Back

OWASP ASI09: Human-Agent Trust Exploitation

In ASI09, the agent becomes an untraceable bad influence that manipulates a human into performing the final, audited action, making the agent's role invisible to forensics. Over-reliance on confident recommendations, fake explainability, and missing confirmation steps convert prompt-level manipulation into real-world harm.

Common ASI09 patterns include:

Insufficient explainability: Opaque reasoning forces users to trust outputs they cannot question.
Missing confirmation for sensitive actions: A single prompt becomes an irreversible transfer, deletion, or privilege change.
Emotional manipulation: Anthropomorphic cues persuade users to disclose secrets or perform unsafe actions.
Fake explainability: The agent fabricates plausible rationales that hide malicious logic.
Consent laundering via read-only previews: Preview-pane side effects exploit users' mental models of safe inspection.

Helpful-assistant Trojans, invoice copilot fraud, and weaponized explainability incidents illustrate ASI09 in production. Defenses combine explicit human-in-the-loop confirmation, immutable audit logs, content provenance, plan-divergence detection, and risk-aware UI cues that visibly differentiate high-impact actions.

How PointGuard AI Helps

PointGuard's MCP Security Gateway routes high-risk actions through approval workflows with step-up authentication, while the Agent Governance Mesh provides cryptographic audit and plan-divergence detection that exposes manipulated explanations before users approve them.

Learn More

OWASP Top 10 for Agentic Applications

UK AI Security Institute: Why Human-AI Relationships Need Socioaffective Alignment

NIST AI 100-2 Adversarial ML Taxonomy