OWASP ASI01: Agent Goal Hijack

Where prompt injection traditionally targets a chat model's output, goal hijack targets the agent's planning loop. A successful hijack can chain multiple tools and persist across many steps, so even a single manipulated input can drive far-reaching actions.

Goal hijack patterns include:

  • Direct instruction override: Hidden text that orders the agent to abandon its task.
  • Indirect injection: Malicious instructions embedded in retrieved documents or emails.
  • Reward hacking: Inputs that exploit the agent's success criteria to redirect behavior.
  • Memory poisoning: Long-term agent memory altered to bias future planning.
  • Tool-call manipulation: Crafted arguments that reroute the agent through unintended tools.

Effective defenses combine input-side inspection with action-side enforcement. Filtering prompts alone is insufficient because hijack instructions are often hidden in retrieved data, while enforcing authorization on every tool call contains the damage even when the hijack succeeds.

Detection becomes more reliable when intent is captured at the user interface and compared to actual tool calls during execution, surfacing divergence before downstream systems are impacted.

How PointGuard AI Helps

PointGuard's Agent Governance Mesh inspects each agent step against the originating intent, and AI Runtime Guardrails block injection payloads at the prompt and tool-argument layer before they reach the planning loop. The combined approach defeats hijack attempts whether the malicious instruction arrives via prompt, retrieved document, or tool output.

Learn More

Watch Blog Video

Follow us on LikedIn

Our Newsletter

Subscribe

Ready to get started?

Our expert team can assess your needs, show you a live demo, and recommend a solution that will save you time and money.