Back

Multi-Turn Attack (AI)

Single-prompt safety filters were designed to catch overtly malicious inputs, but they often miss attacks assembled from individually benign turns. Multi-turn techniques have become a primary method for jailbreaking modern LLMs and for steering autonomous agents into actions they would refuse in one shot, especially as context windows grow and sessions extend across many exchanges.

Common multi-turn attack patterns include:

Crescendo attacks: Gradually escalating turns where each step appears safe but the trajectory leads to unsafe output.
Split-prompt attacks: Malicious intent divided across multiple turns so no single message triggers a filter.
Context drop exploitation: Attackers wait for earlier safety refusals to fall out of the context window before retrying.
Role-playing escalation: Persistent persona prompts that compound across turns and erode safety boundaries.
Trust-building setup: Benign early turns establish authority or rapport that later, riskier turns exploit.

Multi-turn attacks are particularly dangerous in agentic settings, where each turn can include tool calls or memory writes that persist across sessions. Effective defenses evaluate the trajectory of a conversation rather than individual turns, and capture intent at the workflow level.

How PointGuard AI Helps

PointGuard's Intelligent Guardrails analyze prompts and responses across turns rather than in isolation, surfacing crescendo and split-prompt patterns before they cross policy. The Agent Governance Mesh extends the same trajectory analysis to tool-call sequences, catching multi-turn manipulations that single-message filters would miss.

Learn More

Russinovich et al., Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack (arXiv:2404.01833)

OWASP Top 10 for LLM Applications

NIST AI 100-2 Adversarial ML Taxonomy