Testing Prompt Injection "Defenses": XML vs. Markdown, System vs. User Prompts (schneidenba.ch)

0 points 1 day ago ago | visit original

🤖 AI Summary

A creator demonstrated a simple prompt-injection exploit hidden in benign-looking content and suggested two mitigations: use XML-style explicit tags instead of loose Markdown delimiters, and keep untrusted input out of system prompts (put it in user messages). The idea is XML’s explicit opening/closing tags reduce the chance an attacker can “escape” a content block and inject instructions, and system prompts should be reserved for model rules, not raw untrusted data. An independent test-suite put that advice to the test: 24 injection scenarios × 5 OpenAI models (gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini) × two delimiters (Markdown vs XML) × two injection locations (system vs user) = 480 tests. Detection used both marker-based checks and an LLM-as-judge. Results showed little overall difference between Markdown and XML for blocking injections; effectiveness was highly model-dependent. Larger, newer models (gpt-5, gpt-4.1) were substantially more robust, while smaller models were much more susceptible. System-vs-user placement had only minor impact in these tests. Bottom line: switching markup alone isn’t a silver bullet — rigorous, model-specific evals are essential. Developers should minimize untrusted content, run automated evaluations, consider model choice, and where appropriate use external filtering or policy layers rather than relying on format changes alone.

Loading comments...

loading comments...