A free tool that stuns LLMs with thousands of invisible Unicode characters (gibberifier.com)

0 points 10 hours ago ago | visit original

🤖 AI Summary

Researchers/product creators released a free “gibberifier” that inserts zero‑width Unicode characters between characters of a text so it looks identical to humans but breaks many LLM pipelines. By interleaving thousands of invisible code points the tool inflates string length and token counts, which can make some models crash, ignore the content, produce confused outputs, or push users into rate limits. Creators recommend applying it selectively (up to ~500 characters) to key prompt sections; it reportedly blocks graders like Flint AI and yields confused/ignored behavior in ChatGPT in informal tests. The technique matters because it’s a simple, widely accessible form of input‑level obfuscation and adversarial filtering that exploits how tokenizers and preprocessing treat Unicode. Practical uses include anti‑plagiarism, thwarting web scrapers or automated grading, and deliberate token-wasting, but it also raises ethical concerns and an arms race: robust pipelines can strip or normalize zero‑width characters, or use prefilters to neutralize the attack, while naive systems remain vulnerable. For ML practitioners this highlights the need for explicit Unicode normalization, sanitizer steps, and token accounting policies to prevent both accidental failures and malicious exploitation.

Loading comments...

loading comments...