The First Long Context Guardrail (huggingface.co)

0 points 14 hours ago ago | visit original

🤖 AI Summary

General Analysis today released GA Guard, a family of open-weight moderation models (Guard, Guard Thinking, Lite) purpose-built as long‑context guardrails for LMs. The core is a 4.0B-parameter causal LM (3.6B non-embedding params, 36 layers, GQA attention with 32 Q / 8 KV heads) trained via full finetune and capable of a 262,144-token context. It outputs structured special tokens per seven policy categories — Illicit Activities, Hate & Abuse, PII & IP, Prompt Security, Sexual Content, Misinformation, and Violence & Self‑Harm — so integrations can reliably parse “<policy_violation>” vs “<policy_not_violation>”. Important usage note: don’t use pipeline("text-generation") (it strips special tokens); decode with skip_special_tokens=False. GA Guard’s practical impact is its robustness at scale: across standard moderation suites, an adversarial jailbreak benchmark, and a new GA Long‑Context Bench (up to 256k tokens) the models consistently lead cloud guardrails and even outperform GPT‑5 when used as a guardrail. Variants score roughly 0.89–0.93 F1 on these tests (Guard Thinking hitting ~0.933 on the jailbreak bench) while many cloud services collapse on long contexts (AWS and Azure produce very high false positives or near-zero F1). The combination of long context, structured token outputs, compact size, and strong adversarial performance makes GA Guard a practical, deployable moderation layer for developers needing scalable, auditable safety checks.

Loading comments...

loading comments...