Qwen3Guard: Real-Time Safety for Your Token Stream (huggingface.co)

🤖 AI Summary
Qwen3Guard is a new family of safety-moderation models built on Qwen3 and trained on a 1.19M-example labeled prompt/response safety dataset. The release includes 0.6B, 4B and 8B sizes and two purpose-built variants: Qwen3Guard-Gen (a generative, instruction-following classifier for prompt and response moderation) and Qwen3Guard-Stream (which adds a token-level classification head to monitor safety in real time during incremental generation). Qwen3Guard-Gen supports 119 languages, achieves state-of-the-art results on multiple safety benchmarks (English, Chinese and multilingual), and outputs structured labels like Safety: Safe/Controversial/Unsafe plus fine-grained categories and refusal signals. Technically, Qwen3Guard is designed for low-friction deployment: it uses the Hugging Face transformers stack (transformers>=4.51 recommended), example model IDs include Qwen/Qwen3Guard-Gen-8B and Qwen/Qwen3Guard-4B-Gen, and it can be served with sglang or vLLM or via an OpenAI-compatible API (examples show parsing of Safety, Categories and Refusal). The taxonomy covers nine harm categories (e.g., Violent, Non-violent Illegal Acts, Sexual Content, PII, Suicide & Self‑Harm, Unethical Acts, Politically Sensitive, Copyright, Jailbreak). The token-level Stream variant is especially significant for the AI safety community because it enables early, low-latency interventions during generation and configurable severity thresholds—useful for real-time content filtering, dynamic refusal behaviors, and cross-lingual moderation pipelines.
Loading comments...
loading comments...