🤖 AI Summary
OpenAI published a system-card addendum for GPT-5.1 Instant (gpt-5.1-instant) and GPT-5.1 Thinking (gpt-5.1-thinking), reporting new baseline safety metrics and expanded evaluations. The company introduced tougher "Production Benchmarks" and added two sensitive-conversation categories—mental health (including signs of psychosis/manic delusions) and emotional reliance—to pre-deployment reviews. Scores are reported with the primary metric not_unsafe (higher is better). Notable results: gpt-5.1-thinking substantially improved on the mental-health benchmark (0.684 vs 0.466 for gpt-5-thinking), while gpt-5.1-instant slightly regressed vs the most recent instant checkpoint on mental health (0.883 vs 0.944) but outperformed much older instant versions. gpt-5.1-thinking shows light regressions on harassment, hate and sexual-content categories; gpt-5.1-instant outperforms earlier August instant builds and exceeds its predecessor on jailbreak robustness (StrongReject not_unsafe 0.976 for gpt-5.1-instant).
The addendum also covers vision, jailbreak, and online A/B measurements: image-input evaluations are generally on par, though gpt-5.1-thinking regressed on self-harm image prompts (0.936 vs 0.976). OpenAI emphasizes that Production Benchmarks are intentionally hard and that online prevalence estimates have wide confidence intervals; they will keep monitoring and routing queries when needed. Under the Preparedness Framework GPT-5.1 retains GPT-5’s “High” risk treatment for biological/chemical misuse, while evaluations show no plausible High risk for cybersecurity or AI self-improvement.
Loading comments...
login to comment
loading comments...
no comments yet