🤖 AI Summary
OpenAI published an addendum for GPT-5-Codex, a GPT-5 variant fine-tuned for agentic coding inside Codex (CLI, IDE extension, cloud, GitHub, ChatGPT mobile). Trained with reinforcement learning on real-world coding tasks, it’s designed to produce human-style code, follow precise instructions, run tests iteratively, and operate tools. The release emphasizes broad availability (local and cloud) and is positioned to accelerate developer workflows while maintaining tighter control than a general-purpose chat model.
The addendum’s core contribution is a detailed safety suite: model-level training to refuse harmful/malicious coding (malware refusals scored 1.0 on a curated “golden set”), specialized prompt-injection training with a Codex-focused evaluation (98% of attacks ignored), and jailbreak robustness measured by StrongReject (~0.99 across categories). Product mitigations include per-interface sandboxing (containerized cloud instances with network disabled by default; Seatbelt on macOS, seccomp+landlock on Linux locally), workspace-limited file edits, and configurable per-project network allowlists. OpenAI flags GPT-5-Codex as “High risk” for biological/chemical misuse (same as GPT-5) but not meeting their threshold for high cyber capability despite improved CTF performance. For AI/ML practitioners and deployers this means a more capable coding agent with demonstrably stronger safety controls, but one that still requires cautious configuration (network allowlists, review of outputs) and monitoring to manage dual-use and prompt-injection risks.
Loading comments...
login to comment
loading comments...
no comments yet