Anthropic donates Petri open-source alignment tool (www.anthropic.com)

🤖 AI Summary
Anthropic has announced the donation of its open-source alignment tool, Petri, to Meridian Labs, an AI evaluation nonprofit. Initially launched in October 2025, Petri serves as a toolbox for alignment tests applicable to large language models, assessing behaviors such as deception and compliance with harmful requests. This transition is significant as it aims to ensure that Petri remains independent from any single AI lab, allowing its evaluations to be viewed as neutral and credible throughout the industry. The tool has already been utilized by organizations like the UK’s AI Security Institute to evaluate models' tendencies toward sabotaging AI research. Petri is being updated to its third version, which introduces several key enhancements. These include greater adaptability by separating the auditor and target models for customized testing, increased realism to minimize models' awareness during evaluations, and improved depth through integration with another alignment tool, Bloom, for detailed behavioral assessments. With its comprehensive and credible approach, Petri 3.0 is positioned to be a vital resource for the AI/ML community as the demand for reliable evaluations of AI behavior grows.
Loading comments...
loading comments...