Show HN: Zenith: sota harness for normal models to beat Fable on FrontierSWE (ii.inc)

🤖 AI Summary
Zenith, a new agent harness, has successfully elevated the performance of OpenAI's GPT-5.5 on the Frontier Software Engineering (SWE) benchmark, from fifth to first place, overtaking Claude Fable. Rather than replacing the model with a larger one, Zenith enhances performance by creating a tailored control loop around the existing model. This method is particularly significant as access to the highest-performing models is increasingly restricted due to export controls and limited previews, making it essential for developers to optimize the systems they control. Zenith's innovative approach revolves around two key ideas: maintaining continuous planning and testing throughout long-running tasks while utilizing a component called Meta-Zenith to automate the creation of the harness for new tasks based on real-world feedback. By structuring the task execution around efficient management of sub-tasks, checks, and validation, Zenith achieves superior ranking and performance metrics, demonstrating that harness improvements can yield better outcomes than merely depending on model size. This breakthrough offers a promising pathway for AI development, especially in scenarios with limited access to advanced models, allowing greater accessibility and performance possible with less powerful systems.
Loading comments...
loading comments...