We have Mythos at Home: GLM 5.2 beats Claude in our Cyber Benchmarks (semgrep.dev)

0 points 2 hours ago ago | visit original

🤖 AI Summary

In a recent experiment, Zhipu AI's open-weight model GLM 5.2 surpassed Claude Code in detecting vulnerabilities using the IDOR benchmark, achieving a 39% F1 score compared to Claude's 32%. This notable outcome was achieved without the benefit of a complex harness used by more advanced models, emphasizing GLM 5.2's capability despite running on a straightforward prompt. The results indicate that GLM 5.2, with its low operational cost of approximately $0.17 per vulnerability found, presents a strong case for open-weight models in security tasks, particularly in environments where sensitivity and budget are priorities. GLM 5.2 is significant not only for its competitive performance but also for its architecture as a Mixture-of-Experts model featuring 750 billion parameters, with 40 billion active during inference. It allows for an extensive context window of 1 million tokens, which is crucial for handling complex security-related tasks like IDORs that require reasoning across multiple files. The model's performance underscores a shift in the AI landscape, highlighting the potential of open-weight options to challenge established frontiers and embody a significant learning opportunity for security teams focused on AI integration in vulnerability detection.

Loading comments...

loading comments...