GLM-4.7-Flash: 30B MoE model achieves 59.2% on SWE-bench, runs on 24GB GPUs (curateclick.com)

0 points 11 days ago ago | visit original

🤖 AI Summary

Z.AI has announced the GLM-4.7-Flash, a revolutionary 30-billion parameter Mixture of Experts (MoE) model optimized for local deployment, featuring only 3 billion active parameters. Designed for consumer hardware, it can run efficiently on 24GB GPUs, such as the RTX 3090/4090 and M-series Apple chips, exhibiting impressive throughput of 60-80 tokens per second. The model achieved a remarkable 59.2% on the SWE-bench, significantly outperforming competitors like Qwen3-30B and GPT-OSS-20B, which scored 22% and 34% respectively. This establishes GLM-4.7-Flash as a cost-effective solution for coding assistance, UI generation, and tool calling, with a free API tier providing additional accessibility. The significance of GLM-4.7-Flash lies in its ability to fill a vital gap in the local AI landscape, offering superior performance in coding tasks and extended context management through its Multi-Latent Attention (MLA) feature. This architecture allows the model to process longer inputs while maintaining memory efficiency, making it particularly suitable for a range of applications from creative writing to agentic workflows. With production-ready capabilities and cross-platform support, GLM-4.7 represents a practical and powerful tool for developers seeking high-quality AI solutions without the high costs typically associated with larger models.

Loading comments...

loading comments...