NIST's CAISI Evaluation of DeepSeek V4 Pro finds it to be on par with GPT-5 (www.nist.gov)

🤖 AI Summary
In April 2026, the Center for AI Standards and Innovation (CAISI) published an evaluation of the DeepSeek V4 Pro AI model, revealing that its performance lags behind leading models by approximately eight months. This assessment is significant for the AI/ML community as it provides a benchmark for understanding the capabilities and shortcomings of newer open-weight models in comparison to established counterparts like OpenAI’s GPT-5.5. CAISI utilized a methodology inspired by Item Response Theory (IRT) to analyze DeepSeek V4 across 16 benchmarks, examining five domains including cyber security, software engineering, and natural sciences. The evaluation highlighted that while DeepSeek V4 appears competitive on some self-reported benchmarks, it underperformed on critical reasoning tasks and agent-based evaluations, particularly in cyber and software engineering contexts. Moreover, the model's overall cost-effectiveness was partially demonstrated, showing that DeepSeek V4 Pro can be less expensive than other options, such as GPT-5.4 mini, on various benchmark tasks. This detailed performance analysis not only guides prospective users in selecting AI models based on efficiency and performance but also underscores the continuing evolution and competition within the AI landscape, challenging developers to enhance their models further.
Loading comments...
loading comments...