DeepSeek V4–almost on the frontier, a fraction of the price (simonwillison.net)

🤖 AI Summary
Chinese AI lab DeepSeek has launched two preview models in its highly anticipated V4 series: DeepSeek-V4-Pro and DeepSeek-V4-Flash. Both models feature a 1 million token context and utilize a Mixture of Experts architecture, with V4-Pro comprising 1.6 trillion total parameters (49 billion active) and V4-Flash at 284 billion total parameters (13 billion active). Notably, V4-Pro is now the largest publicly available model, surpassing the likes of Kimi K2.6 and GLM-5.1, while DeepSeek maintains a competitive edge in pricing, making V4-Flash the most affordable model among its contemporaries. The significance of this release lies not just in its scale but also in its focus on computational efficiency, achieving significantly lower FLOPs and cache sizes compared to its predecessor, V3.2. DeepSeek claims that the V4-Pro model, while competitive with leading models like GPT-5.4, trails state-of-the-art models by approximately 3 to 6 months in terms of performance. This emphasis on cost efficiency, with input costs as low as $0.14 per million tokens, ensures greater accessibility for developers and researchers, potentially reshaping the landscape of AI/ML applications and fostering broader experimentation within the community.
Loading comments...
loading comments...