🤖 AI Summary
Modular announced a $250M Series C (bringing total funding to $380M) at a $1.6B valuation to scale what it calls AI’s “unified compute layer” — effectively a hypervisor for AI that removes vendor lock-in and unifies heterogeneous hardware and runtimes. Led by Thomas Tull’s US Innovative Technology fund with DFJ Growth and existing backers (GV, General Catalyst, Greylock), the raise underlines strong commercial traction: tens of thousands of downloads per month, 24K+ GitHub stars, trillions of tokens served daily, and reported partner wins yielding up to 70% latency and 80% cost reductions.
Technically, Modular provides an enterprise-grade inference stack that replaces vendor-specific runtimes (CUDA/ROCm) with a unified low-level layer and three core components: Mammoth (Kubernetes-native control plane and router for multi-model, disaggregated compute/cache and prefill-aware routing), MAX (high-performance GenAI serving with speculative decoding, operator-level fusions, OpenAI-compatible endpoints, and seamless CPU/GPU/PyTorch support), and Mojo (a kernel-focused systems language combining Pythonic ergonomics with C++/Rust-like performance/safety). Latest release 25.6 claims 20–50% gains over vLLM and SGLang on next-gen silicon (NVIDIA B200, AMD MI355) and expanding support to Apple GPUs and upcoming ASICs. The funding will accelerate cloud/edge deployment and broader hardware support, making portable, efficient AI inference more practical for enterprises and cloud providers.
Loading comments...
login to comment
loading comments...
no comments yet