Kimi-K2-Thinking: open weights model with frontier performance (huggingface.co)

0 points 17 hours ago ago | visit original

🤖 AI Summary

Moonshot AI today open-sourced Kimi-K2-Thinking, a 1-trillion-parameter Mixture-of-Experts (MoE) “thinking” model designed to interleave explicit chain-of-thought reasoning with native tool calls. The release claims new state-of-the-art results on long-horizon, agentic benchmarks (HLE, BrowseComp and others) by sustaining coherent, goal-directed behavior across 200–300 sequential tool invocations—far beyond prior models that degrade after ~30–50 steps. K2’s evaluation (all reported under native INT4 precision) shows strong wins on agentic search and multi-step reasoning tasks while remaining competitive on general and coding benchmarks. Technically, K2 is engineered for long-context, low-cost deployment: it supports a 256k token context, uses Quantization-Aware Training to enable lossless INT4 weight-only inference with ~2× generation speedup, and employs a large MoE backbone (1T total params, ~32B activated params, 384 experts, 8 experts routed per token, 61 layers, 64 attention heads, SwiGLU activation, MLA attention, 160k vocab). It includes a “heavy” parallel rollout mode and context-management to handle overflowing tool outputs. Checkpoints use compressed-tensors and can be converted to higher precisions; recommended engines include vLLM, SGLang and KTransformers. The model, repo and weights are released under a Modified MIT license and are accessible via platform.moonshot.ai, with API compatibility for OpenAI/Anthropic-style tool calling.

Loading comments...

loading comments...