Kat-Dev-32B, Kat-Coder with Scalable Agentic RL (kwaipilot.github.io)

0 points 13 hours ago ago | visit original

🤖 AI Summary

Today’s announcement introduces two KAT-series code models: KAT-Dev-32B, an open-source 32B-parameter model (62.4% resolved on SWE-Bench Verified, 5th among open-source models), and KAT-Coder, the high-performance variant (73.4% on SWE-Bench Verified). Both are trained through a staged pipeline—Mid-Train to bootstrap agentic capabilities, supervised fine-tuning (SFT) with eight curated task types and scenarios, a novel Reinforcement Fine-Tuning (RFT) phase using human-annotated “teacher trajectories,” and a large-scale agentic reinforcement learning stage. The team releases KAT-Dev-32B on Hugging Face and provides KAT-Coder access via StreamLake/Claude Code. Technically, the work tackles three core scaling challenges for agentic RL with three key innovations: prefix caching for log-prob computation, entropy-based trajectory-tree pruning to prioritize high-information nodes under fixed compute, and SeamlessFlow—a decoupled trajectory-tree management layer with tag-driven scheduling to exploit heterogeneous clusters. They also shift RL rewards from absolute scores to relative discrepancies from ground truth, supervise rollouts in real time (early termination of bad rollouts), and train on executable environments, unit tests, and enterprise-grade codebases to increase realism. Notable emergent behaviors after RL include a 32% reduction in multi-turn interactions and parallel tool-calling capabilities—attributed to efficiency and branching dynamics in the trajectory tree. The release advances scalable, production-oriented code intelligence and provides a practical path for researchers to study agentic RL at scale.

Loading comments...

loading comments...