LongCat-Flash-Thinking, LLM from Meituan (China's Equivalent of Uber Eats) (github.com)

🤖 AI Summary
Meituan released LongCat-Flash-Thinking, a 560B-parameter large reasoning model (LRM) built as a Mixture-of-Experts (MoE) that dynamically activates only 18.6B–31.3B parameters per request (≈27B on average). The model was trained with a two-phase pipeline: a Long-CoT cold-start (curriculum learning + SFT on reasoning/agentic data) followed by large-scale RL using DORA — Meituan’s distributed asynchronous RL framework. DORA’s innovations (elastic colocation, multi-version asynchronous pipelines and an adapted GRPO algorithm) target stable, efficient rollouts across tens of thousands of accelerators. Critically, LongCat uses a domain-parallel training scheme that decouples optimization for STEM, coding and agentic tasks, then fuses domain experts into a near–Pareto-optimal model to avoid the instability of mixed-domain RL. Technically notable are the model’s formal- and agentic-reasoning features: an expert-iteration pipeline for automated theorem proving (statement formalization, iterative proof synthesis and consistency filtering) and a dual-path approach to identify queries that genuinely require tool use, with synthesized tool-API trajectories (including MCP servers and simulated tools). Benchmarks show strong math and theorem-proving performance (MATH500 99.2, MiniF2F pass@1 67.6 / pass@32 81.6) with mixed results on general QA/alignment (MMLU 82.6). Weights are released under the MIT License (inference defaults: temp=1.0, top_p=0.95). Meituan cautions about known LLM limitations and urges careful safety, fairness and legal evaluation before deployment.
Loading comments...
loading comments...