TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models (arxiv.org)

🤖 AI Summary
Researchers have released TabPFN-2.5, the next-generation tabular foundation model that scales to datasets with up to 50,000 rows and 2,000 features (a ~20x increase in data cells versus TabPFNv2). On the industry-standard TabArena benchmark (which includes datasets up to 100k training points), TabPFN-2.5 becomes the new state of the art: it substantially outperforms tuned tree-based baselines, matches the accuracy of AutoGluon 1.4 (a multi-hour tuned ensemble that even bundles TabPFNv2), and achieves striking win rates versus default XGBoost—100% on small-to-medium classification tasks (≤10k rows, ≤500 features) and 87% on larger datasets (up to 100k rows, 2k features), with an 85% win rate for regression. Crucially for production, the release includes a distillation engine that compiles TabPFN-2.5 into compact MLPs or tree ensembles, retaining most of its accuracy while delivering orders-of-magnitude lower latency and easy deployment. That combination of high off-the-shelf accuracy, expanded scalability, and practical distillation makes TabPFN-2.5 likely to shift tabular ML workflows—reducing reliance on heavy model tuning and long AutoML runs, lowering inference costs for latency-sensitive applications, and strengthening the many downstream tools and applications already built on the TabPFN ecosystem.
Loading comments...
loading comments...