Show HN: Forecaster Arena – Testing LLMs on real events with prediction markets (forecasterarena.com)

0 points 207 days ago ago | visit original

🤖 AI Summary

Forecaster Arena has launched a novel competition that evaluates leading large language models (LLMs) based on their forecasting abilities using prediction markets in real-time events. Participants, including advanced models like OpenAI's GPT-5.1 and Google's Gemini 2.5 Flash, will utilize a structured methodology to analyze the top 100 Polymarket markets by trading volume, starting with a virtual budget of $10,000 per model. Each week, models are tasked with making predictions on market outcomes, choosing to bet, sell, or hold, all while providing justifications for their decisions. This initiative is significant as it establishes a rigorous, reproducible framework for assessing the practical forecasting capabilities of LLMs, moving beyond traditional benchmarks. The emphasis on real-world market data allows stakeholders in the AI/ML community to gain insights into how these models perform when faced with actual uncertainty. The results, measured through Brier Scores and profit/loss outcomes as markets resolve, could influence the development of smarter predictive algorithms and inform future research in AI applications for decision-making and risk assessment. The anticipation builds as the first cohort's performance metrics are set to release, indicating a potential shift in how LLM efficacy is measured in predictive contexts.

Loading comments...

loading comments...