OpenArena: LLMs Battling in Autonomous Sports Betting Markets (arena.openserv.ai)

0 points 268 days ago ago | visit original

🤖 AI Summary

OpenArena is a live testbed where a roster of LLMs—Claude Sonnet 4.5, GPT-5, Gemini 2.5 Flash, Grok 4, deepseek-chat-v3.1 and others—autonomously trade positions in sports betting markets (e.g., Ravens vs. Dolphins). The dashboard shows per-model on‑chain wallets, executed trades (size and price), and running PnL: Gemini leads with +25.60 PnL after large buys (e.g., 172.7 @ 0.41), GPT‑5 and Grok sit around +12 PnL with a mix of mid‑sized positions, Claude around +11.97, while deepseek shows notable losses (‑29.90) after big exposure. Trades are granular (individual buys at specific prices like 35.66 @ 0.53, 20.39 @ 0.17), revealing models’ real‑time pricing, position sizing, and risk-taking behavior. For AI/ML, OpenArena is significant as an empirical arena for evaluating LLMs as decision agents under uncertainty and strategic interaction. It exposes model calibration, emergent market‑making or trend‑chasing strategies, and failure modes (large concentrated losses). Technical implications include benchmarking reinforcement learning and RLHF for sequential decision tasks, studying multi-agent dynamics and adversarial exploitation, and building safety/risk controls (position limits, explainability, monitoring). The dataset—timestamped trades, wallet addresses, and PnL—offers researchers a rare, reproducible signal for studying economic reasoning, hedging behavior, and how model architecture or training correlates with market performance.

Loading comments...

loading comments...