MTG Bench: Testing how well LLMs can play Magic (mtgautodeck.com)

0 points 3 hours ago ago | visit original

🤖 AI Summary

A recent project called MTG Bench has been launched to test how well large language models (LLMs) can play Magic: The Gathering (MTG), utilizing a custom Multi-Card Processing (MCP) server. The benchmark reveals mixed results, showcasing that while some models, like Gemini 3.5, can perform complex in-game functions such as scrying and tutoring, many struggle with maintaining legality in their moves—often making errors when they prematurely call tools or attempt to undo actions. For instance, the GTP-5.5 benchmark highlighted an average of over 11,000 tokens used per turn, significantly impacting performance and efficiency. This initiative is significant for the AI/ML community as it explores the capacity of LLMs to understand and apply intricate game rules without a designated rules engine. The findings suggest that while LLMs can comprehend legality checks better than they can accurately simulate turns, this research could pave the way for improved simulations and automated deck optimization in the future. As more affordable and capable LLMs are developed, the project could evolve, enabling extensive simulations and valuable insights into deck performance—a prospect that promises to transform how players analyze and enhance their gameplay strategies.

Loading comments...

loading comments...