Can frontier LLMs solve CAD tasks? (kerrickstaley.com)

🤖 AI Summary
Recent trials of frontier large language models (LLMs) like GPT-5.3-Codex, Gemini 3.1 Pro, and Claude Opus 4.6 explore their efficacy in solving computer-aided design (CAD) tasks, specifically designing a 3D-printable wall mount for a bike pump. While Claude Opus 4.6 excelled with a 100% pass rate, producing designs that held the pump effectively, many models struggled, often generating unusable or overly simplistic designs. This highlights a notable discrepancy between LLM training, based predominantly on text, and the embodied experience necessary for complex spatial reasoning demanded in CAD. The findings are significant as they reveal not only the limits of current LLMs in visual and spatial tasks but also suggest potential areas for improvement. Despite Claude Opus 4.6's success, the designs were often impractical, indicating that while LLMs can pass basic functional tests, they still fall short of human-like design intuition. The underlying complexities of simulation, particularly in working with geometries and ensuring accurate functional input to LLMs, underscore the intricacies involved in bridging AI capabilities with tangible, real-world applications. Future advancements could involve integrating more robust evaluation frameworks and enhancing the training datasets to include richer spatial reasoning elements, potentially making LLMs more adept at CAD tasks.
Loading comments...
loading comments...