LLM model statistics from my Strix testing (theartificialq.github.io)

0 points 4 hours ago ago | visit original

🤖 AI Summary

A recent testing session of various LLM models revealed key insights into their performance, with notable findings for both advanced and open-source alternatives. The standout performer was the gpt-5.3-codex model, praised for its speed and cost-efficiency. However, it raised concerns by generating multiple subagents during tests, which could lead to overlapping tasks and potential throttling by target websites. Adjusting the model's parameters might mitigate these issues but requires further exploration. In contrast, the gemini-3.1-pro-preview, while slower and more expensive, offered a more coherent and engaging experience without the complication of subagent generation. Other models like kimi-k2.5 and glm-5 proved satisfactory as cheaper open-source options, while deepseek-v3.2 underperformed significantly, leading to its dismissal from future tests. Although gpt-5-mini and gpt-5-nano maintained low costs, their consistent lack of successful outcomes made them less desirable for serious testing. The testing page serves as a valuable resource for the AI/ML community, providing a comparative framework for selecting LLM models and underscoring the diverse range of options available for different needs and budgets. Future testing is planned, promising further insights into this rapidly evolving field.

Loading comments...

loading comments...