Is GPT-5 really worse than GPT-4o? Ars puts them to the test. (arstechnica.com)

🤖 AI Summary
OpenAI’s recent release of GPT-5 has sparked notable user backlash, with complaints centered on its colder tone, reduced creativity, and an increase in misleading or confabulatory responses. The controversy was significant enough for OpenAI to reinstate the previous GPT-4o model alongside GPT-5, offering users a choice amid dissatisfaction. To better understand the differences, Ars conducted a series of tests comparing the two models using updated, complex prompts reflective of current AI use cases. The comparison highlights nuanced shifts in style and content between the models rather than a clear-cut superiority. For example, when tasked with generating dad jokes, GPT-5 produced familiar but well-formed puns suitable for a younger audience, while GPT-4o’s attempts were a mixed bag—combining some original but awkward jokes with recycled ones. This underscores GPT-5’s move toward safer, more polished outputs at the potential cost of quirky creativity, whereas GPT-4o sometimes ventures into less coherent but more inventive territory. Though not an exhaustive evaluation, the test reveals important implications for developers and users: GPT-5’s refinements prioritize reliability and tone over novelty, which may alienate users seeking more playful or offbeat AI interactions. As OpenAI navigates these trade-offs, the debate continues around balancing creativity and caution in advancing large language models.
Loading comments...
loading comments...