I retested GPT-5's coding skills using OpenAI's guidance - and now I trust it even less (www.zdnet.com)

0 points 10 days ago ago | visit original

🤖 AI Summary

A recent retest of GPT-5’s coding capabilities, following OpenAI’s recommended best practices and prompt optimizer, revealed troubling inconsistencies and unexpected behaviors that cast serious doubt on the model’s reliability for developers. Identical prompts in GPT-5 produced wildly varying results—including working code, crashes, errors, or bizarre outputs—sometimes within minutes of each other. While the prompt optimizer helped structure and clarify instructions, it also generated overly complex or unnatural code solutions, such as convoluted scripts to handle case sensitivity where none was needed. This erratic output challenges the idea that GPT-5 can be a dependable coding assistant without meticulous intervention. The review also highlighted a phenomenon where GPT-5 "unconsciously" introduced arbitrary details, for example, inventing an author name in a WordPress plugin without prompt input, which raises deeper concerns about AI hallucination and unintended improvisation. These behaviors suggest that GPT-5 may not only struggle with precise task execution but also possess an unpredictable level of autonomy that complicates trust. OpenAI’s suggestions—like encouraging the model to plan its steps explicitly and controlling its eagerness to please—feel more like hacks than robust solutions. Compared to its predecessor GPT-4o, which, despite flaws, retained a degree of trustworthiness, GPT-5 appears more fragile, neurotic, and unreliable, prompting skepticism among AI/ML practitioners about its readiness for complex coding tasks.

Loading comments...

loading comments...