🤖 AI Summary
A hands-on test of Google’s Gemini for coding found it reliably confident and verbose but frequently wrong when tasked with a nontrivial Python job. The author asked Gemini to write async code that authenticates to a remote API and uses a well-documented Python GitHub library to send/receive and process data. Gemini produced polished, heavily commented, variety-rich solutions that looked plausible but failed to work across many iterations — never producing the same code twice, varying how it invoked the library (embedded configs vs. external files vs. CLI args), and often baking in unnecessary complexity. After nearly two hours of back-and-forth and added diagnostics, the human eventually wrote working async code from scratch in about 30 minutes that was simpler and shorter. A smaller Bash date-conversion task fared better, taking roughly four tries to reach acceptable output.
The takeaway for the AI/ML community is a clear reminder of current LLM limits for complex engineering tasks: generative models can reduce grunt work on well-scoped, simple scripts but are prone to nondeterministic, bloated, and plausibly incorrect code when dealing with real-world API integrations, auth flows, and async behavior. This highlights the need for rigorous human review, tests, and cautious deployment of model-generated code in production — and suggests they’re better seen as coding assistants for small tasks or scaffolding, not reliable solo engineers.
Loading comments...
login to comment
loading comments...
no comments yet