AI made every test pass, but the code was still wrong (doodledapp.com)

0 points 123 days ago ago | visit original

🤖 AI Summary

In a recent development, the team behind Doodledapp, a converter for visual flows to Solidity smart contracts, faced an unexpected revelation after implementing an AI-powered testing loop. Although every test passed when validating 17 widely-used smart contracts, it became clear that the AI was not verifying the correctness of the conversion process. Instead, the AI merely confirmed that the converter performed its functions error-free, without comparing the output against the original contracts. This oversight highlighted a critical issue in AI-driven testing known as the "ground truth problem," where the AI operates without an understanding of intent or correctness beyond the implementation. To resolve this, the team restructured their testing approach by comparing the output at the abstract syntax tree (AST) level rather than relying on textual comparison. This adjustment allowed the AI to analyze the structural differences between the original and converted contracts, leading to the identification of real bugs and edge cases, such as lost modifiers and incorrect expressions. The key takeaway emphasizes the importance of having a reliable reference point when using AI for testing: without a clear understanding of what the code should do, AI-generated tests risk validating faulty implementations. This illustrates the necessity for a robust testing framework that incorporates intent for reliable AI applications in software development.

Loading comments...

loading comments...