🤖 AI Summary
When testing a document-extraction pipeline that uses LLM calls, the team found dozens of test cases took 20–30 minutes and $1–2 per run. Rather than writing mocks (or skipping extraction tests), they implemented a test-driven caching strategy: the test suite records the actual responses from external LLM calls into local cache files on first run, and subsequent runs read those caches. Caches are created automatically by tests, can be selectively overridden to re-run parts, and are guarded by conditional code that only activates in local development. This preserves all non-external code execution every run while avoiding repeated time and cost of LLM calls.
For the AI/ML community, this pattern offers a pragmatic middle ground between slow integration tests and brittle mocks. Key technical details: cached data is the exact payload from the external call (not a hand-crafted stub), caches can be regenerated by running tests or refreshed selectively when extraction logic changes, and the trigger for renewing caches is tied to when the underlying logic is re-tested. The approach reduces CI expense, speeds feedback loops, and keeps surface testing realistic—while requiring a cache-refresh policy to handle model output drift or logic updates.
Loading comments...
login to comment
loading comments...
no comments yet