Show HN: Open-source testing framework for AI agents with semantic validation (github.com)

0 points 3 hours ago ago | visit original

🤖 AI Summary

Blade47 released SemanticTest, an open-source, pipeline-based testing framework for AI systems and APIs that replaces brittle exact-text checks with semantic validation. Tests are defined as readable JSON pipelines of composable blocks (HttpRequest, JsonParser, StreamParser, ValidateContent, ValidateTools, Loop, LLMJudge, etc.) that pass named-slot data through a central DataBus. It includes an LLMJudge block that invokes GPT-4-family models to score responses (0–1) on configurable criteria (accuracy, completeness, relevance), validate tool usage/order/args, and provide reasoning; CLI commands (npx semtest) generate HTML reports, run test suites, and support setup/teardown, .env contexts, and custom user blocks via a simple Block API. The project is MIT-licensed and available on GitHub/npm. Significance: SemanticTest addresses core pain points in AI/ML QA — nondeterministic outputs, streaming responses, and tool-calling behavior — by enabling intent- and behavior-focused assertions rather than fragile string matches. Technical implications include seamless evaluation of tool-assisted agents, structured handling of streaming formats (SSE/OpenAI), looped retries, and version-controllable JSON test definitions that integrate into CI. Its LLM-driven judgment and ValidateTools primitives make it practical to assert high-level correctness (intent, helpfulness, permitted tools) and automate robust testing for modern conversational and tool-using AI agents.

Loading comments...

loading comments...