Building Gremlins: AI-powered fuzzing agents to find bugs (blog.sentry.io)

0 points 2 hours ago ago | visit original

🤖 AI Summary

Sentry’s Hackweek team built “Gremlins,” LLM-driven web fuzzing agents that wreak deliberate havoc on SaaS apps to surface real-world bugs. Instead of mutating single inputs, Gremlins drive sequences of user actions across frontends, backends, DBs and services, then stream errors, traces and replays into Sentry for rapid triage. The workflow is simple: configure a target site and agent settings, unleash multiple agents, and let Sentry capture whatever breaks — anything a gremlin can trigger, an actual user might encounter too. Technically, Gremlins turn noisy DOMs into compact ARIA summaries (Playwright ariaSnapshot → YAML) so an LLM can reason about interactive elements, then expose a small set of “tools” (click, fill, navigate) the model requests. The core loop appends page snapshots to conversation history, asks the LLM for tool uses, executes those tool calls via Playwright, records results, and repeats. They prototyped both a homegrown Claude/Playwright agent and browser-use (noting GroqLLM improved navigation reliability vs OpenAI), and faced practical limits: brittle initial navigation, high CPU/memory and leaks, modal handling and hallucinations. Integration used Sentry’s SDK/API for error capture and SSE for a live dashboard. Implication: LLM-powered fuzzers make realistic, multi-step bug discovery feasible, but reliability, resource cost and careful tool-design remain critical for production-grade fuzzing.

Loading comments...

loading comments...