LLM Code Review vs. Deterministic SAST Security Tools (blog.fraim.dev)

🤖 AI Summary
Teams behind Fraim argue that instead of trying to encode every security policy as brittle, exhaustive SAST rules (Semgrep/Checkov), you can use LLMs to evaluate intent directly. They built a "risk_flagger" workflow that runs an LLM (example: openai/gpt-5, temperature=1) over a git diff to flag custom risks described in plain language. The piece shows that hand-written deterministic rules—e.g., an LLM-generated Checkov rule listing “sensitive” admin ports and explicit 0.0.0.0/0 checks—miss real-world edge cases that humans spot easily: bespoke admin ports (TeamViewer 5938), non-default ports mapped to services (Redis on 7000), and CIDR-splitting workarounds (0.0.0.0/1 + 128.0.0.0/1) that evade simple scanners. Technically, Fraim’s LLM-based evaluation can inspect diffs, search the repo to map ports to services, and reason about intent rather than exact patterns, producing human-readable findings (severity, location, explanation). The authors note this requires prompt engineering (coaxing the model to call tools) and trade-offs around nondeterminism and reproducibility, but argue LLMs excel at subjective, under-specified policies that are impractical to exhaustively enumerate. For practitioners, the implication is to treat LLM analysis as a complementary layer to deterministic SAST—better at intent and edge cases—but still requiring CI integration, prompt tuning, and guardrails for consistent results.
Loading comments...
loading comments...