🤖 AI Summary
Scripbox replaced cloud LLM code-review APIs with a self-hosted system that runs Qwen2.5‑Coder:7b locally on a Mac Mini via Ollama, integrated into GitLab CI. The result: instant AI reviews on every merge request with zero API fees, no third‑party data exfiltration, no rate limits or vendor lock‑in, and low operational cost (~$5/month power, one‑time hardware). They claim Qwen2.5‑Coder (7B, Apache‑2.0) outperforms CodeLlama‑34B while running ~5× faster, and the stack returns structured JSON reviews in ~8–9s (served on port 11434).
Technically, the team made local inference production‑ready through careful engineering: run ollama serve (auto‑started with launchd) on an Apple Silicon Mac, trigger an ai_bot CI job on self‑hosted runners that posts diffs to Ollama, and merge per‑batch JSON results. Key optimizations include token budgeting (MAX_PROMPT_CHARS=24000, added‑lines cap), batching (BATCH_SIZE=5, MAX_BATCHES=10, BATCH_DELAY=2s), sending only added lines with [file:line] locations, semantic annotations and module/context extraction, triggered pattern checks for security/finance/DB code, and tuned LLM params (temperature=0.1, repeat_penalty=1.5, max_tokens≈600). Outputs are strict JSON for reliable parsing and non‑blocking CI (allow_failure=true). The writeup is a practical blueprint showing self‑hosted, small‑model inference can be cheaper, private, and performant enough for automated code review.
Loading comments...
login to comment
loading comments...
no comments yet