Show HN: Fantail SLMs for Coding Agents (www.autohand.ai)

🤖 AI Summary
Fantail is a new family of small language models from Autohand designed for low-latency coding agents and local‑first workflows; open weights (0.5B, 1.3B, 3B) will be released under CC BY 4.0 later this week. The line targets short‑turn reasoning, retrieval‑aware chat, tool calls and code assistance with options for 8K or 32K context windows, quantization (Q4/Q5/Q8/FP16), JSON/BNF‑constrained output, and deployments that range from on‑device (consumer GPUs, Apple Silicon) to inference servers or hosted APIs. Fantail emphasizes fast startup, streaming throughput, predictable costs and tunable safety policies, plus guidance for privacy‑preserving local or VPC deployments. Technically, Fantail is tuned with staged training (base pretraining, instruction/safety, task tuning) on permissively licensed and synthetic data, and evaluated on agentic coding benchmarks (Terminal‑Bench) where Fantail‑mini/base/pro score ~31.4%, 38.6%, and 42.0% respectively. Latency and throughput were measured on M2 Max and single T4 setups (batch=1, streaming on); evaluation used JSON/BNF decoding and a standard agent harness (Terminus 2) with careful run averaging and seed variation. For teams and researchers this matters because it provides a practical, open small‑model baseline that trades raw scale for responsiveness, local control, and lower inference cost—making tool calls feel instantaneous and keeping sensitive tokens on infrastructure you control.
Loading comments...
loading comments...