Launch HN: Cactus (YC S25) – AI inference on smartphones (github.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

Cactus (YC S25) launched as an energy‑efficient AI inference framework and kernel stack specifically built for smartphones and AI‑native mobile hardware — with an explicit focus on budget and mid‑range devices that represent over 70% of the market. Unlike existing frameworks tuned for high‑end SoCs, Cactus is a bottom‑up, no‑dependency implementation that exposes an OpenAI‑compatible C FFI for easy integration and language bindings, plus a high‑level transformer engine so developers can run LLMs and custom models directly on phones and Apple Silicon for testing. Technically, Cactus is split into four layers: a Cactus FFI (C API), Cactus Engine (transformer inference), Cactus Graph (a unified zero‑copy numerical graph), and Cactus Kernels (ARM‑specific SIMD ops). It supports INT8 and INT4 quantization, NPUs/DSPs/SMMLA on high‑end phones, and can run CPU‑only on a wide range of devices — e.g., Qwen3‑600m‑INT8 (370–420 MB) yields ~16–20 toks/s on Pixel 6a/Galaxy S21/iPhone 11 Pro and ~50–70 toks/s on Pixel 9/Galaxy S25/iPhone 16. Preliminary NPU run: Qwen3‑4B‑INT4 = 21 t/s on iPhone 16 Pro. Tooling includes HF model conversion, Python/Torch/JAX porting helpers, and production SDKs (500k+ weekly inferences). Cactus fills a gap for mobile-first, low‑power inference while recommending established x86/desktop stacks (HuggingFace, llm libs) for non‑mobile environments.

Loading comments...

loading comments...