Show HN: Run HF Transformers in pure Go (10 MB binary, no Python) (github.com)

0 points 5 hours ago ago | visit original

🤖 AI Summary

Loom is an open-source, high-performance neural network framework written in pure Go that now supports full transformer inference — including running SmolLM2-135M-Instruct — entirely in the browser via WebAssembly (no Python). It uses WebGPU compute shaders (WGSL) for native GPU acceleration and provides hybrid CPU/GPU execution paths with automatic differentiation, full forward/backward on CPU for every layer type, and selective 10–100× GPU speedups for Dense/Conv2D/Attention. The project includes a pure-Go BPE tokenizer, safetensors loader for Hugging Face weights, model conversion tooling, and the ability to compile to shared libraries or WASM for multi-platform deployment (Linux/macOS/Windows/Android/iOS). Technically notable: Loom models are organized on flexible 2D grids, support multi-head attention, LayerNorm, residuals, RNN/LSTM, and 10 Softmax variants. Its Grid Softmax implements Mixture-of-Experts (MoE) natively and the repo includes a mathematical proof/demo showing 97.1% loss reduction and exact output/gradient matching vs. conventional MoE. The framework exposes a reflection-based API and registry for dynamic layer creation, runtime introspection, and FFI bindings (C/C++/Python/Rust/TypeScript/C#). For practitioners this means lightweight, dependency-free deployment of transformers in browsers and native apps, easier model portability from HuggingFace, and a production-ready Go stack for both inference and on-device training.

Loading comments...

loading comments...