Loom: Universal AI Runtime for Local, Cross-Platform Inference (medium.com)

0 points 252 days ago ago | visit original

🤖 AI Summary

Loom is a new cross-platform AI runtime that lets developers run the same HuggingFace safetensors models everywhere — phones, browsers (WASM), desktops, servers, game engines — without format conversion or cloud dependency. Built in Go with a C-ABI layer and bindings for Python, JavaScript, C#, Go and mobile, Loom promises zero-Python deployment (single ~10MB binary), native Godot integration, and an identical API across environments. Technically it loads safetensors directly, implements 10 layer types (Dense, Conv2D, MHA with GQA, RNN/LSTM, LayerNorm/RMSNorm, SwiGLU, softmax variants including native MoE, residuals), supports full forward/backward, and aims for bit-exact determinism (MAE < 1e-8) across platforms via deterministic math and fixed-precision where needed. The practical implications are significant: offline, privacy-preserving inference (medical, mobile, games), reproducible outputs for auditing/regulatory compliance, and simpler deployment without per-platform conversion (ONNX/TFLite/CoreML). Current v0.0.3 is CPU-only and “correctness-first” (roughly 0.5–3 tokens/sec on small models like SmolLM2–360M) with WebGPU GPU acceleration planned for 10–50x speedups while preserving determinism. Loom already runs many Llama-style models (SmolLM2, Qwen2.5, TinyLlama, Mistral/Llama 2/3 with quantization) and reduces vendor lock-in and dependency bloat — enabling local AI in indie games, healthcare devices, and offline mobile apps.

Loading comments...

loading comments...