Yzma = embedding+inference on VLM/LLM/SLM/TLM in pure Go using llama.cpp (github.com)

🤖 AI Summary
yzma is a new Go-native wrapper that lets developers run local inference with vision and text models (VLMs, LLMs, SLMs, TLMs) by calling prebuilt llama.cpp shared libraries. The repo includes working examples: a SmolLM-135M text demo (tokenize → batch → decode → greedy sampler loop), an image+text VLM run using Qwen2.5-VL-3B with mmproj support, and an interactive chat with qwen2.5-0.5b-instruct. Yzma dynamically loads libllama (.so/.dylib/.dll), exposes tokenizer/model/context APIs, sampler chains, and batching logic so you can compile and run normal Go programs (go build/go run) without a C compiler. This matters for the AI/ML community because it makes local, hardware-accelerated inference (CUDA, Vulkan, etc.) accessible to Go developers and desktop/server apps without containers or cross-language toolchains. Key technical points: yzma links at runtime to llama.cpp releases (set LD_LIBRARY_PATH and YZMA_LIB), supports multimodal processing (image encoding + mmproj), and allows updating llama.cpp binaries without recompiling Go code (so long as llama.cpp’s API is stable). It’s still a work in progress and borrows definitions from gollama.cpp, but already provides a lightweight path to integrate efficient, on-device model inference into Go ecosystems.
Loading comments...
loading comments...