Load Llama-3.2 WebGPU in the browser from a local folder (simonwillison.net)

0 points 3 days ago ago | visit original

🤖 AI Summary

A developer has successfully modified the Llama 3.2 WebGPU chat demo to load the model directly from a local folder in the browser, eliminating the need to download large model files over the network. Using OpenAI’s GPT-5-powered Codex CLI, the developer added a file browser interface that allows users to select their own local copy of the ~1.2GB Llama 3.2 ONNX model, which can be cloned from Hugging Face via Git LFS. This proof-of-concept runs entirely client-side in WebGPU-enabled browsers like Chrome or Firefox Nightly, ensuring faster startup and improved privacy by keeping model files local. This advancement is significant for the AI/ML community as it demonstrates seamless local model loading in a browser environment, potentially paving the way for more accessible, offline-capable AI applications without relying on cloud downloads. By leveraging Codex to inspect and alter the underlying Transformers.js library, the developer automated much of the modification process, showcasing AI-assisted programming’s potential to expedite tooling improvements. The next step involves extending such local loading capabilities to other models beyond Llama 3.2, broadening usability in lightweight, web-based AI deployments.

Loading comments...

loading comments...