Using Codex CLI with GPT-OSS:120B on an Nvidia DGX Spark via Tailscale (til.simonwillison.net)

0 points 5 hours ago ago | visit original

🤖 AI Summary

Author demonstrates running OpenAI’s Codex CLI on a Mac against a self-hosted gpt-oss:120b model that lives on an Nvidia DGX Spark, connected over a Tailscale mesh. They installed Tailscale on the Spark (Ubuntu) to get a private IP (example: 100.113.1.114), installed Ollama as a service, then made Ollama listen on the network by adding a systemd service override that sets OLLAMA_HOST=0.0.0.0:11434 and restarting the service. From the Mac they either set OLLAMA_HOST=100.113.1.114:11434 for Ollama CLI commands (ollama ls, ollama pull gpt-oss:120b, ollama run gpt-oss:120b) or use the Codex OSS route with CODEX_OSS_BASE_URL=http://100.113.1.114:11434/v1 codex --oss --model gpt-oss:120b to point the Codex CLI at the remote model. They also show using the llm-ollama plugin and saving OLLAMA_HOST as an env var to avoid repeated typing. This is significant because it’s a practical recipe for developers who want to run local/privately hosted large open models and drive them with familiar tooling (Codex CLI, Ollama, llm plugins) without routing through commercial cloud APIs. Key technical implications: you must expose Ollama securely (systemd override + bind address and port 11434), point clients via OLLAMA_HOST or CODEX_OSS_BASE_URL, and be mindful that gpt-oss:120b is capable but not on par with latest proprietary models (GPT-5/Sonnet 4.5); nevertheless it enables low-latency, private, portable development workflows.

Loading comments...

loading comments...