Deploying a ChatGPT clone (the hard way) (www.natebrake.com)

🤖 AI Summary
A developer built “BrakeChat,” a self-hosted ChatGPT clone that runs open models on personal hardware and is accessible from an iPhone PWA. The stack ties together an OpenWebUI fork (customized UI and Docker image), LM Studio on a Mac Mini (M4 Pro, 64GB unified memory) serving gpt-oss-20b with MLX weights, a BrakeChat backend on an Ubuntu desktop, a forked Notion MCP server for context/tooling, Google OAuth for authentication, and Cloudflare Tunnels to securely expose chat.natebrake.com. The author automated builds with GitHub Actions and deployed via docker-compose, and routed the iOS PWA to behave like a native app. Why it matters: the project shows a practical blueprint for running performant, private LLM services on consumer hardware while preserving control over data and tooling. Key technical takeaways include preferring MLX weights for Apple Silicon performance, sizing models to fit 64GB to enable long context windows, manually configuring LM Studio inference/context settings, enabling native tool-calling for gpt-oss-20b in OpenWebUI, and forking Notion’s MCP to add tool descriptions, filtering, and return markdown to avoid JSON bloat that explodes context length. The write-up highlights the many interoperability gotchas (OAuth, domain DNS migration to Cloudflare, OpenWebUI licensing limits) and demonstrates that smaller open models can be “good enough” for many everyday tasks while offering a replicable path for private, on-prem LLM deployments.
Loading comments...
loading comments...