Show HN: Architecture of my multi-region SaaS built on self-hosted LLMs (github.com)

0 points 2 hours ago ago | visit original

🤖 AI Summary

Engineer showcases the architecture of MECKs Translator, a closed-source, multi-region AI SaaS built around a self-hosted LLM stack. The platform is composed of four cooperating services: the Core Bot (a Node.js 22 Discord gateway using discord.js ShardingManager with detached worker backends), a Next.js 15 Dashboard for SaaS control and billing (Stripe), a Vite/Tailwind landing site with newsletter and status APIs, and an interactive React-based demos site. Deployments use Docker on Fly.io across regions (e.g., fra, iad) with process groups (gateway/worker). Resilience and low-latency routing come from Upstash Redis for multi-layer caching, distributed locks, regional worker queues and failover; persistent storage and funnels use PostgreSQL. Observability is handled by a Leader-Shard-Scheduler that aggregates cron jobs and BI metrics, while multi-stage caching (Redis + Postgres) and Usage Normalization enforce strict cost control. The stack also includes self-hosted LLM endpoints (Llama variants) to avoid third‑party API costs, plus real-time cache invalidation to push config changes from the dashboard to bots across regions. For the AI/ML community this is a practical blueprint for running scalable, cost-conscious LLM services in production: it shows how to combine sharding, regional routing, background queues, and multi-stage caching to manage throughput and costs, and how self-hosted models can be integrated into a SaaS lifecycle with observability, billing, and real-time config. Key takeaways include architecture patterns for horizontal scaling on messaging platforms, operational strategies for MLOps with self-hosted endpoints, and concrete integration points (Redis, Postgres, Fly.io, Docker, Next.js) for multi-region, production-grade LLM deployments.

Loading comments...

loading comments...