Local LLMs Directory [with VRAM Calculator] (apxml.com)

🤖 AI Summary
A community-maintained “Local LLMs Directory” with a built‑in VRAM calculator catalogues dozens of recent models and variants — from 14B instruction-tuned models (Phi‑4 Reasoning Plus) up to trillion‑parameter bases (Moonshot AI’s Kimi K2 1T) — and reports context windows (33K to 1M tokens) and release dates. Standouts include Meta’s Llama 4 Maverick (400B, 1M context), Alibaba’s Qwen3 family (14B–480B with up to 262K context), DeepSeek’s 671B line, Mistral Large 123B, Z.ai’s GLM‑4.5 variants, and OpenAI’s GPT‑OSS releases (≈21B and 117B). The directory is designed to help practitioners compare model scale, memory footprint and long‑context capability at a glance. Significance: the list highlights two converging trends — ever larger parameter counts (hundreds of billions to 1T) and rapidly expanding context windows — which directly drive inference memory and latency requirements. The VRAM calculator is the practical value: it lets engineers estimate GPU/host memory, plan for multi‑GPU sharding, offloading, or quantization strategies (8‑bit/4‑bit, QLoRA) required to run particular checkpoints locally. For researchers and deployers, this resource speeds hardware planning and tradeoffs (context length vs. batch size, model size vs. quantization) and surfaces which models are feasible for on‑premise or edge usage versus those that still demand large cluster resources.
Loading comments...
loading comments...