I build a tools to calculate how much VRAM is needed to run LLMs (www.kolosal.ai)

🤖 AI Summary
A new LLM Memory Calculator for GGUF models lets you instantly estimate the RAM required to load and run a model by reading its metadata (model size, hidden size, number of layers, attention/KV heads and KV cache) from a URL or upload. Rather than downloading the whole file, the tool uses HTTP range requests to fetch only metadata, computes a breakdown (model parameters, KV cache, total required memory) and flags whether your system is likely able to run the model. Results are presented as a quick, actionable baseline so you can avoid time-consuming trial-and-error or out-of-memory crashes. This is significant for developers, researchers and ops teams who must choose models for local experiments or production deployment: it makes hardware planning predictable, helps compare model variants, and reduces wasted GPU/CPU/RAM allocation. Key technical notes: it targets GGUF-formatted models, factors in attention heads, KV heads, hidden layers and context size when estimating KV cache, and highlights that estimates are metadata-driven baselines—not absolute guarantees. Actual memory usage can still vary with runtime (framework/runtime overhead, batch size, tokenization, quantization and parallelism), so use the calculator to pre-screen models and guide resource provisioning.
Loading comments...
loading comments...