Grok 4 Fast now has 2M context window (docs.x.ai)

🤖 AI Summary
xAI’s Grok lineup now exposes much larger context windows and clearer billing/tooling rules: several Grok models support multi-million-token contexts (grok-4-fast variants show up to 4M tokens, while certain releases like grok-4-0709 and grok-code-fast-1 are listed at 2M). Pricing is per million tokens (with per-model rates) and most Grok models carry a 480 rate-limit tier; token usage is split into input, reasoning (internal chain-of-thought), completion, image, and cached prompt tokens for billing. Cached prompt tokens can reduce cost for repeated prompts. Practically, this enables long-context use cases—large documents, extended conversations, and complex multi-step reasoning—while changing how developers must design requests: Grok 4 is a reasoning-first model (no “non-reasoning” mode), and parameters like presencePenalty, frequencyPenalty, stop, and reasoning_effort aren’t supported (they’ll error). Agentic tool calls are billed both for tokens and for tool invocations (Web/X/Code Execution ~$10/1k calls) but are free until Nov 21, 2025; Live Search will be deprecated by Dec 15, 2025. Other limits: image inputs up to 20 MiB, unlimited image count, and model knowledge cutoffs at Nov 2024. For builders, the takeaway is straightforward: much longer contexts and agentic capabilities, but watch parameter compatibility, token accounting (including agent internal reasoning), and tool-invocation costs when architecting workflows.
Loading comments...
loading comments...