Google Gemini 3 Pro Model Card [pdf] (web.archive.org)

🤖 AI Summary
Google published the Gemini 3 Pro model card (Nov 2025), detailing its next‑generation sparse mixture‑of‑experts (MoE) transformer with native multimodal inputs (text, images, audio, video, code). Key technical specs: a token context window up to 1M, 64K token output, MoE routing to decouple total capacity from per‑token compute, trained on TPUs with JAX/ML Pathways, and refined with instruction tuning, human preferences and reinforcement learning for multi‑step reasoning and theorem‑proving. The training mix includes publicly available datasets, crawled and licensed data, user data (per Google policies), and synthetic data; preprocessing used deduplication, robots.txt honoring, and safety filters (including CSAM/violent/explicit content removal). Why it matters: Google positions Gemini 3 Pro as its most capable model for complex, long‑context and agentic tasks (advanced coding, multimodal reasoning, tool use) and distributes it via Gemini App, Cloud/Vertex AI, API and other channels. Evaluations show significant gains over Gemini 2.5 Pro on reasoning and multimodal benchmarks, and mixed automated safety results (e.g., Image→Text +3.1%, Tone +7.9%, Text→Text automated safety shows a -10.4% delta that Google attributes largely to evaluation changes/false positives). Known limits include hallucinations, occasional timeouts and a Jan 2025 knowledge cutoff. For the AI community this advances scalable MoE multimodal capabilities and long‑context agents, while underscoring continued needs for transparent datasets, robust evaluations, and governance around user data and deployment policies.
Loading comments...
loading comments...