Has Google solved two of AI's oldest problems? (generativehistory.substack.com)

🤖 AI Summary
Google appears to be A/B testing a mysterious new model in AI Studio—widely rumored to be Gemini‑3—that’s producing striking results on two long‑standing AI problems: handwriting recognition and abstract symbolic reasoning. Users report the model can generate surprisingly complex software from single prompts, and in a controlled test on historical handwriting (a 50‑document, 10k‑word benchmark used by the author and colleagues) it transcribed five very difficult documents (~1,000 words) with a strict character error rate (CER) of 1.7% and word error rate (WER) of 6.5%. For context, last‑gen Gemini‑2.5‑Pro reached about 4% CER and 11% WER on the same set, and specialist HTR tools typically sit around 8% CER without fine‑tuning. The tester used consistent system prompts and found most remaining errors were non‑semantic (punctuation/capitalization), suggesting near‑expert human performance on content-critical elements like names, dates, and numbers. Technically and culturally, this matters because handwriting transcription is a benchmark that requires tight coupling of vision and reasoning: ambiguous marks must be resolved with contextual, historical knowledge—not just probabilistic next‑token prediction. If these early results hold at scale, they imply predictive LLM architectures may be crossing a threshold where visual precision and expert symbolic reasoning cohere, enabling advances in archival research, scientific image interpretation, and any domain needing accurate vision+reasoning. Caveats remain: the sample was small, training‑set overlap can’t be ruled out, and broader evaluation is needed—but the preliminary gains signal a potentially major capability jump worth close attention.
Loading comments...
loading comments...