🤖 AI Summary
Researchers framed pictographic character recognition as a program-synthesis problem and trained a vision-language model to “decompile” raster images into executable vector programs built from Bézier curve primitives. Instead of predicting pixels or labels, the model outputs a compact geometric description (a sequence of Bézier strokes) that exactly reconstructs characters. It outperforms strong zero-shot baselines — including GPT-4o — on this task and produces editable, interpretable vector representations rather than opaque image embeddings.
The standout result is zero-shot generalization: a model trained only on modern Chinese characters successfully reconstructs ancient Oracle Bone Script, suggesting it learned an abstract, transferable geometric grammar rather than memorizing surface patterns. Technical implications include better alignment between vision, language and mathematical structure, improved interpretability and editability (vector outputs are executable), and new possibilities for robust OCR, font/vectorization, and digital humanities work on historical scripts. The work highlights a path from pixel-level recognition to structured, programmatic scene understanding that could benefit other domains where geometry and symbolic composition matter.
Loading comments...
login to comment
loading comments...
no comments yet