Visual Features Across Modalities: SVG and ASCII Art Cross-Modal Understanding (transformer-circuits.pub)

🤖 AI Summary
Anthropic’s interpretability team reports that LLMs learn cross-modal, high-level semantic features that recognize and generate visual concepts encoded as text—like eyes, mouths, dogs and cats—across ASCII art, SVG code, and prose in multiple languages. Using sparse autoencoders on middle layers of models from Haiku 3.5 to Sonnet 4.5, researchers found features that activate on the same semantic parts (e.g., “eye”) regardless of modality, but only when sufficient contextual cues are present (position in SVG, surrounding ASCII lines, element order). Crucially, some of these “motor” features not only perceive concepts but also steer generation: manipulating them converts smiles to frowns, produces neutral expressions, or adds ears/whiskers while preserving style. Activations are robust to surface changes (color, radius), show smooth interpolation with steering strength, and reveal pareidolia-like behavior where the model interprets shapes as parts in context. They also introduce Data Point Initialization (DPI) for dictionary learning in sparse autoencoders and weakly causal crosscoders: weight matrices are seeded with noisy real data points to place parameters in high-density activation regions. DPI yields clear empirical gains—e.g., a 524k-feature SAE saw ~17% L0 reduction and ~4% MSE improvement; WCCs showed smaller but positive gains (~1% L0, ~2.3% MSE). Together these findings advance interpretability and controllable generation for text-based visual content, suggest new routes to identify steering-capable features, and raise open questions about abstract semantics, the mechanisms that map low-level text to concepts, and how internal “motor” representations arise.
Loading comments...
loading comments...