Show HN: Steerling-8B, a language model that can explain any token it generates (www.guidelabs.ai)

🤖 AI Summary
The Guide Labs Team has unveiled Steerling-8B, a groundbreaking language model designed to be inherently interpretable, marking a significant advance in AI/ML. Capable of tracing every generated token back to its input context, human-understandable concepts, and specific training data, Steerling-8B achieves competitive performance with less training data than its contemporaries—trained on 1.35 trillion tokens but performing similarly to models trained on 2-7 times as much. This model allows for real-time concept manipulation during inference, enabling applications like targeted concept suppression or amplification without the need for retraining. Built on a causal discrete diffusion model with a unique architecture that decomposes embeddings into supervised and discovered concepts, Steerling-8B ensures its predictions are genuinely rooted in its conceptual frameworks. Nearly 84% of the contributions to its token-level predictions derive from the concept module, highlighting its structured approach to interpretability. The model also exhibits impressive capabilities in concept detection and attribution, with plans for further exploration into concept steering, discovery, and data provenance. This sets a new standard for transparency and control in language models, with implications for alignment and ethical AI applications in the near future.
Loading comments...
loading comments...