Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training (github.com)

0 points 98 days ago ago | visit original

🤖 AI Summary

A new method for enhancing large language models (LLMs) without training has been demonstrated, yielding significant improvements in logical reasoning capabilities. By replicating three specific layers in models such as Qwen2.5-32B and Devstral-24B using David Ng's RYS method, a notable 17% increase in reasoning performance was achieved. Specifically, logical deduction scores improved from 0.22 to 0.76 on the BBH benchmark. This technique involves routing hidden states through identical layers during the forward pass, which effectively allows the model to process information twice without modifying any weights. The significance of this approach to the AI/ML community lies in its revelation of "reasoning circuits" within transformer architectures. It suggests that certain contiguous blocks of layers serve as cohesive cognitive units, and by duplicating these blocks, models can exhibit enhanced reasoning capabilities. The research established distinct layer configurations that lead to different cognitive profiles, improving functionalities like math specialization or emotional intelligence while requiring minimal additional resources. This toolkit, validated on two AMD GPUs in just one evening, provides an accessible path for further exploration and optimization of existing transformer models, potentially opening doors for more efficient reasoning enhancements.

Loading comments...

loading comments...