🤖 AI Summary
Arcee has announced the release of Trinity Large, a groundbreaking American open-source foundation model featuring a unique architecture of 400 billion parameters with a sparse Mixture of Experts (MoE) design. This model brings a significant leap in efficiency by utilizing only 13 billion active parameters per token and 256 experts, with 4 experts activated per token. Three variants—Trinity-Large-Preview, Trinity-Large-Base, and TrueBase—are being made available, each catering to different use cases, including chat-ready capabilities and robust pretraining.
Trinity Large's training utilized an unprecedented scale, employing 2048 Nvidia B300 GPUs to process 17 trillion tokens of curated data over just 33 days, marking it as one of the largest publicly stated pretraining runs. This efficient training approach not only enhances the model's performance across diverse benchmarks, including math and scientific reasoning, but it also includes advanced techniques like z-loss and momentum adjustments in expert routing to maintain stability. With a focus on real-world application, Trinity Large aims to provide researchers and developers with cutting-edge open-source technology, encouraging community engagement to refine and enhance its capabilities further.
Loading comments...
login to comment
loading comments...
no comments yet