🤖 AI Summary
Norway’s National Library is embarking on an ambitious project to develop a large language model (LLM) tailored to the Norwegian language, utilizing 2 petabytes of Huawei OceanStor Dorado flash storage for its AI training data pipeline. Marius Husnes, the library's Head of IT Platform, emphasized the need for sovereign LLMs, arguing that countries lacking such models are at a disadvantage, as general models trained primarily in English do not adequately capture local history, culture, and language nuances. The library holds the largest collection of digitalized Norwegian literature, newspapers, and multimedia due to a legal mandate to preserve the nation's cultural heritage, making it uniquely positioned for this endeavor.
The LLM training process leverages advanced infrastructure, including an Nvidia DGX H200 system and Norway's Sigma2 Olivia supercomputer, featuring 448 GPUs and over 64,000 CPU cores. Key challenges include managing the transition between a preservation storage optimized for durability and a high-throughput AI pipeline, as well as establishing evaluation tools and governance for the LLM. Husnes noted the absence of existing tools to assess the model's performance, prompting them to create their own evaluation criteria. This intricate project not only highlights the significance of local LLMs for non-English-speaking nations but also showcases Huawei’s crucial role in the European AI landscape.
Loading comments...
login to comment
loading comments...
no comments yet