Project OSSAS: Custom LLMs to Process 100M Research Papers (inference.net)

0 points 248 days ago ago | visit original

🤖 AI Summary

Project OSSAS, announced by the Alexandria team in collaboration with LAION and Wynd Labs, is an open-science initiative to convert the world’s scientific literature into standardized, machine-readable summaries. Built on Project Alexandria’s “Knowledge Units” legal/technical foundation, OSSAS fine-tunes open models to produce JSON-formatted summaries (classifying texts as SCIENTIFIC_TEXT, PARTIAL, or NON_SCIENTIFIC) that capture title, authors, methodology, results, claims and more. The project ships an initial batch of 100,000 structured summaries, an interactive UMAP-based visualizer (aella.inference.net), and a plan to scale to 100 million papers using idle compute contributed through a permissionless Inference Devnet. Technically, OSSAS post-trained Qwen 3 14B and Nemotron 12B on a curated 110k-paper subset, using outputs from top closed models as targets. Evaluation used an LLM-ensemble judge (GPT-5, Gemini 2.5 Pro, Claude 4.5 Sonnet) and a QA benchmark; fine-tuned models score within ~15% of GPT-5 on holistic ratings and Qwen 3 14B scored 73.9% vs GPT-5’s 74.6% on the QA task. Nemotron yields ~2.25x higher throughput on 8×H200 nodes, and the economics are stark: OSSAS estimates processing 100M papers for under $100k versus >$5M with GPT-5 (≈50× cost reduction). Complementary systems—LOGIC log-prob verification, a Solana staking payout protocol, and cryptographic integrity measures—aim to ensure honest, auditable inference. The result is a practical, low-cost pathway to searchable, comparable, and auditable scientific knowledge at unprecedented scale.

Loading comments...

loading comments...