Ossas (Open Source Summaries at Scale): The Inference.net × Laion × Grass (laion.ai)

🤖 AI Summary
Inference.net, LAION and Grass announced OSSAS (Open Source Summaries at Scale), a pipeline and initial release for producing standardized, JSON-structured summaries of scientific papers at massive scale. They crawled ~100M papers from public sources (deduplicated and augmented with bethgelab, LAION and Common Pile subsets), post-trained Qwen 3 14B and Nemotron 12B on GPT-5–generated targets, and released fine-tuned models plus 100k structured summaries (visualizer: https://laion.inference.net/). The schema captures title/authors/field, executive summary, research questions, methods, architectures, numeric key results, contradictions/limitations, data/code availability, ethics, key figures and three takeaways—designed to be both human- and machine-consumable. A post-training subset used 110k papers (100k train / 10k val); strict prompts enforced schema alignment. For evaluation they used an ensemble “LLM-as-a-Judge” (GPT-5, Gemini 2.5 Pro, Claude 4.5) and a QA benchmark of generated multiple-choice questions. Fine-tuned Qwen 3 scored 4.21 vs GPT-5’s 4.81 (1–5 rubric) and achieved 73.9% QA accuracy (GPT-5: 74.6%); Nemotron 12B FT reached 4.10 and 71.3% QA accuracy. Nemotron also delivered ~2.25× throughput versus Qwen on 8×H200, making it preferable for bulk processing. The project highlights practical, open alternatives to closed models, shows decentralized GPU networks can cut costs dramatically, and offers structured summaries as search/triage inputs and training data—while cautioning about hallucinations, context limits, and the need to verify critical details against source texts.
Loading comments...
loading comments...