Merlin: A computed tomography vision–language foundation model and dataset (www.nature.com)

0 points 3 hours ago ago | visit original

🤖 AI Summary

Stanford researchers have unveiled Merlin, a groundbreaking vision-language foundation model specifically designed for analyzing abdominal computed tomography (CT) scans. Unlike previous models that focus primarily on 2D images and short reports, Merlin incorporates 3D volumetric CT data, radiology reports, and electronic health records, creating a comprehensive approach to medical image analysis. This model was trained using an extensive dataset comprising over 6 million images from 15,331 CT scans paired with diagnosis codes and radiology reports, demonstrating significant advancements in automated medical imaging capabilities. Merlin's introduction is particularly vital for the AI/ML community as it addresses the growing demand for efficient radiology solutions amid a shortage of professionals. The model has been rigorously evaluated across 752 tasks and exceeded the performance of existing 2D models and foundation models in radiology, showcasing its ability to perform zero-shot classification, prognosis predictions, and report generation. By providing access to the trained models and datasets through platforms like GitHub and Hugging Face, the researchers aim to empower further innovation in automated medical imaging, potentially alleviating the burden on radiologists and facilitating new avenues for biomarker discovery and disease risk assessments.

Loading comments...

loading comments...