University of Michigan / University of Cologne
A generalist neuroimaging vision foundation model pretrained on 5.24M clinical MRI and CT volumes for radiologic diagnosis and report generation.
NeuroVFM is a generalist vision foundation model for clinical neuroimaging, designed to interpret brain MRI and CT studies across the full diversity of pathology seen in routine practice. Most prior medical imaging foundation models depend on carefully curated, annotation-rich datasets that are expensive to assemble and narrow in scope. NeuroVFM instead embraces "health system learning": it is pretrained directly on uncurated clinical data accumulated through ordinary patient care, allowing it to learn from the messy, heterogeneous, real-world distribution of scans that radiologists actually encounter.
The model was developed by the Machine Learning in Neurosurgery (MLiNS) Lab at the University of Michigan, with collaborators from University of Michigan Radiology, Computational Medicine and Bioinformatics, and Computer Science and Engineering, and the Department of Neurosurgery at the University of Cologne. It was released as a preprint in November 2025 (Kondepudi et al.). The authors demonstrate that frontier general-purpose vision-language models underperform on neuroimaging tasks, whereas a domain-specific model trained at scale on clinical data achieves state-of-the-art results.
NeuroVFM occupies a distinctive niche in the landscape of biomedical foundation models: rather than a protein or genomics model, it is a volumetric imaging encoder that pairs self-supervised visual pretraining with a downstream findings language model, bridging pixel-level brain anatomy and natural-language radiology reporting.
NeuroVFM is built on a 3D Vision Transformer encoder trained with the Vol-JEPA objective, which masks volumetric patches and predicts their representations in embedding space to learn anatomy and pathology without labels. Downstream capabilities are added in stages: multiple-instance learning trains study-level diagnostic heads covering 74 MRI and 82 CT diagnoses, and a Perceiver-style connector links the visual encoder to a large language model, which is supervised-fine-tuned to generate radiology findings. The training corpus comprises 5.24 million MRI and CT volumes from approximately 567,000 studies collected through clinical care. Released model variants include a neurovfm-encoder, CT diagnostic head (neurovfm-dx-ct), and findings LLM (neurovfm-llm); reported evaluations show state-of-the-art diagnostic accuracy and report quality relative to general-purpose frontier models. Code is released under the MIT license, while model weights are distributed under CC-BY-NC-SA 4.0 and gated behind institutional-email access approval.
NeuroVFM targets clinical and research neuroimaging workflows. Its encoder provides general-purpose volumetric representations that can be adapted to classification, detection, or retrieval tasks, while its diagnostic heads and findings model support automated triage, draft report generation, and decision support for radiologists and neurosurgeons. Researchers can use the released encoder as a feature backbone for downstream neuroimaging studies, and clinical informatics teams can explore it as a template for building generalist medical AI from existing health-system archives. The authors emphasize research-only use and explicitly note it is not a medical device.
NeuroVFM advances the case that uncurated, real-world clinical archives are a viable—and in some respects superior—substrate for training generalist medical AI, challenging the assumption that high-quality models require expensive curated benchmarks. By demonstrating that a domain-specialized volumetric model can outperform frontier vision-language systems on radiologic diagnosis and report generation while reducing hallucinations, it offers a concrete blueprint for "health system learning" in neuroimaging and beyond. As a recent preprint with gated weights, its broader adoption and external validation remain to be established, but it sharpens an important direction for safe, clinically grounded foundation models.
Kondepudi, A., et al. (2025) Health system learning achieves generalist neuroimaging models. Research Square.
DOI: 10.21203/rs.3.rs-8166797/v1Kondepudi, A., et al. (2025) Health system learning achieves generalist neuroimaging models. Research Square.
DOI: 10.48550/arXiv.2511.18640Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data