NeuroVFM

University of Michigan / University of Cologne

Generalist neuroimaging vision foundation model pretrained on 5.24M clinical MRI and CT volumes for radiologic diagnosis and report generation.

Released: November 2025

NeuroVFM is a generalist vision foundation model for clinical neuroimaging, designed to interpret brain MRI and CT studies across the full diversity of pathology seen in routine practice. Most prior medical imaging foundation models depend on carefully curated, annotation-rich datasets that are expensive to assemble and narrow in scope. NeuroVFM instead embraces "health system learning": it is pretrained directly on uncurated clinical data accumulated through ordinary patient care, allowing it to learn from the messy, heterogeneous, real-world distribution of scans that radiologists actually encounter.

The model was developed by the Machine Learning in Neurosurgery (MLiNS) Lab at the University of Michigan, with collaborators from University of Michigan Radiology, Computational Medicine and Bioinformatics, and Computer Science and Engineering, and the Department of Neurosurgery at the University of Cologne. It was released as a preprint in November 2025 (Kondepudi et al.). The authors demonstrate that frontier general-purpose vision-language models underperform on neuroimaging tasks, whereas a domain-specific model trained at scale on clinical data achieves state-of-the-art results.

NeuroVFM occupies a distinctive niche in the landscape of biomedical foundation models: rather than a protein or genomics model, it is a volumetric imaging encoder that pairs self-supervised visual pretraining with a downstream findings language model, bridging pixel-level brain anatomy and natural-language radiology reporting.

Key Features

Health system learning at scale: Pretrained on 5.24 million clinical MRI and CT volumes drawn from roughly 567,000 imaging studies spanning more than 20 years of routine care, without manual curation or expert annotation.
Volumetric self-supervision (Vol-JEPA): Uses a Volumetric Joint-Embedding Predictive Architecture, a 3D extension of the JEPA paradigm that predicts masked regions in latent space rather than reconstructing raw voxels.
Broad diagnostic coverage: Study-level diagnostic heads span 74 MRI and 82 CT diagnoses, built with multiple-instance learning over slice-level features.
Radiology report generation: A fine-tuned findings language model produces draft radiology reports that exceed frontier models in accuracy while reducing hallucinated findings.
Interpretable grounding: Exhibits emergent neuroanatomic understanding with visual grounding that links predictions back to relevant regions of the scan.

Technical Details

NeuroVFM is built on a 3D Vision Transformer encoder trained with the Vol-JEPA objective, which masks volumetric patches and predicts their representations in embedding space to learn anatomy and pathology without labels. Downstream capabilities are added in stages: multiple-instance learning trains study-level diagnostic heads covering 74 MRI and 82 CT diagnoses, and a Perceiver-style connector links the visual encoder to a large language model, which is supervised-fine-tuned to generate radiology findings. The training corpus comprises 5.24 million MRI and CT volumes from approximately 567,000 studies collected through clinical care. Released model variants include a neurovfm-encoder, CT diagnostic head (neurovfm-dx-ct), and findings LLM (neurovfm-llm); reported evaluations show state-of-the-art diagnostic accuracy and report quality relative to general-purpose frontier models. Code is released under the MIT license, while model weights are distributed under CC-BY-NC-SA 4.0 and gated behind institutional-email access approval.

Applications

NeuroVFM targets clinical and research neuroimaging workflows. Its encoder provides general-purpose volumetric representations that can be adapted to classification, detection, or retrieval tasks, while its diagnostic heads and findings model support automated triage, draft report generation, and decision support for radiologists and neurosurgeons. Researchers can use the released encoder as a feature backbone for downstream neuroimaging studies, and clinical informatics teams can explore it as a template for building generalist medical AI from existing health-system archives. The authors emphasize research-only use and explicitly note it is not a medical device.

Impact

NeuroVFM advances the case that uncurated, real-world clinical archives are a viable—and in some respects superior—substrate for training generalist medical AI, challenging the assumption that high-quality models require expensive curated benchmarks. By demonstrating that a domain-specialized volumetric model can outperform frontier vision-language systems on radiologic diagnosis and report generation while reducing hallucinations, it offers a concrete blueprint for "health system learning" in neuroimaging and beyond. As a recent preprint with gated weights, its broader adoption and external validation remain to be established, but it sharpens an important direction for safe, clinically grounded foundation models.

Citations

Health system learning achieves generalist neuroimaging models

Kondepudi, A., et al. (2025) Health system learning achieves generalist neuroimaging models. Research Square.

DOI: 10.21203/rs.3.rs-8166797/v1

Health system learning achieves generalist neuroimaging models

Preprint

Kondepudi, A., et al. (2025) Health system learning achieves generalist neuroimaging models. Research Square.

DOI: 10.48550/arXiv.2511.18640

Recent citations

Papers that recently cited this model.

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP
J. Bexten, Nico Scherf, B. Franczyk, et al.
May 2026
0
CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing
J. H. Rivera, Daniel K. Low, X. Xiong, et al.
Mar 2026
0Influential
Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke
A. Kapoor, A. Alyakin, Jin Vivian Lee, et al.
arXiv.org · Jan 2026
0

Top citations

The most-cited papers that cite this model.

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
Xinhai Hou, Shaoyuan Xu, Manan Biyani, et al.
arXiv.org · Nov 2025
10
CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing
J. H. Rivera, Daniel K. Low, X. Xiong, et al.
Mar 2026
0Influential
Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke
A. Kapoor, A. Alyakin, Jin Vivian Lee, et al.
arXiv.org · Jan 2026
0
From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP
J. Bexten, Nico Scherf, B. Franczyk, et al.
May 2026
0

Citations

Total Citations4

Influential1

References0

GitHub

Stars55

Forks7

Open Issues0

Contributors2

Last Push15d ago

LanguagePython

LicenseMIT

HuggingFace

Downloads599

Likes8

Last Modified15d ago

Pipelineimage-feature-extraction

Fields of citing research

Computer Science100%
Medicine75%
Biology25%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

57Partial

Usability — can I run it?70

Reproducibility — can I retrain it?36

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website HuggingFace Model Demo

Key Features

Health system learning at scale: Pretrained on 5.24 million clinical MRI and CT volumes drawn from roughly 567,000 imaging studies spanning more than 20 years of routine care, without manual curation or expert annotation.

Volumetric self-supervision (Vol-JEPA): Uses a Volumetric Joint-Embedding Predictive Architecture, a 3D extension of the JEPA paradigm that predicts masked regions in latent space rather than reconstructing raw voxels.

Broad diagnostic coverage: Study-level diagnostic heads span 74 MRI and 82 CT diagnoses, built with multiple-instance learning over slice-level features.

Radiology report generation: A fine-tuned findings language model produces draft radiology reports that exceed frontier models in accuracy while reducing hallucinated findings.

Interpretable grounding: Exhibits emergent neuroanatomic understanding with visual grounding that links predictions back to relevant regions of the scan.

Technical Details

Applications

Impact

Citations

Health system learning achieves generalist neuroimaging models

Kondepudi, A., et al. (2025) Health system learning achieves generalist neuroimaging models. Research Square.

DOI: 10.21203/rs.3.rs-8166797/v1

Health system learning achieves generalist neuroimaging models

Preprint

Kondepudi, A., et al. (2025) Health system learning achieves generalist neuroimaging models. Research Square.

DOI: 10.48550/arXiv.2511.18640

Recent citations

Papers that recently cited this model.

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

J. Bexten, Nico Scherf, B. Franczyk, et al.

May 2026

CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing

J. H. Rivera, Daniel K. Low, X. Xiong, et al.

Mar 2026

0Influential

Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke

A. Kapoor, A. Alyakin, Jin Vivian Lee, et al.

arXiv.org · Jan 2026

Top citations

The most-cited papers that cite this model.

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

Xinhai Hou, Shaoyuan Xu, Manan Biyani, et al.

arXiv.org · Nov 2025

CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing

J. H. Rivera, Daniel K. Low, X. Xiong, et al.

Mar 2026

0Influential

Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke

A. Kapoor, A. Alyakin, Jin Vivian Lee, et al.

arXiv.org · Jan 2026

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

J. Bexten, Nico Scherf, B. Franczyk, et al.

May 2026

NeuroVFM

#Key Features

#Technical Details

#Applications

#Impact

Citations

Health system learning achieves generalist neuroimaging models

Health system learning achieves generalist neuroimaging models

Recent citations

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing

Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke

Top citations

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing

Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

NeuroVFM

#Key Features

#Technical Details

#Applications

#Impact

Citations

Health system learning achieves generalist neuroimaging models

Health system learning achieves generalist neuroimaging models

Recent citations

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing

Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke

Top citations

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing

Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke

From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact