M3FM (Lung Cancer Screening)

Rensselaer Polytechnic Institute / Wake Forest University School of Medicine / Massachusetts General Hospital

865M-parameter multimodal foundation model that fuses 3D low-dose chest CT with clinical data to answer 17 lung cancer screening questions.

Released: February 2025

Parameters: 865 Million

M3FM (Medical Multimodal Multitask Foundation Model) is a foundation model for low-dose chest CT lung cancer screening that unifies imaging and clinical information into a single question-answering system. Lung cancer screening generates rich, heterogeneous data — 3D CT volumes alongside demographics, smoking history, prior disease, family history, and laboratory results — yet most deep-learning tools are built for one task and one modality at a time. M3FM reframes the entire screening workflow as a multimodal question-answering problem, so that a single model can answer many clinically distinct questions from any available combination of inputs.

The model was developed by Chuang Niu, Qing Lyu, Ge Wang and colleagues at Rensselaer Polytechnic Institute, with clinical collaborators at Wake Forest University School of Medicine and Massachusetts General Hospital / Harvard Medical School, and published in Nature Communications in February 2025. It was pretrained on a large multimodal corpus — 49 clinical data types, more than 163,000 chest CT series, and 17 screening-related tasks — drawn from the National Lung Screening Trial (NLST) and MIDRC, then validated on independent cohorts.

By learning shared representations across tasks and modalities, M3FM demonstrates synergistic multitasking: training on related questions jointly improves performance on each, and the model can be extended to new tasks using only small, out-of-distribution datasets. This positions it as a flexible backbone for the broader effort to build clinical foundation models from real-world screening data.

Key Features

Unified question-answering interface: Clinical tasks are posed as free-text prompts, letting one model perform nodule characterization, risk prediction, abnormality detection, and Lung-RADS categorization without task-specific heads.
Flexible multimodal fusion: A CT vision transformer and a text transformer encode any available combination of imaging and the 49 clinical data types, so the model degrades gracefully when inputs are missing.
Multi-scale, voxel-aware imaging: Multi-scale linear tokenizers and explicit voxel-size embeddings let M3FM process variable-resolution 3D CT volumes at native spacing without resampling.
Synergistic multitask learning: Distributed task-parallel training across 17 tasks improves individual task performance and supports adding new tasks from small datasets.
Open code and data: The implementation is released under the MIT License and the curated OpenM3Chest screening dataset is deposited on Zenodo.

Technical Details

M3FM combines a CT Vision Transformer (CTViT) that tokenizes 3D low-dose CT volumes with multi-scale embeddings, a text transformer that encodes clinical fields and task instructions via byte-level BPE, a task encoder, and lightweight MLP predictors. It is offered in three sizes — Base (257M), Large (502M), and Huge (865M parameters). Pretraining used 49 clinical data types, 163,725 chest CT series and 17 tasks, with the imaging corpus drawn primarily from NLST (125,090 CT scans from 26,254 patients) and MIDRC. On internal and independent test sets, M3FM reached an AUC of 0.940 for 1-year lung cancer risk and improved cardiovascular-disease mortality risk prediction by up to ~10% over prior models; on the external MGH cohort it improved 1-year lung cancer risk AUC by roughly 20% relative to previous methods, and it outperformed task-specific baselines such as Sybil and Tri2D-Net across the benchmark suite.

Applications

M3FM is aimed at the lung cancer screening setting, where radiologists and screening programs must extract multiple clinically actionable signals — cancer risk, cardiovascular risk, nodule characteristics, and incidental chest abnormalities — from a single CT exam plus the patient's record. Because it accepts arbitrary combinations of imaging and clinical data and answers free-text questions, it can be adapted to institution-specific tasks or new endpoints with small fine-tuning datasets, making it a practical foundation for research on opportunistic screening and multi-outcome risk stratification.

Impact

M3FM is a notable demonstration that a single multimodal multitask foundation model can match or exceed specialized models across the diverse tasks embedded in lung cancer screening, while remaining extensible to new ones. Its released code (MIT) and the OpenM3Chest dataset lower the barrier for reproducing and building on medical imaging foundation models. Key limitations include reliance on large curated CT cohorts for pretraining, restricted access to some clinical validation datasets, and the need for prospective clinical evaluation before deployment in screening workflows.

Citation

Medical multimodal multitask foundation model for lung cancer screening

Niu, C., et al. (2025) Medical multimodal multitask foundation model for lung cancer screening. Nature Communications.

DOI: 10.1038/s41467-025-56822-w

Recent citations

Papers that recently cited this model.

Generative Artificial Intelligence and Large Language Models in Clinical Oncology
Yunfang Yu, Zhenhui Zhao, Zehua Wang, et al.
MedComm · Jun 2026
0
Foundation model for screening severe mitral regurgitation and severe aortic stenosis from coronary angiograms
Yingqian Zhang, Zhiming Shao, Zechen Wei, et al.
Visual Computing for Industry, Biomedicine, and Art · Jun 2026
0
Overview of State-of-the-Art Learning-Based Classification Methods in Medical Imaging.
Nafiseh Ghaffar Nia, Rayyan Manwar, K. Avanaki
Annals of Biomedical Engineering · Jun 2026
0

Top citations

The most-cited papers that cite this model.

Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset
L. Blankemeier, J. Cohen, Ashwin Kumar, et al.
Nature · Jun 2024
128
From Classical Machine Learning to Emerging Foundation Models: Review on Multimodal Data Integration for Cancer Research
A. Muneer, M. Waqas, Maliazurina B. Saad, et al.
Artificial Intelligence Review · Jul 2025
19
Oxidative Stress and Inflammation in Hypoxemic Respiratory Diseases and Their Comorbidities: Molecular Insights and Diagnostic Advances in Chronic Obstructive Pulmonary Disease and Sleep Apnea
J. Rodríguez-Pérez, R. Andreu-Martínez, Roberto Daza, et al.
Antioxidants · Jul 2025
15
Improving Representation of High-Frequency Components for Medical Visual Foundation Models
Yuetan Chu, Yilan Zhang, Zhongyi Han, et al.
IEEE Transactions on Medical Imaging · Jul 2024
9
A narrative review of the prediction of immunotherapy efficacy for treating NSCLC: An artificial intelligence perspective
Shaowei Wu, A. Zhuang, Gengda Huang, et al.
Intelligent Oncology · Jun 2025
8

Citations

Total Citations63

Influential4

References66

GitHub

Stars59

Forks13

Open Issues2

Contributors1

Last Push1y ago

LanguageJupyter Notebook

LicenseMIT

Fields of citing research

Medicine98%
Computer Science86%
Engineering26%
Environmental Science5%
Chemistry4%
Biology4%
Materials Science2%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

71Open

Usability — can I run it?87

Reproducibility — can I retrain it?66

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Dataset

Key Features

Unified question-answering interface: Clinical tasks are posed as free-text prompts, letting one model perform nodule characterization, risk prediction, abnormality detection, and Lung-RADS categorization without task-specific heads.

Flexible multimodal fusion: A CT vision transformer and a text transformer encode any available combination of imaging and the 49 clinical data types, so the model degrades gracefully when inputs are missing.

Multi-scale, voxel-aware imaging: Multi-scale linear tokenizers and explicit voxel-size embeddings let M3FM process variable-resolution 3D CT volumes at native spacing without resampling.

Synergistic multitask learning: Distributed task-parallel training across 17 tasks improves individual task performance and supports adding new tasks from small datasets.

Open code and data: The implementation is released under the MIT License and the curated OpenM3Chest screening dataset is deposited on Zenodo.

Technical Details

Applications

Impact

M3FM (Lung Cancer Screening)

#Key Features

#Technical Details

#Applications

#Impact

Citation

Medical multimodal multitask foundation model for lung cancer screening

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

M3FM (Lung Cancer Screening)

#Key Features

#Technical Details

#Applications

#Impact

Citation

Medical multimodal multitask foundation model for lung cancer screening

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact