Rensselaer Polytechnic Institute / Wake Forest University School of Medicine / Massachusetts General Hospital
Multimodal multitask foundation model that fuses 3D low-dose chest CT with clinical data to perform 17 lung-cancer-screening tasks via free-text question answering.
M3FM (Medical Multimodal Multitask Foundation Model) is a foundation model for low-dose chest CT lung cancer screening that unifies imaging and clinical information into a single question-answering system. Lung cancer screening generates rich, heterogeneous data — 3D CT volumes alongside demographics, smoking history, prior disease, family history, and laboratory results — yet most deep-learning tools are built for one task and one modality at a time. M3FM reframes the entire screening workflow as a multimodal question-answering problem, so that a single model can answer many clinically distinct questions from any available combination of inputs.
The model was developed by Chuang Niu, Qing Lyu, Ge Wang and colleagues at Rensselaer Polytechnic Institute, with clinical collaborators at Wake Forest University School of Medicine and Massachusetts General Hospital / Harvard Medical School, and published in Nature Communications in February 2025. It was pretrained on a large multimodal corpus — 49 clinical data types, more than 163,000 chest CT series, and 17 screening-related tasks — drawn from the National Lung Screening Trial (NLST) and MIDRC, then validated on independent cohorts.
By learning shared representations across tasks and modalities, M3FM demonstrates synergistic multitasking: training on related questions jointly improves performance on each, and the model can be extended to new tasks using only small, out-of-distribution datasets. This positions it as a flexible backbone for the broader effort to build clinical foundation models from real-world screening data.
M3FM combines a CT Vision Transformer (CTViT) that tokenizes 3D low-dose CT volumes with multi-scale embeddings, a text transformer that encodes clinical fields and task instructions via byte-level BPE, a task encoder, and lightweight MLP predictors. It is offered in three sizes — Base (257M), Large (502M), and Huge (865M parameters). Pretraining used 49 clinical data types, 163,725 chest CT series and 17 tasks, with the imaging corpus drawn primarily from NLST (125,090 CT scans from 26,254 patients) and MIDRC. On internal and independent test sets, M3FM reached an AUC of 0.940 for 1-year lung cancer risk and improved cardiovascular-disease mortality risk prediction by up to ~10% over prior models; on the external MGH cohort it improved 1-year lung cancer risk AUC by roughly 20% relative to previous methods, and it outperformed task-specific baselines such as Sybil and Tri2D-Net across the benchmark suite.
M3FM is aimed at the lung cancer screening setting, where radiologists and screening programs must extract multiple clinically actionable signals — cancer risk, cardiovascular risk, nodule characteristics, and incidental chest abnormalities — from a single CT exam plus the patient's record. Because it accepts arbitrary combinations of imaging and clinical data and answers free-text questions, it can be adapted to institution-specific tasks or new endpoints with small fine-tuning datasets, making it a practical foundation for research on opportunistic screening and multi-outcome risk stratification.
M3FM is a notable demonstration that a single multimodal multitask foundation model can match or exceed specialized models across the diverse tasks embedded in lung cancer screening, while remaining extensible to new ones. Its released code (MIT) and the OpenM3Chest dataset lower the barrier for reproducing and building on medical imaging foundation models. Key limitations include reliance on large curated CT cohorts for pretraining, restricted access to some clinical validation datasets, and the need for prospective clinical evaluation before deployment in screening workflows.
Niu, C., et al. (2025) Medical multimodal multitask foundation model for lung cancer screening. Nature Communications.
DOI: 10.1038/s41467-025-56822-wPapers that recently cited this model.
The most-cited papers that cite this model.
Not enough data