FMCIB (Foundation Model for Cancer Imaging Biomarkers)

Harvard Medical School / Dana-Farber Cancer Institute / Brigham and Women's Hospital / Massachusetts General Hospital / Maastricht University / Aarhus University / Stanford University

Self-supervised 3D CT foundation model that extracts general-purpose tumor representations for cancer imaging biomarker discovery and prognosis.

Released: March 2024

FMCIB (Foundation Model for Cancer Imaging Biomarkers) is a self-supervised deep learning model that learns general-purpose representations of tumors and other lesions directly from three-dimensional computed tomography (CT) scans. Rather than training a bespoke network for each clinical question, FMCIB provides a single pretrained encoder whose features can be adapted — by linear probing or fine-tuning — to a wide range of cancer imaging tasks. The work was developed by the Artificial Intelligence in Medicine (AIM) program led by Hugo J. W. L. Aerts, with collaborators across Harvard Medical School, Mass General Brigham (Brigham and Women's Hospital and Massachusetts General Hospital), Dana-Farber Cancer Institute, Maastricht University, Aarhus University, and Stanford University, and was published in Nature Machine Intelligence in March 2024.

The model addresses a persistent bottleneck in quantitative imaging: hand-crafted radiomic features and task-specific supervised models are brittle, data-hungry, and difficult to transfer across institutions, cancer types, and scanners. FMCIB instead borrows the foundation-model paradigm that reshaped natural language and protein science, pretraining on a large, unlabeled corpus of lesions so that downstream tasks can be solved with comparatively little labeled data. It is one of the first demonstrations that contrastive self-supervised pretraining on radiographic lesions yields broadly transferable imaging biomarkers.

Key Features

General-purpose lesion encoder: A single pretrained model produces fixed feature embeddings for any 3D CT lesion, which can be reused across malignancy classification, anatomical site identification, and survival prognostication.
Self-supervised pretraining: Uses a modified SimCLR contrastive learning scheme that requires no manual labels during pretraining, learning invariances from lesion-centered 3D crops.
Label efficiency: Because the encoder is pretrained, downstream tasks reach strong performance with limited annotated data, including in low-data regimes where supervised baselines degrade.
Flexible adaptation: Supports both frozen linear-probe use (embeddings plus a lightweight classifier) and full fine-tuning, with reproducible YAML-driven configurations.
Open release: Code is MIT-licensed, weights are archived on Zenodo, and a pip package (foundation-cancer-image-biomarker) lets users extract features in a few lines of code.

Technical Details

FMCIB uses a 3D ResNet50 convolutional encoder trained with a SimCLR-style contrastive objective adapted for volumetric medical imaging. Pretraining used 11,467 radiographic lesions drawn from 5,513 unique CT scans across 2,312 patients in the DeepLesion dataset, with lesion-centered 3D patches as the model input. The learned representations were evaluated on three clinically motivated use cases. For lesion anatomical site classification the model reached a balanced accuracy of 0.804 and mean average precision of 0.857; for lung nodule malignancy prediction it achieved an AUC of 0.944 and mAP of 0.953; and for non-small-cell lung cancer (NSCLC) 2-year survival prognostication it reached an AUC of 0.638 on the LUNG1 cohort and 0.653 on the RADIO cohort. Across tasks, the foundation model matched or exceeded supervised and conventional radiomic baselines, with the largest gains in low-data and stability settings. Final model weights, extracted features, and predictions are released on Zenodo (DOI 10.5281/zenodo.10528450).

Applications

FMCIB targets researchers and clinical scientists building quantitative imaging biomarkers from CT data without assembling large labeled cohorts for every task. Typical uses include screening lung nodules for malignancy, identifying lesion anatomical context, stratifying patient prognosis, and prototyping new imaging-derived endpoints for oncology studies. Because the encoder produces reusable embeddings, it integrates naturally into radiomics and computational pathology pipelines, retrospective cohort analyses, and clinical trial biomarker development, where consistent, transferable features across sites and scanners are valuable.

Impact

FMCIB helped establish that the foundation-model approach — large-scale self-supervised pretraining followed by lightweight adaptation — transfers to 3D oncologic imaging, an area long dominated by hand-engineered radiomics and narrow supervised models. By publicly releasing code, weights, and a simple feature-extraction package, the Aerts lab lowered the barrier to building imaging biomarkers and provided a reusable baseline that subsequent CT foundation-model studies have built upon and benchmarked against. Its main limitations follow from its design: pretraining is restricted to CT lesions from the DeepLesion dataset, performance on harder endpoints such as long-term survival remains modest, and broader generalization to other modalities and cancer types requires further validation.

Citation

Foundation model for cancer imaging biomarkers

Pai, S., et al. (2024) Foundation model for cancer imaging biomarkers. Nature Machine Intelligence.

DOI: 10.1038/s42256-024-00807-9

Recent citations

Papers that recently cited this model.

A self-supervised learning approach for intelligent surface roughness monitoring in thin-walled component machining
Yaoguo Ma, Changfeng Yao, Liang Tan
Mechanical systems and signal processing · Aug 2026
0
Vision Foundation Models in Radiology: A Scoping Review of Data, Methodology, Evaluation and Clinical Translation
A. Vergara-Richart, Xavier Rafael-Palou, A. Fuster-Matanzo, et al.
Jul 2026
0Influential
Foundation Models vs. Radiomics for Lung Computed Tomography: A Benchmark of Feature Extractors, Classification Heads, and Segmentation Choices
Nils Neukirch, Martin H. Maurer, Nils Strodthoff
Jul 2026
0

Top citations

The most-cited papers that cite this model.

Artificial intelligence in drug development
Kang Zhang, Xin Yang, Yifei Wang, et al.
Nature Medicine · Jan 2025
262
A review of deep learning for brain tumor analysis in MRI
Felix J. Dorfner, Jay B. Patel, Jayashree Kalpathy-Cramer, et al.
npj Precision Oncology · Jan 2025
136
Medical digital twins: enabling precision medicine and medical artificial intelligence
C. Sadée, S. Testa, T. Barba, et al.
The Lancet Digital Health · Jun 2025
100
A semantic-enhanced multi-modal remote sensing foundation model for Earth observation
Kang Wu, Yingying Zhang, Lixiang Ru, et al.
Nature Machine Intelligence · Aug 2025
70
Acquired resistance in cancer: towards targeted therapeutic strategies
A. Soragni, Erik S. Knudsen, Thomas N. O’Connor, et al.
Nature Reviews. Cancer · Jun 2025
69

Citations

Total Citations194

Influential20

References57

GitHub

Stars133

Forks18

Open Issues2

Contributors6

Last Push1y ago

LanguageJupyter Notebook

LicenseMIT

Fields of citing research

Medicine93%
Computer Science78%
Engineering32%
Biology9%
Physics2%
Environmental Science2%
Materials Science2%
Agricultural and Food Sciences1%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

92Open

Usability — can I run it?95

Reproducibility — can I retrain it?87

Model Openness Framework

Unclassified

No formal model card / data card

Resources

GitHub Repository Research Paper Official Website Dataset

Key Features

General-purpose lesion encoder: A single pretrained model produces fixed feature embeddings for any 3D CT lesion, which can be reused across malignancy classification, anatomical site identification, and survival prognostication.

Self-supervised pretraining: Uses a modified SimCLR contrastive learning scheme that requires no manual labels during pretraining, learning invariances from lesion-centered 3D crops.

Label efficiency: Because the encoder is pretrained, downstream tasks reach strong performance with limited annotated data, including in low-data regimes where supervised baselines degrade.

Flexible adaptation: Supports both frozen linear-probe use (embeddings plus a lightweight classifier) and full fine-tuning, with reproducible YAML-driven configurations.

Open release: Code is MIT-licensed, weights are archived on Zenodo, and a pip package (foundation-cancer-image-biomarker) lets users extract features in a few lines of code.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

A self-supervised learning approach for intelligent surface roughness monitoring in thin-walled component machining

Yaoguo Ma, Changfeng Yao, Liang Tan

Mechanical systems and signal processing · Aug 2026

Vision Foundation Models in Radiology: A Scoping Review of Data, Methodology, Evaluation and Clinical Translation

A. Vergara-Richart, Xavier Rafael-Palou, A. Fuster-Matanzo, et al.

Jul 2026

0Influential

Foundation Models vs. Radiomics for Lung Computed Tomography: A Benchmark of Feature Extractors, Classification Heads, and Segmentation Choices

Nils Neukirch, Martin H. Maurer, Nils Strodthoff

Jul 2026

FMCIB (Foundation Model for Cancer Imaging Biomarkers)

#Key Features

#Technical Details

#Applications

#Impact

Citation

Foundation model for cancer imaging biomarkers

Recent citations

Vision Foundation Models in Radiology: A Scoping Review of Data, Methodology, Evaluation and Clinical Translation

Foundation Models vs. Radiomics for Lung Computed Tomography: A Benchmark of Feature Extractors, Classification Heads, and Segmentation Choices

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

FMCIB (Foundation Model for Cancer Imaging Biomarkers)

#Key Features

#Technical Details

#Applications

#Impact

Citation

Foundation model for cancer imaging biomarkers

Recent citations

Vision Foundation Models in Radiology: A Scoping Review of Data, Methodology, Evaluation and Clinical Translation

Foundation Models vs. Radiomics for Lung Computed Tomography: A Benchmark of Feature Extractors, Classification Heads, and Segmentation Choices

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact