bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

FMCIB (Foundation Model for Cancer Imaging Biomarkers)

Harvard Medical School / Dana-Farber Cancer Institute / Brigham and Women's Hospital / Massachusetts General Hospital / Maastricht University / Aarhus University / Stanford University

A self-supervised 3D CT foundation model that extracts general-purpose tumor representations for cancer imaging biomarker discovery across diverse downstream tasks.

Released: March 2024

FMCIB (Foundation Model for Cancer Imaging Biomarkers) is a self-supervised deep learning model that learns general-purpose representations of tumors and other lesions directly from three-dimensional computed tomography (CT) scans. Rather than training a bespoke network for each clinical question, FMCIB provides a single pretrained encoder whose features can be adapted — by linear probing or fine-tuning — to a wide range of cancer imaging tasks. The work was developed by the Artificial Intelligence in Medicine (AIM) program led by Hugo J. W. L. Aerts, with collaborators across Harvard Medical School, Mass General Brigham (Brigham and Women's Hospital and Massachusetts General Hospital), Dana-Farber Cancer Institute, Maastricht University, Aarhus University, and Stanford University, and was published in Nature Machine Intelligence in March 2024.

The model addresses a persistent bottleneck in quantitative imaging: hand-crafted radiomic features and task-specific supervised models are brittle, data-hungry, and difficult to transfer across institutions, cancer types, and scanners. FMCIB instead borrows the foundation-model paradigm that reshaped natural language and protein science, pretraining on a large, unlabeled corpus of lesions so that downstream tasks can be solved with comparatively little labeled data. It is one of the first demonstrations that contrastive self-supervised pretraining on radiographic lesions yields broadly transferable imaging biomarkers.

#Key Features

  • General-purpose lesion encoder: A single pretrained model produces fixed feature embeddings for any 3D CT lesion, which can be reused across malignancy classification, anatomical site identification, and survival prognostication.
  • Self-supervised pretraining: Uses a modified SimCLR contrastive learning scheme that requires no manual labels during pretraining, learning invariances from lesion-centered 3D crops.
  • Label efficiency: Because the encoder is pretrained, downstream tasks reach strong performance with limited annotated data, including in low-data regimes where supervised baselines degrade.
  • Flexible adaptation: Supports both frozen linear-probe use (embeddings plus a lightweight classifier) and full fine-tuning, with reproducible YAML-driven configurations.
  • Open release: Code is MIT-licensed, weights are archived on Zenodo, and a pip package (foundation-cancer-image-biomarker) lets users extract features in a few lines of code.

#Technical Details

FMCIB uses a 3D ResNet50 convolutional encoder trained with a SimCLR-style contrastive objective adapted for volumetric medical imaging. Pretraining used 11,467 radiographic lesions drawn from 5,513 unique CT scans across 2,312 patients in the DeepLesion dataset, with lesion-centered 3D patches as the model input. The learned representations were evaluated on three clinically motivated use cases. For lesion anatomical site classification the model reached a balanced accuracy of 0.804 and mean average precision of 0.857; for lung nodule malignancy prediction it achieved an AUC of 0.944 and mAP of 0.953; and for non-small-cell lung cancer (NSCLC) 2-year survival prognostication it reached an AUC of 0.638 on the LUNG1 cohort and 0.653 on the RADIO cohort. Across tasks, the foundation model matched or exceeded supervised and conventional radiomic baselines, with the largest gains in low-data and stability settings. Final model weights, extracted features, and predictions are released on Zenodo (DOI 10.5281/zenodo.10528450).

#Applications

FMCIB targets researchers and clinical scientists building quantitative imaging biomarkers from CT data without assembling large labeled cohorts for every task. Typical uses include screening lung nodules for malignancy, identifying lesion anatomical context, stratifying patient prognosis, and prototyping new imaging-derived endpoints for oncology studies. Because the encoder produces reusable embeddings, it integrates naturally into radiomics and computational pathology pipelines, retrospective cohort analyses, and clinical trial biomarker development, where consistent, transferable features across sites and scanners are valuable.

#Impact

FMCIB helped establish that the foundation-model approach — large-scale self-supervised pretraining followed by lightweight adaptation — transfers to 3D oncologic imaging, an area long dominated by hand-engineered radiomics and narrow supervised models. By publicly releasing code, weights, and a simple feature-extraction package, the Aerts lab lowered the barrier to building imaging biomarkers and provided a reusable baseline that subsequent CT foundation-model studies have built upon and benchmarked against. Its main limitations follow from its design: pretraining is restricted to CT lesions from the DeepLesion dataset, performance on harder endpoints such as long-term survival remains modest, and broader generalization to other modalities and cancer types requires further validation.

Citation

Foundation model for cancer imaging biomarkers

Pai, S., et al. (2024) Foundation model for cancer imaging biomarkers. Nature Machine Intelligence.

DOI: 10.1038/s42256-024-00807-9

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations179
Influential18
References57

GitHub

Stars132
Forks18
Open Issues2
Contributors6
Last Push1y ago
LanguageJupyter Notebook
LicenseMIT

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
92Open
Usability — can I run it?95
Reproducibility — can I retrain it?87
Model Openness Framework
Unclassified
No formal model card / data card

Tags

3d_cnnbiomarker_discoverycontrastive_learningct_imagingfoundation_modelmalignancy_predictiononcologyprognosisrepresentation_learningresnetself_supervisedtransfer_learning

Resources

GitHub RepositoryResearch PaperOfficial WebsiteDataset