bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

MaCo

Shanghai AI Laboratory

A masked contrastive chest X-ray foundation model that aligns radiograph patches with report text for zero-shot and fine-grained diagnosis.

Released: September 2024

MaCo (Masked Contrastive) is a chest X-ray foundation model that learns transferable image representations by jointly aligning radiographs with their free-text clinical reports. It addresses a persistent tension in medical vision-language pretraining: methods that excel at coarse, global image-report matching (enabling zero-shot diagnosis) often lose the fine-grained, pixel-level understanding needed for grounding, segmentation, and detection, while methods optimized for dense prediction tend to sacrifice zero-shot transfer. MaCo aims to deliver both within a single pretraining recipe.

The model was developed by Weijian Huang, Cheng Li, Shanshan Wang and colleagues across the Shenzhen Institute of Advanced Technology (Chinese Academy of Sciences), Pengcheng Laboratory, Harvard University, and Shanghai AI Laboratory, and was published in Nature Communications in September 2024. Its central contribution is to combine masked image modeling with contrastive image-report alignment, and to introduce a correlation weighting mechanism that adjusts how strongly individual masked image patches are matched to the report.

By unifying these objectives, MaCo positions itself alongside contrastive medical vision-language models such as ConVIRT, GLoRIA, BioViL, and MGCA, but distinguishes itself through its granular, patch-level alignment strategy that supports both label-free prediction and dense downstream tasks.

#Key Features

  • Masked contrastive pretraining: Combines a masked autoencoding reconstruction objective with cross-modal contrastive learning, so the encoder simultaneously captures local image structure and global image-report semantics.
  • Correlation weighting mechanism: A learnable module generates per-patch importance scores from the masked position map (via a softplus-activated fully connected layer), reweighting both the contrastive logits and the loss to prioritize report-relevant regions.
  • Zero-shot diagnosis: Performs label-free classification by comparing image embeddings to text prompts, removing the need for task-specific annotated training data.
  • Granular downstream transfer: A single pretrained backbone supports fine-tuning for classification, semantic segmentation, object detection, and phrase grounding.
  • Open implementation and weights: Released under an MIT license with MAE and MaCo checkpoints, plus task pipelines for fine-tuning, segmentation (MMSegmentation), and detection (ViTDet).

#Technical Details

MaCo pairs a ViT-B/16 image encoder with a BERT text encoder (width 768). It is pretrained on MIMIC-CXR v2, comprising 377,110 chest X-rays associated with 227,827 clinical reports, using a learnable temperature initialized at 0.03 and a loss weight of 0.9 balancing the reconstruction and contrastive terms. Pretraining runs in roughly 3.5 hours on four NVIDIA A100 GPUs at a batch size of 512. Across six open-source X-ray datasets, MaCo was reported to outperform 10 state-of-the-art approaches. Representative results include zero-shot classification AUCs of 77.3% on NIH ChestX-ray, 88.6% on RSNA, and 90.4% on SIIM; phrase grounding on MS-CXR at 25.5% mIoU (CNR 1.144); fully supervised segmentation Dice of 89.4% on SIIM and 75.1% on COVID Rural; detection at 19.2% mAP on RSNA; and fine-tuned classification AUCs of 88.9% on CheXpert and 85.9% on NIH.

#Applications

MaCo is intended for computer-aided diagnosis and radiology research workflows where annotated data is scarce. Its zero-shot capability lets clinicians and researchers screen for pathologies using natural-language prompts without curating labeled training sets, while its phrase-grounding ability can localize findings described in a report to specific image regions, supporting explainable diagnosis. The shared backbone also serves as a strong initialization for downstream classification, segmentation, and detection pipelines, benefiting groups building chest X-ray analysis tools with limited labeled data.

#Impact

By demonstrating that masked image modeling and contrastive report alignment can be reconciled through patch-level correlation weighting, MaCo contributes to the broader effort to build general-purpose medical imaging foundation models. Its publication in Nature Communications and release of code and pretrained weights under a permissive MIT license lower the barrier for reproducible benchmarking and downstream reuse in radiology AI. As with most chest X-ray models, its evaluation centers on MIMIC-CXR-style data, so generalization across imaging hardware, populations, and clinical settings remains an important consideration before deployment.

Citation

Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning

Huang, W., et al. (2023) Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning. Nature Communications.

DOI: 10.1038/s41467-024-51749-0

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations71
Influential0
References61

GitHub

Stars12
Forks0
Open Issues1
Contributors1
Last Push1y ago
LanguagePython
LicenseMIT

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
74Open
Usability — can I run it?94
Reproducibility — can I retrain it?57
Model Openness Framework
Class III
Open Model

Tags

chest_x_raycontrastive_learningfoundation_modelimage_classificationmultimodalphrase_groundingradiologysegmentationself_supervisedtransformervision_transformerzero_shot_learning

Resources

GitHub RepositoryGitHub RepositoryResearch Paper