CheXzero

Self-supervised vision-language model for zero-shot detection of chest X-ray pathologies, trained on image-report pairs without explicit labels.

Released: September 2022

CheXzero is a self-supervised, vision-language model that detects pathologies in chest X-rays without ever being trained on explicit pathology labels. Developed by Ekin Tiu, Pranav Rajpurkar, and colleagues at Stanford University and published in Nature Biomedical Engineering in September 2022, it adapts the contrastive language-image pre-training (CLIP) paradigm to radiology by learning directly from raw chest radiographs paired with their free-text clinical reports.

The central problem CheXzero addresses is the labeling bottleneck in medical imaging AI. Conventional supervised classifiers require large datasets annotated by experts for each target pathology, an expensive and time-consuming process that limits how many conditions a model can recognize. By learning the joint structure of images and the natural-language descriptions radiologists already write, CheXzero instead performs zero-shot classification: at inference time it scores an image against text prompts (for example, "pulmonary edema" versus "no pulmonary edema") and can flag findings it was never explicitly trained to detect.

This made CheXzero a landmark demonstration that self-supervised, report-driven pretraining can reach expert-level performance on chest X-ray interpretation, influencing a subsequent wave of CLIP-style medical foundation models.

Key Features

Label-free training: Learns from chest X-ray images and their accompanying unstructured radiology reports, eliminating the need for manually annotated pathology labels.
Zero-shot pathology detection: Classifies findings via natural-language prompts at inference time, generalizing to pathologies never seen during training.
Expert-level accuracy: Matches three board-certified radiologists on the CheXpert test set, with no statistically significant difference in average Matthews correlation coefficient or F1 score.
Cross-institutional generalization: Maintains strong performance on external datasets, outperforming a fully supervised baseline on 3 of 8 pathologies in PadChest from a different institution and country.
Open and reproducible: Released under an MIT license with training/evaluation code and pre-trained checkpoints publicly available.

Technical Details

CheXzero adapts the CLIP dual-encoder architecture, pairing a ViT-B/32 Vision Transformer image encoder with a 12-layer, 63-million-parameter Transformer text encoder. The two encoders are trained with a contrastive objective that aligns each chest X-ray with its corresponding radiology report in a shared embedding space. Training used 377,110 image-report pairs from the MIMIC-CXR dataset, with the "impression" section of reports extracted as the text supervision signal. On the CheXpert competition test set, an ensemble of CheXzero models achieved a mean AUC of 0.889 across five pathologies (atelectasis, cardiomegaly, consolidation, edema, and pleural effusion), within 0.042 of the top fully supervised method (Deep AUC Maximization, 0.931) despite using no labels. On PadChest, it achieved AUC > 0.9 on 14 findings and AUC ≥ 0.700 on 53 of 107 radiographic findings, including many not present during training.

Applications

CheXzero is most useful where labeled training data is scarce or where the set of clinically relevant findings is broad and evolving. Because it classifies via text prompts, radiologists and researchers can query new or rare pathologies without retraining, supporting rapid prototyping of triage and decision-support tools, retrospective dataset curation, and screening workflows in resource-limited settings. Its label-free design also lowers the barrier for institutions that hold large archives of reports and images but lack the annotation budget to build supervised models for each condition.

Impact

CheXzero was an influential proof that self-supervised, report-supervised pretraining could match radiologists on chest X-ray interpretation, and it helped catalyze the adoption of CLIP-style contrastive learning across medical imaging. As a highly cited, openly released model, it became a common baseline and starting point for later chest-radiograph foundation models and vision-language systems in healthcare. Important limitations remain: performance depends on the quality and phrasing of text prompts, the training data derives largely from single-institution sources that may not reflect all populations or imaging conditions, and the model is a research artifact rather than a regulatory-cleared clinical device, so prospective validation is required before deployment.

Citation

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning

Tiu, E., et al. (2022) Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nature Biomedical Engineering.

DOI: 10.1038/s41551-022-00936-9

Recent citations

Papers that recently cited this model.

CausalCompNet: Causal intervention meets vision-language priors for robust CXR diagnosis
Mengdi Liu, Qiang Li, Rihao Chang, et al.
Biomedical Signal Processing and Control · 2026
1
Super-Generalist: Towards Comprehensive and Accurate Medical Image Understanding via Generalist-Specialist Synergy
Shaoteng Zhang, Weiwei Cao, Wanxing Chang, et al.
Jul 2026
0
Beyond the algorithm: nursing as a core foundation for ethical and equitable AI in healthcare
Ying Sun, Yan Dong Li, Yan Liu, et al.
Cognition, Technology & Work · Jul 2026
0

Top citations

The most-cited papers that cite this model.

Foundation models for generalist medical artificial intelligence
Michael Moor, Oishi Banerjee, Zahra F H Abad, et al.
Nature · Apr 2023
1.7K
A visual-language foundation model for computational pathology
Ming Y. Lu, Bowen Chen, Drew F. K. Williamson, et al.
Nature Medicine · Mar 2024
932
A foundation model for generalizable disease detection from retinal images
Yukun Zhou, Mark A. Chia, S. K. Wagner, et al.
Nature · Sep 2023
861
A visual–language foundation model for pathology image analysis using medical Twitter
Zhi Huang, Federico Bianchi, Mert Yuksekgonul, et al.
Nature Medicine · Aug 2023
767
Med-Flamingo: a Multimodal Medical Few-shot Learner
Michael Moor, Qian Huang, Shirley Wu, et al.
ML4H@NeurIPS · Jul 2023
566

Citations

Total Citations527

Influential49

References51

GitHub

Stars234

Forks51

Open Issues7

Contributors2

Last Push2y ago

LanguagePython

LicenseMIT

Fields of citing research

Computer Science86%
Medicine85%
Engineering17%
Physics1%
Biology1%
Materials Science1%
Environmental Science1%
Linguistics1%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

70Open

Usability — can I run it?94

Reproducibility — can I retrain it?57

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Label-free training: Learns from chest X-ray images and their accompanying unstructured radiology reports, eliminating the need for manually annotated pathology labels.

Zero-shot pathology detection: Classifies findings via natural-language prompts at inference time, generalizing to pathologies never seen during training.

Expert-level accuracy: Matches three board-certified radiologists on the CheXpert test set, with no statistically significant difference in average Matthews correlation coefficient or F1 score.

Cross-institutional generalization: Maintains strong performance on external datasets, outperforming a fully supervised baseline on 3 of 8 pathologies in PadChest from a different institution and country.

Open and reproducible: Released under an MIT license with training/evaluation code and pre-trained checkpoints publicly available.

Technical Details

Applications

Impact

Citation

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning

Tiu, E., et al. (2022) Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nature Biomedical Engineering.

DOI: 10.1038/s41551-022-00936-9

Recent citations

Papers that recently cited this model.

CausalCompNet: Causal intervention meets vision-language priors for robust CXR diagnosis

Mengdi Liu, Qiang Li, Rihao Chang, et al.

Biomedical Signal Processing and Control · 2026

Super-Generalist: Towards Comprehensive and Accurate Medical Image Understanding via Generalist-Specialist Synergy

Shaoteng Zhang, Weiwei Cao, Wanxing Chang, et al.

Jul 2026

Beyond the algorithm: nursing as a core foundation for ethical and equitable AI in healthcare

Ying Sun, Yan Dong Li, Yan Liu, et al.

Cognition, Technology & Work · Jul 2026

CheXzero

#Key Features

#Technical Details

#Applications

#Impact

Citation

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning

Recent citations

Super-Generalist: Towards Comprehensive and Accurate Medical Image Understanding via Generalist-Specialist Synergy

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

CheXzero

#Key Features

#Technical Details

#Applications

#Impact

Citation

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning

Recent citations

Super-Generalist: Towards Comprehensive and Accurate Medical Image Understanding via Generalist-Specialist Synergy

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact