CheXagent

Instruction-tuned vision-language foundation model for chest X-ray interpretation, with 8 billion parameters spanning eight clinical task types.

Released: January 2024

Parameters: 8 Billion

CheXagent is an instruction-tuned vision-language foundation model developed by Stanford University's AIMI (Artificial Intelligence in Medicine and Imaging) center to streamline the interpretation of chest X-rays (CXRs), the most commonly performed medical imaging exam worldwide. Introduced in January 2024, it tackles a persistent bottleneck in radiology: building generalist models that can read CXRs is hampered by the scarcity of large vision-language CXR datasets, the lack of a clinical language model able to parse radiology reports, and the absence of standardized benchmarks for fair evaluation.

The Stanford team addressed all three gaps together. They curated CheXinstruct, a large-scale instruction-tuning corpus assembled from 28 publicly available datasets, trained an 8-billion-parameter model that couples a clinical text decoder with a CXR-specialized vision encoder, and released CheXbench, an evaluation framework spanning eight clinically meaningful task types ranging from image perception to textual understanding. CheXagent sits alongside contrastive CXR models such as CheXzero and CXR-CLIP but is distinguished by its generative, instruction-following design that produces free-text answers and report drafts rather than fixed labels.

Quantitative evaluations and qualitative review by five expert radiologists showed CheXagent outperforming previously developed general-domain and medical-domain foundation models on CheXbench. A later revision extended the work with a clinical reader study and additional model variants (the CheXagent-2 byproducts), measuring real-world impact on report-writing efficiency.

Key Features

Instruction-following CXR interpretation: A single checkpoint handles diverse tasks — view classification, disease identification, findings/impression generation, and visual question answering — by responding to natural-language instructions.
Purpose-built clinical components: The system pairs a language decoder adapted for parsing radiology reports with a vision encoder specialized for representing CXR images, bridged by a connector network trained for the medical domain.
CheXinstruct training corpus: A large-scale instruction-tuning dataset compiled from 28 public CXR datasets, designed to teach the model the full breadth of interpretation subtasks.
CheXbench benchmark: A standardized evaluation suite covering eight task types across image perception and textual understanding, enabling reproducible, head-to-head comparison of CXR foundation models.
Clinical efficiency gains: In a reader study, residents drafting reports with CheXagent assistance achieved roughly a 36% time saving, with improved writing efficiency in 81% of resident and 61% of attending cases without compromising quality.

Technical Details

CheXagent is an 8-billion-parameter vision-language model. Construction proceeded in stages: the authors first trained a clinical large language model to parse radiology reports, then trained a vision encoder to represent CXR images, and finally trained a bridging network and instruction-tuned the full system on CheXinstruct. CheXinstruct is curated from 28 publicly available datasets (including sources such as MIMIC-CXR), giving broad coverage of CXR appearances and report styles. The model produces free-text outputs and supports zero-shot and few-shot prompting across heterogeneous tasks. On CheXbench — eight task types including view and disease classification, findings/impression generation, and visual question answering — CheXagent surpasses prior general- and medical-domain foundation models in expert evaluation. Subsequent CheXagent-2 byproducts released by the lab include smaller variants built on a SigLIP-based vision encoder and the RadPhi-2 clinical decoder. Models and code are released for research use only and are explicitly not intended for clinical deployment.

Applications

CheXagent targets radiology workflows where chest X-ray volume creates reporting backlogs and turnaround pressure. It can draft structured findings and impressions for radiologist review, answer clinical questions about an image, classify views and abnormalities, and serve as a research backbone for downstream CXR tasks. The Stanford reader study demonstrated tangible benefit to trainees and attending radiologists by accelerating report writing while preserving diagnostic quality, suggesting value as an assistive drafting tool. Researchers also benefit from CheXinstruct and CheXbench as shared resources for training and benchmarking new CXR models.

Impact

By releasing the model, the CheXinstruct dataset, and the CheXbench benchmark together, CheXagent provided the CXR community with an integrated, reproducible foundation for generative chest X-ray interpretation and helped establish instruction-tuned vision-language models as a competitive paradigm in medical imaging. The science is openly licensed — the arXiv paper, architecture, and CheXbench results are CC-BY-4.0 — but the released weights (StanfordAIMI/CheXagent-8b) and the GitHub code are governed by a non-commercial, research-use-only license (CC-BY-NC-ND), so they are not freely reusable for commercial or derivative work. Together with the lab's follow-on CheXagent-2 variants, it has become a widely referenced reference point for evaluating radiology foundation models. The chief limitation is that the model is validated for research only and not approved for clinical use; like other generative report tools it requires expert oversight to guard against hallucinated or incorrect findings.

Citation

A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation

Preprint

Chen, Z., et al. (2024) A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation.

DOI: 10.48550/arXiv.2401.12208

Recent citations

Papers that recently cited this model.

Symbal: Detecting Systematic Misalignments in Model-Generated Captions
M. Varma, Jean-Benoit Delbrouck, S. Ostmeier, et al.
Jul 2026
0
CheXpercept: A Benchmark for Evaluating Expert-Level Lesion Perception in Chest X-rays
Geon Choi, Hangyul Yoon, Nalee Kim, et al.
Jun 2026
0
NoduLoCC2026: Lung Nodule Localization and Classification Contest from Chest X-Ray Images
Adnan Mustafic, H. Benhabiles, A. Cabani, et al.
Jun 2026
0

Top citations

The most-cited papers that cite this model.

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, et al.
arXiv.org · May 2023
347
MAIRA-2: Grounded Radiology Report Generation
Shruthi Bannur, Kenza Bouzid, D. C. Castro, et al.
arXiv.org · Jun 2024
139
GREEN: Generative Radiology Report Evaluation and Error Notation
S. Ostmeier, Justin Xu, Zhihong Chen, et al.
Conference on Empirical Methods in Natural Language Processing · May 2024
120
CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats
Pierre J. Chambon, Jean-Benoit Delbrouck, Thomas Sounack, et al.
arXiv.org · May 2024
92
Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey
Qika Lin, Yifan Zhu, Xin Mei, et al.
Information Fusion · Aug 2024
85Influential

Citations

Total Citations75

Influential11

References0

GitHub

Stars230

Forks28

Open Issues5

Contributors1

Last Push1y ago

LanguagePython

HuggingFace

Downloads1K

Likes46

Last Modified2y ago

Pipelinetext-generation

Fields of citing research

Computer Science100%
Medicine96%
Engineering15%
Physics1%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

32Closed

Usability — can I run it?34

Reproducibility — can I retrain it?11

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website HuggingFace Model Dataset

Key Features

Instruction-following CXR interpretation: A single checkpoint handles diverse tasks — view classification, disease identification, findings/impression generation, and visual question answering — by responding to natural-language instructions.

Purpose-built clinical components: The system pairs a language decoder adapted for parsing radiology reports with a vision encoder specialized for representing CXR images, bridged by a connector network trained for the medical domain.

CheXinstruct training corpus: A large-scale instruction-tuning dataset compiled from 28 public CXR datasets, designed to teach the model the full breadth of interpretation subtasks.

CheXbench benchmark: A standardized evaluation suite covering eight task types across image perception and textual understanding, enabling reproducible, head-to-head comparison of CXR foundation models.

Clinical efficiency gains: In a reader study, residents drafting reports with CheXagent assistance achieved roughly a 36% time saving, with improved writing efficiency in 81% of resident and 61% of attending cases without compromising quality.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Symbal: Detecting Systematic Misalignments in Model-Generated Captions

M. Varma, Jean-Benoit Delbrouck, S. Ostmeier, et al.

Jul 2026

CheXpercept: A Benchmark for Evaluating Expert-Level Lesion Perception in Chest X-rays

Geon Choi, Hangyul Yoon, Nalee Kim, et al.

Jun 2026

NoduLoCC2026: Lung Nodule Localization and Classification Contest from Chest X-Ray Images

Adnan Mustafic, H. Benhabiles, A. Cabani, et al.

Jun 2026

CheXagent

#Key Features

#Technical Details

#Applications

#Impact

Citation

A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation

Recent citations

Symbal: Detecting Systematic Misalignments in Model-Generated Captions

CheXpercept: A Benchmark for Evaluating Expert-Level Lesion Perception in Chest X-rays

NoduLoCC2026: Lung Nodule Localization and Classification Contest from Chest X-Ray Images

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

CheXagent

#Key Features

#Technical Details

#Applications

#Impact

Citation

A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation

Recent citations

Symbal: Detecting Systematic Misalignments in Model-Generated Captions

CheXpercept: A Benchmark for Evaluating Expert-Level Lesion Perception in Chest X-rays

NoduLoCC2026: Lung Nodule Localization and Classification Contest from Chest X-Ray Images

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact