SS-CXR

Children's National Hospital / George Washington University / University of Surrey

Self-supervised vision transformer pretrained on chest X-rays to produce a domain-specific foundation model for classification and lung segmentation.

Released: October 2024

SS-CXR is a domain-specific foundation model for chest radiography, built by pretraining a vision transformer on large unlabeled collections of chest X-rays (CXRs) with self-supervised learning. Chest X-rays are among the most common medical imaging exams worldwide and are central to diagnosing thoracic conditions such as pneumonia, COVID-19, and other lung pathologies. Yet most deep learning systems for CXR interpretation rely on transfer learning from natural-image datasets like ImageNet, whose statistics differ sharply from grayscale radiographs. SS-CXR addresses this mismatch by learning general-purpose representations directly from CXR data, so that downstream models start from features attuned to thoracic anatomy rather than everyday photographs.

The model was developed by researchers at Children's National Hospital and George Washington University in Washington, D.C., together with the Centre for Vision, Speech and Signal Processing at the University of Surrey. It was first released as the SPCXR preprint in 2022 and published as "SS-CXR: Self-Supervised Pretraining Using Chest X-Rays Towards A Domain Specific Foundation Model" at the IEEE International Conference on Image Processing (ICIP) in 2024.

The central finding is that domain-specific self-supervised pretraining yields representations that transfer better to clinical CXR tasks than general-domain pretraining, with the largest gains on data-scarce problems such as pediatric COVID-19 detection.

Key Features

Domain-specific pretraining: Representations are learned directly from chest X-rays rather than transferred from natural images, aligning the backbone with the appearance and structure of thoracic radiographs.
Group-masked self-supervision: The model uses group masked model learning (GMML), which masks contiguous groups of image patches and trains the transformer to reconstruct them, encouraging it to capture local anatomical context.
One backbone, multiple tasks: A single pretrained encoder supports both multi-class disease classification (via a DeiT-style ViT head) and lung segmentation (via a UNETR-style decoder).
Label-efficient transfer: Because useful features are learned without annotations, downstream fine-tuning needs comparatively few labeled examples, which is valuable in clinical settings where labeling is costly.

Technical Details

SS-CXR is built on a small vision transformer backbone (ViT-S) pretrained with GMML, a group-masked self-supervised objective in which clustered patches of the input image are corrupted and reconstructed, forcing the network to learn context-aware representations. The pretrained encoder is then adapted to downstream tasks: a DeiT-style classifier head for thoracic disease classification and a UNETR-style architecture for lung segmentation. Pretraining draws on large public CXR corpora, and the learned features are fine-tuned on task-specific datasets. The authors report roughly a 25% accuracy improvement over supervised transformer baselines on a challenging pediatric COVID-19 detection dataset, alongside competitive results on pneumonia detection, general health screening, and lung segmentation, demonstrating that the same pretrained model transfers across both classification and dense-prediction tasks.

Applications

SS-CXR targets clinical and research workflows that analyze chest radiographs, including triage and screening, pneumonia and COVID-19 detection, and lung segmentation for downstream quantification. Its pretrained encoder is most useful to teams building CXR classifiers or segmentation tools with limited labeled data, since starting from CXR-attuned features reduces the annotation burden and improves performance on rare or pediatric presentations where labeled examples are scarce.

Impact

SS-CXR is part of a broader shift toward domain-specific medical imaging foundation models that pretrain on in-domain data rather than relying on natural-image transfer learning. By demonstrating that group-masked self-supervised pretraining on CXRs improves downstream classification and segmentation, particularly in low-data regimes, the work reinforced the case for self-supervised foundation models in radiology and informed follow-on efforts from the same groups, including federated self-supervised approaches for pediatric COVID-19 detection. As a conference-scale model, its reported gains come from specific benchmark datasets rather than broad multi-site clinical validation, and its compact ViT-S backbone is modest compared with later large-scale CXR foundation models.

Citation

SS-CXR: Self-Supervised Pretraining Using Chest X-Rays Towards A Domain Specific Foundation Model

Anwar, S., et al. (2022) SS-CXR: Self-Supervised Pretraining Using Chest X-Rays Towards A Domain Specific Foundation Model. International Conference on Information Photonics.

DOI: 10.1109/icip51287.2024.10647378

Recent citations

Papers that recently cited this model.

Foundation Model for Bone-Related Radiographs
Ali Tamizifar, Shakiba Berenjkoub, Zahra Sobhaninia, et al.
2026 IEEE World AI IoT Congress (AIIoT) · May 2026
0
A critical review of pretrained deep neural networks for chest x-ray interpretation: architectural trends, clinical relevance, and future directions
Shaik Shabina, S. Kalyani
Engineering Research Express · May 2026
0
Eksplorasi pada Pemetaan Klasifikasi Radiograf Toraks Penyakit Paru-Paru Menggunakan Convolutional Neural Network (CNN)
Andreas Rezeki Zai, B. Suhardi, Surya Tri Nowo, et al.
Syntax : Journal of Software Engineering, Computer Science and Information Technology · Jan 2026
0

Top citations

The most-cited papers that cite this model.

EVA-X: a foundation model for general chest x-ray analysis with self-supervised learning
Jingfeng Yao, Xinggang Wang, Yuehao Song, et al.
npj Digital Medicine · May 2024
26
A Taxonomy of Foundation Model based Systems through the Lens of Software Architecture
Qinghua Lu, Liming Zhu, Xiwei Xu, et al.
2024 IEEE/ACM 3rd International Conference on AI Engineering – Software Engineering for AI (CAIN) · May 2023
19
Zero-Shot Pediatric Tuberculosis Detection in Chest X-Rays Using Self-Supervised Learning
Daniel Capellán-Martín, Abhijeet Parida, Juan J. Gómez-Valverde, et al.
IEEE International Symposium on Biomedical Imaging · Feb 2024
8
Self-Supervised Graph Transformer with Contrastive Learning for Brain Connectivity Analysis Towards Improving Autism Detection
Yicheng Leng, S. Anwar, I. Rekik, et al.
IEEE International Symposium on Biomedical Imaging · Jan 2025
6
MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models
Shanghao Shi, Md Shahedul Haque, Abhijeet Parida, et al.
IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies · Jul 2024
4

Citations

Total Citations13

Influential0

References59

Fields of citing research

Medicine83%
Computer Science83%
Engineering33%
Environmental Science8%
Physics8%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

22Closed

Usability — can I run it?15

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper Research Paper

Key Features

Domain-specific pretraining: Representations are learned directly from chest X-rays rather than transferred from natural images, aligning the backbone with the appearance and structure of thoracic radiographs.

Group-masked self-supervision: The model uses group masked model learning (GMML), which masks contiguous groups of image patches and trains the transformer to reconstruct them, encouraging it to capture local anatomical context.

One backbone, multiple tasks: A single pretrained encoder supports both multi-class disease classification (via a DeiT-style ViT head) and lung segmentation (via a UNETR-style decoder).

Label-efficient transfer: Because useful features are learned without annotations, downstream fine-tuning needs comparatively few labeled examples, which is valuable in clinical settings where labeling is costly.

Technical Details

Applications

Impact

Citation

SS-CXR: Self-Supervised Pretraining Using Chest X-Rays Towards A Domain Specific Foundation Model

Anwar, S., et al. (2022) SS-CXR: Self-Supervised Pretraining Using Chest X-Rays Towards A Domain Specific Foundation Model. International Conference on Information Photonics.

DOI: 10.1109/icip51287.2024.10647378

SS-CXR

#Key Features

#Technical Details

#Applications

#Impact

Citation

SS-CXR: Self-Supervised Pretraining Using Chest X-Rays Towards A Domain Specific Foundation Model

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

SS-CXR

#Key Features

#Technical Details

#Applications

#Impact

Citation

SS-CXR: Self-Supervised Pretraining Using Chest X-Rays Towards A Domain Specific Foundation Model

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact