Children's National Hospital / George Washington University / University of Surrey
Self-supervised vision transformer pretrained on chest X-rays to produce a domain-specific foundation model for classification and lung segmentation.
SS-CXR is a domain-specific foundation model for chest radiography, built by pretraining a vision transformer on large unlabeled collections of chest X-rays (CXRs) with self-supervised learning. Chest X-rays are among the most common medical imaging exams worldwide and are central to diagnosing thoracic conditions such as pneumonia, COVID-19, and other lung pathologies. Yet most deep learning systems for CXR interpretation rely on transfer learning from natural-image datasets like ImageNet, whose statistics differ sharply from grayscale radiographs. SS-CXR addresses this mismatch by learning general-purpose representations directly from CXR data, so that downstream models start from features attuned to thoracic anatomy rather than everyday photographs.
The model was developed by researchers at Children's National Hospital and George Washington University in Washington, D.C., together with the Centre for Vision, Speech and Signal Processing at the University of Surrey. It was first released as the SPCXR preprint in 2022 and published as "SS-CXR: Self-Supervised Pretraining Using Chest X-Rays Towards A Domain Specific Foundation Model" at the IEEE International Conference on Image Processing (ICIP) in 2024.
The central finding is that domain-specific self-supervised pretraining yields representations that transfer better to clinical CXR tasks than general-domain pretraining, with the largest gains on data-scarce problems such as pediatric COVID-19 detection.
SS-CXR is built on a small vision transformer backbone (ViT-S) pretrained with GMML, a group-masked self-supervised objective in which clustered patches of the input image are corrupted and reconstructed, forcing the network to learn context-aware representations. The pretrained encoder is then adapted to downstream tasks: a DeiT-style classifier head for thoracic disease classification and a UNETR-style architecture for lung segmentation. Pretraining draws on large public CXR corpora, and the learned features are fine-tuned on task-specific datasets. The authors report roughly a 25% accuracy improvement over supervised transformer baselines on a challenging pediatric COVID-19 detection dataset, alongside competitive results on pneumonia detection, general health screening, and lung segmentation, demonstrating that the same pretrained model transfers across both classification and dense-prediction tasks.
SS-CXR targets clinical and research workflows that analyze chest radiographs, including triage and screening, pneumonia and COVID-19 detection, and lung segmentation for downstream quantification. Its pretrained encoder is most useful to teams building CXR classifiers or segmentation tools with limited labeled data, since starting from CXR-attuned features reduces the annotation burden and improves performance on rare or pediatric presentations where labeled examples are scarce.
SS-CXR is part of a broader shift toward domain-specific medical imaging foundation models that pretrain on in-domain data rather than relying on natural-image transfer learning. By demonstrating that group-masked self-supervised pretraining on CXRs improves downstream classification and segmentation, particularly in low-data regimes, the work reinforced the case for self-supervised foundation models in radiology and informed follow-on efforts from the same groups, including federated self-supervised approaches for pediatric COVID-19 detection. As a conference-scale model, its reported gains come from specific benchmark datasets rather than broad multi-site clinical validation, and its compact ViT-S backbone is modest compared with later large-scale CXR foundation models.
Anwar, S., et al. (2022) SS-CXR: Self-Supervised Pretraining Using Chest X-Rays Towards A Domain Specific Foundation Model. International Conference on Information Photonics.
DOI: 10.1109/icip51287.2024.10647378Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data