Whole-slide pathology foundation model pretrained on 1.3 billion tiles from 171,189 clinical WSIs. Achieves state-of-the-art on 25 of 26 pathology benchmark tasks.
Prov-GigaPath is a whole-slide pathology foundation model developed jointly by Microsoft Research, Providence Health System, and the University of Washington, published in Nature in May 2024. It addresses a core limitation of prior computational pathology models: the reliance on small, curated research datasets that fail to reflect the heterogeneity of real clinical practice. Prov-GigaPath was pretrained on 1.3 billion image tiles derived from 171,189 whole-slide images (WSIs) collected across Providence's 28 cancer centers, spanning over 30,000 patients and 31 tissue types — making it the first digital pathology foundation model trained at this scale on real-world clinical data.
The model is designed around the fundamental challenge of gigapixel pathology slides: a single WSI can contain hundreds of thousands of image patches, far exceeding the context capacity of standard vision transformers. Prov-GigaPath addresses this through a two-stage architecture that separately learns patch-level visual semantics and slide-level spatial context, enabling coherent representation across an entire slide without sacrificing local detail.
In independent benchmarking, Prov-GigaPath achieves state-of-the-art performance on 25 of 26 tasks across a comprehensive digital pathology evaluation suite covering cancer subtyping and pathomics prediction, using data from both Providence and The Cancer Genome Atlas (TCGA). The work was published as "A whole-slide foundation model for digital pathology from real-world data" in Nature 630, 181–188 (2024).
Prov-GigaPath consists of two independently pretrained components with a combined parameter count of approximately 1.2 billion. The tile encoder is a Vision Transformer with ViT-g/14 architecture (~1.13 billion parameters), pretrained using DINOv2 — a self-supervised method based on self-distillation without labels. DINOv2 pretraining used a base learning rate of 4e-3 with an effective batch size of 384. The resulting tile encoder produces dense visual embeddings from 256x256 pixel patches extracted at 0.5 microns per pixel resolution.
The slide encoder (~86 million trainable parameters) is based on LongNet, an architecture purpose-built for ultra-long sequence modeling via dilated attention. Tile embeddings are serialized in row-major order across the WSI and fed as a sequence into the slide encoder, which applies dilated attention at ratios of [1, 2, 4, 8, 16] and segment lengths of [1024, 5792, 32768, 185363, 1048576]. Slide-level pretraining uses a masked autoencoder objective, masking random subsets of tile embeddings and reconstructing them from context. Training data spans 31 tissue types from Providence's integrated health network, distinguishing this model from predecessors trained on TCGA or other research repositories, and providing exposure to scanner variability, staining protocol differences, and clinical artifact distributions at realistic scale.
Prov-GigaPath is applicable across the full spectrum of computational pathology tasks. Oncology researchers use it for cancer subtyping — classifying histological subtypes such as lung adenocarcinoma versus squamous cell carcinoma directly from WSI morphology. Translational researchers apply it to pathomics: predicting molecular features including mutation status, microsatellite instability, gene expression signatures, and treatment biomarkers from slide appearance alone, without requiring separate molecular assays. Pathology AI developers use the pretrained tile and slide embeddings as frozen feature extractors, training lightweight downstream classifiers that require far less labeled data than end-to-end approaches. The dual-granularity embedding design means a single pretrained model can serve tasks ranging from patch-level tissue segmentation to patient-level prognosis.
Prov-GigaPath represents a methodological advance in digital pathology by demonstrating that foundation model pretraining at genuine clinical scale — rather than curated research datasets — yields substantially improved and more generalizable representations. Its Nature publication and open model release have made it a reference point for subsequent work in computational pathology. Key limitations are worth noting: the model weights are released for research use only and have not undergone regulatory review for clinical deployment. Performance on populations outside the U.S. health network, or on slides prepared with substantially different staining protocols or scanner hardware, may differ from reported benchmarks. The benchmark evaluation is also weighted toward H&E-based tasks, and IHC-specific performance is less thoroughly characterized. Stain normalization sensitivity, common across pathology vision models, remains a practical consideration when applying the model to out-of-distribution data.
Xu, H., et al. (2024) A whole-slide foundation model for digital pathology from real-world data. Nature.
DOI: 10.1038/s41586-024-07441-w