bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Pathology

Prov-GigaPath

Microsoft Research

Whole-slide pathology foundation model pretrained on 1.3 billion tiles from 171,189 clinical WSIs. Achieves state-of-the-art on 25 of 26 pathology benchmark tasks.

Released: 2024
Parameters: 1,200,000,000

Overview

Prov-GigaPath is a whole-slide pathology foundation model developed jointly by Microsoft Research, Providence Health System, and the University of Washington, published in Nature in May 2024. It addresses a core limitation of prior computational pathology models: the reliance on small, curated research datasets that fail to reflect the heterogeneity of real clinical practice. Prov-GigaPath was pretrained on 1.3 billion image tiles derived from 171,189 whole-slide images (WSIs) collected across Providence's 28 cancer centers, spanning over 30,000 patients and 31 tissue types — making it the first digital pathology foundation model trained at this scale on real-world clinical data.

The model is designed around the fundamental challenge of gigapixel pathology slides: a single WSI can contain hundreds of thousands of image patches, far exceeding the context capacity of standard vision transformers. Prov-GigaPath addresses this through a two-stage architecture that separately learns patch-level visual semantics and slide-level spatial context, enabling coherent representation across an entire slide without sacrificing local detail.

In independent benchmarking, Prov-GigaPath achieves state-of-the-art performance on 25 of 26 tasks across a comprehensive digital pathology evaluation suite covering cancer subtyping and pathomics prediction, using data from both Providence and The Cancer Genome Atlas (TCGA). The work was published as "A whole-slide foundation model for digital pathology from real-world data" in Nature 630, 181–188 (2024).

Key Features

  • Clinical-scale pretraining: Trained on over 1.38 billion 256x256 pixel tiles from 171,189 real-world clinical WSIs, including both hematoxylin and eosin (H&E) and immunohistochemistry (IHC) stained slides from a large multi-site U.S. health network.
  • Two-stage curriculum learning: Tile-level representations are first learned with a ViT-g/14 encoder using DINOv2 self-supervised pretraining; slide-level context is then captured by a masked autoencoder built on LongNet, which learns spatial relationships across entire gigapixel slides.
  • Ultra-long-context slide modeling: The LongNet-based slide encoder uses dilated attention with segment lengths up to 1,048,576 tokens, enabling it to process sequences of tens of thousands of tiles and capture global slide structure without tile subsampling.
  • Dual-granularity embeddings: Produces both patch-level and slide-level embeddings, supporting a broad range of downstream tasks from local morphology classification to whole-slide phenotype prediction.
  • Broad benchmark superiority: Outperforms all comparison models on 25 of 26 tasks spanning 9 cancer subtyping and 17 pathomics prediction challenges, with statistically significant improvements on 18 tasks.
  • Open weights for research: Model weights for both tile and slide encoders are publicly available on HuggingFace under a research-use license, with inference code and demo notebooks on GitHub.

Technical Details

Prov-GigaPath consists of two independently pretrained components with a combined parameter count of approximately 1.2 billion. The tile encoder is a Vision Transformer with ViT-g/14 architecture (~1.13 billion parameters), pretrained using DINOv2 — a self-supervised method based on self-distillation without labels. DINOv2 pretraining used a base learning rate of 4e-3 with an effective batch size of 384. The resulting tile encoder produces dense visual embeddings from 256x256 pixel patches extracted at 0.5 microns per pixel resolution.

The slide encoder (~86 million trainable parameters) is based on LongNet, an architecture purpose-built for ultra-long sequence modeling via dilated attention. Tile embeddings are serialized in row-major order across the WSI and fed as a sequence into the slide encoder, which applies dilated attention at ratios of [1, 2, 4, 8, 16] and segment lengths of [1024, 5792, 32768, 185363, 1048576]. Slide-level pretraining uses a masked autoencoder objective, masking random subsets of tile embeddings and reconstructing them from context. Training data spans 31 tissue types from Providence's integrated health network, distinguishing this model from predecessors trained on TCGA or other research repositories, and providing exposure to scanner variability, staining protocol differences, and clinical artifact distributions at realistic scale.

Applications

Prov-GigaPath is applicable across the full spectrum of computational pathology tasks. Oncology researchers use it for cancer subtyping — classifying histological subtypes such as lung adenocarcinoma versus squamous cell carcinoma directly from WSI morphology. Translational researchers apply it to pathomics: predicting molecular features including mutation status, microsatellite instability, gene expression signatures, and treatment biomarkers from slide appearance alone, without requiring separate molecular assays. Pathology AI developers use the pretrained tile and slide embeddings as frozen feature extractors, training lightweight downstream classifiers that require far less labeled data than end-to-end approaches. The dual-granularity embedding design means a single pretrained model can serve tasks ranging from patch-level tissue segmentation to patient-level prognosis.

Impact

Prov-GigaPath represents a methodological advance in digital pathology by demonstrating that foundation model pretraining at genuine clinical scale — rather than curated research datasets — yields substantially improved and more generalizable representations. Its Nature publication and open model release have made it a reference point for subsequent work in computational pathology. Key limitations are worth noting: the model weights are released for research use only and have not undergone regulatory review for clinical deployment. Performance on populations outside the U.S. health network, or on slides prepared with substantially different staining protocols or scanner hardware, may differ from reported benchmarks. The benchmark evaluation is also weighted toward H&E-based tasks, and IHC-specific performance is less thoroughly characterized. Stain normalization sensitivity, common across pathology vision models, remains a practical consideration when applying the model to out-of-distribution data.

Citation

A whole-slide foundation model for digital pathology from real-world data

Xu, H., et al. (2024) A whole-slide foundation model for digital pathology from real-world data. Nature.

DOI: 10.1038/s41586-024-07441-w

Metrics

GitHub

Stars596
Forks99
Open Issues74
Contributors3
Last Push11mo ago
LanguagePython
LicenseApache-2.0

Citations

Total Citations744
Influential75
References35

HuggingFace

Downloads56.4K
Likes165
Last Modified1y ago
Pipelineimage-feature-extraction

Tags

vision transformerfoundation modelself-supervisedcancerhistologywhole-slide imaging

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace Model