PLUTO

Compact 22M-parameter ViT-S pathology foundation model pre-trained on 195M tiles, spanning subcellular segmentation to slide-level prediction.

Released: May 2024

Parameters: 22 Million

PLUTO (Pathology-Universal Transformer) is a lightweight self-supervised foundation model for computational pathology, developed by PathAI and posted to arXiv in May 2024. It was designed to address a practical tension in digital pathology: the most capable foundation models are large and trained on enormous private corpora, yet real-world pathology workflows require predictions across radically different spatial scales — from individual nuclei to entire whole-slide images (WSIs). PLUTO aims to provide a single, compact feature extractor that performs well across all of these scales rather than excelling at only one.

A defining characteristic of PLUTO is its efficiency. Built on a Vision Transformer Small (ViT-S) backbone with roughly 22 million parameters, it is far smaller than contemporaries such as Virchow (632M-1.9B) or H-optimus-0 (1.1B). Despite this, the authors report that PLUTO matches or outperforms task-specific baselines and larger pathology foundation models on a diverse set of internal and external benchmarks, demonstrating that architectural design and training strategy can substitute for raw parameter count and dataset scale.

The model was developed by a team of roughly 33 researchers led by Dinkar Juyal, with Andrew H. Beck as senior author. PLUTO represents PathAI's first published foundation model and underpins the company's later, larger PLUTO-4 series (2025), which scales the same multi-scale training recipe up to a 1.1B-parameter variant.

Key Features

Multi-scale by design: A single model serves subcellular instance segmentation, tile-level classification, and slide-level prediction, rather than requiring separate task-specific architectures for each scale.
Lightweight backbone: The ViT-S encoder has only ~22M parameters, enabling fast, cost-effective inference and deployment relative to billion-parameter pathology models.
Flexible patch sizes (FlexiViT): Patch sizes are dynamically selected from {8, 16, 32} during training, letting the model operate across multiple magnifications (0.25, 0.5, 1, and 2 microns per pixel) and adapt to the resolution a task demands.
Composite self-supervised objective: Training combines DINOv2 and iBOT losses with an added Masked Autoencoder (MAE) reconstruction term and a Fourier-domain loss, encouraging representations that capture both global context and fine high-frequency morphological detail.
Highly diverse pretraining corpus: The 195M training tiles span 11 scanners, four stain groups (H&E FFPE, H&E frozen, IHC with 100+ stains, and special stains), 16 tissue groups, and 28 disease areas, improving robustness to site-to-site variation.

Technical Details

PLUTO uses ViT-S student and teacher encoders (~22M parameters) with a shallower MAE decoder. The self-supervised objective sums four terms: a DINO loss, an iBOT masked-image-modeling loss, an MAE reconstruction loss, and a Fourier-based loss. Multi-scale masking is implemented by tying mask sizes to FlexiViT patch sizes dynamically chosen from {8, 16, 32}, and tiles are sampled at four resolutions (0.25, 0.5, 1, and 2 mpp) so the model learns features at multiple magnifications. Pretraining used 195 million image tiles drawn from 158,852 whole-slide images sourced from more than 50 distinct sites.

On benchmark evaluations, PLUTO reports 90.2 F1 (in-domain) and 86.1 F1 (out-of-domain) on NSCLC slide-level classification, 96.6% accuracy on the CRC-100K tile classification benchmark, state-of-the-art gland instance segmentation on GlaS (91.2 DICE, 84.5 IoU), and 67.1 bonded panoptic quality (bPQ) on PanNuke nuclei segmentation. These results are achieved while using orders-of-magnitude fewer parameters and training tiles than several competing pathology foundation models.

Applications

PLUTO is intended as a general-purpose feature backbone for building downstream pathology models across spatial scales. Computational pathology teams can use its embeddings for tile-level cancer detection and tissue classification, attention-based aggregation for slide-level diagnosis and biomarker prediction, and dense outputs for gland and nuclei instance segmentation — all from one frozen encoder. Its small footprint makes it attractive for high-throughput screening, latency-sensitive workflows, and settings with limited GPU resources, where billion-parameter models are impractical. The breadth of its training data (multiple stains, scanners, and tissue types) supports use in translational research and biopharma pipelines that span heterogeneous datasets.

Impact

PLUTO offered an influential counterpoint to the prevailing "bigger is better" trend in pathology foundation models, showing that a 22M-parameter model with a carefully composed multi-scale, multi-objective training recipe can rival or beat far larger systems. This efficiency argument is significant for a field where deployment cost and cross-site robustness often matter as much as peak benchmark accuracy. The work established the design foundation for PathAI's subsequent PLUTO-4 models, which extend the same FlexiViT multi-scale approach to larger scales and broader clinical integration. Key limitations include that the paper is a preprint without a peer-reviewed venue at time of writing, the pretraining data is proprietary and not publicly released, and the model — like other pathology foundation models — has not received regulatory clearance and requires independent validation before any clinical diagnostic use.

Citation

PLUTO: Pathology-Universal Transformer

Preprint

Juyal, D., et al. (2024) PLUTO: Pathology-Universal Transformer. arXiv.org.

DOI: 10.48550/arXiv.2405.07905

Recent citations

Papers that recently cited this model.

Recommendation Statement for the Validation, Implementation, and Clinical Application of Artificial Intelligence Within a Clinical Laboratory from the Digital Pathology Association
N. Silberman, Anil Parwani, David S McClintock, et al.
AI in Precision Oncology · Jun 2026
0
Data- and knowledge-driven multimodal learning in computational pathology: A comprehensive survey
Mingxin Liu, Chengfei Cai, Deping Chen, et al.
EngMedicine · Jun 2026
0
Foundation Models in Cancer Pathology: Techniques, Applications, and Future Directions
Bo Zhang, Victor Cui, Tong Wu, et al.
Research · May 2026
0

Top citations

The most-cited papers that cite this model.

Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology
Eric Zimmermann, E. Vorontsov, Julian Viret, et al.
arXiv.org · Aug 2024
202
RudolfV: A Foundation Model by Pathologists for Pathologists
Jonas Dippel, Barbara Feulner, Tobias Winterhoff, et al.
arXiv.org · Jan 2024
69
A multimodal whole-slide foundation model for pathology
Tong Ding, Sophia J. Wagner, Andrew H. Song, et al.
Nature Medicine · Nov 2025
58
Artificial intelligence in digital pathology — time for a reality check
Arpit Aggarwal, Satvika Bharadwaj, Germán Corredor, et al.
Nature Reviews Clinical Oncology · Feb 2025
46
From Classical Machine Learning to Emerging Foundation Models: Review on Multimodal Data Integration for Cancer Research
A. Muneer, M. Waqas, Maliazurina B. Saad, et al.
Artificial Intelligence Review · Jul 2025
19

Citations

Total Citations31

Influential1

References57

Fields of citing research

Medicine94%
Computer Science91%
Biology16%
Engineering9%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

6Closed

Usability — can I run it?7

Reproducibility — can I retrain it?5

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper Official Website

Key Features

Multi-scale by design: A single model serves subcellular instance segmentation, tile-level classification, and slide-level prediction, rather than requiring separate task-specific architectures for each scale.

Lightweight backbone: The ViT-S encoder has only ~22M parameters, enabling fast, cost-effective inference and deployment relative to billion-parameter pathology models.

Flexible patch sizes (FlexiViT): Patch sizes are dynamically selected from {8, 16, 32} during training, letting the model operate across multiple magnifications (0.25, 0.5, 1, and 2 microns per pixel) and adapt to the resolution a task demands.

Composite self-supervised objective: Training combines DINOv2 and iBOT losses with an added Masked Autoencoder (MAE) reconstruction term and a Fourier-domain loss, encouraging representations that capture both global context and fine high-frequency morphological detail.

Highly diverse pretraining corpus: The 195M training tiles span 11 scanners, four stain groups (H&E FFPE, H&E frozen, IHC with 100+ stains, and special stains), 16 tissue groups, and 28 disease areas, improving robustness to site-to-site variation.

Technical Details

Applications

Impact

PLUTO

#Key Features

#Technical Details

#Applications

#Impact

Citation

PLUTO: Pathology-Universal Transformer

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

PLUTO

#Key Features

#Technical Details

#Applications

#Impact

Citation

PLUTO: Pathology-Universal Transformer

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact