bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Pathology

PLUTO

PathAI

A lightweight 22M-parameter ViT-S pathology foundation model pre-trained on 195M tiles, handling tasks from subcellular segmentation to slide-level prediction.

Released: May 2024
Parameters: 22 Million

PLUTO (Pathology-Universal Transformer) is a lightweight self-supervised foundation model for computational pathology, developed by PathAI and posted to arXiv in May 2024. It was designed to address a practical tension in digital pathology: the most capable foundation models are large and trained on enormous private corpora, yet real-world pathology workflows require predictions across radically different spatial scales — from individual nuclei to entire whole-slide images (WSIs). PLUTO aims to provide a single, compact feature extractor that performs well across all of these scales rather than excelling at only one.

A defining characteristic of PLUTO is its efficiency. Built on a Vision Transformer Small (ViT-S) backbone with roughly 22 million parameters, it is far smaller than contemporaries such as Virchow (632M-1.9B) or H-optimus-0 (1.1B). Despite this, the authors report that PLUTO matches or outperforms task-specific baselines and larger pathology foundation models on a diverse set of internal and external benchmarks, demonstrating that architectural design and training strategy can substitute for raw parameter count and dataset scale.

The model was developed by a team of roughly 33 researchers led by Dinkar Juyal, with Andrew H. Beck as senior author. PLUTO represents PathAI's first published foundation model and underpins the company's later, larger PLUTO-4 series (2025), which scales the same multi-scale training recipe up to a 1.1B-parameter variant.

#Key Features

  • Multi-scale by design: A single model serves subcellular instance segmentation, tile-level classification, and slide-level prediction, rather than requiring separate task-specific architectures for each scale.
  • Lightweight backbone: The ViT-S encoder has only ~22M parameters, enabling fast, cost-effective inference and deployment relative to billion-parameter pathology models.
  • Flexible patch sizes (FlexiViT): Patch sizes are dynamically selected from {8, 16, 32} during training, letting the model operate across multiple magnifications (0.25, 0.5, 1, and 2 microns per pixel) and adapt to the resolution a task demands.
  • Composite self-supervised objective: Training combines DINOv2 and iBOT losses with an added Masked Autoencoder (MAE) reconstruction term and a Fourier-domain loss, encouraging representations that capture both global context and fine high-frequency morphological detail.
  • Highly diverse pretraining corpus: The 195M training tiles span 11 scanners, four stain groups (H&E FFPE, H&E frozen, IHC with 100+ stains, and special stains), 16 tissue groups, and 28 disease areas, improving robustness to site-to-site variation.

#Technical Details

PLUTO uses ViT-S student and teacher encoders (~22M parameters) with a shallower MAE decoder. The self-supervised objective sums four terms: a DINO loss, an iBOT masked-image-modeling loss, an MAE reconstruction loss, and a Fourier-based loss. Multi-scale masking is implemented by tying mask sizes to FlexiViT patch sizes dynamically chosen from {8, 16, 32}, and tiles are sampled at four resolutions (0.25, 0.5, 1, and 2 mpp) so the model learns features at multiple magnifications. Pretraining used 195 million image tiles drawn from 158,852 whole-slide images sourced from more than 50 distinct sites.

On benchmark evaluations, PLUTO reports 90.2 F1 (in-domain) and 86.1 F1 (out-of-domain) on NSCLC slide-level classification, 96.6% accuracy on the CRC-100K tile classification benchmark, state-of-the-art gland instance segmentation on GlaS (91.2 DICE, 84.5 IoU), and 67.1 bonded panoptic quality (bPQ) on PanNuke nuclei segmentation. These results are achieved while using orders-of-magnitude fewer parameters and training tiles than several competing pathology foundation models.

#Applications

PLUTO is intended as a general-purpose feature backbone for building downstream pathology models across spatial scales. Computational pathology teams can use its embeddings for tile-level cancer detection and tissue classification, attention-based aggregation for slide-level diagnosis and biomarker prediction, and dense outputs for gland and nuclei instance segmentation — all from one frozen encoder. Its small footprint makes it attractive for high-throughput screening, latency-sensitive workflows, and settings with limited GPU resources, where billion-parameter models are impractical. The breadth of its training data (multiple stains, scanners, and tissue types) supports use in translational research and biopharma pipelines that span heterogeneous datasets.

#Impact

PLUTO offered an influential counterpoint to the prevailing "bigger is better" trend in pathology foundation models, showing that a 22M-parameter model with a carefully composed multi-scale, multi-objective training recipe can rival or beat far larger systems. This efficiency argument is significant for a field where deployment cost and cross-site robustness often matter as much as peak benchmark accuracy. The work established the design foundation for PathAI's subsequent PLUTO-4 models, which extend the same FlexiViT multi-scale approach to larger scales and broader clinical integration. Key limitations include that the paper is a preprint without a peer-reviewed venue at time of writing, the pretraining data is proprietary and not publicly released, and the model — like other pathology foundation models — has not received regulatory clearance and requires independent validation before any clinical diagnostic use.

Citation

PLUTO: Pathology-Universal Transformer

Preprint

Juyal, D., et al. (2024) PLUTO: Pathology-Universal Transformer. arXiv.org.

DOI: 10.48550/arXiv.2405.07905

Citations

Total Citations29

Openness

Unclassified
Restrictive license on core components

Tags

digital_pathologyfoundation_modelhistologyinstance_segmentationself_supervisedslide_level_predictiontile_classificationvision_transformerwhole_slide_imaging

Resources

Research PaperOfficial Website