bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Pathology foundation models
Pathology

GenBio-PathFM

genbio.ai

1.1B-parameter histopathology foundation model trained on public data with a JEDI (JEPA+DINO) dual-stage strategy, reaching state-of-the-art on THUNDER, HEST, and PathoROB.

Released: March 2026
Parameters: 1.1 Billion

Histopathology foundation models learn general-purpose representations from large collections of tissue images and have become the backbone of computational pathology, powering tasks from tumor subtyping to biomarker prediction. The leading models, however, are typically scaled up using enormous and often proprietary slide collections, which makes them expensive to train and difficult to reproduce. A recurring question is whether comparable performance can be reached with far less data and with a fully public training set.

GenBio-PathFM, released by GenBio AI in March 2026, is a 1.1-billion-parameter histopathology foundation model designed to answer that question. According to the authors it is the strongest open-weight histopathology model to date and the only state-of-the-art model trained exclusively on publicly available data. Its efficiency rests on two innovations: an automated data-curation pipeline that prioritizes morphological diversity rather than raw volume, and a dual-stage self-supervised learning strategy the authors call JEDI, combining a joint-embedding predictive (JEPA) objective with DINO-style self-distillation.

By matching top proprietary models while using a fraction of their training data and relying only on public sources, GenBio-PathFM positions itself as a reproducible, openly available backbone for the computational pathology community.

#Key Features

  • JEDI dual-stage learning: A two-stage self-supervised strategy combines a JEPA predictive objective with DINO self-distillation to learn robust tissue representations.
  • Diversity-driven curation: An automated pipeline selects training tiles to maximize morphological diversity, improving data efficiency over volume-only scaling.
  • Public-data-only training: The model is trained exclusively on publicly available histopathology data, making the full data provenance transparent and reproducible.
  • Open weights: Pretrained weights are released on HuggingFace under the GenBio AI Community License, with reference code on GitHub.
  • State-of-the-art benchmarks: Across THUNDER, HEST, and PathoROB the model achieves leading accuracy and robustness, ranking top on more HEST subtasks than any other model.

#Technical Details

GenBio-PathFM is a 1.1B-parameter vision-transformer-style image encoder. It accepts 224x224 RGB tiles and outputs a 4608-dimensional feature vector, with optional access to 196 patch tokens for dense tasks. Training uses the JEDI strategy, a dual-stage combination of JEPA and DINO objectives, on a curated subset of public histopathology images chosen for morphological diversity. On the THUNDER benchmark the model ties with H-Optimus-1, with both ranking first on 3 of 12 subtasks, and on HEST it is the top performer on more subtasks than any competing model; the authors also report strong robustness on PathoROB. Reported results are achieved with substantially less training data than comparable leading models. The preprint is released under CC-BY-NC-ND, weights under the GenBio AI Community License.

#Applications

GenBio-PathFM is intended as a feature extractor for computational pathology workflows, including cancer subtyping, tissue and cell classification, gene-expression prediction from histology, and other slide- or tile-level analyses. Because its weights are openly available and it is trained on public data, it is well suited to academic groups and clinically oriented researchers who need a reproducible backbone and transparent data provenance. Downstream users typically freeze the encoder and train lightweight task heads on its embeddings.

#Impact

By reaching state-of-the-art results on major histopathology benchmarks using only public data and a fraction of the usual training volume, GenBio-PathFM challenges the assumption that competitive pathology foundation models require massive proprietary corpora. Its open weights and reproducible training set make it a practical option for the community and a useful baseline for future work. Caveats remain: it is a recent preprint awaiting peer review, the model and weights carry non-commercial-style licensing, and benchmark leads such as the THUNDER tie are narrow and may shift as other models evolve.

Tags

representation_learningtissue_classificationvision_transformerfoundation_modelself_supervisedhistology