GenBio-PathFM

Histopathology foundation model with 1.1B parameters, trained entirely on public data using JEDI, a dual-stage strategy combining JEPA and DINO.

Released: March 2026

Parameters: 1.1 Billion

Histopathology foundation models learn general-purpose representations from large collections of tissue images and have become the backbone of computational pathology, powering tasks from tumor subtyping to biomarker prediction. The leading models, however, are typically scaled up using enormous and often proprietary slide collections, which makes them expensive to train and difficult to reproduce. A recurring question is whether comparable performance can be reached with far less data and with a fully public training set.

GenBio-PathFM, released by GenBio AI in March 2026, is a 1.1-billion-parameter histopathology foundation model designed to answer that question. According to the authors it is the strongest open-weight histopathology model to date and the only state-of-the-art model trained exclusively on publicly available data. Its efficiency rests on two innovations: an automated data-curation pipeline that prioritizes morphological diversity rather than raw volume, and a dual-stage self-supervised learning strategy the authors call JEDI, combining a joint-embedding predictive (JEPA) objective with DINO-style self-distillation.

By matching top proprietary models while using a fraction of their training data and relying only on public sources, GenBio-PathFM positions itself as a reproducible, openly available backbone for the computational pathology community.

Key Features

JEDI dual-stage learning: A two-stage self-supervised strategy combines a JEPA predictive objective with DINO self-distillation to learn robust tissue representations.
Diversity-driven curation: An automated pipeline selects training tiles to maximize morphological diversity, improving data efficiency over volume-only scaling.
Public-data-only training: The model is trained exclusively on publicly available histopathology data, making the full data provenance transparent and reproducible.
Open weights: Pretrained weights are released on HuggingFace under the GenBio AI Community License, with reference code on GitHub.
State-of-the-art benchmarks: Across THUNDER, HEST, and PathoROB the model achieves leading accuracy and robustness, ranking top on more HEST subtasks than any other model.

Technical Details

GenBio-PathFM is a 1.1B-parameter vision-transformer-style image encoder. It accepts 224x224 RGB tiles and outputs a 4608-dimensional feature vector, with optional access to 196 patch tokens for dense tasks. Training uses the JEDI strategy, a dual-stage combination of JEPA and DINO objectives, on a curated subset of public histopathology images chosen for morphological diversity. On the THUNDER benchmark the model ties with H-Optimus-1, with both ranking first on 3 of 12 subtasks, and on HEST it is the top performer on more subtasks than any competing model; the authors also report strong robustness on PathoROB. Reported results are achieved with substantially less training data than comparable leading models. The preprint is released under CC-BY-NC-ND, weights under the GenBio AI Community License.

Applications

GenBio-PathFM is intended as a feature extractor for computational pathology workflows, including cancer subtyping, tissue and cell classification, gene-expression prediction from histology, and other slide- or tile-level analyses. Because its weights are openly available and it is trained on public data, it is well suited to academic groups and clinically oriented researchers who need a reproducible backbone and transparent data provenance. Downstream users typically freeze the encoder and train lightweight task heads on its embeddings.

Impact

By reaching state-of-the-art results on major histopathology benchmarks using only public data and a fraction of the usual training volume, GenBio-PathFM challenges the assumption that competitive pathology foundation models require massive proprietary corpora. Its open weights and reproducible training set make it a practical option for the community and a useful baseline for future work. Caveats remain: it is a recent preprint awaiting peer review, the model and weights carry non-commercial-style licensing, and benchmark leads such as the THUNDER tie are narrow and may shift as other models evolve.

Citation

GenBio-PathFM: A State-of-the-Art Foundation Model for Histopathology

Kapse, S., et al. (2026) GenBio-PathFM: A State-of-the-Art Foundation Model for Histopathology. bioRxiv.

DOI: 10.64898/2026.03.17.712534

Recent citations

Papers that recently cited this model.

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers
Alexey Pchelnikov, A. Pchelnikov
May 2026
0
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
Weiming Chen, Xitong Ling, Zhenyang Cai, et al.
May 2026
0

Top citations

The most-cited papers that cite this model.

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers
Alexey Pchelnikov, A. Pchelnikov
May 2026
0
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
Weiming Chen, Xitong Ling, Zhenyang Cai, et al.
May 2026
0

Citations

Total Citations2

Influential0

References31

GitHub

Stars37

Forks2

Open Issues1

Contributors2

Last Push3mo ago

LanguagePython

HuggingFace

Downloads501

Likes16

Last Modified3mo ago

Fields of citing research

Computer Science100%
Medicine100%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

21Closed

Usability — can I run it?26

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website HuggingFace Model

Key Features

JEDI dual-stage learning: A two-stage self-supervised strategy combines a JEPA predictive objective with DINO self-distillation to learn robust tissue representations.

Diversity-driven curation: An automated pipeline selects training tiles to maximize morphological diversity, improving data efficiency over volume-only scaling.

Public-data-only training: The model is trained exclusively on publicly available histopathology data, making the full data provenance transparent and reproducible.

Open weights: Pretrained weights are released on HuggingFace under the GenBio AI Community License, with reference code on GitHub.

State-of-the-art benchmarks: Across THUNDER, HEST, and PathoROB the model achieves leading accuracy and robustness, ranking top on more HEST subtasks than any other model.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Alexey Pchelnikov, A. Pchelnikov

May 2026

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Weiming Chen, Xitong Ling, Zhenyang Cai, et al.

May 2026

Top citations

The most-cited papers that cite this model.

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Alexey Pchelnikov, A. Pchelnikov

May 2026

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Weiming Chen, Xitong Ling, Zhenyang Cai, et al.

May 2026

GenBio-PathFM

Key Features

Technical Details

Applications

Impact

Citation

GenBio-PathFM: A State-of-the-Art Foundation Model for Histopathology

Recent citations

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Top citations

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

GenBio-PathFM

Key Features

Technical Details

Applications

Impact

Citation

GenBio-PathFM: A State-of-the-Art Foundation Model for Histopathology

Recent citations

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Top citations

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

GenBio-PathFM

#Key Features

#Technical Details

#Applications

#Impact

Citation

GenBio-PathFM: A State-of-the-Art Foundation Model for Histopathology

Recent citations

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Top citations

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

GenBio-PathFM

#Key Features

#Technical Details

#Applications

#Impact

Citation

GenBio-PathFM: A State-of-the-Art Foundation Model for Histopathology

Recent citations

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Top citations

CellDX AI Autopilot: Agent-Guided Training and Deployment of Pathology Classifiers

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact