A BEiT-based self-supervised foundation model pretrained on 11M+ histopathology image tiles for cancer diagnosis, subtyping, and survival prediction.
BEPH (BEiT-based Pre-training on Histopathological images) is a self-supervised foundation model for computational pathology that learns transferable visual representations from hematoxylin and eosin (H&E) stained tissue images. Developed by the Yu Lab at Shanghai Jiao Tong University and published in Nature Communications in March 2025, BEPH addresses a central bottleneck in digital pathology: most clinically relevant tasks have only modest amounts of labeled data, making it difficult to train accurate models from scratch. By pretraining on millions of unlabeled image tiles, BEPH provides a general-purpose encoder that can be efficiently fine-tuned for a wide range of cancer-related tasks.
The model is notable for adapting the BEiT v2 masked image modeling paradigm, originally developed for natural images, to gigapixel histopathology. Rather than predicting raw pixels, BEiT-style pretraining reconstructs discrete visual tokens for masked patches, which encourages the encoder to learn high-level semantic structure relevant to tissue morphology. BEPH demonstrates that this approach yields representations that transfer well across cancer types and across the patch, whole-slide, and prognostic levels of analysis.
BEPH fits alongside other pathology foundation models such as UNI, CONCH, and Virchow, but distinguishes itself with a deliberately lightweight design (an ~86M-parameter ViT-Base backbone) that lowers the barrier to local deployment and fine-tuning on commodity hardware.
BEPH uses a ViT-Base backbone (~86M parameters) pretrained with the BEiT v2 self-supervised objective, in which a VQ-KD visual tokenizer supplies discrete target tokens for masked patches. Pretraining data consist of 224×224 tiles sampled at 40× magnification with at least 75% tissue content, drawn from TCGA diagnostic slides. For downstream WSI tasks, tile features are aggregated using the CLAM attention-based multiple-instance-learning framework. Across reported benchmarks, BEPH reaches AUCs of 0.994 for renal cell carcinoma subtyping, 0.970 for non-small cell lung cancer subtyping, and 0.946 for breast cancer subtyping, and achieves concordance indices in the 0.59–0.71 range for survival prediction across six TCGA cohorts (BRCA, CRC, CCRCC, PRCC, LUAD, STAD). Code and pretrained weights are released under the GPL-3.0 license.
BEPH is designed for computational pathology researchers and clinical AI developers who need a strong starting point for cancer-image analysis. Typical use cases include detecting malignancy in tissue patches, classifying cancer subtypes from whole-slide images, and stratifying patients by predicted survival risk to support prognosis. Because the backbone is lightweight and the weights are openly available, smaller labs can fine-tune BEPH on their own annotated datasets without the compute demands of larger pathology foundation models, making it suitable for both methods research and translational pipeline development.
By showing that a BEiT-based masked-image-modeling backbone can rival or exceed contrastive and supervised baselines across diagnosis, subtyping, and survival tasks, BEPH reinforced masked image modeling as a viable pretraining strategy for histopathology. Its publication in Nature Communications, paired with an openly released GPL-3.0 codebase and pretrained weights, lowered the practical barrier to adopting foundation models in pathology research. The model's emphasis on a compact, deployable architecture offers a useful counterpoint to the trend toward ever-larger pathology encoders, with the main limitation being that its pretraining draws from TCGA, so generalization to other scanners, stains, and populations warrants further external validation.
Yang, Z., et al. (2024) A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images. bioRxiv.
DOI: 10.1038/s41467-025-57587-yPapers that recently cited this model.
The most-cited papers that cite this model.
Not enough data