Technical University of Munich
Pathology foundation model that fuses global patch and cell-level tokens via joint-weighted attention pooling for H&E-based biomarker detection.
JWTH (Joint-Weighted Token Hierarchy) is a pathology foundation model (PFM) designed to infer molecular biomarkers directly from routine hematoxylin and eosin (H&E) whole-slide images. It was introduced in the November 2025 arXiv preprint "From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection" (arXiv:2511.05150) by Jingsong Liu, Han Li, Nassir Navab, and Peter J. Schüffler at the Technical University of Munich, spanning the Institute of Pathology, the Computer Aided Medical Procedures (CAMP) group, the Munich Center for Machine Learning (MCML), and the Munich Data Science Institute (MDSI).
The model targets a recognized limitation of existing pathology foundation models: most rely on global, patch-level embeddings and overlook the cell-level morphology that pathologists use to read tissue. AI-based biomarkers promise to predict molecular features, such as receptor status or mutational state, from inexpensive H&E slides rather than from slow and costly molecular assays, but coarse global representations can miss the cellular detail that drives those predictions. JWTH addresses this by combining large-scale self-supervised pretraining with a cell-centric post-tuning stage and an attention-pooling inference strategy that fuses local (cellular) and global (contextual) tokens into a single hierarchical representation.
JWTH is built on a vision-transformer backbone pretrained with a combination of self-supervised objectives: DINO and iBOT self-distillation together with Gram-anchoring to stabilize learned features. Pretraining used approximately 84 million H&E image patches extracted from about 84,000 whole-slide images from the Technical University of Munich. The defining architectural contribution is the inference-time joint-weighted token hierarchy: instead of using only the global class token (as in conventional linear probing) the model applies attention pooling to fuse the class token with local patch tokens, producing a representation that captures both tissue context and cellular detail. On a benchmark of four tasks covering four biomarkers across eight patient cohorts, JWTH achieves up to 8.3% higher balanced accuracy and a 1.2% average gain over prior PFMs, outperforming UNI, Virchow2, CONCH, and the general-purpose DINOv2 and DINOv3 backbones.
JWTH is aimed at computational pathology researchers and translational teams developing H&E-based biomarker assays. Because it predicts molecular features from routine stained slides, it could support cheaper and faster pre-screening for receptor status, mutational signatures, and other clinically relevant markers, and could in principle be applied retrospectively to archived slide collections. The attention-pooling design also surfaces which cellular regions drive a prediction, which is valuable for interpretability and for building pathologist trust in AI-derived biomarkers.
JWTH contributes to the active effort to make pathology foundation models attend to cellular morphology rather than only global tissue context, and its consistent gains over established PFMs such as UNI, Virchow2, and CONCH suggest that joint global-cellular token fusion is a productive direction for interpretable, robust biomarker detection. As an arXiv preprint, its results await peer review and independent validation. Importantly, the training corpus is private TUM clinical data and, at the time of writing, there is no public code repository, no released model weights, and no HuggingFace artifact, which currently limits independent reproduction and downstream reuse.
Liu, J., et al. (2025) From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection. arXiv.org.
DOI: 10.48550/arXiv.2511.05150Papers that recently cited this model.
Zhengyang Xu, Han Li, Jingsong Liu, et al.
Mar 2026
The most-cited papers that cite this model.
Zhengyang Xu, Han Li, Jingsong Liu, et al.
Mar 2026
Share of papers citing this model.