JWTH (Joint-Weighted Token Hierarchy) is a pathology foundation model (PFM) designed to infer molecular biomarkers directly from routine hematoxylin and eosin (H&E) whole-slide images. It was introduced in the November 2025 arXiv preprint "From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection" (arXiv:2511.05150) by Jingsong Liu, Han Li, Nassir Navab, and Peter J. Schüffler at the Technical University of Munich, spanning the Institute of Pathology, the Computer Aided Medical Procedures (CAMP) group, the Munich Center for Machine Learning (MCML), and the Munich Data Science Institute (MDSI).

The model targets a recognized limitation of existing pathology foundation models: most rely on global, patch-level embeddings and overlook the cell-level morphology that pathologists use to read tissue. AI-based biomarkers promise to predict molecular features, such as receptor status or mutational state, from inexpensive H&E slides rather than from slow and costly molecular assays, but coarse global representations can miss the cellular detail that drives those predictions. JWTH addresses this by combining large-scale self-supervised pretraining with a cell-centric post-tuning stage and an attention-pooling inference strategy that fuses local (cellular) and global (contextual) tokens into a single hierarchical representation.

Key Features

Joint global-cellular representation: Fuses global patch-level tokens with cell-centric local tokens, bridging tissue-context and single-cell morphology rather than relying on global embeddings alone.
Attention-pooling token fusion: At inference, an attention-pooling strategy jointly weights the class token and local tokens, letting the model emphasize the cellular regions most relevant to each biomarker.
Cell-centric post-tuning: After self-supervised pretraining, a dedicated post-tuning stage refocuses representations on cell-level features, improving sensitivity to morphology over standard linear probing.
Large-scale H&E pretraining: Pretrained on roughly 84 million H&E patches drawn from about 84,000 whole-slide images, providing broad coverage of tissue appearance.
Strong biomarker accuracy: Reports up to 8.3% higher balanced accuracy and a 1.2% average improvement over prior pathology foundation models across its evaluation suite.

Technical Details

JWTH is built on a vision-transformer backbone pretrained with a combination of self-supervised objectives: DINO and iBOT self-distillation together with Gram-anchoring to stabilize learned features. Pretraining used approximately 84 million H&E image patches extracted from about 84,000 whole-slide images from the Technical University of Munich. The defining architectural contribution is the inference-time joint-weighted token hierarchy: instead of using only the global class token (as in conventional linear probing) the model applies attention pooling to fuse the class token with local patch tokens, producing a representation that captures both tissue context and cellular detail. On a benchmark of four tasks covering four biomarkers across eight patient cohorts, JWTH achieves up to 8.3% higher balanced accuracy and a 1.2% average gain over prior PFMs, outperforming UNI, Virchow2, CONCH, and the general-purpose DINOv2 and DINOv3 backbones.

Applications

JWTH is aimed at computational pathology researchers and translational teams developing H&E-based biomarker assays. Because it predicts molecular features from routine stained slides, it could support cheaper and faster pre-screening for receptor status, mutational signatures, and other clinically relevant markers, and could in principle be applied retrospectively to archived slide collections. The attention-pooling design also surfaces which cellular regions drive a prediction, which is valuable for interpretability and for building pathologist trust in AI-derived biomarkers.

Impact

JWTH contributes to the active effort to make pathology foundation models attend to cellular morphology rather than only global tissue context, and its consistent gains over established PFMs such as UNI, Virchow2, and CONCH suggest that joint global-cellular token fusion is a productive direction for interpretable, robust biomarker detection. As an arXiv preprint, its results await peer review and independent validation. Importantly, the training corpus is private TUM clinical data and, at the time of writing, there is no public code repository, no released model weights, and no HuggingFace artifact, which currently limits independent reproduction and downstream reuse.

Citation

From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection

Preprint

Liu, J., et al. (2025) From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection. arXiv.org.

DOI: 10.48550/arXiv.2511.05150

Key Features

Joint global-cellular representation: Fuses global patch-level tokens with cell-centric local tokens, bridging tissue-context and single-cell morphology rather than relying on global embeddings alone.

Attention-pooling token fusion: At inference, an attention-pooling strategy jointly weights the class token and local tokens, letting the model emphasize the cellular regions most relevant to each biomarker.

Cell-centric post-tuning: After self-supervised pretraining, a dedicated post-tuning stage refocuses representations on cell-level features, improving sensitivity to morphology over standard linear probing.

Large-scale H&E pretraining: Pretrained on roughly 84 million H&E patches drawn from about 84,000 whole-slide images, providing broad coverage of tissue appearance.

Strong biomarker accuracy: Reports up to 8.3% higher balanced accuracy and a 1.2% average improvement over prior pathology foundation models across its evaluation suite.

Technical Details

Applications

Impact

Citation

From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection

Preprint

Liu, J., et al. (2025) From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection. arXiv.org.

DOI: 10.48550/arXiv.2511.05150

JWTH

Key Features

Technical Details

Applications

Impact

Citation

From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection

Recent citations

MMNavAgent: Multi-Magnification WSI Navigation Agent for Clinically Consistent Whole-Slide Analysis

Top citations

MMNavAgent: Multi-Magnification WSI Navigation Agent for Clinically Consistent Whole-Slide Analysis

Citations

Fields of citing research

Openness

Resources

JWTH

Key Features

Technical Details

Applications

Impact

Citation

From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection

Recent citations

MMNavAgent: Multi-Magnification WSI Navigation Agent for Clinically Consistent Whole-Slide Analysis

Top citations

MMNavAgent: Multi-Magnification WSI Navigation Agent for Clinically Consistent Whole-Slide Analysis

Citations

Fields of citing research

Openness

Resources

JWTH

#Key Features

#Technical Details

#Applications

#Impact

Citation

From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection

Recent citations

MMNavAgent: Multi-Magnification WSI Navigation Agent for Clinically Consistent Whole-Slide Analysis

Top citations

MMNavAgent: Multi-Magnification WSI Navigation Agent for Clinically Consistent Whole-Slide Analysis

Citations

Fields of citing research

Openness

Resources

JWTH

#Key Features

#Technical Details

#Applications

#Impact

Citation

From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection

Recent citations

MMNavAgent: Multi-Magnification WSI Navigation Agent for Clinically Consistent Whole-Slide Analysis

Top citations

MMNavAgent: Multi-Magnification WSI Navigation Agent for Clinically Consistent Whole-Slide Analysis

Citations

Fields of citing research

Openness

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact