bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
PathologySpatial omics

H2O

Tencent AI for Life Science Lab / Fudan University / University of Science and Technology of China

A foundation model that predicts spatial transcriptomics and proteomics directly from routine H&E whole-slide images using a vision transformer aligned with a language model.

Released: April 2026

H2O (Histopathology to Omics) is a foundation model that bridges the modality gap between routine histopathology and spatial multi-omics, inferring spatial transcriptomics (ST) and spatial proteomics (SP) landscapes directly from hematoxylin and eosin (H&E) whole-slide images. Spatial omics assays resolve gene and protein expression in their native tissue context, but they remain expensive, technically demanding, and difficult to scale across the millions of H&E slides generated in clinical practice. H2O addresses this gap by learning to read molecular signal out of the morphological patterns pathologists already observe, turning a ubiquitous, low-cost imaging modality into a proxy for high-dimensional spatial profiling.

The model was developed by researchers at the Tencent AI for Life Science Lab in collaboration with Fudan University and the University of Science and Technology of China, and released as a bioRxiv preprint in April 2026. Its central idea is to align histological morphology with semantic molecular knowledge: a vision transformer encodes tissue patches while a language model supplies a molecular semantic space, and contrastive learning ties the two together so that visual features map onto interpretable expression patterns.

H2O is positioned among a fast-growing class of histology-to-omics models (such as OmiCLIP and related visual–omics foundation models), but distinguishes itself by spanning both transcriptomics and proteomics within a single framework and by recovering biologically meaningful cell–cell signaling rather than expression alone.

#Key Features

  • H&E-to-multi-omics inference: Predicts both spatial transcriptomics and spatial proteomics from standard H&E images, removing the need for a dedicated spatial assay to obtain a first-pass molecular landscape.
  • Vision–language alignment: Couples a vision transformer image encoder with a large language model via contrastive learning, grounding histological features in a semantic molecular space rather than a purely numerical regression target.
  • Cell–cell communication recovery: Reconstructs biologically meaningful signaling axes (for example the MIF–CD74/CD44 axis) directly from morphology, pointing to interactions that would otherwise require molecular profiling.
  • Pan-tissue generalization: Trained across 25 organs and cancer types, the model transfers to unseen cohorts spanning fetal and pediatric thymus, metastatic lymph node, and breast cancer tissue.

#Technical Details

H2O is a multimodal architecture that integrates a Vision Transformer (ViT) image encoder with a Large Language Model (LLM) text/semantic encoder, trained with a contrastive objective that aligns histological morphology to molecular semantics. This design lets the model embed H&E patches and molecular descriptions into a shared space, so that expression profiles can be inferred from image features at inference time. The model was trained on roughly 1.3 million paired H&E–spatial patches drawn from a pan-tissue corpus covering 25 organs and cancer types, giving it broad morphological and molecular coverage. Reported evaluations indicate strong predictive accuracy for omics expression from histology and robust generalization when applied to additional public cohorts (fetal and pediatric thymus, human metastatic lymph node, and breast cancer), where its inferred profiles remained biologically concordant with known biology.

#Applications

H2O is aimed at researchers and computational pathologists who want spatial molecular context from slides that have only been imaged with H&E. Because H&E is the default, low-cost stain used across virtually all clinical and biobank tissue, the model can retrospectively enrich large archives with predicted transcriptomic and proteomic maps, screen cohorts for molecular phenotypes before committing to expensive spatial assays, and surface candidate cell–cell signaling events for follow-up validation. Demonstrated use cases span oncology (breast cancer, metastatic lymph node) and developmental biology (fetal and pediatric thymus).

#Impact

By unifying spatial transcriptomics and proteomics prediction within a single histology-anchored foundation model, H2O extends the histology-to-omics paradigm beyond transcriptomics alone and toward interpretable, communication-level biology. Its ability to recover known signaling axes from morphology suggests that routine slides carry more recoverable molecular information than previously exploited, with potential to make spatial multi-omics analysis more scalable and accessible. As a 2026 preprint, its benchmarks and downstream adoption are still being established, and predictions inferred from H&E should be treated as hypotheses for experimental confirmation rather than replacements for direct molecular measurement.

Tags

gene_expressionspatial_omics_predictionvision_transformertransformerfoundation_modelcontrastive_learningmultimodalhistologyspatial_transcriptomics