Samsung Advanced Institute for Health Sciences and Technology / Samsung Medical Center / Sungkyunkwan University
A multi-task framework that predicts single-cell type composition and reconstructs spatial gene expression directly from H&E histology using a frozen pathology foundation model backbone.
SHEST (Single-cell-level H&E Spatial Transcriptomics) is a multi-task deep learning framework that infers cellular biology directly from routine hematoxylin and eosin (H&E) histology. From a stained tissue image alone, it both predicts the cell-type composition of the tissue and reconstructs spatially resolved gene expression at single-cell resolution — bridging conventional histopathology with spatial transcriptomics without requiring a molecular assay.
The model addresses a practical bottleneck in tumor microenvironment research: spatial transcriptomics platforms are costly and not yet routine in clinical pathology, whereas H&E slides are produced for nearly every tissue specimen. By learning the relationship between tissue morphology and underlying molecular and cellular state, SHEST extracts cell-type and expression information from the inexpensive, ubiquitous H&E modality, making single-cell-level spatial analysis more accessible.
SHEST was developed by Hoyeon Jeong, Junghan Oh, Donggeon Lee, Jae Hwan Kang, and Yoon-La Choi at the Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Samsung Medical Center, and Sungkyunkwan University in Seoul, South Korea. It was posted as a bioRxiv preprint in November 2025 and published in Briefings in Bioinformatics in 2026.
python he.py --wsi <file>), pairing Cellpose nuclear segmentation with the SHEST heads to output cell-level h5ad expression and GeoJSON annotations.SHEST uses a frozen H-optimus-0 vision transformer backbone — a large pathology foundation model trained on histology images — and adds two task-specific heads, one for cell-type prediction and one for gene-expression reconstruction. Inputs use a quadruple-tile strategy that aggregates morphological context around each segmented nucleus, with specialized clustering used to organize predictions. The model resolves six cell types in its lung adenocarcinoma setting: tumor (LUAD) cells, alveolar cells, macrophages, endothelial cells, fibroblasts, and lymphocytes. On held-out evaluation it reports F1 scores of 0.97 (tumor cells) and 0.91 (lymphocytes), and reconstructed expression reproduces known cell-type-specific marker patterns while preserving spatial relationships and multicellular niche structure. The released checkpoint combines the trained task heads with the frozen backbone; the implementation targets Python 3.10 and PyTorch 2.6.0 and uses Cellpose for nuclear segmentation.
SHEST is aimed at researchers and pathologists studying the tumor microenvironment who want spatially resolved cellular and molecular readouts without running an expensive spatial transcriptomics experiment. Because it operates on standard H&E whole-slide images, it can be applied retrospectively to archival slides to map cell-type composition and infer gene expression across a section, supporting tasks such as immune-infiltration assessment, niche characterization, and hypothesis generation for downstream molecular validation.
By demonstrating that single-cell-level cell typing and spatial expression can be recovered from H&E morphology, SHEST advances a growing line of work that repurposes ubiquitous histology images as a proxy for costly molecular assays. Its strong reported accuracy on tumor and immune cells, external validation, and openly released code and weights make it a practical reference point for histology-to-transcriptomics modeling. Key limitations follow from its training scope: the task heads were trained on lung adenocarcinoma with a fixed six-cell-type taxonomy, so performance on other tissues, cancer types, or cell populations will require further validation. The journal article is released under CC BY-NC 4.0, and the Hugging Face weights are gated behind a contact-sharing agreement.
Jeong, H., et al. (2025) SHEST: single-cell-level artificial intelligence from haematoxylin and eosin morphology for cell-type prediction and spatial transcriptomics reconstruction. bioRxiv.
DOI: 10.1101/2025.11.19.689364Jeong, H., et al. (2026) SHEST: single-cell-level artificial intelligence from haematoxylin and eosin morphology for cell-type prediction and spatial transcriptomics reconstruction. Briefings in Bioinformatics.
DOI: 10.1093/bib/bbag037Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data