bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Pathology foundation models
PathologySpatial omics

SpaFoundation

Central South University

An 80M-parameter histology Vision Transformer foundation model that predicts spatial gene expression from H&E tissue images and transfers to tumor detection and spatial clustering.

Released: August 2025
Parameters: 80 Million

Spatial transcriptomics (ST) jointly profiles gene expression and spatial context alongside histological images, but the technology remains costly and time-consuming, limiting routine clinical use. A practical workaround is to infer gene expression directly from inexpensive hematoxylin-and-eosin (H&E) tissue images, yet prior computational approaches have been constrained by limited accuracy and spatial resolution, often a consequence of small training sets and modest model capacity.

SpaFoundation, introduced in August 2025 by researchers at Central South University (Changsha, China), addresses this gap with a large-scale histology foundation model purpose-built to predict spatial gene expression from tissue images. Rather than training a task-specific predictor from scratch, it learns generalizable histological representations through domain-specific self-supervised pretraining, then applies them to spatial gene expression inference and related downstream tasks with minimal or no fine-tuning.

Within the landscape of histology foundation models, SpaFoundation is distinguished by its explicit focus on spatial omics: it couples a general-purpose image encoder with the goal of high-resolution, transferable spatial gene expression prediction, positioning it alongside contemporaries such as BRIDGE that bridge histology and spatial transcriptomics.

#Key Features

  • Spatial gene expression from H&E alone: Predicts spatial gene expression directly from standard tissue images, sidestepping the cost and turnaround of running spatial transcriptomics assays.
  • Self-distillation plus masked image modeling: Combines self-distillation with masked image modeling (MIM) so the encoder captures both high-level semantic representations and fine-grained structural features that enrich per-spot representations.
  • Strong transferability: Pretrained representations transfer to downstream tasks including tumor detection and spatial domain clustering with minimal or zero-shot fine-tuning.
  • Resolution flexibility: Validation across 117 samples demonstrates flexibility across different spatial resolutions, including high-resolution inference.
  • Open weights and code: Implementation and pretrained weights are publicly released under an MIT license on GitHub and Hugging Face.

#Technical Details

SpaFoundation employs a teacher-student Vision Transformer (ViT) architecture that models dependencies among image patches, using an iBOT-style objective that jointly applies self-distillation and masked image modeling. The model has 80 million parameters and is pretrained on 1.79 million histology patches (the GitHub README cites approximately 1.84 million) spanning 26 tissue types, drawn from the HEST-1K spatial transcriptomics resource, which aggregates data from multiple platforms (including Spatial Transcriptomics, Visium, and Xenium) across human and mouse tissue. The authors validate the model on 117 samples and report that it consistently outperforms state-of-the-art baselines across four downstream tasks: spatial gene expression prediction, high-resolution gene expression inference, tumor detection, and spatial domain clustering. Downstream tumor-detection evaluation uses a cutaneous squamous cell carcinoma (cSCC) dataset (GEO accession GSE144240).

#Applications

SpaFoundation is aimed at researchers and pathologists who want spatial molecular insight without the expense of full spatial transcriptomics experiments. By inferring gene expression from routine H&E slides, it can extend molecular characterization to large image archives, support virtual ST for cohorts where sequencing is impractical, and provide transferable features for tumor detection and tissue-region clustering. Its open weights make it a candidate encoder for computational pathology and spatial omics pipelines that need a histology backbone tuned for expression-related tasks.

#Impact

By demonstrating that domain-specific pretraining on roughly 1.79 million histology patches yields representations that beat task-specific baselines across several spatial omics tasks, SpaFoundation reinforces a broader trend toward foundation-model-driven inference of spatial gene expression from cheap imaging. Released openly with code and weights, it lowers the barrier for groups exploring image-to-expression prediction. As a recent preprint, its real-world adoption and independent benchmarking are still emerging, and reported gains should be read in the context of the authors' own evaluation; the model's reliance on H&E appearance also means inferred expression remains a prediction rather than a measurement.

Citation

Preprint

DOI: 10.1101/2025.08.07.669202

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
59Partial
Usability — can I run it?83
Reproducibility — can I retrain it?42
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

foundation_modelgene_expression_predictionhistologyrepresentation_learningself_supervisedspatial_transcriptomicstumor_detectionvision_transformerzero_shot

Resources

GitHub RepositoryResearch PaperHuggingFace ModelDataset