Lightweight multimodal foundation model integrating spatial transcriptomics and H&E histopathology with pathway activity scores for biologically grounded spatial niche discovery at single-cell resolution.
SpatialFusion is a lightweight multimodal foundation model from the Uhler Lab at MIT that integrates spatial transcriptomics, hematoxylin-and-eosin (H&E) histopathology imaging, and pathway-activity scores into a single representation for spatial niche discovery at single-cell resolution. Posted to bioRxiv in March 2026, SpatialFusion is unusual among multimodal foundation models for being deliberately small — under 300,000 parameters — and yet competitive with or exceeding much larger models on its target tasks.
The model identifies pre-malignant niches in colorectal cancer and metastasis-associated niches in lung cancer in published case studies in the preprint, demonstrating that biologically grounded niche discovery does not require enormous parameter counts when the modalities are well chosen and aligned.
SpatialFusion uses a compact transformer architecture with cross-modal attention layers fusing spatial gene-expression embeddings, image patch embeddings from H&E sections, and pathway-activity vectors. Training is self-supervised with reconstruction objectives across modalities. The bioRxiv preprint provides architectural details and benchmark comparisons against larger multimodal models including BioMedCLIP and Hibou-class pathology models.
The case studies span colorectal cancer (CRC) and lung adenocarcinoma datasets with paired ST and H&E data, using model-derived niche assignments to identify biologically and clinically meaningful spatial subpopulations.
SpatialFusion is suited for spatial-biology research groups that need integrated multimodal analysis without the GPU footprint of larger foundation models. Its parameter efficiency makes it tractable to fine-tune on dataset-specific contexts. Applications include cancer microenvironment characterization, niche-based prognostic biomarker discovery, and integration of digital pathology with molecular profiles.
SpatialFusion provides a useful counterpoint to the prevailing scaling-first trajectory in foundation models for spatial biology. By demonstrating that a small, carefully designed multimodal architecture can deliver biologically meaningful niche-discovery capabilities, it broadens access to multimodal spatial analysis for groups without large compute budgets and clarifies that careful modality alignment and grounding can substitute for raw parameter count in some bio-FM contexts.
Yates, J., et al. (2026) SpatialFusion: A lightweight multimodal foundation model for pathway-informed spatial niche mapping. bioRxiv.
DOI: 10.64898/2026.03.16.712056