bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Spatial omics foundation models
Spatial omicsImagingPathology

MAD: Microenvironment-Aware Distillation

MIT / Georgia Institute of Technology

A self-supervised, cell-centric pretraining strategy that distills morphology and microenvironment views of each cell into a unified embedding for virtual spatial omics from microscopy.

Released: March 2026

MAD (Microenvironment-Aware Distillation) is a self-supervised pretraining strategy for learning cell-centric representations directly from microscopy images, with the goal of enabling "virtual" spatial omics — reading molecular state from images at single-cell resolution and tissue scale without the cost and throughput limits of experimental omics assays. It addresses an open question in microscopy representation learning: how to encode a single cell's identity together with the tissue environment it sits in, and how much biological information such embeddings can actually capture.

The central idea is dual-view self-distillation. For each indexed cell, MAD forms two complementary views — a morphology view focused on the cell itself and a microenvironment view capturing its surrounding tissue context — and jointly distills them into a single unified embedding space. This explicitly ties a cell's appearance to its spatial neighborhood, rather than treating each cell crop in isolation, so the learned features reflect both intrinsic morphology and the surrounding cellular community.

MAD was developed by Jiashu Han, Kunzan Liu, Yeojin Kim, Saurabh Sinha, and Sixian You, from the Computational Biophotonics Laboratory in the MIT Research Laboratory of Electronics and Department of Electrical Engineering and Computer Science, with collaboration from the Georgia Institute of Technology. It was posted to arXiv in March 2026. It joins a growing class of methods — alongside histology-to-expression models such as BRIDGE — that treat paired image and molecular data as views of the same tissue, but distinguishes itself by being cell-centric and microenvironment-aware rather than patch- or slide-level.

#Key Features

  • Dual-view design: Each cell is represented by a morphology view and a microenvironment view, capturing both intrinsic appearance and surrounding tissue context in one model.
  • Joint self-distillation: The two views are distilled into a single unified embedding space, encouraging consistency between a cell's morphology and its spatial neighborhood without requiring labels.
  • Cell-centric, self-supervised pretraining: Learns transferable representations from unlabeled microscopy at single-cell resolution, scaling to large image collections with minimal annotation.
  • Modality- and tissue-agnostic: Evaluated across diverse tissues and imaging modalities, supporting cell subtyping, transcriptomic prediction, and downstream bioinformatic inference.
  • Parameter-efficient: Reported to outperform foundation models of similar parameter count that were trained on substantially larger datasets, suggesting the dual-view objective improves data efficiency.

#Technical Details

MAD is a self-supervised pretraining strategy built on self-distillation, in which two augmented or contextual views of the same input are encouraged to produce consistent representations. Here the views are biologically motivated: a morphology view centered on the indexed cell and a microenvironment view of the surrounding tissue, jointly distilled into one embedding space. Pretraining proceeds without labels, and the resulting embeddings are evaluated on downstream tasks including cell subtyping, transcriptomic (gene-expression) prediction, and bioinformatic inference, across multiple tissues and imaging modalities. The authors report state-of-the-art performance on these tasks and note that MAD surpasses comparably sized foundation models trained on substantially larger corpora. Exact parameter counts, the pretraining dataset composition, and per-benchmark scores are detailed in the preprint; these figures come from an arXiv preprint (posted 2026-03-11) and have not yet been peer reviewed.

#Applications

MAD targets researchers and computational pathologists who want molecular-level readouts and cell-state characterization from microscopy images, which are far cheaper and higher-throughput than spatial omics assays. Virtual transcriptomic prediction can supplement or pre-screen costly sequencing experiments, while cell subtyping from images supports tissue atlasing and quantitative histopathology. Because MAD learns general-purpose, cell-centric embeddings, it can serve as a backbone for diverse downstream analyses on existing microscopy datasets where paired omics data are scarce.

#Impact

MAD contributes to the rapidly developing area of image-to-omics modeling by arguing that a cell's identity is best learned together with its microenvironment, and that this dual-view objective yields more biologically informative — and more data-efficient — embeddings than scale alone. Important caveats apply: the work is a non-peer-reviewed preprint, and as of writing no public code repository or pretrained weights have been released, so independent reproduction of the reported benchmark gains is not yet possible. If released, MAD's approach could become a general tool for representation learning in microscopy and a practical route to virtual spatial omics at scale.

Tags

gene_expression_predictioncell_type_annotationrepresentation_learningvision_transformertransformerself_supervisedfoundation_modelself_distillationhistologyspatial_transcriptomics