MIT / Georgia Institute of Technology
A self-supervised, cell-centric pretraining strategy that distills morphology and microenvironment views of each cell into a unified embedding for virtual spatial omics from microscopy.
MAD (Microenvironment-Aware Distillation) is a self-supervised pretraining strategy for learning cell-centric representations directly from microscopy images, with the goal of enabling "virtual" spatial omics — reading molecular state from images at single-cell resolution and tissue scale without the cost and throughput limits of experimental omics assays. It addresses an open question in microscopy representation learning: how to encode a single cell's identity together with the tissue environment it sits in, and how much biological information such embeddings can actually capture.
The central idea is dual-view self-distillation. For each indexed cell, MAD forms two complementary views — a morphology view focused on the cell itself and a microenvironment view capturing its surrounding tissue context — and jointly distills them into a single unified embedding space. This explicitly ties a cell's appearance to its spatial neighborhood, rather than treating each cell crop in isolation, so the learned features reflect both intrinsic morphology and the surrounding cellular community.
MAD was developed by Jiashu Han, Kunzan Liu, Yeojin Kim, Saurabh Sinha, and Sixian You, from the Computational Biophotonics Laboratory in the MIT Research Laboratory of Electronics and Department of Electrical Engineering and Computer Science, with collaboration from the Georgia Institute of Technology. It was posted to arXiv in March 2026. It joins a growing class of methods — alongside histology-to-expression models such as BRIDGE — that treat paired image and molecular data as views of the same tissue, but distinguishes itself by being cell-centric and microenvironment-aware rather than patch- or slide-level.
MAD is a self-supervised pretraining strategy built on self-distillation, in which two augmented or contextual views of the same input are encouraged to produce consistent representations. Here the views are biologically motivated: a morphology view centered on the indexed cell and a microenvironment view of the surrounding tissue, jointly distilled into one embedding space. Pretraining proceeds without labels, and the resulting embeddings are evaluated on downstream tasks including cell subtyping, transcriptomic (gene-expression) prediction, and bioinformatic inference, across multiple tissues and imaging modalities. The authors report state-of-the-art performance on these tasks and note that MAD surpasses comparably sized foundation models trained on substantially larger corpora. Exact parameter counts, the pretraining dataset composition, and per-benchmark scores are detailed in the preprint; these figures come from an arXiv preprint (posted 2026-03-11) and have not yet been peer reviewed.
MAD targets researchers and computational pathologists who want molecular-level readouts and cell-state characterization from microscopy images, which are far cheaper and higher-throughput than spatial omics assays. Virtual transcriptomic prediction can supplement or pre-screen costly sequencing experiments, while cell subtyping from images supports tissue atlasing and quantitative histopathology. Because MAD learns general-purpose, cell-centric embeddings, it can serve as a backbone for diverse downstream analyses on existing microscopy datasets where paired omics data are scarce.
MAD contributes to the rapidly developing area of image-to-omics modeling by arguing that a cell's identity is best learned together with its microenvironment, and that this dual-view objective yields more biologically informative — and more data-efficient — embeddings than scale alone. Important caveats apply: the work is a non-peer-reviewed preprint, and as of writing no public code repository or pretrained weights have been released, so independent reproduction of the reported benchmark gains is not yet possible. If released, MAD's approach could become a general tool for representation learning in microscopy and a practical route to virtual spatial omics at scale.