Vermeer

Microsoft Research / Broad Institute / Harvard University

Generative microscopy foundation model that synthesizes in-silico fluorescence images of protein subcellular localization from amino-acid sequence.

Released: June 2026

Fluorescence microscopy reveals where proteins localize inside cells, but imaging every human protein across every relevant cell line, condition, and morphological context is experimentally infeasible. Vermeer addresses this gap by generating microscopy images in silico: given a protein's amino-acid sequence and a set of landmark stains describing a cell's morphology, it predicts what the protein's fluorescence channel would look like in that specific cellular context. This reframes subcellular localization as a conditional image-generation problem rather than a discrete classification task, capturing the continuous, cell-to-cell variation that real imaging exhibits.

Developed by researchers at Microsoft Research, the Broad Institute, and Harvard University—including Sandeep Kambhampati, Eric Zimmermann, Emre B. Hayir, Kevin K. Yang, Fei Chen, and Alex X. Lu—Vermeer was released as a preprint in June 2026. It is a channel-adaptive autoregressive generative foundation model trained on the Human Protein Atlas (HPA), the largest public resource of immunofluorescence images annotated with subcellular localization.

What distinguishes Vermeer is its ability to generalize beyond its training distribution. By conditioning on protein sequence, it transfers zero-shot to proteins never seen during training; by conditioning on morphology landmarks, it transfers to unseen cell lines; and by modeling image channels autoregressively, it adapts to imaging setups whose channel subsets and orderings differ from those used in training.

Key Features

Sequence-conditioned generation: Vermeer ties protein localization to amino-acid sequence, allowing it to predict plausible localization images for proteins absent from its training set.
Morphology-aware conditioning: Generations are conditioned on landmark reference stains (e.g., nucleus, microtubules, endoplasmic reticulum) so predicted localization respects the actual morphology of each target cell.
Channel-adaptive autoregression: By generating channels sequentially, the model supports varying channel subsets and orderings, enabling flexible use with imaging configurations unlike those in training.
Zero-shot transfer: The model extends to unseen proteins, unseen cell lines, and different imaging-channel configurations without retraining.
Improved fidelity: The authors report substantially better perceptual quality and biological fidelity than prior generative approaches to protein-localization imaging.

Technical Details

Vermeer is an autoregressive generative model that factorizes a multi-channel microscopy image into a sequence of channels and tokens, generating the protein-of-interest channel conditioned on amino-acid sequence embeddings and on landmark stain channels that encode cellular morphology. It is trained on the Human Protein Atlas, whose immunofluorescence images pair a protein-of-interest channel with fixed reference markers—DAPI for the nucleus, β-tubulin for microtubules, and calreticulin for the endoplasmic reticulum. Because channels are modeled autoregressively rather than jointly at a fixed configuration, the model can condition on any available subset of landmark channels in arbitrary order, which is what underpins its zero-shot transfer to new imaging protocols. The authors evaluate generations on perceptual-quality and biological-fidelity metrics, reporting improvements over previous generative baselines for protein-localization prediction.

Applications

Vermeer is aimed at cell biologists, microscopists, and computational researchers studying protein subcellular localization at scale. Because it can synthesize localization images for proteins and cell lines that have not been imaged, it can serve as an in-silico screening tool—prioritizing proteins or conditions for wet-lab imaging, hypothesizing localization for understudied proteins, and augmenting datasets for downstream localization classifiers. Its channel-adaptive design also makes it useful for harmonizing or imputing data across microscopy datasets collected under heterogeneous channel configurations.

Impact

Vermeer extends generative foundation models into the domain of subcellular imaging, joining a growing body of work—such as proteome-aware vision models like SubCell—that learns single-cell biology directly from microscopy. By coupling protein sequence with cellular morphology in a single generative framework and demonstrating zero-shot transfer across proteins, cell lines, and imaging setups, it offers a route toward scalable, hypothesis-generating in-silico microscopy. As an early preprint, its real-world adoption is still emerging: pretrained weights (the vermeer_XL_CA checkpoint) are already available on HuggingFace, though without a declared license or model card, while the GitHub code repository (github.com/microsoft/vermeer) is not yet public. Independent benchmarking and experimental validation of generated localizations will be important to assess how faithfully its predictions reflect ground-truth biology.

Citation

Vermeer: Autoregressive generative modeling of microscopy predicts protein localization

Kambhampati, S., et al. (2026) Vermeer: Autoregressive generative modeling of microscopy predicts protein localization. bioRxiv.

DOI: 10.64898/2026.06.01.729395

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References28

GitHub

Stars3

Forks2

Open Issues0

Contributors4

Last Push3d ago

LanguageJupyter Notebook

LicenseMIT

HuggingFace

Downloads0

Likes1

Last Modified2mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

17Closed

Usability — can I run it?14

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Sequence-conditioned generation: Vermeer ties protein localization to amino-acid sequence, allowing it to predict plausible localization images for proteins absent from its training set.

Morphology-aware conditioning: Generations are conditioned on landmark reference stains (e.g., nucleus, microtubules, endoplasmic reticulum) so predicted localization respects the actual morphology of each target cell.

Channel-adaptive autoregression: By generating channels sequentially, the model supports varying channel subsets and orderings, enabling flexible use with imaging configurations unlike those in training.

Zero-shot transfer: The model extends to unseen proteins, unseen cell lines, and different imaging-channel configurations without retraining.

Improved fidelity: The authors report substantially better perceptual quality and biological fidelity than prior generative approaches to protein-localization imaging.

Technical Details

Applications

Impact

Vermeer

Key Features

Technical Details

Applications

Impact

Citation

Vermeer: Autoregressive generative modeling of microscopy predicts protein localization

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Vermeer

Key Features

Technical Details

Applications

Impact

Citation

Vermeer: Autoregressive generative modeling of microscopy predicts protein localization

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Vermeer

#Key Features

#Technical Details

#Applications

#Impact

Citation

Vermeer: Autoregressive generative modeling of microscopy predicts protein localization

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Vermeer

#Key Features

#Technical Details

#Applications

#Impact

Citation

Vermeer: Autoregressive generative modeling of microscopy predicts protein localization

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact