bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingProtein

Vermeer

Microsoft Research / Broad Institute / Harvard University

Channel-adaptive autoregressive generative model that synthesizes in-silico fluorescence microscopy of protein subcellular localization from amino-acid sequence and cellular landmark stains.

Released: June 2026

Fluorescence microscopy reveals where proteins localize inside cells, but imaging every human protein across every relevant cell line, condition, and morphological context is experimentally infeasible. Vermeer addresses this gap by generating microscopy images in silico: given a protein's amino-acid sequence and a set of landmark stains describing a cell's morphology, it predicts what the protein's fluorescence channel would look like in that specific cellular context. This reframes subcellular localization as a conditional image-generation problem rather than a discrete classification task, capturing the continuous, cell-to-cell variation that real imaging exhibits.

Developed by researchers at Microsoft Research, the Broad Institute, and Harvard University—including Sandeep Kambhampati, Eric Zimmermann, Emre B. Hayir, Kevin K. Yang, Fei Chen, and Alex X. Lu—Vermeer was released as a preprint in June 2026. It is a channel-adaptive autoregressive generative foundation model trained on the Human Protein Atlas (HPA), the largest public resource of immunofluorescence images annotated with subcellular localization.

What distinguishes Vermeer is its ability to generalize beyond its training distribution. By conditioning on protein sequence, it transfers zero-shot to proteins never seen during training; by conditioning on morphology landmarks, it transfers to unseen cell lines; and by modeling image channels autoregressively, it adapts to imaging setups whose channel subsets and orderings differ from those used in training.

#Key Features

  • Sequence-conditioned generation: Vermeer ties protein localization to amino-acid sequence, allowing it to predict plausible localization images for proteins absent from its training set.
  • Morphology-aware conditioning: Generations are conditioned on landmark reference stains (e.g., nucleus, microtubules, endoplasmic reticulum) so predicted localization respects the actual morphology of each target cell.
  • Channel-adaptive autoregression: By generating channels sequentially, the model supports varying channel subsets and orderings, enabling flexible use with imaging configurations unlike those in training.
  • Zero-shot transfer: The model extends to unseen proteins, unseen cell lines, and different imaging-channel configurations without retraining.
  • Improved fidelity: The authors report substantially better perceptual quality and biological fidelity than prior generative approaches to protein-localization imaging.

#Technical Details

Vermeer is an autoregressive generative model that factorizes a multi-channel microscopy image into a sequence of channels and tokens, generating the protein-of-interest channel conditioned on amino-acid sequence embeddings and on landmark stain channels that encode cellular morphology. It is trained on the Human Protein Atlas, whose immunofluorescence images pair a protein-of-interest channel with fixed reference markers—DAPI for the nucleus, β-tubulin for microtubules, and calreticulin for the endoplasmic reticulum. Because channels are modeled autoregressively rather than jointly at a fixed configuration, the model can condition on any available subset of landmark channels in arbitrary order, which is what underpins its zero-shot transfer to new imaging protocols. The authors evaluate generations on perceptual-quality and biological-fidelity metrics, reporting improvements over previous generative baselines for protein-localization prediction.

#Applications

Vermeer is aimed at cell biologists, microscopists, and computational researchers studying protein subcellular localization at scale. Because it can synthesize localization images for proteins and cell lines that have not been imaged, it can serve as an in-silico screening tool—prioritizing proteins or conditions for wet-lab imaging, hypothesizing localization for understudied proteins, and augmenting datasets for downstream localization classifiers. Its channel-adaptive design also makes it useful for harmonizing or imputing data across microscopy datasets collected under heterogeneous channel configurations.

#Impact

Vermeer extends generative foundation models into the domain of subcellular imaging, joining a growing body of work—such as proteome-aware vision models like SubCell—that learns single-cell biology directly from microscopy. By coupling protein sequence with cellular morphology in a single generative framework and demonstrating zero-shot transfer across proteins, cell lines, and imaging setups, it offers a route toward scalable, hypothesis-generating in-silico microscopy. As an early preprint, its real-world adoption is still emerging: pretrained weights (the vermeer_XL_CA checkpoint) are already available on HuggingFace, though without a declared license or model card, while the GitHub code repository (github.com/microsoft/vermeer) is not yet public. Independent benchmarking and experimental validation of generated localizations will be important to assess how faithfully its predictions reflect ground-truth biology.

Citation

Vermeer: Autoregressive generative modeling of microscopy predicts protein localization

Kambhampati, S., et al. (2026) Vermeer: Autoregressive generative modeling of microscopy predicts protein localization. bioRxiv.

DOI: 10.64898/2026.06.01.729395

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References28

GitHub

Stars1
Forks0
Open Issues0
Contributors3
Last Push6d ago
LanguageJupyter Notebook
LicenseMIT

HuggingFace

Downloads0
Likes0
Last Modified15d ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
17Closed
Usability — can I run it?14
Reproducibility — can I retrain it?14
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

autoregressivecell_biologyfluorescence_microscopyfoundation_modelgenerativeimage_synthesismultimodalsubcellular_localizationtransformerzero_shot

Resources

GitHub RepositoryResearch PaperHuggingFace Model