Yale University / Pennsylvania State University / Helmholtz Munich
A CLIP-style multimodal framework that aligns transcriptomic perturbation signatures with text and cell-painting image embeddings for retrieval, drug-gene inference, and perturbation prediction.
PertOmni is a multimodal representation learning framework for single-cell perturbation screens, designed to characterize how genetic and chemical interventions reshape cellular state. Multimodal perturbation screens have made it possible to measure the effects of thousands of gene knockdowns and compound treatments at single-cell resolution, but most representation learning methods are tailored to a single perturbation modality and do not incorporate external semantic knowledge. This narrow framing limits their ability to generalize across datasets and across genetic versus chemical perturbation types. PertOmni addresses this gap by learning a shared embedding space in which perturbation signatures, natural-language descriptions, and microscopy images can be compared directly.
The method follows a CLIP-style contrastive recipe, an approach the authors term Contrastive Alignment of Multimodal Biological Embeddings (CAMBE). It aligns transcriptomic perturbation signatures with text-derived embeddings of curated gene and compound descriptions, as well as image-derived embeddings from cell painting assays. By grounding transcriptomic responses in semantic and morphological context, PertOmni connects the readout of a perturbation to prior biological knowledge about the gene or molecule that produced it.
PertOmni was developed by researchers at Yale University, the Pennsylvania State University, and Helmholtz Munich, and posted as a bioRxiv preprint in June 2026. The work is currently a preprint awaiting peer review.
PertOmni jointly trains a shared transcriptomic encoder together with dataset-specific text encoders, using a masked contrastive objective in the CLIP family. The masking and contrastive design are structured to emphasize discrimination within a given cell type, which reduces the tendency of models to separate cells by baseline cell-type identity rather than by perturbation effect. Text embeddings are derived from curated gene and compound descriptions, and image embeddings are derived from cell painting assays, so that each transcriptomic signature is anchored to complementary semantic and morphological representations. The trained joint embedding space is evaluated on three downstream tasks — bidirectional retrieval, drug-gene interaction inference, and perturbation prediction — across both small-molecule and CRISPRi perturbation datasets, where it shows consistent improvements over strong baseline methods.
PertOmni targets computational biologists and pharmacology researchers working with large-scale perturbation screens. Because all downstream tasks operate on the fixed trained embedding space, the model can be applied to new analyses without re-training: retrieving the most likely gene or compound description for an observed transcriptomic response, inferring drug-gene interactions to support target identification and mechanism-of-action studies, and predicting the effect of unseen perturbations. Grounding perturbation signatures in curated text and imaging makes the embeddings useful for connecting high-throughput screen readouts to prior biological knowledge during early-stage drug discovery and functional genomics.
PertOmni contributes to a growing effort to build general representations of cellular perturbation that bridge multiple data modalities, extending CLIP-style contrastive alignment from vision-language settings into single-cell biology. Its emphasis on within-cell-type discrimination directly addresses a recurring confound in perturbation modeling, where cell-type heterogeneity can dominate the learned representation. As a preprint, its results have not yet been peer-reviewed, and no public code or model weights have been released. The reported gains over strong baselines across retrieval, drug-gene inference, and perturbation prediction tasks position it as a step toward unified, knowledge- grounded models of genetic and chemical perturbation effects.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data