A multimodal generative model that designs molecules from transcriptomic and morphological perturbation phenotypes using rectified flow transformers.
A central goal of phenotypic drug discovery is to invert the usual screening logic: instead of asking what a given molecule does to cells, ask what molecule would produce a desired cellular phenotype. High-content perturbation screens now routinely capture two complementary readouts of how a compound reshapes a cell — transcriptomic profiles (which genes change) and morphological profiles (how cells look under imaging, e.g. Cell Painting). Designing molecules conditioned jointly on both readouts is a natural but underexplored generative problem.
Pert2Mol, introduced by Verma and colleagues at Purdue University in a February 2026 bioRxiv preprint, is a multimodal generative framework that maps perturbation phenotypes to molecular structures. It learns from paired control-treatment experiments, fusing transcriptomic and morphological signals with a bidirectional cross-attention mechanism and generating candidate molecules with a rectified flow transformer. The design lets the model condition on either or both phenotypic modalities and emit chemically valid structures that are intended to reproduce the queried phenotype.
By treating phenotype-to-molecule generation as a conditional flow-based problem, Pert2Mol aims to make phenotype-anchored molecular design both fast and reliable, positioning it as a tool for hypothesis generation in target-agnostic discovery campaigns.
Pert2Mol couples a bidirectional cross-attention module, which fuses transcriptomic and morphological perturbation profiles, with a rectified flow transformer that generates molecular structures conditioned on the fused phenotype. Training uses paired control-treatment experiments so the model learns the differential phenotype induced by a perturbation. The authors report a Frechet ChemNet Distance of 4.996, perfect molecular validity, and 84.7% scaffold diversity, alongside roughly 12.4x faster generation than a diffusion baseline and support for deterministic sampling to aid validation. Code is available on GitHub (wangmengbo/Pert2Mol); the preprint indicates trained weights are forthcoming rather than already released, so reported numbers should be read as preprint-stage results pending peer review.
Pert2Mol targets computational and phenotypic drug-discovery teams who possess perturbation screening data — transcriptomic profiles, Cell Painting-style morphological features, or both — and want to propose small molecules predicted to elicit a target phenotype. This supports target-agnostic discovery, where the desired biological effect is known but a clean molecular target is not, as well as exploration around hit compounds by querying the model with the phenotype of a desirable perturbation. The ability to condition on either modality alone makes it usable in settings where only one assay type is available.
Pert2Mol extends generative molecular design into the multimodal phenotype space, joining a small but growing set of models that close the loop between high-content perturbation screens and de novo chemistry. Its use of rectified flow for fast, deterministic generation and its joint handling of transcriptomic and morphological signals are notable methodological choices. As a February 2026 preprint with code available but weights still promised, its reported metrics await peer review and independent benchmarking, and prospective experimental validation of generated molecules will ultimately determine how well phenotype-conditioned generation translates to real biological activity.