University of Pittsburgh / Carnegie Mellon University
A pretrained multi-task flow-matching model for structure-based drug design, unifying de novo design, docking, conformer generation, and pharmacophore conditioning.
OMTRA is a multi-task generative model for structure-based drug design (SBDD) developed by the gnina group in David Koes's lab at the University of Pittsburgh, with collaborators at Carnegie Mellon University. Released as a preprint in December 2025 and slated for presentation at MLSB 2025, it tackles a long-standing fragmentation in computational drug design: tasks such as de novo ligand generation, molecular docking, conformer generation, and pharmacophore-guided design are typically handled by separate, purpose-built tools. OMTRA instead casts these as a single flow-matching problem over heterogeneous molecular graphs, allowing one pretrained model to flexibly perform many SBDD tasks—including combinations with no analogue in conventional workflows.
The model is built on a multi-modal flow-matching formulation that extends the FlowMol3 architecture, treating a protein-ligand system as a collection of distinct dependent "modalities" (ligand atom positions and types, protein atoms, pharmacophore features, and non-protein/non-designable entities such as ions and cofactors). By conditioning on different subsets of these modalities at inference time, a single set of weights addresses de novo design, docking, and conformer generation under a unified generative process.
Notably, the authors are candid that the benefits of large-scale pretraining and multi-task training were "modest and inconsistent"—a refreshingly honest finding that frames OMTRA as much as a rigorous investigation of transfer learning in molecular generative models as a deployable tool. The full code, trained weights, and curated dataset are released openly under Apache-2.0.
OMTRA uses an SE(3)-equivariant geometric graph neural network built from geometric vector perceptron (GVP) operations, with type-specific message and node-update functions and four stacked convolution blocks (each containing two graph convolutions plus position and edge updates). Pretraining draws on the Pharmit dataset of roughly 500 million 3D molecular conformers aggregated from sources including ChEMBL34, Enamine, PubChem, and ZINC (one lowest-energy RDKit ETKDG/UFF conformer per molecule), while protein-ligand training uses PLINDER (400,000+ annotated PDB systems) and CrossDocked (22.5 million poses). On CrossDocked de novo design (100 pockets × 100 samples), OMTRA reached 89.8% PoseBusters validity versus 86.6% for Pocket2Mol. On the PoseBusters docking benchmark it achieved 92% top-1 accuracy at ≤2Å RMSD and 91% PB-validity, exceeding AlphaFold3's reported 84% PB-validity. Consistent with the authors' caveats, ligand-only pretraining and multi-task training produced small, sometimes opposing effects, and single-task models matched multi-task performance under equal compute.
OMTRA is aimed at computational and medicinal chemists pursuing structure-based drug discovery, where a single model can generate candidate ligands inside a protein pocket, dock proposed molecules, generate 3D conformers, and incorporate pharmacophore hypotheses—all without switching tools. Its pharmacophore-conditioning capability is especially useful for hit-to-lead campaigns that must respect known key interactions, and an interactive web application lowers the barrier for experimentalists to explore generations. The released CLI and weights make it straightforward to integrate into existing screening and design pipelines.
OMTRA extends the gnina ecosystem's open, reproducible approach to molecular modeling into the multi-task generative regime, providing a unified alternative to the patchwork of single-purpose SBDD models. Beyond its competitive benchmark numbers on de novo design and docking, its most consequential contribution may be empirical: by openly reporting that pretraining and cross-task transfer yielded only modest, inconsistent gains, it sharpens an open question for the field about whether molecular generative models can reliably benefit from scale and shared representations. The fully open weights, code, and dataset give other groups a concrete baseline to build on and interrogate.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data