bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Small molecule foundation models
Small moleculeProtein

OMTRA

University of Pittsburgh / Carnegie Mellon University

A pretrained multi-task flow-matching model for structure-based drug design, unifying de novo design, docking, conformer generation, and pharmacophore conditioning.

Released: December 2025

OMTRA is a multi-task generative model for structure-based drug design (SBDD) developed by the gnina group in David Koes's lab at the University of Pittsburgh, with collaborators at Carnegie Mellon University. Released as a preprint in December 2025 and slated for presentation at MLSB 2025, it tackles a long-standing fragmentation in computational drug design: tasks such as de novo ligand generation, molecular docking, conformer generation, and pharmacophore-guided design are typically handled by separate, purpose-built tools. OMTRA instead casts these as a single flow-matching problem over heterogeneous molecular graphs, allowing one pretrained model to flexibly perform many SBDD tasks—including combinations with no analogue in conventional workflows.

The model is built on a multi-modal flow-matching formulation that extends the FlowMol3 architecture, treating a protein-ligand system as a collection of distinct dependent "modalities" (ligand atom positions and types, protein atoms, pharmacophore features, and non-protein/non-designable entities such as ions and cofactors). By conditioning on different subsets of these modalities at inference time, a single set of weights addresses de novo design, docking, and conformer generation under a unified generative process.

Notably, the authors are candid that the benefits of large-scale pretraining and multi-task training were "modest and inconsistent"—a refreshingly honest finding that frames OMTRA as much as a rigorous investigation of transfer learning in molecular generative models as a deployable tool. The full code, trained weights, and curated dataset are released openly under Apache-2.0.

#Key Features

  • Unified multi-task design: A single flow-matching model handles pocket-conditioned de novo design, rigid docking, conformer generation, and pharmacophore-conditioned variants, plus task combinations not available in traditional pipelines.
  • Multi-modal heterogeneous graphs: The system represents complex objects as dependent modalities (ligand, protein, pharmacophore, ions/cofactors), with both continuous (positions) and discrete (atom types) variables handled jointly.
  • Pharmacophore conditioning: Supplying pharmacophore constraints sharply improves results—docking RMSD ≤2Å rose from 94.8% to 99.0%, and three pharmacophores reduced de novo sampling needs by roughly 56%.
  • Open release: Downloadable pretrained weights, a multi-task command-line interface, and the training dataset are all distributed under an Apache-2.0 license.

#Technical Details

OMTRA uses an SE(3)-equivariant geometric graph neural network built from geometric vector perceptron (GVP) operations, with type-specific message and node-update functions and four stacked convolution blocks (each containing two graph convolutions plus position and edge updates). Pretraining draws on the Pharmit dataset of roughly 500 million 3D molecular conformers aggregated from sources including ChEMBL34, Enamine, PubChem, and ZINC (one lowest-energy RDKit ETKDG/UFF conformer per molecule), while protein-ligand training uses PLINDER (400,000+ annotated PDB systems) and CrossDocked (22.5 million poses). On CrossDocked de novo design (100 pockets × 100 samples), OMTRA reached 89.8% PoseBusters validity versus 86.6% for Pocket2Mol. On the PoseBusters docking benchmark it achieved 92% top-1 accuracy at ≤2Å RMSD and 91% PB-validity, exceeding AlphaFold3's reported 84% PB-validity. Consistent with the authors' caveats, ligand-only pretraining and multi-task training produced small, sometimes opposing effects, and single-task models matched multi-task performance under equal compute.

#Applications

OMTRA is aimed at computational and medicinal chemists pursuing structure-based drug discovery, where a single model can generate candidate ligands inside a protein pocket, dock proposed molecules, generate 3D conformers, and incorporate pharmacophore hypotheses—all without switching tools. Its pharmacophore-conditioning capability is especially useful for hit-to-lead campaigns that must respect known key interactions, and an interactive web application lowers the barrier for experimentalists to explore generations. The released CLI and weights make it straightforward to integrate into existing screening and design pipelines.

#Impact

OMTRA extends the gnina ecosystem's open, reproducible approach to molecular modeling into the multi-task generative regime, providing a unified alternative to the patchwork of single-purpose SBDD models. Beyond its competitive benchmark numbers on de novo design and docking, its most consequential contribution may be empirical: by openly reporting that pretraining and cross-task transfer yielded only modest, inconsistent gains, it sharpens an open question for the field about whether molecular generative models can reliably benefit from scale and shared representations. The fully open weights, code, and dataset give other groups a concrete baseline to build on and interrogate.

Citation

Preprint

DOI: 10.48550/arXiv.2512.05080

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
72Open
Usability — can I run it?93
Reproducibility — can I retrain it?64
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

conformer_generationde_novo_designdrug_discoveryflow_matchingfoundation_modelgenerativegraph_neural_networkmolecular_dockingmulti_taskprotein_ligand_interactionssmall_molecules

Resources

GitHub RepositoryResearch PaperDemo