Diffusion model for structure-based 3D ligand design that explicitly models pocket flexibility, jointly generating ligands and holo pocket conformations from apo protein structures.
Apo2Mol is a diffusion-based generative model for structure-based drug design that designs 3D ligand molecules directly inside a protein binding pocket while explicitly accounting for the pocket's flexibility. Most pocket-conditioned generative models assume a rigid receptor, conditioning on a single fixed conformation of the binding site. In practice, proteins reorganize their side chains and backbone upon ligand binding (the transition from an unbound "apo" state to a ligand-bound "holo" state), so a generator trained on rigid holo pockets implicitly assumes the very induced-fit rearrangement it should be predicting. This mismatch limits the realism of designs when only an apo structure, such as one from an unliganded crystal or a predicted model, is available.
Apo2Mol reframes the task as a joint generation problem: starting from an apo pocket, it simultaneously generates a candidate ligand and the corresponding holo-like pocket conformation, so the receptor and the molecule are co-designed rather than the ligand being fit into a static cavity. It was developed by the Li Lab (AIDD-LiLab) at the University of Florida by Xinzhe Zheng, Shiyu Jiang, Gustavo Seabra, Chenglong Li, and Yanjun Li, posted to arXiv in November 2025 and accepted to AAAI 2026.
By learning how pockets actually deform during binding, Apo2Mol addresses a gap left by rigid-pocket methods and positions itself alongside the broader family of diffusion-based 3D molecule generators while focusing specifically on receptor flexibility.
Apo2Mol is a full-atom, hierarchical graph-based diffusion model. It learns to
denoise atomic coordinates and types for both the ligand and the binding-pocket
residues, producing a 3D molecule together with a deformed, holo-like pocket
conditioned on the input apo pocket. To learn realistic pocket dynamics, the
authors curated a dataset of over 24,000 experimentally resolved apo-holo
structure pairs (24,601 paired structures in the released version) drawn from
the Protein Data Bank and organized via PLINDER interaction identifiers, with
pockets defined as residues within 10 Angstroms of any ligand atom. The
implementation is built on PyTorch Lightning with Hydra configuration and
Weights & Biases logging. Code is released under an MIT license, and a
pretrained checkpoint (apo2mol_checkpoint.ckpt) ships in the repository for
sampling and evaluation; the curated dataset is available as a Hugging Face
dataset with a substantive data card.
Apo2Mol targets structure-based drug discovery teams who need to design novel small-molecule binders when only an apo or computationally predicted receptor structure is available, a frequent situation for newer or less-characterized targets. By co-designing the ligand with an induced-fit pocket conformation, it aims to produce candidates that are more compatible with the receptor's actual binding-competent shape, supporting hit generation and de novo design workflows ahead of docking, scoring, and experimental validation.
Apo2Mol contributes to the active area of 3D pocket-conditioned molecule generation by foregrounding receptor flexibility, an aspect that rigid-pocket diffusion models such as earlier structure-based generators largely ignore. Its acceptance at AAAI 2026 and its release of code, a pretrained checkpoint, and a documented apo-holo dataset lower the barrier for others to study induced-fit generation. As a recent contribution, its real-world hit rates and advantages over rigid-pocket baselines will be clarified as the community benchmarks and extends the approach; the explicit apo-holo framing and the curated paired dataset are likely to be of independent value for modeling protein flexibility.
Zheng, X., et al. (2025) Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models. arXiv.org.
DOI: 10.48550/arXiv.2511.14559Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data