Graph Transformer foundation model integrating 3 million protein pockets and 5 million molecules as E(3)-equivariant graphs for joint protein-ligand geometric representation learning.
MolX is a graph-transformer foundation model that learns joint geometric representations of protein binding pockets and small molecules. Posted to bioRxiv in early March 2026 by researchers at Monash University and collaborating institutions, MolX is trained on a corpus of 3 million protein pockets and 5 million molecules represented as E(3)-equivariant graphs, enabling rotation- and translation-invariant geometric reasoning.
MolX achieves state-of-the-art results across eight downstream drug-discovery benchmarks spanning conventional structure-activity tasks (binding affinity, virtual screening), modality-specific tasks (PROTAC degradation activity, molecular glue design, antibody-drug conjugate optimization), and cross-domain transfer.
MolX uses a graph transformer with E(3)-equivariant attention layers. Inputs are heterogeneous graphs jointly representing protein pocket atoms and molecular atoms with distance-and-angle features. Pretraining objectives combine masked atom prediction with pocket-ligand contrastive matching. The bioRxiv preprint provides architectural specifications, training schedule, and full benchmark tables.
Eight downstream benchmarks include PDBbind, LIT-PCBA, PROTAC-DB, Molecular Glue Atlas, ADC-Bench, and three internal modality-specific tasks. MolX outperforms prior structure-aware foundation models including Uni-Mol, GearNet, and ESM-Gearnet across these benchmarks.
MolX is positioned as a general-purpose representation backbone for early drug discovery teams working across multiple therapeutic modalities. The PROTAC and molecular-glue capabilities are particularly valuable given the relative scarcity of foundation models for these emerging modalities. The cross-domain generalization property reduces the need for per-target retraining when applying the model to new programs.
MolX advances the state of the art in geometric foundation models for drug discovery by demonstrating that joint protein-pocket and small-molecule representation learning can deliver SOTA on a broad cross-modality benchmark sweep. The integration of multiple emerging modalities (PROTAC, molecular glue, ADC) into a single foundation model is unusual and provides a useful counterpoint to highly specialized per-modality tools.
Liu, J., et al. (2026) MolX: A Geometric Foundation Model for Protein–Ligand Modelling. bioRxiv.
DOI: 10.64898/2026.02.26.708362