A pretrained language model for 3D molecule generation that unifies de novo and fragment-based drug design within a single multi-task framework.
Structure-based drug design increasingly relies on generative models that can propose three-dimensional molecules positioned inside a target protein pocket. In practice, medicinal chemistry projects span two distinct modes of generation: de novo design, where a molecule is built from scratch within a binding site, and fragment-based design, where a known fragment, hit, or scaffold is grown, linked, or elaborated under explicit substructure constraints. Most generative methods specialize in one mode, forcing teams to maintain separate models and limiting how naturally prior chemical knowledge can be injected into a campaign.
UniLingo3DMol, developed by Beijing StoneWise Technology and released as a bioRxiv preprint in November 2025, addresses this fragmentation with a single pretrained language model for 3D molecule generation. By treating molecular generation as a sequence-modeling problem over a fragment-permutation-capable representation, the model unifies de novo and fragment-based design within one multi-task training framework. The same model can generate a complete ligand for an empty pocket or extend, link, and complete user-supplied fragments without switching architectures or retraining.
The work is presented not only as a methodological contribution but as a prospectively validated drug discovery tool: the authors report applying the model across more than 100 protein targets and using it to discover potent CBL-B inhibitors that showed anti-tumor efficacy in vivo, an immuno-oncology target of substantial therapeutic interest.
UniLingo3DMol is a pretrained, transformer-based language model that generates three-dimensional molecules conditioned on a protein binding pocket. Rather than representing a molecule as a fixed atom or token order, the authors adopt a fragment-permutation-capable representation in which a molecule is decomposed into fragments that can be permuted and reassembled; this is what enables a single sequence model to support both unconstrained de novo generation and fragment-conditioned tasks such as growing, linking, or completing partial structures. Training combines these objectives in a multi-task setup, so the model learns a shared notion of pocket-aware 3D chemistry that applies across generation modes. The authors evaluate the approach across more than 100 protein targets and report its use in a live medicinal chemistry campaign. Full architectural hyperparameters, parameter counts, and training-set composition are described in the preprint, which has not yet undergone peer review.
UniLingo3DMol is aimed at structure-based drug discovery, where computational chemists and AI-driven discovery teams need to generate candidate ligands that fit a known protein pocket. Its unified design is particularly useful in realistic medicinal chemistry workflows: a project can start with pure de novo ideation against a new target, then switch to fragment growing or scaffold elaboration as hits emerge, all from one model. The reported CBL-B inhibitor campaign illustrates the intended use case, generating novel, synthesizable candidates against a therapeutically relevant immuno-oncology target and advancing them to in vivo efficacy testing.
By consolidating de novo and fragment-based 3D generation into one language-model framework, UniLingo3DMol contributes to a broader trend of treating molecular design as a unified sequence-modeling problem and reducing the number of bespoke models a discovery team must maintain. Its most notable claim is prospective: a reported real-world hit-discovery success against CBL-B with in vivo validation, which, if borne out in peer review, would strengthen the case for language-model-based generative chemistry in practical pipelines. As of this writing the model is described only in a bioRxiv preprint (CC-BY-NC-ND); no public code or trained weights have been located, the weights appear to be proprietary, and no standalone model card or data card has been published beyond the preprint itself, which limits independent reproduction and openness evaluation.
Wang, H., et al. (2025) A unified language model bridging de novo and fragment-based 3D molecule design delivers potent CBL-B inhibitors for cancer treatment. bioRxiv.
DOI: 10.1101/2025.11.13.688260Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data