A synthesis-constrained generative model that designs synthesizable PROTAC degraders by sampling reaction templates and building blocks, tuned with reinforcement learning.
Proteolysis-targeting chimeras (PROTACs) are bifunctional molecules that recruit an E3 ubiquitin ligase to a protein of interest, tagging the target for degradation by the cell's own machinery. Each PROTAC combines a warhead that binds the target protein, an E3 ligand that engages the ligase, and a linker that joins the two. This modular but unusually large and flexible architecture makes PROTACs an attractive therapeutic modality, but it also makes them difficult to design: deep generative models can propose novel structures rapidly, yet the molecules they invent are frequently impractical or impossible to synthesize.
SynPROTAC, developed by researchers at the State Key Laboratory of Anti-Infective Drug Discovery and Development at Sun Yat-sen University in Guangzhou and posted to bioRxiv in December 2025, addresses this synthesizability gap directly. Rather than generating molecular graphs atom by atom and hoping a synthetic route exists, the model assembles PROTACs from a curated library of chemical reaction templates and purchasable building blocks. Every molecule it produces therefore comes with a feasible synthetic route by construction.
By coupling this synthesis-constrained generator with reinforcement learning, SynPROTAC steers generation toward novel degraders that not only can be made but also exhibit reasonable physicochemical and binding-related properties, positioning it within the growing family of synthesizability-aware generative models adapted specifically to the demands of targeted protein degradation.
SynPROTAC pairs a Graphormer encoder with a transformer-based decoder. The encoder embeds a supplied warhead or E3 ligand, and the decoder then autoregressively samples chemical reaction templates and building blocks to assemble a full PROTAC, yielding a synthetic route alongside the final structure. The generator was pretrained on roughly 20 million chemical reaction trees sampled from a library of 91 reaction templates and 483 filtered building blocks. Following pretraining, the model is fine-tuned with reinforcement learning to reward generation of PROTACs with desirable binding properties. The authors report that across their evaluations SynPROTAC produces novel PROTACs with high synthetic feasibility and reasonable two-dimensional and three-dimensional physicochemical and binding-related properties.
SynPROTAC is aimed at medicinal chemists and computational drug-discovery teams pursuing targeted protein degradation. Given a known warhead or E3 ligand, the model can propose novel, synthesizable PROTAC candidates together with routes for making them, which can shorten the cycle between in-silico design and wet-lab synthesis. Because synthesizability is enforced during generation rather than checked afterward, the tool is well suited to early-stage hit generation and linker exploration where the cost of proposing unmakeable molecules is high.
PROTACs are a fast-growing therapeutic modality, and the disconnect between what generative models propose and what chemists can actually synthesize has been a persistent bottleneck. SynPROTAC contributes to closing that gap by extending synthesis-constrained generation, an approach that has gained traction in general small-molecule design, into the specialized and structurally demanding domain of bifunctional degraders. As a recent preprint its long-term adoption remains to be seen, and its outputs reflect computational property estimates that require experimental validation, but it offers a practical template for embedding synthetic accessibility into degrader design workflows. As of the preprint, no public code repository for SynPROTAC could be located, and although trained weights are reported to be shared via figshare, that release could not be independently verified, so reproducibility currently depends on artifacts that are not yet confirmed to be openly available.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data