A latent diffusion model with ControlNet-style conditioning for post-translational-modification-aware protein sequence design.
Post-translational modifications (PTMs) such as phosphorylation, glycosylation, and acetylation reshape protein function, localization, and stability, yet most generative protein design tools treat sequences as unmodified amino-acid strings and ignore where modifications can occur. PTM-dCN, introduced by Zhang, Huang, Chen, and Qing at Shanghai Jiao Tong University in a May 2026 bioRxiv preprint, addresses this gap by making PTM sites a first-class, controllable target of sequence generation.
The model is a latent diffusion model for PTM-aware protein sequence design. It builds on a PTM-aware protein language model whose vocabulary is extended with custom modification tokens, so that modified residues are represented explicitly rather than collapsed into their canonical amino acids. On top of this backbone, PTM-dCN adapts the ControlNet paradigm from image generation to the protein latent space: a control branch steers the diffusion process toward sequences carrying designated PTM sites while the pretrained generative weights remain fixed.
Because conditioning is applied through the ControlNet branch at inference time, users can specify desired modification patterns without retraining the underlying model for each new design task. This positions PTM-dCN within the emerging family of controllable generative protein models, but distinguishes it by targeting the post-translational layer of protein biology that earlier sequence and structure generators largely overlooked.
PTM-dCN combines a latent diffusion generator with a PTM-aware protein language model as its representation backbone. The language model's token set is augmented with custom modification tokens so that PTM sites are encoded directly in the sequence representation. Conditioning follows the ControlNet design: a control network is trained to inject PTM-site information into the diffusion trajectory while the parameters of the pretrained generative model are held fixed, separating learned generative priors from task-specific control. Training data is a curated dataset derived from SwissProt PTM annotations. As a bioRxiv preprint, the reported results have not yet undergone peer review, and the authors do not currently provide a public code repository or model weights; precise parameter counts and benchmark figures should be confirmed against the preprint.
PTM-dCN is aimed at researchers designing proteins where modification state matters, including the design of substrates or sensors with defined phosphorylation or glycosylation sites, engineering of signaling-pathway components, and generation of candidate sequences for studying how PTMs modulate function. By letting designers specify modification sites up front rather than filtering modifications after the fact, it can streamline hypothesis generation in protein engineering and synthetic biology workflows that depend on post-translational control.
PTM-dCN illustrates how controllable-generation techniques from other domains, such
as ControlNet from image synthesis, can be transferred to protein design to address a
biologically important but underexplored axis: post-translational modification. As an
early entry in PTM-aware generative design, its broader influence will depend on
independent validation and on the release of code and weights, which are not yet
available. The bioRxiv preprint carries no declared reuse license (cc_no), so reuse
terms are unspecified and the work should be treated as a preliminary research result
pending peer review and further benchmarking.
Zhang, S., et al. (2026) PTM-dCN: Latent Space Control for Post-Translational Modification–Aware Protein Design. bioRxiv.
DOI: 10.64898/2026.05.06.714367