bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein

PTM-dCN

Shanghai Jiao Tong University

A latent diffusion model with ControlNet-style conditioning for post-translational-modification-aware protein sequence design.

Released: May 2026

Post-translational modifications (PTMs) such as phosphorylation, glycosylation, and acetylation reshape protein function, localization, and stability, yet most generative protein design tools treat sequences as unmodified amino-acid strings and ignore where modifications can occur. PTM-dCN, introduced by Zhang, Huang, Chen, and Qing at Shanghai Jiao Tong University in a May 2026 bioRxiv preprint, addresses this gap by making PTM sites a first-class, controllable target of sequence generation.

The model is a latent diffusion model for PTM-aware protein sequence design. It builds on a PTM-aware protein language model whose vocabulary is extended with custom modification tokens, so that modified residues are represented explicitly rather than collapsed into their canonical amino acids. On top of this backbone, PTM-dCN adapts the ControlNet paradigm from image generation to the protein latent space: a control branch steers the diffusion process toward sequences carrying designated PTM sites while the pretrained generative weights remain fixed.

Because conditioning is applied through the ControlNet branch at inference time, users can specify desired modification patterns without retraining the underlying model for each new design task. This positions PTM-dCN within the emerging family of controllable generative protein models, but distinguishes it by targeting the post-translational layer of protein biology that earlier sequence and structure generators largely overlooked.

#Key Features

  • PTM-aware language model backbone: Built on a protein language model whose vocabulary includes custom modification tokens, allowing modified residues to be modeled explicitly rather than as plain amino acids.
  • ControlNet-style latent control: Adapts the ControlNet conditioning approach to the protein latent space, adding a control branch that guides generation toward designated PTM sites while keeping pretrained weights frozen.
  • Inference-time conditioning: PTM specifications are supplied as conditioning at generation time, so new modification targets do not require retraining the base model on a new dataset.
  • Curated PTM training data: Trained on a curated SwissProt-derived PTM dataset that links sequences to their annotated modification sites.

#Technical Details

PTM-dCN combines a latent diffusion generator with a PTM-aware protein language model as its representation backbone. The language model's token set is augmented with custom modification tokens so that PTM sites are encoded directly in the sequence representation. Conditioning follows the ControlNet design: a control network is trained to inject PTM-site information into the diffusion trajectory while the parameters of the pretrained generative model are held fixed, separating learned generative priors from task-specific control. Training data is a curated dataset derived from SwissProt PTM annotations. As a bioRxiv preprint, the reported results have not yet undergone peer review, and the authors do not currently provide a public code repository or model weights; precise parameter counts and benchmark figures should be confirmed against the preprint.

#Applications

PTM-dCN is aimed at researchers designing proteins where modification state matters, including the design of substrates or sensors with defined phosphorylation or glycosylation sites, engineering of signaling-pathway components, and generation of candidate sequences for studying how PTMs modulate function. By letting designers specify modification sites up front rather than filtering modifications after the fact, it can streamline hypothesis generation in protein engineering and synthetic biology workflows that depend on post-translational control.

#Impact

PTM-dCN illustrates how controllable-generation techniques from other domains, such as ControlNet from image synthesis, can be transferred to protein design to address a biologically important but underexplored axis: post-translational modification. As an early entry in PTM-aware generative design, its broader influence will depend on independent validation and on the release of code and weights, which are not yet available. The bioRxiv preprint carries no declared reuse license (cc_no), so reuse terms are unspecified and the work should be treated as a preliminary research result pending peer review and further benchmarking.

Citation

PTM-dCN: Latent Space Control for Post-Translational Modification–Aware Protein Design

Zhang, S., et al. (2026) PTM-dCN: Latent Space Control for Post-Translational Modification–Aware Protein Design. bioRxiv.

DOI: 10.64898/2026.05.06.714367

Openness

Unclassified
Restrictive license on core components

Tags

controlnetde_novo_designdiffusiongenerativelanguage_modelpost_translational_modificationprotein_designproteomics

Resources

Research Paper