A multimodal framework for text-guided protein design, enabling sequence generation, zero-shot editing, and property prediction via contrastive learning.
ProteinDT is a multimodal protein design framework developed by Shengchao Liu and collaborators from UC Berkeley, NVIDIA Research, Mila, and other institutions, published in Nature Machine Intelligence in 2025. It addresses a fundamental challenge in protein engineering: enabling researchers to specify desired protein properties in natural language and receive valid, functional protein sequences in return — without requiring deep expertise in structural biology or sequence design.
The core innovation is a contrastive alignment between two modalities — natural language descriptions and protein sequences — that allows the two to share a common representation space. Once aligned, text descriptions can be used to steer both sequence generation and editing. This stands in contrast to purely sequence-based or structure-based design tools, which require domain-specific encoding of design goals rather than natural language inputs.
The training dataset, SwissProtCLAP, consists of approximately 441,000 text-protein pairs extracted from the SwissProt subset of UniProt, grounding the model's language understanding in curated scientific annotations covering a wide range of protein functions, stability properties, and binding characteristics.
ProteinDT is built around a three-stage pipeline. First, ProteinCLAP trains a contrastive alignment between a protein sequence encoder and a text encoder, using paired SwissProtCLAP data to bring semantically related text-protein pairs close in embedding space. Second, a protein facilitator module learns to generate protein sequence representations from text embeddings alone, effectively bridging the gap between language and the protein embedding space learned by ProteinCLAP. Third, a conditional protein decoder translates these representations into full amino acid sequences.
For editing tasks, ProteinDT operates in the shared latent space rather than at the sequence level. Latent optimization iteratively adjusts a protein's representation toward a target text description, then decodes the modified representation back to a sequence. Benchmarking demonstrates best hit ratio across 12 zero-shot editing tasks evaluated under 21 distinct evaluation methods. For property prediction, the model leverages the aligned cross-modal embeddings as features, outperforming sequence-only baselines on tasks requiring understanding of stability and binding context.
ProteinDT is primarily aimed at drug discovery and protein therapeutics, where researchers need to engineer candidates with specified binding affinities, thermostability profiles, or reduced immunogenicity. By accepting natural language specifications — for example, "high binding affinity to target receptor with improved thermal stability" — the framework lowers the barrier to entry for wet-lab biologists who are not fluent in sequence-based design. It is also applicable to synthetic biology workflows for designing novel enzymes with custom catalytic properties, and to functional annotation tasks where predicted property scores supplement experimental characterization.
ProteinDT was among the first frameworks to demonstrate that natural language can serve as a practical interface for protein sequence design, influencing subsequent work on multimodal biological foundation models. Published in Nature Machine Intelligence, the work established contrastive text-protein alignment as a viable pretraining strategy and opened a research direction distinct from structure-conditioned design methods. A key limitation is that ProteinDT operates at the sequence level and does not predict or optimize three-dimensional structure directly; users requiring structural validation of generated sequences must pair it with a structure prediction tool such as ESMFold or AlphaFold 2. The reliance on SwissProt annotations also means that protein functions poorly covered by curated databases may be underrepresented in the model's generalization capacity.
Liu, S., Li, Y., Li, Z., Gitter, A., Zhu, Y., Lu, J., Xu, Z., Nie, W., Ramanathan, A., Xiao, C., Tang, J., Guo, H., & Anandkumar, A. (2025). A text-guided protein design framework. Nature Machine Intelligence.
DOI: 10.1038/s42256-025-01011-z