Rostlab
A bilingual protein language model that translates bidirectionally between amino acid sequences and the 3Di structural alphabet, enabling inverse folding and structure-aware embeddings.
ProstT5 (Protein structure-sequence T5) is a bilingual protein language model developed by Michael Heinzinger, Burkhard Rost, and colleagues at the Technical University of Munich, in collaboration with Martin Steinegger's group at Seoul National University. Released in July 2023 and published in NAR Genomics and Bioinformatics (December 2024), ProstT5 addresses a fundamental gap in protein language models: the inability to simultaneously reason over both the one-dimensional sequence and the three-dimensional structure of a protein within a single unified model.
Prior to ProstT5, protein language models such as ProtT5 and ESM-2 operated entirely on amino acid sequences, while structure prediction models like AlphaFold 2 operated downstream. ProstT5 bridges these two modalities by adopting the 3Di alphabet from Foldseek — a 20-letter token vocabulary that encodes local three-dimensional structural environments into discrete symbols analogous to amino acids. By representing protein structure as a 1D sequence of 3Di tokens, ProstT5 can be trained with the same masked language modeling machinery used for amino acid sequences, making structural knowledge directly accessible within the language model framework.
The model was built by fine-tuning ProtT5-XL-U50, a large T5 encoder-decoder pretrained on hundreds of millions of protein sequences. Fine-tuning used 17 million high-quality, non-redundant structure predictions from the AlphaFold Database, teaching the model to translate between the amino acid and 3Di alphabets in both directions. This bilingual capability is the defining innovation: ProstT5 can take an amino acid sequence and produce 3Di tokens (structure prediction), or take 3Di tokens and generate amino acid sequences (inverse folding), all within a single model pass.
ProstT5 is an encoder-decoder transformer based on ProtT5-XL-U50, with approximately 3 billion parameters and a hidden dimension of 1024. Fine-tuning used a non-redundant subset of 17 million AlphaFolDB predictions, with sequence redundancy removed to avoid data leakage. The training objective applies the original ProtT5 span-based denoising to both amino acid sequences and their corresponding 3Di token sequences simultaneously, preventing catastrophic forgetting while teaching the new structural vocabulary. Input sequences are prefixed with a special token to indicate the modality being processed (sequence or structure), allowing the encoder-decoder to switch between translation directions.
On inverse folding benchmarks, ProstT5 achieves competitive native sequence recovery rates compared to structure-based inverse folding models such as ProteinMPNN, despite using only the 3Di token representation of structure rather than full atomic coordinates. For remote homology detection, 3Di sequences generated by ProstT5 from amino acid input alone outperform traditional sequence-based methods (BLAST, HHblits) and approach the performance of 3Di tokens derived from actual AlphaFold-predicted structures, validating the quality of the model's structural predictions.
ProstT5 is particularly valuable for researchers who need structure-aware protein representations but lack access to experimentally determined or computationally predicted structures for every sequence of interest. In protein design, the inverse folding mode generates novel sequences consistent with a target structural fold, providing a fast alternative to ProteinMPNN for ideation and diversity generation. For function annotation and remote homology detection, ProstT5-derived 3Di sequences enable Foldseek-based structural alignment at sequence-database scale, identifying distant evolutionary relationships invisible to sequence alignment alone. The model's pretrained embeddings also serve as improved feature representations for fine-tuning on structure-sensitive downstream tasks such as solubility prediction, subcellular localization, and secondary structure annotation.
ProstT5 demonstrated that the boundary between sequence and structure modeling in proteins can be dissolved within a single language model framework by adopting a structural token vocabulary. Its publication in NAR Genomics and Bioinformatics and public release on HuggingFace made bilingual sequence-structure reasoning accessible to the broader protein science community. The approach inspired subsequent work exploring multi-alphabet protein representations and highlighted the AlphaFold Database as a training resource for structure-conditioned models. A key limitation is that ProstT5's 3Di representation captures local structural environments but does not encode global backbone geometry or atomic coordinates, making it less precise than coordinate-based methods for applications that require atomic-level accuracy.