Overview

xTrimoPGLM is a 100-billion-parameter protein language model developed jointly by BioMap and Tsinghua University and published in Nature Methods in 2025. It addresses a persistent tension in protein foundation models: autoencoding architectures (such as ESM2) excel at learning contextual sequence representations for understanding tasks, while autoregressive architectures (such as ProGen2) are better suited to sequence generation. Rather than accepting this tradeoff, xTrimoPGLM introduces a unified pretraining framework built on the General Language Model (GLM) backbone that jointly optimizes both objectives within a single model at unprecedented scale.

The core technical insight is that the GLM architecture — which processes input bidirectionally while performing autoregressive span infilling — is structurally compatible with both masked language modeling and causal generation objectives. A two-stage curriculum learning strategy capitalizes on this: the first stage trains exclusively on a masked language model objective over 400 billion tokens to build strong representational foundations, followed by a second stage of 600 billion tokens mixing 20% masked language modeling with 80% GLM generation objectives. This curriculum progression allows the model to develop robust sequence understanding before taking on the harder joint task.

xTrimoPGLM was trained on approximately 940 million unique protein sequences drawn from UniRef50 and related databases, totaling roughly 200 billion amino acid residues across 1 trillion training tokens. Training required 96 NVIDIA DGX machines each equipped with 8 A100 80GB GPUs. A quantized INT4 version of the 100B model is publicly available and can run inference on a single A100 80GB GPU.

Key Features

Unified understanding and generation: A single model handles both protein sequence comprehension and de novo protein design, outperforming task-specific baselines across 18 downstream benchmarks spanning structure, function, interaction, and developability categories.
GLM dual-objective pretraining: The GLM backbone enables simultaneous training with bidirectional (masked) and autoregressive (generative) objectives, resolving the longstanding incompatibility between BERT-style and GPT-style protein models.
xT-Fold structure prediction: Folding modules attached to the xTrimoPGLM-100B backbone yield an advanced structure predictor achieving a TM-score of 0.86 on CAMEO and 0.70 on CASP15, outperforming ESMFold (0.85 and 0.65 respectively) while remaining practical through 4-bit quantization and FlashAttention.
xTrimoPGLM-Ab antibody specialization: A 1-billion-parameter variant fine-tuned on antibody sequences achieves state-of-the-art zero-shot naturalness prediction, outperforming IgLM, AbLang, AntiBERTy, and ESM2-15B, with antibody structure prediction substantially faster than AlphaFold2.
Programmable sequence generation: After supervised fine-tuning on curated sequences, the model supports conditioned protein generation. De novo sequences generated by xTrimoPGLM achieve a median pLDDT of 85.4 and a median TM-score of 0.658 against PDB structures, at a median sequence identity of only 11.7% to known proteins — indicating genuine novelty rather than retrieval.
Scalable model family: Alongside the flagship 100B model, a family of smaller public checkpoints (1B, 3B, 7B, 10B) in both MLM and CLM configurations is available via Hugging Face for practical fine-tuning workflows.

Technical Details

xTrimoPGLM-100B uses a transformer architecture based on the General Language Model (GLM) design, which differs from standard encoder-only or decoder-only transformers by supporting bidirectional context during prefix processing and autoregressive decoding during span generation. This architecture supports both in-place token prediction (for understanding) and span prediction with autoregressive infilling (for generation) within the same forward pass. The two-stage curriculum applies a pure masked language model objective for the first 400 billion tokens, then transitions to a mixed regime where 80% of training steps use the GLM span-infilling objective and 20% retain the masked language model loss.

Benchmarking across 18 understanding tasks demonstrates consistent outperformance over ESM2-15B, ProtTrans, and other baselines. On out-of-distribution perplexity evaluations — a measure of how well a model generalizes to sequences beyond its training distribution — xTrimoPGLM-100B scores 10.81 (vs. 10.98 for ESM2) at 90% sequence identity cutoff and 13.35 (vs. 14.30 for ProGen2-xlarge at 6.4B parameters) at the 50% cutoff. The xT-Fold structural prediction module achieves TM-score 0.86 on CAMEO and 0.70 on CASP15 with inference accelerated through INT4 quantization and FlashAttention.

Applications

xTrimoPGLM is suited for a wide range of protein science workflows. Researchers can use the MLM-variant checkpoints as sequence encoders for fine-tuning on supervised tasks such as functional annotation, subcellular localization prediction, thermostability estimation, and protein-protein interaction prediction. The CLM-variant checkpoints support sequence generation tasks including scaffold design, linker generation, and unconditional de novo protein design. The specialized xTrimoPGLM-Ab model is directly applicable to therapeutic antibody research, with strong zero-shot performance on naturalness scoring that can prioritize lead candidates before experimental synthesis. The xT-Fold extension enables rapid structure prediction for proteins lacking experimental structural data, which is valuable in drug discovery and structural genomics programs.

Impact

xTrimoPGLM establishes a clear proof of concept that protein language models need not choose between understanding and generation capabilities, and that scaling to 100 billion parameters yields measurable downstream gains over strong smaller baselines. Its publication in Nature Methods in 2025 consolidates the preprint findings and marks it as a peer-reviewed contribution to the protein foundation model literature. The release of a quantized 100B model compatible with a single A100 GPU, alongside a family of smaller open checkpoints, lowers the barrier to entry for researchers working outside of industrial compute environments. A notable limitation is that, like other sequence-only language models, xTrimoPGLM does not incorporate explicit 3D structural information during pretraining — structure emerges only through downstream folding modules rather than being learned directly from coordinates. The xT-Fold performance, while competitive with ESMFold, remains below structure-aware models such as AlphaFold 2 for many targets.

Citation

xTrimoPGLM: unified 100-billion-parameter pretrained transformer for deciphering the language of proteins

Chen, B., Cheng, X., Li, P. et al. xTrimoPGLM: unified 100-billion-parameter pretrained transformer for deciphering the language of proteins. Nat Methods 22, 1028–1039 (2025).

DOI: 10.1038/s41592-025-02636-z

Overview

Key Features

Unified understanding and generation: A single model handles both protein sequence comprehension and de novo protein design, outperforming task-specific baselines across 18 downstream benchmarks spanning structure, function, interaction, and developability categories.

GLM dual-objective pretraining: The GLM backbone enables simultaneous training with bidirectional (masked) and autoregressive (generative) objectives, resolving the longstanding incompatibility between BERT-style and GPT-style protein models.

xT-Fold structure prediction: Folding modules attached to the xTrimoPGLM-100B backbone yield an advanced structure predictor achieving a TM-score of 0.86 on CAMEO and 0.70 on CASP15, outperforming ESMFold (0.85 and 0.65 respectively) while remaining practical through 4-bit quantization and FlashAttention.

xTrimoPGLM-Ab antibody specialization: A 1-billion-parameter variant fine-tuned on antibody sequences achieves state-of-the-art zero-shot naturalness prediction, outperforming IgLM, AbLang, AntiBERTy, and ESM2-15B, with antibody structure prediction substantially faster than AlphaFold2.

Programmable sequence generation: After supervised fine-tuning on curated sequences, the model supports conditioned protein generation. De novo sequences generated by xTrimoPGLM achieve a median pLDDT of 85.4 and a median TM-score of 0.658 against PDB structures, at a median sequence identity of only 11.7% to known proteins — indicating genuine novelty rather than retrieval.

Scalable model family: Alongside the flagship 100B model, a family of smaller public checkpoints (1B, 3B, 7B, 10B) in both MLM and CLM configurations is available via Hugging Face for practical fine-tuning workflows.

Technical Details

Applications

Impact

Citation

xTrimoPGLM: unified 100-billion-parameter pretrained transformer for deciphering the language of proteins

Chen, B., Cheng, X., Li, P. et al. xTrimoPGLM: unified 100-billion-parameter pretrained transformer for deciphering the language of proteins. Nat Methods 22, 1028–1039 (2025).

DOI: 10.1038/s41592-025-02636-z

xTrimoPGLM

Overview

Key Features

Technical Details

Applications

Impact

Citation

xTrimoPGLM: unified 100-billion-parameter pretrained transformer for deciphering the language of proteins

Metrics

GitHub

Citations

Tags

Resources

xTrimoPGLM

Overview

Key Features

Technical Details

Applications

Impact

Citation

xTrimoPGLM: unified 100-billion-parameter pretrained transformer for deciphering the language of proteins

Metrics

GitHub

Citations

Tags

Resources