Chai-1

Biomolecular structure prediction foundation model covering proteins, small molecules, DNA, RNA, and glycans in a single diffusion framework.

Released: October 2024

Chai-1 is a multi-modal foundation model for biomolecular structure prediction developed by Chai Discovery and released in October 2024. Unlike most structure prediction tools that specialize in a single molecule class, Chai-1 handles proteins, small molecules (ligands), DNA, RNA, and glycosylations within a single unified architecture. This generality makes it one of the broadest open-weights biomolecular structure predictors available to researchers.

The model is built on a diffusion-based framework that generates 3D atomic coordinates directly by iteratively denoising a random configuration of atoms. All molecular entities — from amino acid residues to glycan moieties — are represented as tokens in a shared space, allowing the model to reason jointly about inter-molecular contacts and binding geometries without requiring separate sub-models for each entity type.

Chai-1 achieves a 77% success rate on the PoseBusters benchmark for protein-ligand co-crystal structure prediction, competitive with or exceeding AlphaFold 3. A distinctive capability is its support for experimental restraint conditioning: sparse distance or contact restraints derived from crosslinking mass spectrometry, NMR, or cryo-EM data can be incorporated as soft biases during the diffusion process, yielding double-digit percentage point improvements over unguided prediction for difficult targets.

Key Features

Unified Multi-Modal Architecture: Predicts structures for proteins, small molecules, DNA, RNA, and glycosylations in a single model without specialized sub-models for each molecule type.
Experimental Restraint Conditioning: Accepts sparse wet-lab restraints (e.g., crosslinking MS, NMR chemical shifts) as soft prompts during inference, substantially boosting accuracy for challenging targets.
MSA-Free Inference Mode: Operates without multiple sequence alignments, retaining most full-MSA performance and enabling rapid predictions for novel or orphan sequences with limited evolutionary context.
Open Weights for Research: Model weights and inference code are released as a Python package (chai_lab) for non-commercial research; a free web interface supports both research and commercial drug discovery use cases.
PoseBusters-Validated Pose Prediction: Achieves 77% on the PoseBusters benchmark for protein-ligand co-crystal pose prediction, matching or exceeding the performance of AlphaFold 3.

Technical Details

Chai-1 uses a diffusion-based architecture that treats all biomolecular entities in a shared token representation. A pairwise representation module encodes spatial relationships between all atom pairs across molecule types, allowing reasoning about inter-molecular contacts in protein-ligand and protein-nucleic acid complexes. The model incorporates evolutionary information from multiple sequence alignments and structural templates for proteins, with an optional MSA-free pathway for rapid inference. Experimental restraints are integrated as soft biases on pairwise distances during the reverse diffusion trajectory rather than as hard geometric constraints.

Training data derives from experimental structures in the Protein Data Bank, including co-crystal structures of protein-ligand, protein-nucleic acid, and glycosylated protein complexes. The training regime incorporates data augmentation strategies consistent with recent biomolecular diffusion models. On the PoseBusters benchmark, Chai-1 achieves 77% success rate for protein-ligand pose prediction. Protein-protein interface accuracy is competitive with AlphaFold-Multimer on standard benchmarks. Addition of experimental restraints yields double-digit percentage point improvements over unguided prediction.

Applications

Chai-1 is particularly well-suited to structure-based drug discovery workflows, where the ability to model protein-ligand binding poses, evaluate fragment library binding modes, and assess allosteric interactions in a single unified system offers a practical advantage over pipelines stitched together from multiple specialized tools. Beyond small-molecule discovery, the model supports modeling of multi-chain assemblies including protein-protein, protein-DNA, and protein-RNA complexes, as well as glycoprotein structures with explicit glycan chains. Researchers integrating sparse experimental data from crosslinking MS, SAXS, or cryo-EM can use the restraint-conditioning feature to guide predictions toward conformations consistent with low-resolution experimental observations.

Impact

Chai-1 represents a significant step toward generalist biomolecular structure prediction, demonstrating that a single diffusion-based model can achieve competitive accuracy across multiple molecule classes simultaneously. Its open-weights release has lowered the barrier to high-quality structure prediction for protein-ligand, protein-nucleic acid, and glycoprotein systems that were previously underserved by freely available tools. A notable limitation is that model weights are restricted to non-commercial use; commercial users must access predictions through the Chai Discovery web interface. As of release, the technical report exists as a bioRxiv preprint and has not yet undergone formal peer review. Accuracy for complex glycan chains lags behind protein and small-molecule performance, and very large or highly disordered complexes remain challenging at standard compute budgets. Nonetheless, Chai-1 has been widely adopted by the drug discovery and structural biology communities as a practical complement or alternative to AlphaFold 3 for multi-modal structure modeling tasks.

Citation

Chai-1: Decoding the molecular interactions of life

Preprint

Boitreaud, J., et al. (2024) Chai-1: Decoding the molecular interactions of life. bioRxiv.

DOI: 10.1101/2024.10.10.615955

Recent citations

Papers that recently cited this model.

Peptide ligand discovery of G protein-coupled receptors
J. Hermes, Marin Matic, Ho Yan Yeung, et al.
Nature Reviews Methods Primers · Jul 2026
0
Characterising AlphaFold 3’s ability to predict T cell antigen specificity
Benjamin McMaster, Ali El Moselhy, Ilija Ilievski, et al.
bioRxiv · Jul 2026
0
Capabilities, specificity gaps and training-data dependence of AlphaFold3 across diverse application areas
O. Follonier, Yan Liu, Pablo Campomanes, et al.
bioRxiv · Jul 2026
0

Top citations

The most-cited papers that cite this model.

Boltz-1 Democratizing Biomolecular Interaction Modeling
Jeremy Wohlwend, Gabriele Corso, Saro Passaro, et al.
bioRxiv · Nov 2024
378
Atom-level enzyme active site scaffolding using RFdiffusion2
Woody Ahern, Jason Yim, D. Tischer, et al.
bioRxiv · Apr 2025
89
Highly pathogenic avian influenza H5N1: history, current situation, and outlook
F. Krammer, Enikő Hermann, Angela L. Rasmussen
Journal of Virology · Mar 2025
86
Have protein-ligand cofolding methods moved beyond memorisation?
Peter Škrinjar, Jérôme Eberhardt, G. Tauriello, et al.
bioRxiv · Aug 2025
71
AI-driven protein design
Huan Yee Koh, Yi Zheng, Maddie Yang, et al.
Nature Reviews Bioengineering · Sep 2025
49

Citations

Total Citations400

Influential48

References0

GitHub

Stars2K

Forks277

Open Issues89

Contributors8

Last Push25d ago

LanguagePython

LicenseApache-2.0

HuggingFace

Downloads0

Likes22

Last Modified1y ago

Fields of citing research

Biology77%
Medicine64%
Computer Science62%
Chemistry41%
Environmental Science8%
Materials Science3%
Engineering3%
Physics2%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

49Partial

Usability — can I run it?95

Reproducibility — can I retrain it?0

open weights, closed recipenot reproducible

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website HuggingFace Model

Key Features

Unified Multi-Modal Architecture: Predicts structures for proteins, small molecules, DNA, RNA, and glycosylations in a single model without specialized sub-models for each molecule type.

Experimental Restraint Conditioning: Accepts sparse wet-lab restraints (e.g., crosslinking MS, NMR chemical shifts) as soft prompts during inference, substantially boosting accuracy for challenging targets.

MSA-Free Inference Mode: Operates without multiple sequence alignments, retaining most full-MSA performance and enabling rapid predictions for novel or orphan sequences with limited evolutionary context.

Open Weights for Research: Model weights and inference code are released as a Python package (chai_lab) for non-commercial research; a free web interface supports both research and commercial drug discovery use cases.

PoseBusters-Validated Pose Prediction: Achieves 77% on the PoseBusters benchmark for protein-ligand co-crystal pose prediction, matching or exceeding the performance of AlphaFold 3.

Technical Details

Applications

Impact

Top citations

The most-cited papers that cite this model.

Chai-1

#Key Features

#Technical Details

#Applications

#Impact

Citation

Chai-1: Decoding the molecular interactions of life

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Chai-1

#Key Features

#Technical Details

#Applications

#Impact

Citation

Chai-1: Decoding the molecular interactions of life

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact