Multi-modal foundation model for biomolecular structure prediction covering proteins, small molecules, DNA, RNA, and glycans in a unified diffusion framework.
Chai-1 is a multi-modal foundation model for biomolecular structure prediction developed by Chai Discovery and released in October 2024. Unlike most structure prediction tools that specialize in a single molecule class, Chai-1 handles proteins, small molecules (ligands), DNA, RNA, and glycosylations within a single unified architecture. This generality makes it one of the broadest open-weights biomolecular structure predictors available to researchers.
The model is built on a diffusion-based framework that generates 3D atomic coordinates directly by iteratively denoising a random configuration of atoms. All molecular entities — from amino acid residues to glycan moieties — are represented as tokens in a shared space, allowing the model to reason jointly about inter-molecular contacts and binding geometries without requiring separate sub-models for each entity type.
Chai-1 achieves a 77% success rate on the PoseBusters benchmark for protein-ligand co-crystal structure prediction, competitive with or exceeding AlphaFold 3. A distinctive capability is its support for experimental restraint conditioning: sparse distance or contact restraints derived from crosslinking mass spectrometry, NMR, or cryo-EM data can be incorporated as soft biases during the diffusion process, yielding double-digit percentage point improvements over unguided prediction for difficult targets.
chai_lab) for non-commercial research; a free web interface supports both research and commercial drug discovery use cases.Chai-1 uses a diffusion-based architecture that treats all biomolecular entities in a shared token representation. A pairwise representation module encodes spatial relationships between all atom pairs across molecule types, allowing reasoning about inter-molecular contacts in protein-ligand and protein-nucleic acid complexes. The model incorporates evolutionary information from multiple sequence alignments and structural templates for proteins, with an optional MSA-free pathway for rapid inference. Experimental restraints are integrated as soft biases on pairwise distances during the reverse diffusion trajectory rather than as hard geometric constraints.
Training data derives from experimental structures in the Protein Data Bank, including co-crystal structures of protein-ligand, protein-nucleic acid, and glycosylated protein complexes. The training regime incorporates data augmentation strategies consistent with recent biomolecular diffusion models. On the PoseBusters benchmark, Chai-1 achieves 77% success rate for protein-ligand pose prediction. Protein-protein interface accuracy is competitive with AlphaFold-Multimer on standard benchmarks. Addition of experimental restraints yields double-digit percentage point improvements over unguided prediction.
Chai-1 is particularly well-suited to structure-based drug discovery workflows, where the ability to model protein-ligand binding poses, evaluate fragment library binding modes, and assess allosteric interactions in a single unified system offers a practical advantage over pipelines stitched together from multiple specialized tools. Beyond small-molecule discovery, the model supports modeling of multi-chain assemblies including protein-protein, protein-DNA, and protein-RNA complexes, as well as glycoprotein structures with explicit glycan chains. Researchers integrating sparse experimental data from crosslinking MS, SAXS, or cryo-EM can use the restraint-conditioning feature to guide predictions toward conformations consistent with low-resolution experimental observations.
Chai-1 represents a significant step toward generalist biomolecular structure prediction, demonstrating that a single diffusion-based model can achieve competitive accuracy across multiple molecule classes simultaneously. Its open-weights release has lowered the barrier to high-quality structure prediction for protein-ligand, protein-nucleic acid, and glycoprotein systems that were previously underserved by freely available tools. A notable limitation is that model weights are restricted to non-commercial use; commercial users must access predictions through the Chai Discovery web interface. As of release, the technical report exists as a bioRxiv preprint and has not yet undergone formal peer review. Accuracy for complex glycan chains lags behind protein and small-molecule performance, and very large or highly disordered complexes remain challenging at standard compute budgets. Nonetheless, Chai-1 has been widely adopted by the drug discovery and structural biology communities as a practical complement or alternative to AlphaFold 3 for multi-modal structure modeling tasks.
Boitreaud, J., et al. (2024) Chai-1: Decoding the molecular interactions of life. bioRxiv.
DOI: 10.1101/2024.10.10.615955