Boltz-1 is an open-source deep learning model developed at MIT for predicting the three-dimensional structures of biomolecular complexes, including proteins, nucleic acids, and small molecules. Released in November 2024 by researchers Jeremy Wohlwend, Gabriele Corso, Saro Passaro, and colleagues from MIT's Jameel Clinic under the guidance of professors Regina Barzilay and Tommi Jaakkola, Boltz-1 is notable for being the first fully open model to achieve accuracy on par with AlphaFold3 — the state-of-the-art system from Google DeepMind that had previously been available only under a restrictive non-commercial license.
The model's central contribution to the field is not just technical performance, but democratization. AlphaFold3 represented a significant advance in modeling biomolecular interactions involving proteins, DNA, RNA, and small molecule ligands simultaneously, but its inaccessibility to commercial and independent researchers created a meaningful gap in the ecosystem. Boltz-1 closes that gap by releasing training code, model weights, datasets, and benchmarks under the MIT license, making frontier-level structural biology AI available to the global research community without restriction.
Boltz-1 follows the general architectural framework established by AlphaFold3 but introduces several meaningful innovations in multiple sequence alignment (MSA) pairing, training-time structure cropping, pocket-conditioned prediction, and inference-time steering. The entire training pipeline relies exclusively on publicly available data from the Protein Data Bank, UniProt, and ChEMBL, demonstrating that AlphaFold3-class performance does not require proprietary datasets.
Boltz-1 is built around a diffusion-based architecture that closely follows the AlphaFold3 framework, with a trunk network that processes paired sequence and structural representations and a diffusion module that generates atomic coordinates. The model was trained entirely on open data: pre-processed Protein Data Bank (PDB) structures with pre-computed MSAs containing up to 4,096 sequences per chain, along with ligand information sourced from ChEMBL and chemical databases. Key architectural modifications relative to AlphaFold3 include revised MSA pairing algorithms for handling heteromeric complexes, changes to representation flow within the trunk, and a reworked confidence model that frames confidence estimation as a fine-tuning task on the trunk layers rather than a separate head.
On the CASP15 benchmark (66 targets from the 2022 competition), Boltz-1 demonstrates strong performance across modalities: a median LDDT-PLI of 65% for protein-ligand interactions versus 40% for Chai-1, and 83% of protein-protein targets with DockQ > 0.23 versus 76% for Chai-1. RNA prediction performance achieves a median LDDT of 0.54 on CASP15 RNA targets. The Boltz-steering technique further improves output quality at inference time without retraining, by applying constraint-based guidance during diffusion sampling to eliminate non-physical bond geometries and clashes.
Boltz-1 is designed to serve the full spectrum of structural biology use cases that previously required access to commercial or institutional tools. Drug discovery teams can use it for structure-based virtual screening, predicting protein-ligand poses for hit identification and lead optimization. The pocket-conditioning feature makes it directly applicable to fragment-based drug design, where partial information about a binding site is used to guide complex structure prediction. Protein engineers can use Boltz-1 to predict heteromeric complex structures, assess the effects of mutations on binding interfaces, or validate designed protein-protein interactions prior to experimental synthesis. Academic researchers benefit from the transparent training pipeline, which supports reproducibility, ablation studies, and further model development in ways that closed systems cannot.
Boltz-1 arrived at a moment when the structural biology community was acutely aware of the tension between scientific capability and access. AlphaFold3's non-commercial license had excluded large portions of the research ecosystem from using the most powerful available tool for biomolecular complex prediction. Boltz-1 resolved this by demonstrating that AlphaFold3-level accuracy is achievable with open data and open methods, establishing a new baseline for what the community can expect from open-source tools. The MIT license enables integration into commercial drug discovery platforms, downstream model development, and open-science initiatives. The concurrent release of training data on AWS Open Data further lowers the barrier for groups wishing to replicate or extend the work. A key limitation is that, as a diffusion model, Boltz-1 can still produce hallucinated or non-physical structures for difficult targets — a challenge the Boltz-steering technique partially addresses but does not fully eliminate. As of early 2025, the model has seen rapid adoption in both academic and commercial contexts, with active community development continuing through the public GitHub repository.
Wohlwend, J., Corso, G., Passaro, S., et al. (2024). Boltz-1: Democratizing Biomolecular Interaction Modeling. bioRxiv.
DOI: 10.1101/2024.11.19.624167