Open-source reproduction of AlphaFold 3 from Baidu PaddleHelix, predicting structures of proteins, nucleic acids, and small molecule ligands with comparable accuracy.
HelixFold3 is an open-source biomolecular structure prediction system developed by Baidu's PaddleHelix team and released in August 2024. It is one of the first open reproductions of AlphaFold 3, directly addressing the restricted accessibility of Google DeepMind's proprietary system. HelixFold3 predicts three-dimensional structures of proteins, DNA, RNA, and conventional small molecule ligands — including their mixed complexes — with accuracy benchmarked as comparable to AlphaFold 3 across multiple prediction tasks.
The model fills a significant gap in the open-science ecosystem. AlphaFold 3 demonstrated that a unified diffusion-based framework could handle the full diversity of biological macromolecules, but its weights and inference code were not released for local use. HelixFold3 reproduces this capability using publicly available structural data and releases both code and weights under a non-commercial academic license, enabling the research community to run predictions locally, inspect the implementation, and build upon it without dependency on a proprietary API.
A subsequent checkpoint, HelixFold3.1, improved RNA and nucleic acid predictions and was benchmarked on the CASP15 RNA benchmark. A further update, HelixFold3.2 (July 2025), brought additional gains in protein-related tasks and reduced atomic clashes.
HelixFold3 follows the AlphaFold 3 architectural paradigm. The trunk is a Pairformer that processes joint pair and single sequence representations, extracting co-evolutionary and spatial relationship signals from multiple sequence alignments (MSAs) for proteins and from sequence context for nucleic acids and ligands. Small molecules are represented via SMILES strings. The trunk output is passed to a diffusion module that generates all-atom coordinates across all molecular types in a unified denoising framework.
The model is implemented in PaddlePaddle, Baidu's deep learning framework, and trained on publicly available structural data with a September 30, 2021 cutoff, including Protein Data Bank (PDB) experimental structures and self-distillation datasets augmenting coverage of protein and complex configurations.
Benchmark results show competitive performance against AlphaFold 3 across several task categories. On ligand docking (PoseBusters V1 and V2), HelixFold3 achieves high success rates on 428 and 308 targets respectively, with only approximately 2% performance reduction on non-training-overlapping samples. On 186 protein complexes from PDB (January–November 2022) and on nucleic acid benchmarks (41 RNA-only and 41 DNA-only structures), HelixFold3.1 matches or exceeds competing open systems. The primary acknowledged gap is on general protein-protein interfaces and large multimeric complexes, where a performance difference relative to AlphaFold 3 remains.
HelixFold3 is well suited for structure-based drug discovery, where predicting protein-ligand binding poses for small molecule candidates is a core early-stage task. Researchers in RNA and DNA biology can model RNA tertiary structures, DNA-protein complexes, and mixed nucleic acid assemblies without access to proprietary services. Antibody engineers benefit from the epitope-guided inference capability for antigen-antibody complex geometry. More broadly, structural genomics teams can generate structural models for proteins and complexes lacking experimental data, and the transparent, reproducible codebase makes it a useful baseline for evaluating new biomolecular structure prediction methods.
HelixFold3 represents a meaningful contribution to open-science infrastructure for structural biology. By reproducing AlphaFold 3's capabilities with publicly available data and releasing code and weights, it lowers barriers for researchers at institutions without commercial API access and enables the broader community to audit, extend, and benchmark against the AlphaFold 3 paradigm. Notable limitations include the non-commercial license (CC BY-NC-SA 4.0), which precludes direct commercial deployment without a separate agreement, and the PaddlePaddle dependency, which may add friction for PyTorch-centric workflows. As an arXiv preprint at time of release, the results have not undergone formal peer review. Like all AlphaFold 3-style systems, predictions represent single static conformations and do not capture conformational dynamics or ensemble behavior.
Liu, L., et al. (2024) Technical Report of HelixFold3 for Biomolecular Structure Prediction. arXiv.org.
DOI: 10.48550/arXiv.2408.16975