Open-source PyTorch reproduction of AlphaFold 3 (Apache 2.0) that matches or exceeds AF3 performance on protein-ligand, protein-protein, and protein-nucleic acid benchmarks.
Protenix is a fully open-source PyTorch reimplementation of AlphaFold 3, developed by ByteDance AI Lab and released in January 2025 under the permissive Apache 2.0 license. The project was motivated by the limited accessibility of the original AlphaFold 3 code and model weights, which were released under a restrictive license that prohibited commercial use and required registration. By faithfully reproducing the AF3 architecture and training pipeline in a transparent, auditable codebase, Protenix restores the ability for both academic and commercial researchers to build on the current state of the art in biomolecular structure prediction.
At equivalent training data cutoffs, model scale, and inference budget, Protenix matches or exceeds AlphaFold 3 on multiple standard benchmarks, including PoseBusters V2 for protein-ligand docking and the CASP15 RNA structure prediction challenge. The model handles the full range of molecular inputs that AF3 targets — protein sequences, small molecules, RNA, and DNA — making it a practical drop-in alternative for workflows that require open-weight access or commercial deployment freedom.
The v1.0.0 release in February 2026 added template and RNA MSA support, and the repository is actively maintained with ongoing performance and usability improvements. A lightweight Protenix-Mini variant is also available for reduced-cost inference on standard hardware.
Protenix is a 368-million parameter model that follows the AlphaFold 3 architecture. The core design departs from AlphaFold 2's iterative structure module in favor of a diffusion-based structure generation approach. Sequence and evolutionary information are processed by a Pairformer (which replaces the Evoformer used for non-MSA inputs in AF3) and an MSA module that captures co-evolutionary signals. A diffusion-based decoder then generates all-atom coordinates by iteratively denoising from a random distribution, allowing flexible and physically plausible generation across diverse molecular types including small molecules and nucleic acids.
The model is trained on publicly available structural data consistent with the AlphaFold 3 training regime: Protein Data Bank structures, sequence databases for MSA construction (UniRef, BFD, MGnify), and small molecule and nucleic acid records from public repositories. The full training pipeline and MSA generation scripts are open-sourced alongside the model weights. Inference is implemented in Python with CUDA and C++ extensions; kernel fusion and shared variable caching are used for optimization, and the Protenix-Mini variants reduce compute requirements further for large-scale screening use cases.
Protenix serves researchers who require open-weight access to AF3-class structure prediction capabilities. In drug discovery, the model predicts protein-ligand binding poses for virtual screening and lead optimization pipelines where commercial licensing of the original AF3 would be prohibitive. Structural biologists use it to generate atomic models of protein-protein and protein-nucleic acid complexes, particularly for targets with limited homologs where MSA-dependent methods are stressed. RNA biologists benefit from the CASP15-leading RNA structure prediction capability. The model is also well-suited as a reproducible baseline for benchmarking new structure prediction methods, given that its full training and inference pipeline is publicly auditable in a way that AlphaFold 3 is not.
Protenix occupies a strategically important position in the biomolecular structure prediction landscape by delivering AlphaFold 3-class performance under an unrestricted open-source license. Its release directly addresses the gap created by DeepMind's decision to withhold full training code and impose commercial restrictions on AF3, and it has become a reference implementation for the research community seeking a fully reproducible AF3 baseline. Notable limitations mirror those of AlphaFold 3: performance depends on MSA depth and template availability, degrading for sequences with no evolutionary homologs; predictions represent single static conformations rather than conformational ensembles; and inference on large complexes remains computationally demanding despite optimization work. As a preprint-stage model at initial release, some benchmark claims are pending full peer review, though the transparent codebase allows independent verification.
Chen, X., et al. (2025) Protenix - Advancing Structure Prediction Through a Comprehensive AlphaFold3 Reproduction. bioRxiv.
DOI: 10.1101/2025.01.08.631967