RoseTTAFold All-Atom (RFAA) extends the RoseTTAFold architecture to model the full chemical complexity of biological systems. While AlphaFold 2 and the original RoseTTAFold transformed protein structure prediction, both were limited to polypeptide chains. RFAA removes that constraint by combining residue-level representations of proteins and nucleic acids with an atomic graph representation of small molecules and covalent modifications, enabling joint structure prediction across all major classes of biological macromolecules and their ligands in a single network pass.
Published in Science in March 2024 by the Baker Lab at the University of Washington, RFAA achieves protein monomer structure prediction accuracy comparable to AlphaFold 2 while simultaneously handling interaction partners that no prior generalist method could model. The work also introduced RFdiffusion All-Atom (RFdiffusionAA), a companion generative model fine-tuned from RFAA that designs entirely new protein scaffolds around target small molecules.
The release marked a significant step toward modeling the true chemical complexity of biological assemblies, where proteins rarely act in isolation but instead interact with metabolites, cofactors, nucleic acids, and post-translational modifications.
RFAA builds on the RoseTTAFold2 three-track architecture, which processes 1D sequence, 2D pairwise distance, and 3D coordinate information in parallel tracks with iterative cross-track attention. The key innovation is a dual input representation: biopolymers (amino acids, DNA/RNA bases) are encoded at residue level, while small molecules, metals, and covalent modifications are encoded as atomic bond graphs fed into the 1D track (element types), 2D track (chemical bonds), and 3D track (chirality). This asymmetric scheme allows efficient polymer processing while preserving full bonded geometry for non-polymer components. Structure generation uses an SE(3)-equivariant transformer to produce all-atom coordinates.
The model was trained on biological assemblies from the Protein Data Bank, including protein-small molecule complexes, protein-metal complexes, and covalently modified proteins. Common solvents and crystallization additives were filtered from training targets to keep the model focused on biologically meaningful interactions. On standard benchmarks, RFAA achieves protein monomer accuracy comparable to AlphaFold 2, strong performance on flexible backbone docking in CAMEO evaluations, and reasonable accuracy on multi-chain assemblies containing combinations of proteins, nucleic acids, and small molecules simultaneously.
RFAA is best suited for research problems that require modeling the true chemical context of biological systems. Primary use cases include predicting ligand-bound protein structures where backbone flexibility matters, modeling metalloenzymes and cofactor-bound proteins such as heme proteins or zinc-finger domains, and characterizing covalently modified proteins like glycoproteins. The companion RFdiffusionAA model extends these capabilities into active protein design, enabling researchers to generate novel binders for specific small-molecule targets — a workflow relevant to biosensor development, therapeutic protein engineering, and synthetic biology. The combined prediction-and-design pipeline represents a practical toolkit for labs working at the chemistry-biology interface.
RFAA represented a meaningful expansion of the generalist structure prediction paradigm beyond polypeptides, addressing a longstanding gap where researchers had to chain together specialized tools to model chemically complex assemblies. The experimentally validated small-molecule binder designs demonstrated that all-atom modeling is not merely predictive but generatively useful. Limitations remain: RFAA is not a replacement for specialized docking software when the receptor structure is already known, performance decreases for very large or chemically unusual ligands, and all-atom modeling of large assemblies demands substantially more memory than protein-only prediction. Nonetheless, its open-source availability on GitHub and strong benchmark results have made it a widely adopted tool for labs working on protein-ligand and protein-small-molecule systems.
Krishna, R., et al. (2023) Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom. bioRxiv.
DOI: 10.1126/science.adl2528