Overview

alphafold_finetune is an AlphaFold 2-derived framework developed by Amir Motmaen, Justas Dauparas, Minkyung Baek, Martin Henley, David Baker, and Philip Bradley at the Bradley Lab, Fred Hutchinson Cancer Center, in collaboration with the Institute for Protein Design at the University of Washington. Published in PNAS in February 2023, the work addresses a gap in structure-based modeling of protein-peptide recognition: while AlphaFold 2 is highly accurate for single-chain protein folding and for many protein-protein interactions, its performance on short peptide-binding predictions — particularly for the immunologically critical major histocompatibility complex (MHC) system — can be insufficient for specificity discrimination. The paper demonstrates that systematic fine-tuning of AlphaFold's parameters on peptide-MHC structural and binding data substantially improves classification accuracy for peptide-MHC specificity prediction and generalizes to unrelated peptide-binding systems.

The MHC-peptide interaction is central to adaptive immunity: MHC class I and class II molecules present peptide fragments on the cell surface for recognition by T cell receptors, and the binding specificity of different MHC alleles determines which peptides can trigger immune responses. Predicting which peptides bind which MHC allele is important for vaccine design, neoantigen identification in cancer immunotherapy, and understanding autoimmune disease mechanisms. Prior computational approaches relied primarily on sequence-based predictors trained on binding affinity data, which can struggle with structural nuances that determine peptide compatibility with specific MHC grooves. alphafold_finetune brings AlphaFold's structural modeling capacity directly to bear on specificity prediction through targeted fine-tuning.

The framework goes beyond MHC to demonstrate generalization: the fine-tuned models also show improved performance on peptide-binding specificity for PDZ and SH3 domain families, which are entirely different classes of peptide-binding modules important in cellular signaling. This cross-domain generalization suggests that fine-tuning AlphaFold on structural data for any peptide-binding system can improve specificity discrimination in a transferable way.

Key Features

MHC class I and II fine-tuning: AlphaFold 2 parameters are fine-tuned on curated peptide-MHC class I and class II structural data, directly optimizing the network for accurate modeling of the peptide-binding groove geometry that determines specificity.
Generalization to other peptide-binding domains: Fine-tuned models demonstrate improved performance on PDZ domain and SH3 domain peptide-binding specificity, establishing that the approach transfers beyond the MHC training distribution.
Near-state-of-the-art MHC specificity classification: The fine-tuned model approaches the accuracy of top sequence-based MHC specificity predictors while using a structural prediction framework, without requiring explicit binding affinity measurements as training targets.
AlphaFold confidence as binding proxy: Uses AlphaFold's predicted Local Distance Difference Test (pLDDT) and predicted Template Modeling (pTM) scores as structural confidence signals that correlate with binding compatibility, enabling classification without a separate scoring head.
Google Colab pipeline: A user-friendly Colab notebook pipeline enables researchers without local AlphaFold infrastructure to run fine-tuned predictions, broadening accessibility to wet-lab immunologists and structural biologists.
Open-source and modifiable: The full codebase for fine-tuning and inference is available on GitHub, enabling the community to apply the same framework to other peptide-binding systems with custom structural training data.

Technical Details

alphafold_finetune starts from the pretrained AlphaFold 2 checkpoint (93 million parameters) and fine-tunes using curated datasets of peptide-MHC complex structures from the Protein Data Bank. For MHC class I, the training data includes high-resolution crystal structures of peptide-MHC I complexes across multiple human alleles (HLA-A, HLA-B, HLA-C) along with binding and non-binding peptide pairs used to construct pairwise training objectives. For MHC class II, a similar dataset of peptide-MHC II structures is used. Fine-tuning adjusts the Evoformer and Structure Module weights through gradient descent on a combined structure prediction and binding specificity objective, preserving the general folding capability while sharpening sensitivity to peptide groove compatibility.

At inference, the model takes peptide-MHC sequences as input in the standard AlphaFold multimer format and returns structure predictions with associated confidence scores. Binding specificity is assessed by comparing pLDDT or pTM scores for candidate peptide sequences against the same MHC allele: sequences that produce higher-confidence complex structures are predicted as better binders. The fine-tuned model approaches state-of-the-art performance on peptide-MHC class I and class II specificity benchmarks, with particular improvements for alleles underrepresented in sequence-based training datasets. Generalization experiments on PDZ domains (tested on SPOT array data) and SH3 domains confirm that fine-tuning benefits extend to structurally distinct peptide-binding modules.

Applications

The primary application domain is computational immunology, where accurate MHC-peptide specificity prediction is essential for designing T cell vaccines, identifying neoantigen candidates in cancer immunotherapy, and understanding which peptides from a pathogen are immunogenic in a given patient's HLA background. Researchers in neoantigen pipelines for cancer genomics can use alphafold_finetune as an additional structural filter to complement sequence-based MHC predictors like NetMHCpan, potentially improving precision when selecting candidates for experimental validation. In fundamental structural immunology, the model enables rapid hypothesis testing about how specific amino acid changes in a peptide or MHC allele alter binding geometry. For researchers studying cellular signaling, the PDZ and SH3 domain fine-tuning results suggest that alphafold_finetune can improve binding predictions for any system where a curated structural dataset of peptide-protein complexes is available, opening its use to domains far beyond immunology.

Impact

alphafold_finetune established one of the earliest demonstrations that AlphaFold 2's pretrained representations could be systematically improved for a specialized structural biology task through fine-tuning on domain-specific data, a paradigm that has since become a productive research direction. The work predated several subsequent AlphaFold fine-tuning efforts and provided a clear methodological template for how to construct training data, define fine-tuning objectives, and evaluate generalization for peptide-binding systems. The Fred Hutchinson Cancer Center released the code and Colab pipeline openly, enabling immediate uptake by the immunoinformatics community. A notable limitation is that the model inherits AlphaFold's computational requirements — each peptide-MHC prediction requires a full AlphaFold inference pass — making exhaustive screening of large peptide libraries computationally expensive compared to lightweight sequence-based predictors. Nonetheless, for cases where high structural accuracy is needed or where sequence-based methods give ambiguous results, alphafold_finetune provides a valuable complementary approach.

Overview

Key Features

MHC class I and II fine-tuning: AlphaFold 2 parameters are fine-tuned on curated peptide-MHC class I and class II structural data, directly optimizing the network for accurate modeling of the peptide-binding groove geometry that determines specificity.

Generalization to other peptide-binding domains: Fine-tuned models demonstrate improved performance on PDZ domain and SH3 domain peptide-binding specificity, establishing that the approach transfers beyond the MHC training distribution.

Near-state-of-the-art MHC specificity classification: The fine-tuned model approaches the accuracy of top sequence-based MHC specificity predictors while using a structural prediction framework, without requiring explicit binding affinity measurements as training targets.

AlphaFold confidence as binding proxy: Uses AlphaFold's predicted Local Distance Difference Test (pLDDT) and predicted Template Modeling (pTM) scores as structural confidence signals that correlate with binding compatibility, enabling classification without a separate scoring head.

Google Colab pipeline: A user-friendly Colab notebook pipeline enables researchers without local AlphaFold infrastructure to run fine-tuned predictions, broadening accessibility to wet-lab immunologists and structural biologists.

Open-source and modifiable: The full codebase for fine-tuning and inference is available on GitHub, enabling the community to apply the same framework to other peptide-binding systems with custom structural training data.

Technical Details

Applications

Impact

alphafold_finetune

Overview

Key Features

Technical Details

Applications

Impact

Tags

Resources

alphafold_finetune

Overview

Key Features

Technical Details

Applications

Impact

Tags

Resources