Institute for Protein Design
AlphaFold fine-tuned on peptide-MHC and protein-peptide binding data for specificity prediction across MHC class I/II, PDZ, and SH3 domains.
alphafold_finetune is an AlphaFold 2-derived framework developed by Amir Motmaen, Justas Dauparas, Minkyung Baek, Martin Henley, David Baker, and Philip Bradley at the Bradley Lab, Fred Hutchinson Cancer Center, in collaboration with the Institute for Protein Design at the University of Washington. Published in PNAS in February 2023, the work addresses a gap in structure-based modeling of protein-peptide recognition: while AlphaFold 2 is highly accurate for single-chain protein folding and for many protein-protein interactions, its performance on short peptide-binding predictions — particularly for the immunologically critical major histocompatibility complex (MHC) system — can be insufficient for specificity discrimination. The paper demonstrates that systematic fine-tuning of AlphaFold's parameters on peptide-MHC structural and binding data substantially improves classification accuracy for peptide-MHC specificity prediction and generalizes to unrelated peptide-binding systems.
The MHC-peptide interaction is central to adaptive immunity: MHC class I and class II molecules present peptide fragments on the cell surface for recognition by T cell receptors, and the binding specificity of different MHC alleles determines which peptides can trigger immune responses. Predicting which peptides bind which MHC allele is important for vaccine design, neoantigen identification in cancer immunotherapy, and understanding autoimmune disease mechanisms. Prior computational approaches relied primarily on sequence-based predictors trained on binding affinity data, which can struggle with structural nuances that determine peptide compatibility with specific MHC grooves. alphafold_finetune brings AlphaFold's structural modeling capacity directly to bear on specificity prediction through targeted fine-tuning.
The framework goes beyond MHC to demonstrate generalization: the fine-tuned models also show improved performance on peptide-binding specificity for PDZ and SH3 domain families, which are entirely different classes of peptide-binding modules important in cellular signaling. This cross-domain generalization suggests that fine-tuning AlphaFold on structural data for any peptide-binding system can improve specificity discrimination in a transferable way.
alphafold_finetune starts from the pretrained AlphaFold 2 checkpoint (93 million parameters) and fine-tunes using curated datasets of peptide-MHC complex structures from the Protein Data Bank. For MHC class I, the training data includes high-resolution crystal structures of peptide-MHC I complexes across multiple human alleles (HLA-A, HLA-B, HLA-C) along with binding and non-binding peptide pairs used to construct pairwise training objectives. For MHC class II, a similar dataset of peptide-MHC II structures is used. Fine-tuning adjusts the Evoformer and Structure Module weights through gradient descent on a combined structure prediction and binding specificity objective, preserving the general folding capability while sharpening sensitivity to peptide groove compatibility.
At inference, the model takes peptide-MHC sequences as input in the standard AlphaFold multimer format and returns structure predictions with associated confidence scores. Binding specificity is assessed by comparing pLDDT or pTM scores for candidate peptide sequences against the same MHC allele: sequences that produce higher-confidence complex structures are predicted as better binders. The fine-tuned model approaches state-of-the-art performance on peptide-MHC class I and class II specificity benchmarks, with particular improvements for alleles underrepresented in sequence-based training datasets. Generalization experiments on PDZ domains (tested on SPOT array data) and SH3 domains confirm that fine-tuning benefits extend to structurally distinct peptide-binding modules.
The primary application domain is computational immunology, where accurate MHC-peptide specificity prediction is essential for designing T cell vaccines, identifying neoantigen candidates in cancer immunotherapy, and understanding which peptides from a pathogen are immunogenic in a given patient's HLA background. Researchers in neoantigen pipelines for cancer genomics can use alphafold_finetune as an additional structural filter to complement sequence-based MHC predictors like NetMHCpan, potentially improving precision when selecting candidates for experimental validation. In fundamental structural immunology, the model enables rapid hypothesis testing about how specific amino acid changes in a peptide or MHC allele alter binding geometry. For researchers studying cellular signaling, the PDZ and SH3 domain fine-tuning results suggest that alphafold_finetune can improve binding predictions for any system where a curated structural dataset of peptide-protein complexes is available, opening its use to domains far beyond immunology.
alphafold_finetune established one of the earliest demonstrations that AlphaFold 2's pretrained representations could be systematically improved for a specialized structural biology task through fine-tuning on domain-specific data, a paradigm that has since become a productive research direction. The work predated several subsequent AlphaFold fine-tuning efforts and provided a clear methodological template for how to construct training data, define fine-tuning objectives, and evaluate generalization for peptide-binding systems. The Fred Hutchinson Cancer Center released the code and Colab pipeline openly, enabling immediate uptake by the immunoinformatics community. A notable limitation is that the model inherits AlphaFold's computational requirements — each peptide-MHC prediction requires a full AlphaFold inference pass — making exhaustive screening of large peptide libraries computationally expensive compared to lightweight sequence-based predictors. Nonetheless, for cases where high structural accuracy is needed or where sequence-based methods give ambiguous results, alphafold_finetune provides a valuable complementary approach.