MHC-Fine

AlphaFold fine-tuned via OpenFold on 944 high-resolution MHC-peptide structures, reaching median peptide RMSD of 0.65 Å on held-out complexes.

Released: November 2023

MHC-Fine is a specialized variant of AlphaFold fine-tuned exclusively on high-resolution MHC-peptide crystal structures to improve the accuracy of structural predictions for major histocompatibility complex (MHC) complexes with bound peptides. Developed by Ernest Glukhov, Dmytro Kalitin, Darya Stepanenko, Yimin Zhu, Thu Nguyen, George Jones, Carlos Simmerling, Julie C. Mitchell, Sandor Vajda, Ken A. Dill, Dzmitry Padhorny, and Dima Kozakov at Stony Brook University, with collaborators from Oak Ridge National Laboratory and Boston University, the work was first posted as a bioRxiv preprint in November 2023 and subsequently published in Biophysical Journal in 2024.

The MHC-peptide system poses a particularly demanding structural prediction challenge. MHC molecules present peptide fragments for immune surveillance, and the precise geometry of how a peptide sits within the MHC binding groove — including its backbone conformation, side-chain orientations, and anchor residue interactions — determines whether a T cell receptor will recognize the complex. AlphaFold 2 and AlphaFold-Multimer are trained on the broad diversity of the Protein Data Bank, which provides general structural competence but insufficient specialization for the stereotyped, groove-filling geometry of peptide-MHC interactions. MHC-Fine directly addresses this gap through domain-specific fine-tuning on a curated structural dataset, sharpening AlphaFold's predictions for this immunologically important class of complexes.

A key implementation choice distinguishes MHC-Fine from other AlphaFold fine-tuning approaches: rather than modifying the original JAX-based AlphaFold codebase directly, the developers built on OpenFold — a PyTorch reimplementation of AlphaFold that supports efficient gradient-based fine-tuning through standard deep learning frameworks. This choice provides substantially more flexibility for training modifications, learning rate scheduling, and integration with the broader PyTorch ecosystem, and makes the training procedure more accessible to researchers without JAX expertise.

Key Features

Curated MHC-peptide training dataset: Fine-tuned on 944 high-resolution MHC-peptide crystal structures from the Protein Data Bank, spanning human (HLA), mouse (H-2), and other species MHC alleles, providing a diverse structural basis while remaining focused on this specific complex class.
OpenFold-based PyTorch implementation: Built on the PyTorch reimplementation of AlphaFold rather than the original JAX codebase, enabling flexible training pipeline design, standard gradient utilities, and integration with modern deep learning tooling.
Improved peptide RMSD accuracy: Achieves a median Cα RMSD of 0.65 Å for predicted peptide conformations in held-out MHC-peptide complexes, outperforming both the Pandora homology-modeling approach and AlphaFold-Multimer on this task-specific metric.
Enhanced pLDDT calibration: Provides improved predicted Local Distance Difference Test (pLDDT) scores that more reliably reflect the actual accuracy of MHC-peptide complex predictions compared to the general-purpose AlphaFold model.
Cross-species generalization: The training dataset spans multiple species' MHC alleles, enabling the model to generalize to non-human MHC complexes relevant for veterinary immunology and comparative immunological research.
Complementary to sequence-based tools: Designed to work alongside sequence-based MHC-peptide prediction tools (such as NetMHCpan or MHCflurry) by providing structural accuracy for cases where the 3D geometry of peptide binding is the primary question.

Technical Details

MHC-Fine uses the OpenFold framework — a memory-efficient, GPU-friendly PyTorch reproduction of AlphaFold 2 — as its training foundation. The final training dataset consisted of 944 high-resolution MHC-peptide crystal structures collected from the Protein Data Bank, filtered for resolution quality and cleaned to remove redundancies and low-quality structures. This dataset covers MHC class I and class II complexes across multiple human HLA alleles and selected non-human species. Fine-tuning proceeds from the AlphaFold 2 pretrained weights, applying supervised learning on the curated structural dataset with the same structure prediction objectives as the original AlphaFold training, adapted to focus on accurate reproduction of the peptide-MHC binding geometry.

Evaluation against held-out MHC-peptide complexes uses Cα RMSD of predicted versus experimental peptide conformations as the primary accuracy metric, with additional assessment using pLDDT scores as a proxy for prediction confidence. The median peptide RMSD of 0.65 Å on the test set compares favorably to competing methods: Pandora, which uses homology modeling with templates from the structural database, and AlphaFold-Multimer, which is the standard approach for multi-chain complex prediction but lacks specialization for the peptide-groove interaction geometry. The improvement is most pronounced for peptides with unusual sequence motifs or for alleles with limited structural templates, where AlphaFold-Multimer's general training is insufficient to correctly place the peptide backbone.

Applications

MHC-Fine is directly applicable in computational immunology workflows focused on structural accuracy of MHC-peptide complexes. Vaccine designers modeling how specific peptides from pathogen proteins engage different HLA alleles in target populations can use MHC-Fine to generate higher-fidelity structural models than standard AlphaFold-Multimer provides. Cancer immunotherapy researchers identifying neoantigen candidates can use MHC-Fine predictions to assess structural plausibility of candidate peptides in patient-specific HLA alleles, complementing sequence-based affinity predictions. Structural biologists using computational models to guide experimental mutagenesis — identifying residues in the peptide or MHC allele that alter binding geometry — benefit from the improved peptide RMSD accuracy. For researchers studying the molecular basis of alloreactivity, transplant rejection, or autoimmune antigen presentation, MHC-Fine enables more reliable structural hypotheses about which peptide-MHC combinations are structurally compatible. The multi-species training also makes MHC-Fine useful for veterinary immunology research where non-human MHC systems are studied.

Impact

MHC-Fine represents a clear example of how domain-specific fine-tuning of a general-purpose structure predictor on a high-quality, task-relevant dataset can improve accuracy beyond what broad training achieves. The choice to build on OpenFold in PyTorch rather than the original JAX AlphaFold codebase is noteworthy as a practical contribution: it demonstrates that the OpenFold ecosystem is a viable platform for production-quality fine-tuning workflows, potentially lowering the barrier for future domain-specific AlphaFold adaptations. The 0.65 Å median peptide RMSD improvement over AlphaFold-Multimer, while modest in absolute terms, is meaningful for the MHC field where differences of fractions of an angstrom in anchor residue positioning can determine whether a peptide is presented or rejected. Limitations include the dataset size — 944 structures is sufficient for fine-tuning but may not capture the full diversity of the human HLA supertype landscape, and alleles with few or no crystal structures in the PDB will benefit less from the fine-tuning. The model also inherits AlphaFold's computational requirements and does not natively score binding affinities, so it must be combined with sequence-based affinity predictors for comprehensive peptide prioritization.

Citations

MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction.

Glukhov, E., et al. (2024) MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction.. Biophysical Journal.

DOI: 10.1016/j.bpj.2024.05.011

MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction

Preprint

Glukhov, E., et al. (2023) MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction. bioRxiv.

DOI: 10.1101/2023.11.29.569310

Recent citations

Papers that recently cited this model.

PP-MAPS: dynamic pharmacophore signatures of protein–peptide interfaces from molecular dynamics trajectories
Camille Depenveiller, Arezki Guerda, Emilia Rabia, et al.
bioRxiv · Apr 2026
0
PMGen: From Peptide-MHC Structure Prediction to Peptide Generation
Amir H. Asgary, Amirreza Aleyasin, Jonas A. Mehl, et al.
bioRxiv · Feb 2026
0Influential
Predicted peptide scaffolds for drug screening in endometrial cancer organoids
Mengmeng Zhang, Yuan Wan, Dingxi Li
Scientific Reports · Oct 2025
2

Top citations

The most-cited papers that cite this model.

TCR3d 2.0: expanding the T cell receptor structure database with new structures, tools and interactions
V. Lin, Melyssa Cheung, R. Gowthaman, et al.
Nucleic Acids Research · Sep 2024
24
Modeling Protein–Protein and Protein–Ligand Interactions by the ClusPro Team in CASP16
Ryota Ashizawa, S. Kotelnikov, Omeir Khan, et al.
Proteins: Structure, Function, and Bioinformatics · Oct 2025
9
Machine-Guided Dual-Objective Protein Engineering for Deimmunization and Therapeutic Functions
Eric Wolfsberg, John Paul, Josh Tycko, et al.
bioRxiv · Feb 2025
7
Phospho-Tune: Enhanced Structural Modeling of Phosphorylated Protein Interactions
Ernest Glukhov, Veranika Averkava, S. Kotelnikov, et al.
bioRxiv · Mar 2024
6
A structure-guided approach to predict MHC-I restriction of T cell receptors for public antigens
Sagar Gupta, N. Sgourakis
bioRxiv · Jun 2024
4

Citations

Total Citations9

Influential1

References24

Fields of citing research

Biology89%
Medicine89%
Computer Science67%
Chemistry33%
Materials Science11%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

35Closed

Usability — can I run it?41

Reproducibility — can I retrain it?17

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper Google Colab Link

Key Features

Curated MHC-peptide training dataset: Fine-tuned on 944 high-resolution MHC-peptide crystal structures from the Protein Data Bank, spanning human (HLA), mouse (H-2), and other species MHC alleles, providing a diverse structural basis while remaining focused on this specific complex class.

OpenFold-based PyTorch implementation: Built on the PyTorch reimplementation of AlphaFold rather than the original JAX codebase, enabling flexible training pipeline design, standard gradient utilities, and integration with modern deep learning tooling.

Improved peptide RMSD accuracy: Achieves a median Cα RMSD of 0.65 Å for predicted peptide conformations in held-out MHC-peptide complexes, outperforming both the Pandora homology-modeling approach and AlphaFold-Multimer on this task-specific metric.

Enhanced pLDDT calibration: Provides improved predicted Local Distance Difference Test (pLDDT) scores that more reliably reflect the actual accuracy of MHC-peptide complex predictions compared to the general-purpose AlphaFold model.

Cross-species generalization: The training dataset spans multiple species' MHC alleles, enabling the model to generalize to non-human MHC complexes relevant for veterinary immunology and comparative immunological research.

Complementary to sequence-based tools: Designed to work alongside sequence-based MHC-peptide prediction tools (such as NetMHCpan or MHCflurry) by providing structural accuracy for cases where the 3D geometry of peptide binding is the primary question.

Technical Details

Applications

Impact

Citations

MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction.

Glukhov, E., et al. (2024) MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction.. Biophysical Journal.

DOI: 10.1016/j.bpj.2024.05.011

MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction

Preprint

Glukhov, E., et al. (2023) MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction. bioRxiv.

DOI: 10.1101/2023.11.29.569310

MHC-Fine

#Key Features

#Technical Details

#Applications

#Impact

Citations

MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction.

MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

MHC-Fine

#Key Features

#Technical Details

#Applications

#Impact

Citations

MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction.

MHC-Fine: Fine-tuned AlphaFold for Precise MHC-Peptide Complex Prediction

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact