bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Protein

MHC-Fine

Stony Brook University

AlphaFold fine-tuned via OpenFold on 944 high-resolution MHC-peptide crystal structures, achieving median peptide RMSD of 0.65 Å on held-out complexes.

Released: 2023

Overview

MHC-Fine is a specialized variant of AlphaFold fine-tuned exclusively on high-resolution MHC-peptide crystal structures to improve the accuracy of structural predictions for major histocompatibility complex (MHC) complexes with bound peptides. Developed by Ernest Glukhov, Dmytro Kalitin, Darya Stepanenko, Yimin Zhu, Thu Nguyen, George Jones, Carlos Simmerling, Julie C. Mitchell, Sandor Vajda, Ken A. Dill, Dzmitry Padhorny, and Dima Kozakov at Stony Brook University, with collaborators from Oak Ridge National Laboratory and Boston University, the work was first posted as a bioRxiv preprint in November 2023 and subsequently published in Biophysical Journal in 2024.

The MHC-peptide system poses a particularly demanding structural prediction challenge. MHC molecules present peptide fragments for immune surveillance, and the precise geometry of how a peptide sits within the MHC binding groove — including its backbone conformation, side-chain orientations, and anchor residue interactions — determines whether a T cell receptor will recognize the complex. AlphaFold 2 and AlphaFold-Multimer are trained on the broad diversity of the Protein Data Bank, which provides general structural competence but insufficient specialization for the stereotyped, groove-filling geometry of peptide-MHC interactions. MHC-Fine directly addresses this gap through domain-specific fine-tuning on a curated structural dataset, sharpening AlphaFold's predictions for this immunologically important class of complexes.

A key implementation choice distinguishes MHC-Fine from other AlphaFold fine-tuning approaches: rather than modifying the original JAX-based AlphaFold codebase directly, the developers built on OpenFold — a PyTorch reimplementation of AlphaFold that supports efficient gradient-based fine-tuning through standard deep learning frameworks. This choice provides substantially more flexibility for training modifications, learning rate scheduling, and integration with the broader PyTorch ecosystem, and makes the training procedure more accessible to researchers without JAX expertise.

Key Features

  • Curated MHC-peptide training dataset: Fine-tuned on 944 high-resolution MHC-peptide crystal structures from the Protein Data Bank, spanning human (HLA), mouse (H-2), and other species MHC alleles, providing a diverse structural basis while remaining focused on this specific complex class.
  • OpenFold-based PyTorch implementation: Built on the PyTorch reimplementation of AlphaFold rather than the original JAX codebase, enabling flexible training pipeline design, standard gradient utilities, and integration with modern deep learning tooling.
  • Improved peptide RMSD accuracy: Achieves a median Cα RMSD of 0.65 Å for predicted peptide conformations in held-out MHC-peptide complexes, outperforming both the Pandora homology-modeling approach and AlphaFold-Multimer on this task-specific metric.
  • Enhanced pLDDT calibration: Provides improved predicted Local Distance Difference Test (pLDDT) scores that more reliably reflect the actual accuracy of MHC-peptide complex predictions compared to the general-purpose AlphaFold model.
  • Cross-species generalization: The training dataset spans multiple species' MHC alleles, enabling the model to generalize to non-human MHC complexes relevant for veterinary immunology and comparative immunological research.
  • Complementary to sequence-based tools: Designed to work alongside sequence-based MHC-peptide prediction tools (such as NetMHCpan or MHCflurry) by providing structural accuracy for cases where the 3D geometry of peptide binding is the primary question.

Technical Details

MHC-Fine uses the OpenFold framework — a memory-efficient, GPU-friendly PyTorch reproduction of AlphaFold 2 — as its training foundation. The final training dataset consisted of 944 high-resolution MHC-peptide crystal structures collected from the Protein Data Bank, filtered for resolution quality and cleaned to remove redundancies and low-quality structures. This dataset covers MHC class I and class II complexes across multiple human HLA alleles and selected non-human species. Fine-tuning proceeds from the AlphaFold 2 pretrained weights, applying supervised learning on the curated structural dataset with the same structure prediction objectives as the original AlphaFold training, adapted to focus on accurate reproduction of the peptide-MHC binding geometry.

Evaluation against held-out MHC-peptide complexes uses Cα RMSD of predicted versus experimental peptide conformations as the primary accuracy metric, with additional assessment using pLDDT scores as a proxy for prediction confidence. The median peptide RMSD of 0.65 Å on the test set compares favorably to competing methods: Pandora, which uses homology modeling with templates from the structural database, and AlphaFold-Multimer, which is the standard approach for multi-chain complex prediction but lacks specialization for the peptide-groove interaction geometry. The improvement is most pronounced for peptides with unusual sequence motifs or for alleles with limited structural templates, where AlphaFold-Multimer's general training is insufficient to correctly place the peptide backbone.

Applications

MHC-Fine is directly applicable in computational immunology workflows focused on structural accuracy of MHC-peptide complexes. Vaccine designers modeling how specific peptides from pathogen proteins engage different HLA alleles in target populations can use MHC-Fine to generate higher-fidelity structural models than standard AlphaFold-Multimer provides. Cancer immunotherapy researchers identifying neoantigen candidates can use MHC-Fine predictions to assess structural plausibility of candidate peptides in patient-specific HLA alleles, complementing sequence-based affinity predictions. Structural biologists using computational models to guide experimental mutagenesis — identifying residues in the peptide or MHC allele that alter binding geometry — benefit from the improved peptide RMSD accuracy. For researchers studying the molecular basis of alloreactivity, transplant rejection, or autoimmune antigen presentation, MHC-Fine enables more reliable structural hypotheses about which peptide-MHC combinations are structurally compatible. The multi-species training also makes MHC-Fine useful for veterinary immunology research where non-human MHC systems are studied.

Impact

MHC-Fine represents a clear example of how domain-specific fine-tuning of a general-purpose structure predictor on a high-quality, task-relevant dataset can improve accuracy beyond what broad training achieves. The choice to build on OpenFold in PyTorch rather than the original JAX AlphaFold codebase is noteworthy as a practical contribution: it demonstrates that the OpenFold ecosystem is a viable platform for production-quality fine-tuning workflows, potentially lowering the barrier for future domain-specific AlphaFold adaptations. The 0.65 Å median peptide RMSD improvement over AlphaFold-Multimer, while modest in absolute terms, is meaningful for the MHC field where differences of fractions of an angstrom in anchor residue positioning can determine whether a peptide is presented or rejected. Limitations include the dataset size — 944 structures is sufficient for fine-tuning but may not capture the full diversity of the human HLA supertype landscape, and alleles with few or no crystal structures in the PDB will benefit less from the fine-tuning. The model also inherits AlphaFold's computational requirements and does not natively score binding affinities, so it must be combined with sequence-based affinity predictors for comprehensive peptide prioritization.

Tags

structure predictiontransformerfine-tunedtransfer learningantibody

Resources

Research PaperLink