bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Protein

SPIRED-Fitness

Tsinghua University

End-to-end framework predicting protein structure and mutational fitness from a single sequence, with 5x faster inference than ESMFold at comparable accuracy.

Released: 2024
Parameters: 125,000,000

Overview

SPIRED-Fitness is an end-to-end computational framework developed at Tsinghua University that jointly predicts protein three-dimensional structure and mutational fitness effects from a single amino acid sequence. Published in Nature Communications in August 2024 by Yinghui Chen, Yunxin Xu, and Haipeng Gong, the framework addresses a longstanding bottleneck in protein engineering: the need for both accurate structural predictions and rapid mutational effect estimates without relying on computationally expensive multiple sequence alignments (MSAs).

The system's core innovation is a novel structure prediction engine called SPIRED, which reformulates the folding problem around relative C-alpha displacements in local reference frames rather than predicting global atomic coordinates directly. This architectural choice, paired with a new Relative Displacement Loss (RD Loss), reduces training costs by more than an order of magnitude compared to competing single-sequence methods and accelerates inference by approximately five-fold, while delivering accuracy comparable to ESMFold and OmegaFold on standard benchmarks. The predicted structural embeddings are then passed to downstream modules — SPIRED-Fitness, SPIRED-Stab, and SPIRED-Bind — that predict single and double mutational effects on function, thermostability, and binding affinity respectively.

The framework is designed for high-throughput screening scenarios where researchers need reliable fitness predictions across thousands or millions of sequence variants without access to MSA data or large compute clusters.

Key Features

  • Single-sequence operation: Requires only a wild-type amino acid sequence as input, with no MSA construction step, making it practical for orphan proteins, synthetic sequences, or large-scale library screening where alignment data is unavailable or costly to generate.
  • Joint structure and fitness prediction: The structure and fitness modules are trained end-to-end, so the geometric representations learned for folding directly inform mutational effect predictions. This joint optimization yields approximately 2% improvement in fitness prediction accuracy over training the modules separately.
  • Relative Displacement Loss architecture: SPIRED predicts pairwise C-alpha displacement vectors in local residue frames rather than absolute coordinates, replacing the computationally intensive Frame Aligned Point Error (FAPE) loss used by AlphaFold-family models. This reduces training to 85 GPU days versus 896 GPU days for ESMFold and 3,456 GPU days for OmegaFold.
  • Multi-task fitness modules: Three specialized downstream heads cover distinct fitness dimensions — SPIRED-Fitness for general mutational effects, SPIRED-Stab for thermodynamic stability changes (ΔΔG and ΔTm), and SPIRED-Bind for binding affinity perturbations — all built on shared structural representations.
  • State-of-the-art stability prediction: SPIRED-Stab achieves leading performance on the S669 benchmark for ΔΔG prediction (Spearman ρ = 0.58), outperforming prior dedicated stability predictors including GeoDDG.
  • Extreme throughput: Predictions complete within seconds per protein, and the framework is reported to be approximately 1,900x faster than GeoFitness v2 on fitness benchmarks, enabling genome-scale or deep mutational scanning-scale applications.

Technical Details

SPIRED-Fitness is a 125-million parameter system anchored by an ESM-2 (650M parameter) language model for sequence embedding, followed by the SPIRED structure predictor comprising four sequential Folding Units. Each Folding Unit iteratively updates both 1D residue-level and 2D pairwise representations. The pairwise C-alpha displacement prediction strategy avoids the rotation matrix operations inherent to FAPE-based losses, substantially reducing backpropagation cost. Structural training used 113,609 PDB chains (filtered to less than 5 Å resolution, 40–1,200 residues, with a cutoff date of March 2022) plus 24,183 CATH domains. The fitness module was trained on 693,000 single mutations and 265,000 double mutations across 485 proteins, sourced from cDNA proteolysis datasets, MaveDB, and DeepSequence collections.

On standard benchmarks, SPIRED reaches a TM-score of 0.786 on CAMEO targets (with GDFold2 post-processing), comparable to OmegaFold's 0.778 while using a similar parameter budget. Fitness prediction on the held-out test set achieves Spearman correlation ρ = 0.85, competitive with ECNet (0.84) and GeoFitness v2 (0.83). On the ProteinGym zero-shot benchmark across 50 assays, SPIRED-Fitness scores ρ = 0.45, surpassing single-sequence baselines such as ESM-1v and CARP, though trailing MSA-dependent methods like TranceptEVE (0.46).

Applications

SPIRED-Fitness is suited for protein engineering workflows where rapid, large-scale variant screening is required. Researchers performing deep mutational scanning can use the framework to prioritize which single or double mutants to synthesize and test experimentally, drastically reducing wet-lab burden. Enzyme engineers can leverage SPIRED-Stab to identify thermostabilizing mutations before committing to in vitro expression and differential scanning calorimetry. Antibody and binder developers can use SPIRED-Bind to assess how interface mutations affect affinity without structural data for the complex. The low compute requirements also make the tool accessible for groups without access to high-performance computing infrastructure, enabling deployment on laboratory workstations or modest cloud instances.

Impact

SPIRED-Fitness establishes that competitive protein structure and fitness prediction can be achieved with dramatically reduced computational cost, challenging the prevailing assumption that state-of-the-art performance requires massive training budgets or MSA-based co-evolutionary information. By reducing training cost to 85 GPU days and inference to seconds per sequence, the Gong lab at Tsinghua University has made high-throughput mutational landscape characterization feasible for a much broader range of research groups. A key limitation is that the framework operates on monomeric single-sequence input and does not model complex assemblies, conformational dynamics, or ligand interactions. Additionally, SPIRED-Fitness performance on zero-shot benchmarks remains slightly below MSA-based approaches for fitness prediction, reflecting the fundamental information advantage of evolutionary data. Future development may incorporate optional MSA inputs or diffusion-based refinement to close this gap.

Citation

An end-to-end framework for the prediction of protein structure and fitness from single sequence

Chen, Y., et al. (2024) An end-to-end framework for the prediction of protein structure and fitness from single sequence. Nature Communications.

DOI: 10.1038/s41467-024-51776-x

Metrics

GitHub

Stars50
Forks4
Open Issues0
Contributors1
Last Push10mo ago
LanguagePython
LicenseMIT

Citations

Total Citations32
Influential2
References64

Tags

protein fitnessstructure predictionvariant effect predictionfoundation model

Resources

GitHub RepositoryResearch Paper