FrustrAI-Seq

Protein language model that predicts per-residue local energetic frustration directly from sequence, covering whole proteomes and disordered regions.

Released: February 2026

Local energetic frustration is a concept from energy-landscape theory that pinpoints residues whose interactions are in conflict with the protein's overall drive toward a low-energy native fold. Far from being mere noise, minimally frustrated regions tend to stabilize the fold while highly frustrated patches frequently mark functional sites — binding interfaces, allosteric hotspots, and catalytic regions. Computing frustration classically requires a 3D structure and many simulated mutations per contact, which makes proteome-scale analysis slow and excludes regions that lack a well-defined structure.

FrustrAI-Seq, introduced by Leusch and colleagues at Helmholtz Munich (with collaborators including the Rost group) in a February 2026 bioRxiv preprint, removes the structural bottleneck by predicting per-residue local energetic frustration directly from amino-acid sequence. It learns to map embeddings from a protein language model to frustration scores, so that no explicit structure or mutational sampling is needed at inference time. This makes it possible to score entire proteomes in minutes and — importantly — to extend frustration analysis to intrinsically disordered regions and de novo designed proteins that previously fell outside the reach of structure-based methods.

The authors release model weights, code, and the largest freely available resource of precomputed local frustration scores to date, spanning on the order of one million proteins.

Key Features

Sequence-only frustration prediction: Predicts per-residue local energetic frustration directly from amino-acid sequence using protein language model embeddings, removing the need for an explicit 3D structure.
Proteome-scale speed: Processes entire proteomes within minutes — roughly 17 minutes for the human proteome on a single GPU — versus the much heavier cost of structure-based mutational sampling.
Reaches previously inaccessible regions: Extends frustration analysis to intrinsically disordered regions and de novo designed proteins where structure-based methods struggle.
Open release: Model weights and code are released on GitHub under the Apache 2.0 license, with a precomputed frustration resource covering on the order of one million proteins.

Technical Details

FrustrAI-Seq is a supervised predictor that maps protein language model (pLM) embeddings to per-residue local energetic frustration scores, learning the relationship between sequence-derived representations and frustration values computed by established structure-based methods. Because it operates on pLM embeddings rather than explicit structures, inference is fast and structure-free: the authors report scoring the full human proteome in roughly 17 minutes on a single GPU and validate that predictions remain biologically relevant across diverse protein families. The release includes trained model weights and code on GitHub under the Apache 2.0 license, and a precomputed dataset of frustration scores for approximately one million proteins. The paper itself is distributed under a CC BY license.

Applications

FrustrAI-Seq is built for structural and computational biologists who want to map functionally important regions across many proteins quickly. Highly frustrated residues flag candidate binding sites, allosteric regions, and catalytic hotspots, making the tool useful for prioritizing residues in protein engineering, interpreting variant effects, and characterizing intrinsically disordered regions whose conformational behavior matters for function. Its speed enables proteome-wide screens — for example, annotating frustration across an organism's entire protein complement — and its applicability to de novo designs makes it relevant for evaluating engineered proteins that have no natural homologs.

Impact

By scaling frustration analysis from individual structures to whole proteomes and to structureless regions, FrustrAI-Seq makes a previously specialized biophysical measure broadly accessible. Its open release of weights, code, and a roughly million-protein precomputed resource lowers the barrier for incorporating frustration into annotation and design pipelines, and its applicability to intrinsically disordered and de novo proteins addresses a long-standing blind spot of structure-based approaches. As a February 2026 preprint, its predictions await peer review and broader community validation, but the permissive licensing and precomputed resource position it for immediate experimentation.

Citation

FrustrAI-Seq: Scaling Local Energetic Frustration to the Protein Sequence Space

Leusch, J., et al. (2026) FrustrAI-Seq: Scaling Local Energetic Frustration to the Protein Sequence Space. bioRxiv.

DOI: 10.64898/2026.02.03.703498

Recent citations

Papers that recently cited this model.

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery
Hanqun Cao, Zijun Gao, Chunbin Gu, et al.
Jun 2026
0Influential

Top citations

The most-cited papers that cite this model.

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery
Hanqun Cao, Zijun Gao, Chunbin Gu, et al.
Jun 2026
0Influential

Citations

Total Citations1

Influential1

References48

GitHub

Stars7

Forks2

Open Issues1

Contributors1

Last Push1mo ago

LanguagePython

LicenseApache-2.0

Fields of citing research

Biology100%
Computer Science100%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

78Open

Usability — can I run it?97

Reproducibility — can I retrain it?55

Model Openness Framework

Class III

Open Model

Resources

GitHub Repository Research Paper Official Website

Key Features

Sequence-only frustration prediction: Predicts per-residue local energetic frustration directly from amino-acid sequence using protein language model embeddings, removing the need for an explicit 3D structure.

Proteome-scale speed: Processes entire proteomes within minutes — roughly 17 minutes for the human proteome on a single GPU — versus the much heavier cost of structure-based mutational sampling.

Reaches previously inaccessible regions: Extends frustration analysis to intrinsically disordered regions and de novo designed proteins where structure-based methods struggle.

Open release: Model weights and code are released on GitHub under the Apache 2.0 license, with a precomputed frustration resource covering on the order of one million proteins.

Technical Details

Applications

Impact

FrustrAI-Seq

Key Features

Technical Details

Applications

Impact

Citation

FrustrAI-Seq: Scaling Local Energetic Frustration to the Protein Sequence Space

Recent citations

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Top citations

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

FrustrAI-Seq

Key Features

Technical Details

Applications

Impact

Citation

FrustrAI-Seq: Scaling Local Energetic Frustration to the Protein Sequence Space

Recent citations

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Top citations

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

FrustrAI-Seq

#Key Features

#Technical Details

#Applications

#Impact

Citation

FrustrAI-Seq: Scaling Local Energetic Frustration to the Protein Sequence Space

Recent citations

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Top citations

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

FrustrAI-Seq

#Key Features

#Technical Details

#Applications

#Impact

Citation

FrustrAI-Seq: Scaling Local Energetic Frustration to the Protein Sequence Space

Recent citations

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Top citations

SF-Cluster: Frustration-Guided MSA Subsampling for Alternative Protein Conformation Recovery

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact