bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein

Stoic

University of Basel

Predicts protein complex stoichiometry from sequence using protein language model embeddings and a graph neural network, exporting AlphaFold3-ready JSON.

Released: March 2026

Modern structure-prediction systems such as AlphaFold-Multimer and AlphaFold3 have transformed protein complex modeling, but they require the stoichiometry of a complex — the copy number of each distinct protein entity — to be specified in advance. For the many complexes whose composition is unknown, the standard workaround is a brute-force search that runs structure prediction across many candidate stoichiometry combinations, an approach that is both computationally expensive and frequently inaccurate.

Stoic, developed by Daniil Litvinov, Janani Durairaj, Torsten Schwede and colleagues at the University of Basel (Biozentrum and SIB), addresses this gap by predicting complex stoichiometry directly from amino acid sequence, with no structure prediction in the loop. Posted to bioRxiv in March 2026, it reframes stoichiometry as a sequence-level learning problem and produces ranked copy-number predictions in seconds, along with AlphaFold3-ready JSON files that can be fed directly into downstream structure prediction.

By learning to recognize interface-relevant features rather than relying on global sequence statistics, Stoic offers a fast, accessible front end for assembling protein complexes whose composition was previously a bottleneck.

#Key Features

  • Sequence-only prediction: Estimates per-entity copy numbers from amino acid sequences alone, removing the need for expensive brute-force structure prediction over candidate stoichiometries.
  • Interface-aware representation: Learns to identify residues that participate in protein-protein interactions rather than depending on global sequence features, improving discrimination of homomeric versus heteromeric assemblies.
  • AlphaFold3-ready output: Exports JSON specifying predicted stoichiometries so results plug directly into AF3 structure-prediction pipelines.
  • Ranked top-N predictions: Returns multiple ranked stoichiometry hypotheses, supporting downstream evaluation when the top prediction is uncertain.
  • Open and hosted: Released under the MIT license with pretrained weights on HuggingFace, a hosted web demo, and a Colab notebook for no-install use.

#Technical Details

Stoic uses ESM2-650M to compute residue-level embeddings for each unique protein entity in a complex, then aggregates them into fixed-length per-entity representations via a learned weighted pooling mechanism. These pooled embeddings serve as node features in a fully connected graph that is passed to a graph convolutional network (GCN), which outputs copy numbers as node labels. The task is cast as multi-class classification over 14 copy-number classes, allowing prediction of both homomeric and heteromeric stoichiometries. The pipeline is available as a command-line tool (stoic_predict_stoichiometry), a Python API, and a HuggingFace Space, accepting FASTA input and emitting top-N ranked predictions plus AF3-ready JSON.

#Applications

Stoic is aimed at structural biologists and computational researchers who need to model protein complexes whose composition is not known a priori. It can serve as a rapid pre-processing step ahead of AlphaFold3 or AlphaFold-Multimer, narrowing the space of stoichiometries to evaluate and avoiding combinatorial structure-prediction sweeps. Use cases include interpreting interactomics and cross-linking data, prioritizing assembly hypotheses for cryo-EM or crystallography, and large-scale annotation of complexes across proteomes.

#Impact

By decoupling stoichiometry inference from structure prediction, Stoic targets a long-standing practical limitation of complex modeling that becomes acute at scale. Its lightweight, openly licensed implementation — with hosted inference and AF3-compatible exports — lowers the barrier for routine use within existing structure-prediction workflows. As one of several 2025–2026 efforts tackling stoichiometry prediction, it contributes to a maturing toolkit for moving from sequence to assembled complex structures with less manual trial and error.

Tags

stoichiometry_predictionprotein_complex_assemblygraph_neural_networktransformersupervisedrepresentation_learningprotein_complexes