bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

HERCULES

Italian Institute of Technology

Protein language model that predicts RNA-binding domains, global RNA-binding propensity, and mutation effects at single-residue resolution from sequence.

Released: March 2026

RNA-binding proteins (RBPs) govern much of post-transcriptional regulation, and identifying which proteins bind RNA — and precisely which residues mediate that binding — is central to understanding gene regulation and disease. HERCULES is a sequence-based predictor that addresses this at single-residue resolution, classifying proteins as RNA-binding or not, localizing RNA-binding domains (RBDs), and assessing how mutations alter RNA-binding capacity.

HERCULES was developed by the Tartaglia lab at the Italian Institute of Technology and released as a preprint in March 2026. Methodologically, it combines a fine-tuned protein language model — ProteinBERT — with explicit physicochemical amino-acid features in a multi-task learning framework. This pairing lets the model draw on the contextual sequence representations learned by a pretrained PLM while retaining interpretable biochemical signal known to be relevant for RNA recognition.

The Tartaglia group has a long track record in RNA-binding prediction (e.g. catRAPID and related tools), and HERCULES extends that lineage into the protein language model era, unifying protein-level classification, residue-level domain localization, and mutation-effect assessment in a single model accessible through a web server and Python package.

#Key Features

  • Fine-tuned ProteinBERT backbone: HERCULES adapts a pretrained protein language model to RNA-binding tasks, leveraging learned sequence context for prediction.
  • Physicochemical feature integration: The model combines PLM attention with explicit physico-chemical amino-acid descriptors, blending learned and interpretable signal.
  • Three-in-one prediction: It outputs global RNA-binding propensity, residue-level RNA-binding-domain localization, and mutation-effect (in-silico mutational scanning) predictions.
  • Single-residue resolution: Predictions are made per residue, enabling precise localization of binding regions and variants.
  • Web server and Python package: HERCULES is available both as an interactive web tool and an installable package for programmatic use.

#Technical Details

HERCULES is built on ProteinBERT, a protein language model fine-tuned for RNA-binding prediction, with physico-chemical amino-acid features integrated into a multi-task architecture that jointly addresses classification, residue-level localization, and mutation effects. For the protein-level RBP-versus-non-RBP classification task, the model reports an AUROC of 0.86. For mutation-effect assessment, HERCULES correctly classifies 87% of deleterious RNA-binding variants, indicating that its residue-level signal captures functionally meaningful determinants of binding. The model is distributed as a fixed checkpoint rather than a continuously trained system; the codebase is MIT-licensed and requires downloading ProteinBERT weights, and the preprint reports results across these three prediction tasks. A hosted web server provides predictions without local installation.

#Applications

HERCULES is useful to molecular biologists and RNA researchers seeking to identify RNA-binding proteins, pinpoint the domains responsible for binding, and prioritize mutations likely to disrupt RNA interactions. Because it operates from sequence alone, it can be applied to uncharacterized proteins and to variants of unknown significance, supporting both basic studies of post-transcriptional regulation and the interpretation of disease-associated mutations in RBPs. The web server makes it accessible to wet-lab researchers without computational infrastructure.

#Impact

By unifying RNA-binding classification, domain localization, and mutation-effect prediction in a single language-model-based tool, HERCULES streamlines analyses that previously required separate methods. Its strong classification AUROC and high accuracy on deleterious variants suggest practical utility for prioritizing experiments and interpreting RBP mutations. Coming from a group with deep expertise in RNA-binding prediction, it represents a continuation of that work into the PLM era, accessible through an established web-server platform. As a recent preprint with a fixed checkpoint, broader external validation across diverse RBP families remains to be demonstrated.

Tags

rna_binding_predictionvariant_effect_predictiontransformertransfer_learningmulti_taskrna_binding_proteinproteomics