ProLoc addresses a gap between protein function prediction and mechanistic interpretation. Most protein-text and protein function models capture global, protein-level associations: they can tell you that a protein has a given function, but not which residues are responsible for it. For researchers trying to understand a mechanism or to prioritize residues for experimental validation, that whole-protein answer is too coarse. ProLoc reframes the problem as a span-level grounding task: given a protein sequence and a free-text functional description, it identifies the specific residue regions—domains, motifs, or functional sites—that correspond to that description.

Developed by Peishuo Liu, Jiaxin Fan, Mianzhi Pan, and Jianbing Zhang at Nanjing University and released as a preprint in June 2026, ProLoc introduces both the task formulation, which the authors call text-guided protein functional region localization, and a model built to solve it. The work pairs a curated benchmark derived from InterPro annotations with a text-conditioned localization model that combines a protein language model and a biomedical text encoder.

The framing borrows the notion of visual grounding from vision-language research and applies it to proteins, treating the residue sequence as the medium to be localized within and the functional description as the query. This makes ProLoc useful as a residue-level annotation and hypothesis-generation tool rather than a global classifier.

Key Features

Text-guided residue localization: Given a protein sequence and a natural-language functional description, ProLoc returns the residue spans corresponding to that description, enabling residue-level interpretation rather than whole-protein labels.
Generic, open-vocabulary inference: A single trained checkpoint accepts any protein sequence and any free-text query covering the InterPro annotation space, and is applied without per-task retraining.
Anchor-free span proposals: Beyond a direct localization output, the model generates anchor-free span proposals that improve recovery of multiple disjoint functional sites within one protein.
Dual-encoder design: ProLoc builds on ESM2-650M for the protein sequence and PubMedBERT for the text description, conditioning residue-level predictions on the functional query.
Purpose-built benchmark: The accompanying InterPro-derived benchmark provides explicit protein-text-region examples with sequence-similarity-aware splits and a unified span-level evaluation protocol.

Technical Details

ProLoc is a text-conditioned localization model built on a frozen-vocabulary pairing of ESM2-650M, a 650-million-parameter protein language model, and PubMedBERT, a biomedical-domain text encoder. It performs direct residue-level localization and includes an anchor-free span proposal mechanism for recovering multiple functional regions. Training and evaluation use a benchmark constructed from InterPro annotations covering both domain-level and functional-site descriptions, with sequence-similarity-aware splits designed to test generalization to dissimilar sequences. On the held-out test set, the direct output reaches the strongest single-region localization performance at 0.7730 IoU@1, while the anchor-free proposal output improves visible multi-site recovery, reaching 0.9671 VM R@10 IoU50 and 0.9489 VM All-Hit@50. The authors report that ProLoc substantially outperforms window-based adaptations of representative protein and protein-text models on the same benchmark.

Applications

ProLoc supports residue-level functional annotation of proteins, particularly for newly sequenced or under-characterized proteins where a functional description is available but the responsible regions are unknown. By localizing text descriptions to specific spans, it helps researchers prioritize residues for experimental validation, interpret the structural or mechanistic basis of a function, and pinpoint domains, motifs, and functional sites. The open-vocabulary text query makes it adaptable across the breadth of InterPro annotations without retraining for each function of interest.

Impact

ProLoc defines text-guided protein functional region localization as a distinct span-level grounding task and supplies both a benchmark and a baseline model for it, establishing an evaluation framework that future protein-text models can be measured against. Its emphasis on residue-level grounding rather than global classification moves protein-text modeling toward mechanistic interpretability and experimental prioritization. As of mid-2026 the work is a preprint awaiting peer review; no source code, pretrained weights, or hosted API have been released, and the work is distributed under a restrictive (non-commercial) license, which currently limits independent reproduction and downstream reuse.

Key Features

Text-guided residue localization: Given a protein sequence and a natural-language functional description, ProLoc returns the residue spans corresponding to that description, enabling residue-level interpretation rather than whole-protein labels.

Generic, open-vocabulary inference: A single trained checkpoint accepts any protein sequence and any free-text query covering the InterPro annotation space, and is applied without per-task retraining.

Anchor-free span proposals: Beyond a direct localization output, the model generates anchor-free span proposals that improve recovery of multiple disjoint functional sites within one protein.

Dual-encoder design: ProLoc builds on ESM2-650M for the protein sequence and PubMedBERT for the text description, conditioning residue-level predictions on the functional query.

Purpose-built benchmark: The accompanying InterPro-derived benchmark provides explicit protein-text-region examples with sequence-similarity-aware splits and a unified span-level evaluation protocol.

Technical Details

Applications

Impact

ProLoc

Key Features

Technical Details

Applications

Impact

Citation

Recent citations

Top citations

Fields of citing research

Openness

Tags

Resources

ProLoc

Key Features

Technical Details

Applications

Impact

Citation

Recent citations

Top citations

Fields of citing research

Openness

Tags

Resources

ProLoc

#Key Features

#Technical Details

#Applications

#Impact

Citation

Recent citations

Top citations

Fields of citing research

Openness

Tags

Resources

ProLoc

#Key Features

#Technical Details

#Applications

#Impact

Citation

Recent citations

Top citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact