TM-Vec 2

Protein structural homology search from sequence alone, embedding proteins so that structural similarity becomes a fast nearest-neighbor lookup.

Released: February 2026

TM-Vec 2 is a deep learning method for detecting protein structural homology directly from amino acid sequence, without first predicting or aligning three-dimensional structures. Structural similarity often reveals evolutionary and functional relationships that sequence identity alone misses, but measuring it traditionally requires structures and structure-alignment tools such as TM-align or Foldseek. TM-Vec 2 instead encodes proteins into vectors whose distances approximate structural similarity scores, so that homology search reduces to fast nearest-neighbor lookups over embeddings.

The model is the successor to the original TM-Vec, developed by Keluskar, Batra, Bezshapkin, Morton, and Zhu at Arizona State University and released as a February 2026 bioRxiv preprint. Its headline contribution is efficiency: a distilled variant, TM-Vec 2s, achieves up to 258x speedup over the original TM-Vec and up to 56x speedup over Foldseek for large-scale database queries, while reportedly improving accuracy. This makes structure-aware search practical at the scale of modern protein databases that now contain hundreds of millions of sequences.

TM-Vec 2 fits into the landscape of structure-informed search tools alongside Foldseek and the original TM-Vec, occupying the niche of sequence-only structural homology detection where no experimental or predicted structure is required at query time.

Key Features

Sequence-only structural search: TM-Vec 2 predicts structural similarity (TM-score-like) between proteins from sequence alone, removing the need to compute or align 3D structures for each query.
Large speedups via distillation: The distilled TM-Vec 2s variant reaches up to 258x speedup over the original TM-Vec and up to 56x over Foldseek on large-scale queries.
Improved accuracy: Despite being faster, TM-Vec 2s is reported to achieve higher accuracy than the original TM-Vec on structural similarity benchmarks.
Embedding-based retrieval: Proteins are mapped to fixed-length vectors, so homology search becomes efficient nearest-neighbor lookup that scales to very large databases.
Two model variants: A full TM-Vec 2 model and a distilled TM-Vec 2s model let users trade maximum accuracy against maximum throughput.

Technical Details

TM-Vec 2 builds on protein language model embeddings, encoding each sequence into a vector such that the distance between two vectors approximates their structural similarity score, which is then used for fast retrieval. The authors introduce a distilled student model, TM-Vec 2s, trained to reproduce the behavior of the larger model at much lower compute, which is the source of the reported throughput gains. On large-scale database queries, TM-Vec 2s reaches up to 258x speedup relative to the original TM-Vec and up to 56x relative to Foldseek, while reporting higher accuracy than the original TM-Vec. The preprint details the training data, the structural similarity targets used for supervision, and the benchmark protocols; it is released under a CC BY-NC-ND license. As a recent preprint, public availability of code and trained weights should be confirmed from the authors before use.

Applications

TM-Vec 2 is aimed at researchers performing large-scale protein homology and function-annotation searches, including metagenomic and proteome-wide studies where most sequences lack experimentally determined structures. Because it operates from sequence and returns structure-aware matches quickly, it is well suited to annotating large collections of uncharacterized proteins, discovering remote homologs that escape sequence-based search, and clustering proteins by structural relatedness. The faster TM-Vec 2s variant is particularly useful for all-against-all comparisons across very large databases.

Impact

By delivering structure-aware homology search at speeds approaching or exceeding fast structure-alignment tools, TM-Vec 2 lowers the cost of incorporating structural similarity into routine large-scale protein analysis. The distillation strategy that yields TM-Vec 2s illustrates how a smaller student model can retain accuracy while dramatically improving throughput, a pattern increasingly relevant as protein databases grow. As a February 2026 preprint, the reported speedup and accuracy figures come from the authors and await independent benchmarking; performance on the most remote homologs and on proteins poorly represented in training data remains to be characterized externally.

Citation

TM-Vec 2: Accelerated Protein Homology Detection for Structural Similarity

Keluskar, A., et al. (2026) TM-Vec 2: Accelerated Protein Homology Detection for Structural Similarity. bioRxiv.

DOI: 10.64898/2026.02.05.704073

Recent citations

Papers that recently cited this model.

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel
BK Pradeep, Weixian Deng, Robert L. Jernigan
bioRxiv · May 2026
0

Top citations

The most-cited papers that cite this model.

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel
BK Pradeep, Weixian Deng, Robert L. Jernigan
bioRxiv · May 2026
0

Citations

Total Citations1

Influential0

References36

Fields of citing research

Biology100%
Computer Science100%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

4Closed

Usability — can I run it?7

Reproducibility — can I retrain it?0

not reproducible

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper

Key Features

Sequence-only structural search: TM-Vec 2 predicts structural similarity (TM-score-like) between proteins from sequence alone, removing the need to compute or align 3D structures for each query.

Large speedups via distillation: The distilled TM-Vec 2s variant reaches up to 258x speedup over the original TM-Vec and up to 56x over Foldseek on large-scale queries.

Improved accuracy: Despite being faster, TM-Vec 2s is reported to achieve higher accuracy than the original TM-Vec on structural similarity benchmarks.

Embedding-based retrieval: Proteins are mapped to fixed-length vectors, so homology search becomes efficient nearest-neighbor lookup that scales to very large databases.

Two model variants: A full TM-Vec 2 model and a distilled TM-Vec 2s model let users trade maximum accuracy against maximum throughput.

Technical Details

Applications

Impact

TM-Vec 2

Key Features

Technical Details

Applications

Impact

Citation

TM-Vec 2: Accelerated Protein Homology Detection for Structural Similarity

Recent citations

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel

Top citations

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel

Citations

Fields of citing research

Openness

Tags

Resources

TM-Vec 2

Key Features

Technical Details

Applications

Impact

Citation

TM-Vec 2: Accelerated Protein Homology Detection for Structural Similarity

Recent citations

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel

Top citations

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel

Citations

Fields of citing research

Openness

Tags

Resources

TM-Vec 2

#Key Features

#Technical Details

#Applications

#Impact

Citation

TM-Vec 2: Accelerated Protein Homology Detection for Structural Similarity

Recent citations

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel

Top citations

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel

Related models

Citations

Fields of citing research

Openness

Tags

Resources

TM-Vec 2

#Key Features

#Technical Details

#Applications

#Impact

Citation

TM-Vec 2: Accelerated Protein Homology Detection for Structural Similarity

Recent citations

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel

Top citations

New Protein Function Characterization for Human Paralog Discovery, Scraping the Bottom of the Genomics Barrel

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact