All Competitors
Every biological foundation model, evaluated and ranked by the bio.rodeo team
Showing 1–19 of 19 filtered models
ProtSent
6——Contrastively fine-tuned ESM-2 (35M and 150M) protein language models that produce general-purpose sequence embeddings where biological similarity maps to embedding proximity.
Protein87OpennessGATSBI
13——A graph-attention model producing context-aware protein embeddings from protein-protein interaction, co-expression, and tissue networks, with biologically motivated data splits.
Protein94OpennessReverse-distilled ESM-2 checkpoints (up to 15B) producing Matryoshka-style nested embeddings that scale consistently and reach state of the art on ProteinGym.
Protein58OpennessProtAlign
———Lawrence Livermore National LaboratoryMarch 6, 2026contrastive_learningcross_modal_retrievalembeddings+4A contrastive cross-modal encoder that aligns protein sequence (ESM-2) and structure (ProteinMPNN) representations into a shared embedding space for cross-modal retrieval.
Protein35OpennessEnzPlacer
———A contrastive-learning model that predicts the first three Enzyme Commission (EC) digits for enzymes whose exact (fourth-level) function was never seen during training.
Protein59OpennessEVA
——101Cross-species, multimodal foundation model of immunology and inflammation that harmonizes transcriptomics and histology into unified patient-level representations.
Single-cellRNAPathology27OpennessTM-Vec 2
—1—A distilled deep learning model that predicts structural similarity between proteins directly from sequence, reaching up to 258x speedups for large-scale homology search.
Protein4OpennessMetagenBERT
———A pipeline that builds whole-metagenome embeddings directly from raw DNA reads using genomic language models and FAISS k-means clustering, without taxonomic or functional annotation.
DNA & Gene22OpennessMicroGenomer
———A 470M-parameter microbial genome foundation model trained hierarchically on 234.5B bp for multi-scale genomic understanding and ecophysiological trait prediction.
DNA & Gene44Opennessvir2vec
———A 422M-parameter pan-viral genomic language model that produces fixed genome-level embeddings reused across viral classification tasks without re-training.
DNA & Gene53OpennessZebraformer
—1—A BERT-style transformer language model built on the Geneformer framework and trained on zebrafish single-cell transcriptomics to produce gene and cell embeddings for developmental analysis.
Single-cell46OpennessSFM-Protein
—3—A transformer protein language model using integrative co-evolutionary pre-training to capture both short-range and long-range residue interactions from sequence alone.
Protein10OpennessDNABERT-S
1304611.7KSpecies-aware DNA embedding model built on DNABERT-2, using contrastive learning to cluster and differentiate genomic sequences by species without labeled data.
DNA & Gene53OpennessAnkh
248696.7KOptimized protein language model that surpasses state-of-the-art performance with fewer than 10% of the parameters of comparable models.
Protein24OpennessCARP
259——CNN-based protein language model series showing convolutions match transformer performance on sequence pretraining while scaling linearly with sequence length.
Protein81OpennessTAPE
7391K—Benchmark suite of five biologically relevant tasks for evaluating protein sequence representation learning, covering structure, homology, and engineering.
Protein89OpennessUniRep
3651.1K—A multiplicative LSTM protein language model trained on 24M sequences to produce fixed-length embeddings for protein engineering and function prediction.
Protein49Openness