All Competitors

Every biological foundation model, evaluated and ranked by the bio.rodeo team

Applications

Architectures

Learning Paradigms

Biological Subjects

Showing 1–24 of 31 filtered models

Single-cell

mLLMCelltype

Texas A&M University

Multi-LLM consensus framework for automated cell type annotation in scRNA-seq data, outperforming prior methods by ~15% in mean accuracy.

6407
See the scorecard
Protein

Pinal

Westlake University

A 16B-parameter framework for de novo protein design from natural language, converting text descriptions into functional protein sequences via two-stage structure-conditioned generation.

9321
See the scorecard
Protein

Evolla

Westlake University

An 80B-parameter multimodal protein-language model that decodes protein function through natural language dialogue, integrating sequence, structure, and evolutionary context.

671941
See the scorecard
RNA

RhoFold+

ml4bio

End-to-end RNA 3D structure prediction combining the RNA-FM language model with Invariant Point Attention, achieving SOTA on RNA-Puzzles and CASP15.

227186
See the scorecard
Single-cell

Cell2Sentence

Yale University

Framework that converts single-cell gene expression profiles into ranked gene-name sequences, enabling standard LLMs to generate, annotate, and analyze cells.

85466
See the scorecard
Protein

Compute-Optimal PLM

BioMap

Scaling law study for protein language models that identifies compute-optimal training regimes for CLM and MLM architectures using 939M protein sequences.

1134
See the scorecard
Single-cell

CellPLM

OmicsML

Single-cell transformer that treats cells as tokens and tissues as sentences, encoding cell-cell relationships with 100x faster inference than prior pre-trained models.

10274
See the scorecard
Protein

PLMSearch

Fudan University

Protein language model-based sequence search that detects remote homologs with threefold higher sensitivity than MMseqs2 at comparable speed.

7967
See the scorecard
Single-cell

GPTCelltype

Columbia University / Duke University

An R package that uses GPT-4 to annotate cell types in scRNA-seq data from marker genes, matching expert accuracy across hundreds of cell types and tissues.

224202
See the scorecard
RNA

ERNIE-RNA

Tsinghua University

A structure-enhanced RNA language model that incorporates base-pairing constraints into self-attention, achieving state-of-the-art RNA structure and function prediction.

4127
See the scorecard
DNA & Gene

Caduceus

Kuleshov Lab

Bidirectional, reverse-complement equivariant DNA language models built on Mamba SSMs. Outperforms models 10x larger on long-range variant effect prediction.

232
See the scorecard
RNA

RiNALMo

LBCB Sci

650M-parameter RNA language model pre-trained on 36M non-coding RNA sequences. Achieves state-of-the-art generalization on secondary structure prediction across unseen RNA families.

161100
See the scorecard
Protein

ProLLaMA

PKU-YuanGroup

A 7B-parameter protein language model built on LLaMA-2 that performs both protein sequence generation and superfamily classification in a unified framework.

2105K
See the scorecard
Single-cell

scMulan

Tsinghua University

A 368M-parameter generative language model for single-cell transcriptomics, enabling zero-shot cell type annotation, batch integration, and conditional cell generation.

617
See the scorecard
RNA

RNA-MSM

Peking University / Griffith University

Unsupervised RNA language model using multiple sequence alignments to predict secondary structure and solvent accessibility from evolutionary information.

69941K
See the scorecard
Protein

IgLM

GrayLab

Generative language model trained on 558 million antibody sequences for infilling-based design of CDR loops and full-length immunoglobulin sequences.

188104
See the scorecard
Protein

ProGen2

Salesforce

Family of autoregressive protein language models (151M–6.4B parameters) trained on over a billion sequences for protein generation and zero-shot fitness prediction.

699
See the scorecard
Multimodalities

BioT5

Renmin University of China

Pre-training framework bridging molecules, proteins, and natural language using T5 with SELFIES representations for cross-modal biological understanding.

125
See the scorecard
Multimodalities

DARWIN Series

MasterAI EAM

Domain-specific large language models for natural science, fine-tuned on physics, chemistry, and materials science literature using automated instruction generation.

24748
See the scorecard
DNA & Gene

DNABERT-2

MAGICS Lab

Multi-species genomic foundation model replacing k-mer tokenization with BPE, achieving state-of-the-art performance with 21x fewer parameters than prior leading models.

47837595.5K
See the scorecard
Protein

Ankh

Technical University of Munich

Optimized protein language model that surpasses state-of-the-art performance with fewer than 10% of the parameters of comparable models.

24656
See the scorecard
Protein

ReprogBERT

IBM

Reprograms a frozen English BERT model for antibody CDR sequence infilling via learnable cross-domain projection matrices, without training a new protein language model.

2411
See the scorecard
RNA

SpliceBERT

Biomed AI

A BERT-based RNA language model pre-trained on 2M+ pre-mRNA sequences from 72 vertebrate species for splicing prediction and variant effect analysis.

548
See the scorecard
Multimodalities

Galactica

Meta AI

A large language model trained on 48 million scientific papers and knowledge bases to store, combine, and reason about scientific knowledge.

2.7K3.5K
See the scorecard