All Competitors
Every biological foundation model, evaluated and ranked by the bio.rodeo team
Applications
Architectures
Learning Paradigms
Biological Subjects
Showing 1–24 of 31 filtered models
mLLMCelltype
Texas A&M University
Multi-LLM consensus framework for automated cell type annotation in scRNA-seq data, outperforming prior methods by ~15% in mean accuracy.
Pinal
Westlake University
A 16B-parameter framework for de novo protein design from natural language, converting text descriptions into functional protein sequences via two-stage structure-conditioned generation.
Evolla
Westlake University
An 80B-parameter multimodal protein-language model that decodes protein function through natural language dialogue, integrating sequence, structure, and evolutionary context.
RhoFold+
ml4bio
End-to-end RNA 3D structure prediction combining the RNA-FM language model with Invariant Point Attention, achieving SOTA on RNA-Puzzles and CASP15.
Cell2Sentence
Yale University
Framework that converts single-cell gene expression profiles into ranked gene-name sequences, enabling standard LLMs to generate, annotate, and analyze cells.
Compute-Optimal PLM
BioMap
Scaling law study for protein language models that identifies compute-optimal training regimes for CLM and MLM architectures using 939M protein sequences.
CellPLM
OmicsML
Single-cell transformer that treats cells as tokens and tissues as sentences, encoding cell-cell relationships with 100x faster inference than prior pre-trained models.
PLMSearch
Fudan University
Protein language model-based sequence search that detects remote homologs with threefold higher sensitivity than MMseqs2 at comparable speed.
GPTCelltype
Columbia University / Duke University
An R package that uses GPT-4 to annotate cell types in scRNA-seq data from marker genes, matching expert accuracy across hundreds of cell types and tissues.
ERNIE-RNA
Tsinghua University
A structure-enhanced RNA language model that incorporates base-pairing constraints into self-attention, achieving state-of-the-art RNA structure and function prediction.
Caduceus
Kuleshov Lab
Bidirectional, reverse-complement equivariant DNA language models built on Mamba SSMs. Outperforms models 10x larger on long-range variant effect prediction.
RiNALMo
LBCB Sci
650M-parameter RNA language model pre-trained on 36M non-coding RNA sequences. Achieves state-of-the-art generalization on secondary structure prediction across unseen RNA families.
ProLLaMA
PKU-YuanGroup
A 7B-parameter protein language model built on LLaMA-2 that performs both protein sequence generation and superfamily classification in a unified framework.
scMulan
Tsinghua University
A 368M-parameter generative language model for single-cell transcriptomics, enabling zero-shot cell type annotation, batch integration, and conditional cell generation.
RNA-MSM
Peking University / Griffith University
Unsupervised RNA language model using multiple sequence alignments to predict secondary structure and solvent accessibility from evolutionary information.
IgLM
GrayLab
Generative language model trained on 558 million antibody sequences for infilling-based design of CDR loops and full-length immunoglobulin sequences.
ProGen2
Salesforce
Family of autoregressive protein language models (151M–6.4B parameters) trained on over a billion sequences for protein generation and zero-shot fitness prediction.
BioT5
Renmin University of China
Pre-training framework bridging molecules, proteins, and natural language using T5 with SELFIES representations for cross-modal biological understanding.
DARWIN Series
MasterAI EAM
Domain-specific large language models for natural science, fine-tuned on physics, chemistry, and materials science literature using automated instruction generation.
DNABERT-2
MAGICS Lab
Multi-species genomic foundation model replacing k-mer tokenization with BPE, achieving state-of-the-art performance with 21x fewer parameters than prior leading models.
Ankh
Technical University of Munich
Optimized protein language model that surpasses state-of-the-art performance with fewer than 10% of the parameters of comparable models.
ReprogBERT
IBM
Reprograms a frozen English BERT model for antibody CDR sequence infilling via learnable cross-domain projection matrices, without training a new protein language model.
SpliceBERT
Biomed AI
A BERT-based RNA language model pre-trained on 2M+ pre-mRNA sequences from 72 vertebrate species for splicing prediction and variant effect analysis.
Galactica
Meta AI
A large language model trained on 48 million scientific papers and knowledge bases to store, combine, and reason about scientific knowledge.