bio.rodeo
Every biological foundation model, ranked and refreshed.
336 models·11 categories·Latest: TARIO-2 (3d ago)
See the LineupLatest Releases
Hot off the preprint servers — the newest biological foundation models.
Multimodal tumor foundation model trained on paired H&E histology and spatial transcriptomics to infer whole-transcriptome and tumor-microenvironment signal from routine H&E alone.
Protein pretraining framework that learns representations directly from cryo-EM density maps, transferring to flexibility, active-site, binding-affinity, and stability tasks.
Self-supervised foundation model that learns reusable representations of cancer genomes from somatic SNVs and copy-number alterations across 33 tumor types.
A self-supervised masked autoencoder for RNA-seq count data, pretrained on 1.4M public samples to learn transferable transcriptomic representations without per-dataset re-training.
A masked discrete-diffusion model over millions of full-length mRNAs, guided by Monte Carlo Tree Search for joint codon optimization and de novo UTR design.
A protein-text foundation model embedding sequences and natural language in a shared token space, enabling protein understanding and de novo design from one checkpoint.
DNA & Gene
Genomic sequence modeling and gene expression analysis
Self-supervised foundation model that learns reusable representations of cancer genomes from somatic SNVs and copy-number alterations across 33 tumor types.
A zebrafish DNA sequence-to-function model predicting cell-type-specific single-cell expression across 85 cell-type x developmental-timepoint combinations during embryogenesis.
A 700M-parameter DNA language model pretrained on the rice pangenome, built as a reusable foundation model for crop genomics and molecular breeding.
RNA
RNA structure, function, and expression modeling
A masked discrete-diffusion model over millions of full-length mRNAs, guided by Monte Carlo Tree Search for joint codon optimization and de novo UTR design.
An RNA language model trained by self-supervised masked-nucleotide prediction on ~50,000 IRES sequences that predicts secondary-structure features rivaling experimental chemical probing.
A cross-modal transfer-learning model that adapts the ESM-2 650M protein language model to mRNA analysis by swapping amino-acid tokens for codon tokens, applied to mRNA benchmarks without re-training.
Protein
Protein sequence and structure prediction
Protein pretraining framework that learns representations directly from cryo-EM density maps, transferring to flexibility, active-site, binding-affinity, and stability tasks.
A protein-text foundation model embedding sequences and natural language in a shared token space, enabling protein understanding and de novo design from one checkpoint.
Small molecule
Molecular representation, generation, and property prediction
A signed heterogeneous graph foundation model pretrained on the SIGMA-KG knowledge graph for zero-shot drug mode-of-action, clinical response, and drug-drug interaction prediction.
A motif-aware graph diffusion transformer for controllable molecular generation that transfers to unseen properties by learning only lightweight task embeddings with the generator frozen.
Protein-ligand foundation model that maps coarse-grained structural representations directly to binding affinity, running ~26x faster than Boltz-2.
Single-cell
Single-cell transcriptomics and genomics
A self-supervised masked autoencoder for RNA-seq count data, pretrained on 1.4M public samples to learn transferable transcriptomic representations without per-dataset re-training.
A zebrafish DNA sequence-to-function model predicting cell-type-specific single-cell expression across 85 cell-type x developmental-timepoint combinations during embryogenesis.
A constrained deep flow-matching framework for distributional translation of omics signatures across biological domains, such as mouse-to-human transcriptomics, without paired samples.
Spatial omics
Spatially-resolved omics and tissue microenvironment modeling
Multimodal tumor foundation model trained on paired H&E histology and spatial transcriptomics to infer whole-transcriptome and tumor-microenvironment signal from routine H&E alone.
A diffusion transformer that synthesizes H&E histopathology image patches conditioned jointly on spatial transcriptomics gene expression and morphological embeddings.
A spatial-transcriptomics foundation model for the tumor microenvironment that produces TME-aware embeddings and enables in silico perturbation from a fixed pretrained checkpoint.
Pathology
Histology and tissue imaging analysis
Multimodal tumor foundation model trained on paired H&E histology and spatial transcriptomics to infer whole-transcriptome and tumor-microenvironment signal from routine H&E alone.
A diffusion transformer that synthesizes H&E histopathology image patches conditioned jointly on spatial transcriptomics gene expression and morphological embeddings.
Genetically aligned foundation model for blood smear cytology that links single white-blood-cell morphology to chromosomal aberrations and mutations for AML/APL diagnosis.
Imaging
Microscopy, fluorescence imaging, and cryo-EM analysis
Protein pretraining framework that learns representations directly from cryo-EM density maps, transferring to flexibility, active-site, binding-affinity, and stability tasks.
A physics-informed generative foundation model for quantitative diffusion MRI that maps brain microstructure (tensor, kurtosis, NODDI) and adapts zero-shot to each participant's data.
A domain-specific foundation model for zero-shot plant root image segmentation, built on a MobileSAM backbone and trained across nine diverse root datasets.
Metabolomics
Metabolite profiles from NMR and mass spectrometry
A self-supervised metabolomic foundation model pretrained on NMR metabolite profiles from 430,000+ UK Biobank participants, applied without backbone retraining to aging, subtyping, and risk tasks.
Biosignals
Continuous physiological and wearable sensor time-series
A dual-stream self-supervised foundation model for continuous glucose monitoring data, separating slow physiological state from transient glucose events.
Language model
Bio/scientific language and generative models
A protein-text foundation model embedding sequences and natural language in a shared token space, enabling protein understanding and de novo design from one checkpoint.
A multimodal Q-former that fuses DNA sequence, gene context, protein function, and text into a prefix for a frozen LLM, enabling zero-shot genetic variant interpretation.
A unified bio-language Mixture-of-Experts foundation model spanning DNA, protein sequence and structure, and biological text, applied across eight task families from a single checkpoint.
The Leaderboard
Ranked by academic impact and model scale.
Most Cited
Academic impact by citation count
| # | Model | Citations |
|---|---|---|
| 1 | 36.1K | |
| 2 | 11K | |
| 3 | 4.6K | |
| 4 | 3.4K | |
| 5 | 3.1K |
Most Parameters
Model size by parameter count
| # | Model | Parameters |
|---|---|---|
| 1 | 120B | |
| 2 | 100B | |
| 3 | 98B | |
| 4 | 80B | |
| 5 | 46.7B |
Who Builds Biological AI
The organizations behind the most-cited foundation models.
Top Organizations
By total citation count