DNA & Gene foundation models
DNA & Gene

DNA & Gene Models

Genomic sequence modeling and gene expression analysis

106 models in this category

What DNA & gene foundation models do

DNA and gene foundation models learn the regulatory and functional grammar of genomic sequences, predicting how nucleotide changes propagate through gene regulatory networks to alter expression, splicing, and cellular phenotype. Models like Enformer predict cell-type-specific gene expression tracks from sequence alone, while Evo and the Nucleotide Transformer learn broader representations spanning prokaryotic and eukaryotic genomes. DNABERT and its successors apply BERT-style masking to DNA k-mers, enabling fine-tuning for tasks from promoter classification to variant effect prediction.

Common applications and use cases

Variant effect prediction is among the highest-value applications: models like Enformer and Sei can score non-coding variants for regulatory impact, informing GWAS interpretation and rare disease diagnosis. Regulatory element classification — identifying enhancers, promoters, and silencers — and CRISPR guide efficiency scoring are other well-established use cases. Evo's pretraining at genomic scale also supports sequence generation tasks, including the design of novel regulatory elements and protein-coding sequences.

Notable Models

Top-rated dna & gene models from our evaluations

Big Bird

Google Research

Released July 28, 2020

2.8K319K634

Sparse attention transformer extending BERT to sequences up to 8x longer via random, local, and global attention patterns, with demonstrated applications in genomic sequence modeling.

DNA & Gene
49Openness

DNABERT-2

MAGICS Lab

Released June 26, 2023

41697.7K493

Multi-species genomic foundation model replacing k-mer tokenization with BPE, achieving state-of-the-art performance with 21x fewer parameters than prior leading models.

DNA & Gene
64Openness

Carbon

Hugging Face +2 others

Released May 1, 2026

7.4K193

An open autoregressive genomic foundation model (0.5B–8B params) with a 6-mer DNA tokenizer, matching Evo2-7B win rates at far higher throughput.

DNA & Gene
93Openness

Enformer

Google DeepMind

Released October 4, 2021

1.2K15K

Transformer model that predicts gene expression and regulatory activity from 200kb DNA sequences, capturing enhancer-promoter interactions up to 100kb away.

DNA & Gene
84Openness

Caduceus

Kuleshov Lab

Released March 5, 2024

21019.7K240

Bidirectional, reverse-complement equivariant DNA language models built on Mamba SSMs. Outperforms models 10x larger on long-range variant effect prediction.

DNA & Gene
86Openness

GPN

Song Lab

Released October 31, 2023

356344

A DNA language model for unsupervised genome-wide variant effect prediction, trained on multispecies genomes via masked language modeling without functional annotation labels.

DNA & Gene
93Openness

Frequently asked questions

What is a DNA and gene foundation model?

A DNA and gene foundation model is a neural network pretrained on large collections of genomic sequences — DNA or RNA — to learn representations of regulatory syntax, coding potential, and sequence function. These representations transfer to downstream tasks like variant effect prediction, gene expression modeling, and regulatory element classification. Examples include Enformer, Nucleotide Transformer, and Evo.

How do genomic foundation models handle the non-coding genome?

Most genomic foundation models are pretrained on whole-genome sequences that include non-coding regions, meaning they implicitly encode information about regulatory elements, transposons, and intergenic space. Models like Enformer were specifically designed to predict transcription factor binding and chromatin accessibility from non-coding sequence windows, making them well-suited to interpreting GWAS hits in regulatory regions.

What is the difference between DNABERT and Enformer?

DNABERT applies BERT-style masked language modeling to tokenized DNA k-mers, producing general-purpose genomic embeddings useful across many classification and regression tasks. Enformer is an architecture explicitly trained to predict genomic assay tracks (CAGE, ATAC-seq, ChIP-seq) from long input windows of up to 200 kb, making it specifically powerful for gene expression and regulatory prediction rather than general-purpose sequence representation.

Can DNA foundation models predict variant pathogenicity?

Yes, this is one of the most actively developed applications. Models like Enformer, Sei, and Nucleotide Transformer can score the predicted functional impact of single-nucleotide variants by comparing reference and alternate sequence outputs. However, most current models are stronger at regulatory variants in well-characterized tissues than at rare coding variants, and calibration against clinical databases remains an active area of evaluation.