Question 1

What is a DNA and gene foundation model?

Accepted Answer

A DNA and gene foundation model is a neural network pretrained on large collections of genomic sequences — DNA or RNA — to learn representations of regulatory syntax, coding potential, and sequence function. These representations transfer to downstream tasks like variant effect prediction, gene expression modeling, and regulatory element classification. Examples include Enformer, Nucleotide Transformer, and Evo.

Question 2

How do genomic foundation models handle the non-coding genome?

Accepted Answer

Most genomic foundation models are pretrained on whole-genome sequences that include non-coding regions, meaning they implicitly encode information about regulatory elements, transposons, and intergenic space. Models like Enformer were specifically designed to predict transcription factor binding and chromatin accessibility from non-coding sequence windows, making them well-suited to interpreting GWAS hits in regulatory regions.

Question 3

What is the difference between DNABERT and Enformer?

Accepted Answer

DNABERT applies BERT-style masked language modeling to tokenized DNA k-mers, producing general-purpose genomic embeddings useful across many classification and regression tasks. Enformer is an architecture explicitly trained to predict genomic assay tracks (CAGE, ATAC-seq, ChIP-seq) from long input windows of up to 200 kb, making it specifically powerful for gene expression and regulatory prediction rather than general-purpose sequence representation.

Question 4

Can DNA foundation models predict variant pathogenicity?

Accepted Answer

Yes, this is one of the most actively developed applications. Models like Enformer, Sei, and Nucleotide Transformer can score the predicted functional impact of single-nucleotide variants by comparing reference and alternate sequence outputs. However, most current models are stronger at regulatory variants in well-characterized tissues than at rare coding variants, and calibration against clinical databases remains an active area of evaluation.

DNA & Gene Models

What DNA & gene foundation models do

Common applications and use cases

Notable Models

Big Bird

DNABERT-2

eRNAformer

Enformer

DNABERT-S

GPN

Frequently asked questions

What is a DNA and gene foundation model?

How do genomic foundation models handle the non-coding genome?

What is the difference between DNABERT and Enformer?

Can DNA foundation models predict variant pathogenicity?

Explore related categories