Transformer model that predicts gene expression and regulatory activity from 200kb DNA sequences, capturing enhancer-promoter interactions up to 100kb away.
Enformer is a deep learning architecture developed by Google DeepMind and Calico Life Sciences that substantially advances the ability to predict gene expression and epigenomic signals directly from DNA sequence. Published in Nature Methods in October 2021, the model addresses a fundamental problem in regulatory genomics: how noncoding DNA sequence — including distal enhancers, silencers, and insulators — determines gene expression levels across different cell types and tissues.
The central innovation of Enformer is its dramatically extended receptive field. Previous sequence-to-function models such as Basenji2 processed approximately 40,000 base pairs of DNA context, capturing regulatory elements within roughly 20kb of a gene's promoter. Enformer incorporates transformer-based self-attention layers that allow the model to integrate information from across a 200kb input window, effectively reaching distal regulatory elements up to 100kb away. This expansion increased the fraction of relevant enhancers within the model's field of view from 47% to 84%, yielding significantly improved predictions.
The model was developed collaboratively by Žiga Avsec and colleagues at Google DeepMind (including John Jumper, who later received the Nobel Prize for AlphaFold 2) alongside David Kelley and Vikram Agarwal at Calico Life Sciences. It builds directly on the Basenji lineage of genomic deep learning models while replacing the purely convolutional architecture with a hybrid convolution-transformer design that captures both local sequence motifs and long-range regulatory grammar.
tfhub.dev/deepmind/enformer/1) and code is released under an open license, with Colab notebooks for usage and fine-tuning.Enformer uses a hybrid convolutional-transformer architecture. DNA sequences are first encoded as one-hot vectors (4 channels per position, for A, C, G, T) and passed through a series of convolutional blocks that downsample the input and extract local sequence features. The resulting representations are then processed by 11 transformer blocks with 8 attention heads each and a model width of 1,536 channels. Custom relative positional encodings — combining exponential, gamma, and central mask functions — are used in place of absolute positional encodings to generalize across sequence positions. The output layer produces predictions at 128bp resolution over the central 114,688bp of the input, yielding output tensors of shape [batch, 896, num_tracks].
Training data was drawn from ENCODE and FANTOM5, covering CAGE (capped analysis of gene expression), DNase-seq, ATAC-seq, ChIP-seq for histone marks and transcription factors, and RNA-seq experiments across hundreds of human and mouse cell types and tissues. In benchmarks, Enformer outperformed Basenji2 across all assay types (paired Wilcoxon P < 10^-38) and improved eQTL fine-mapping accuracy in 47 of 48 GTEx tissues tested. On the CAGI5 saturation mutagenesis challenge, Enformer achieved state-of-the-art performance in predicting the regulatory consequences of systematic base-pair substitutions.
Enformer is used by researchers working at the intersection of genomics, regulatory biology, and human genetics. A primary application is variant effect prediction: by scoring the predicted functional impact of noncoding variants, Enformer helps fine-map causal variants from genome-wide association studies (GWAS) and prioritize candidates for experimental follow-up. It is also used to generate regulatory hypotheses for rare or de novo variants associated with Mendelian disorders, where experimental validation is difficult. In comparative genomics, Enformer predictions support the study of cis-regulatory evolution by imputing functional activity across species from sequence alone. Downstream models such as Enformer Celltyping and seq2cell have fine-tuned Enformer's learned representations to extend predictions to novel cell types and multimodal epigenomic integration tasks.
Enformer established the transformer architecture as a viable — and superior — alternative to purely convolutional approaches for genome-scale regulatory prediction, influencing a generation of follow-on models including Borzoi, Sei, and HyenaDNA. The Nature Methods paper has accumulated thousands of citations since its publication and is widely referenced as a benchmark in the regulatory genomics deep learning literature. The availability of pre-computed variant effect scores and TensorFlow Hub model weights has enabled broad adoption without specialized compute infrastructure. A notable limitation is that the model predicts steady-state population-level epigenomic signals rather than single-cell or dynamic responses, and it cannot account for three-dimensional genome organization beyond what is encoded in the linear sequence context. Performance also degrades for genomic regions poorly represented in ENCODE or FANTOM5 training data, such as repetitive elements or understudied cell types.
Avsec, Ž., et al. (2021) Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods.
DOI: 10.1038/s41592-021-01252-x