AlphaGenome

DNA foundation model that predicts thousands of functional genomic tracks, from expression and splicing to chromatin, at single base-pair resolution.

Released: June 2025

Parameters: 450 Million

AlphaGenome is a unified deep learning model from Google DeepMind that takes up to 1 million base pairs of DNA sequence as input and predicts thousands of functional genomic tracks at single base-pair resolution. It is designed to decode the regulatory grammar of the non-coding genome — the roughly 98% of human DNA that does not encode proteins but critically controls when, where, and how much genes are expressed. By jointly predicting diverse molecular readouts including gene expression, splicing, chromatin accessibility, histone modifications, transcription factor binding, and 3D chromatin contact maps, AlphaGenome provides a comprehensive view of regulatory sequence function from a single model.

AlphaGenome builds on the lineage of sequence-to-function models pioneered by Basenji and Enformer, but substantially advances the field by processing longer input sequences (1 Mb versus 200 kb for Enformer), predicting at finer resolution, and covering more output modalities. It complements AlphaMissense, which focuses on missense variants in protein-coding regions, by addressing the vast regulatory landscape where the majority of disease-associated genetic variants reside. The model was published in Nature in June 2025.

Key Features

Megabase-scale context: Processes up to 1,000,000 base pairs in a single forward pass, capturing long-range regulatory interactions such as enhancer-promoter communication that shorter-context models miss.
Single base-pair resolution: Delivers predictions at individual nucleotide resolution for most output modalities, enabling precise variant effect scoring.
Comprehensive multimodal output: Jointly predicts 11 assay categories spanning 5,930 human tracks and 1,128 mouse tracks, including RNA-seq, CAGE, PRO-cap, splice junctions, DNase, ATAC, histone marks, transcription factor binding, and Hi-C contact maps.
State-of-the-art variant effect prediction: Matches or exceeds the strongest available external models in 24 of 26 variant effect prediction evaluations and outperforms best external models on 22 of 24 genome track prediction tasks.
Efficient inference: Scores variants in under one second on consumer GPUs, making genome-wide scans feasible.

Technical Details

AlphaGenome is a 450-million parameter model implemented in JAX using a U-Net-style encoder-decoder architecture. Seven convolutional downsampling stages detect short sequence motifs and progressively compress the spatial dimension while increasing channel depth, preserving fine-grained local patterns such as splice site motifs and transcription factor binding sites. A multi-head self-attention transformer core then operates on the compressed representations to model long-range dependencies across the full megabase window. A U-Net decoder with skip connections progressively restores base-pair resolution, combining local information from the encoder with the long-range context from the transformer. Dedicated pairwise interaction blocks construct 2D representations for predicting Hi-C contact maps at 2,048 bp resolution.

The model is trained on publicly available functional genomics data from ENCODE, GTEx, 4D Nucleome, and FANTOM5, covering hundreds of human and mouse cell types. The deployed 450M-parameter student model is distilled from an ensemble of 64 frozen teacher models over 250,000 training steps on H100 GPUs. A single student model trains in approximately 4 hours on 8 TPUv3 chips — roughly half the compute budget of Enformer.

Applications

AlphaGenome's primary application is scoring the functional impact of non-coding genetic variants, which is critical because the vast majority of GWAS hits fall outside protein-coding regions. Researchers can use it to prioritize candidate causal variants and understand their regulatory mechanisms — for example, the authors demonstrated its utility by characterizing mechanisms of clinically relevant variants near the TAL1 oncogene. The model also supports gene regulation studies, helping researchers identify enhancers, characterize their tissue specificity, and trace effects on target gene expression. Additional applications include interpreting variants of uncertain significance in whole-genome sequencing studies, supporting drug target discovery by mapping the regulatory landscape around disease loci, and comparative genomics studies leveraging its joint human and mouse training.

Impact

AlphaGenome represents a significant step forward for regulatory genomics, unifying prediction of diverse functional genomic modalities under a single architecture and achieving state-of-the-art performance across a comprehensive benchmark suite. By making model weights available through Hugging Face and Kaggle and providing a free API for non-commercial research, Google DeepMind lowered the barrier for researchers without specialized hardware to access megabase-scale regulatory sequence modeling. Key limitations include the inability to capture regulatory elements beyond the 1 Mb input window, reduced reliability for rare or poorly characterized cell types not well represented in training data, and unsuitability for direct clinical interpretation of individual patient genomes without additional validation. The model is not designed to explain complex trait genetics in isolation, as such phenomena involve developmental timing, environmental interactions, and epistasis beyond cis-regulatory sequence.

Citation

Advancing regulatory variant effect prediction with AlphaGenome

Avsec, Z., Latysheva, N., Cheng, J., et al. (2026). Advancing regulatory variant effect prediction with AlphaGenome. Nature, 649, 1206-1218.

DOI: 10.1038/s41586-025-10014-0

Recent citations

Papers that recently cited this model.

In-depth exploration into the multifaceted regulatory mechanisms of carotenoid metabolism in microalgae.
Ming-Hua Liang, Chongping Li, Wei-Ping Zhang, et al.
Bioresource Technology · Sep 2026
0
An encyclopedia of human enhancer-gene regulatory interactions.
A. Gschwind, Kristy S. Mualim, Alireza Karbalayghareh, et al.
Nature · Jul 2026
0
Mendelian disorders of the epigenetic machinery: a decade of insights into the molecular basis.
Leandros Boukas, Hans T Bjornsson
Epigenomics · Jul 2026
0

Top citations

The most-cited papers that cite this model.

A haplotype-resolved view of human gene regulation
Mitchell R. Vollger, Elliott G. Swanson, Shane J. Neph, et al.
bioRxiv · Jun 2025
20
Pre-training genomic language model with variants for better modeling functional genomics
Tianyu Liu, Xiangyu Zhang, Jiecong Lin, et al.
NPJ artificial intelligence · Apr 2026
7
The Cell Ontology in the age of single-cell omics
Shawn Zheng Kai Tan, Aleix Puig-Barbe, Damien Goutte-Gattat, et al.
arXiv.org · Jun 2025
6
Efficient and accurate steering of Large Language Models through attention-guided feature learning
Parmida Davarmanesh, Ashia Wilson, Adityanarayanan Radhakrishnan
arXiv.org · Jan 2026
4
Navigating the promise and pitfalls of artificial intelligence.
Nature Microbiology · Mar 2026
2

Citations

Total Citations151

Influential17

References69

GitHub

Stars2K

Forks266

Open Issues2

Contributors14

Last Push9d ago

LanguagePython

LicenseApache-2.0

HuggingFace

Downloads0

Likes108

Last Modified5mo ago

Fields of citing research

Biology83%
Medicine72%
Computer Science70%
Environmental Science9%
Engineering5%
Agricultural and Food Sciences5%
Chemistry2%
Materials Science1%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

49Partial

Usability — can I run it?68

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository GitHub Repository Research Paper Official Website HuggingFace Model Documentation

Key Features

Megabase-scale context: Processes up to 1,000,000 base pairs in a single forward pass, capturing long-range regulatory interactions such as enhancer-promoter communication that shorter-context models miss.

Single base-pair resolution: Delivers predictions at individual nucleotide resolution for most output modalities, enabling precise variant effect scoring.

Comprehensive multimodal output: Jointly predicts 11 assay categories spanning 5,930 human tracks and 1,128 mouse tracks, including RNA-seq, CAGE, PRO-cap, splice junctions, DNase, ATAC, histone marks, transcription factor binding, and Hi-C contact maps.

State-of-the-art variant effect prediction: Matches or exceeds the strongest available external models in 24 of 26 variant effect prediction evaluations and outperforms best external models on 22 of 24 genome track prediction tasks.

Efficient inference: Scores variants in under one second on consumer GPUs, making genome-wide scans feasible.

Technical Details

Applications

Impact

Top citations

The most-cited papers that cite this model.

AlphaGenome

#Key Features

#Technical Details

#Applications

#Impact

Citation

Advancing regulatory variant effect prediction with AlphaGenome

Recent citations

Top citations

The Cell Ontology in the age of single-cell omics

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

AlphaGenome

#Key Features

#Technical Details

#Applications

#Impact

Citation

Advancing regulatory variant effect prediction with AlphaGenome

Recent citations

Top citations

The Cell Ontology in the age of single-cell omics

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact