Google DeepMind model that predicts thousands of functional genomic tracks at single base-pair resolution from megabase-scale DNA sequences.
AlphaGenome is a unified deep learning model from Google DeepMind that takes up to 1 million base pairs of DNA sequence as input and predicts thousands of functional genomic tracks at single base-pair resolution. It is designed to decode the regulatory grammar of the non-coding genome — the roughly 98% of human DNA that does not encode proteins but critically controls when, where, and how much genes are expressed. By jointly predicting diverse molecular readouts including gene expression, splicing, chromatin accessibility, histone modifications, transcription factor binding, and 3D chromatin contact maps, AlphaGenome provides a comprehensive view of regulatory sequence function from a single model.
AlphaGenome builds on the lineage of sequence-to-function models pioneered by Basenji and Enformer, but substantially advances the field by processing longer input sequences (1 Mb versus 200 kb for Enformer), predicting at finer resolution, and covering more output modalities. It complements AlphaMissense, which focuses on missense variants in protein-coding regions, by addressing the vast regulatory landscape where the majority of disease-associated genetic variants reside. The model was published in Nature in June 2025.
AlphaGenome is a 450-million parameter model implemented in JAX using a U-Net-style encoder-decoder architecture. Seven convolutional downsampling stages detect short sequence motifs and progressively compress the spatial dimension while increasing channel depth, preserving fine-grained local patterns such as splice site motifs and transcription factor binding sites. A multi-head self-attention transformer core then operates on the compressed representations to model long-range dependencies across the full megabase window. A U-Net decoder with skip connections progressively restores base-pair resolution, combining local information from the encoder with the long-range context from the transformer. Dedicated pairwise interaction blocks construct 2D representations for predicting Hi-C contact maps at 2,048 bp resolution.
The model is trained on publicly available functional genomics data from ENCODE, GTEx, 4D Nucleome, and FANTOM5, covering hundreds of human and mouse cell types. The deployed 450M-parameter student model is distilled from an ensemble of 64 frozen teacher models over 250,000 training steps on H100 GPUs. A single student model trains in approximately 4 hours on 8 TPUv3 chips — roughly half the compute budget of Enformer.
AlphaGenome's primary application is scoring the functional impact of non-coding genetic variants, which is critical because the vast majority of GWAS hits fall outside protein-coding regions. Researchers can use it to prioritize candidate causal variants and understand their regulatory mechanisms — for example, the authors demonstrated its utility by characterizing mechanisms of clinically relevant variants near the TAL1 oncogene. The model also supports gene regulation studies, helping researchers identify enhancers, characterize their tissue specificity, and trace effects on target gene expression. Additional applications include interpreting variants of uncertain significance in whole-genome sequencing studies, supporting drug target discovery by mapping the regulatory landscape around disease loci, and comparative genomics studies leveraging its joint human and mouse training.
AlphaGenome represents a significant step forward for regulatory genomics, unifying prediction of diverse functional genomic modalities under a single architecture and achieving state-of-the-art performance across a comprehensive benchmark suite. By making model weights available through Hugging Face and Kaggle and providing a free API for non-commercial research, Google DeepMind lowered the barrier for researchers without specialized hardware to access megabase-scale regulatory sequence modeling. Key limitations include the inability to capture regulatory elements beyond the 1 Mb input window, reduced reliability for rare or poorly characterized cell types not well represented in training data, and unsuitability for direct clinical interpretation of individual patient genomes without additional validation. The model is not designed to explain complex trait genetics in isolation, as such phenomena involve developmental timing, environmental interactions, and epistasis beyond cis-regulatory sequence.
Avsec, Z., Latysheva, N., Cheng, J., et al. (2026). Advancing regulatory variant effect prediction with AlphaGenome. Nature, 649, 1206-1218.
DOI: 10.1038/s41586-025-10014-0