bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
DNA & Gene

AlphaGenome

Google DeepMind

Google DeepMind model that predicts thousands of functional genomic tracks at single base-pair resolution from megabase-scale DNA sequences.

Released: 2025
Parameters: 450,000,000

Overview

AlphaGenome is a unified deep learning model from Google DeepMind that takes up to 1 million base pairs of DNA sequence as input and predicts thousands of functional genomic tracks at single base-pair resolution. It is designed to decode the regulatory grammar of the non-coding genome — the roughly 98% of human DNA that does not encode proteins but critically controls when, where, and how much genes are expressed. By jointly predicting diverse molecular readouts including gene expression, splicing, chromatin accessibility, histone modifications, transcription factor binding, and 3D chromatin contact maps, AlphaGenome provides a comprehensive view of regulatory sequence function from a single model.

AlphaGenome builds on the lineage of sequence-to-function models pioneered by Basenji and Enformer, but substantially advances the field by processing longer input sequences (1 Mb versus 200 kb for Enformer), predicting at finer resolution, and covering more output modalities. It complements AlphaMissense, which focuses on missense variants in protein-coding regions, by addressing the vast regulatory landscape where the majority of disease-associated genetic variants reside. The model was published in Nature in June 2025.

Key Features

  • Megabase-scale context: Processes up to 1,000,000 base pairs in a single forward pass, capturing long-range regulatory interactions such as enhancer-promoter communication that shorter-context models miss.
  • Single base-pair resolution: Delivers predictions at individual nucleotide resolution for most output modalities, enabling precise variant effect scoring.
  • Comprehensive multimodal output: Jointly predicts 11 assay categories spanning 5,930 human tracks and 1,128 mouse tracks, including RNA-seq, CAGE, PRO-cap, splice junctions, DNase, ATAC, histone marks, transcription factor binding, and Hi-C contact maps.
  • State-of-the-art variant effect prediction: Matches or exceeds the strongest available external models in 24 of 26 variant effect prediction evaluations and outperforms best external models on 22 of 24 genome track prediction tasks.
  • Efficient inference: Scores variants in under one second on consumer GPUs, making genome-wide scans feasible.

Technical Details

AlphaGenome is a 450-million parameter model implemented in JAX using a U-Net-style encoder-decoder architecture. Seven convolutional downsampling stages detect short sequence motifs and progressively compress the spatial dimension while increasing channel depth, preserving fine-grained local patterns such as splice site motifs and transcription factor binding sites. A multi-head self-attention transformer core then operates on the compressed representations to model long-range dependencies across the full megabase window. A U-Net decoder with skip connections progressively restores base-pair resolution, combining local information from the encoder with the long-range context from the transformer. Dedicated pairwise interaction blocks construct 2D representations for predicting Hi-C contact maps at 2,048 bp resolution.

The model is trained on publicly available functional genomics data from ENCODE, GTEx, 4D Nucleome, and FANTOM5, covering hundreds of human and mouse cell types. The deployed 450M-parameter student model is distilled from an ensemble of 64 frozen teacher models over 250,000 training steps on H100 GPUs. A single student model trains in approximately 4 hours on 8 TPUv3 chips — roughly half the compute budget of Enformer.

Applications

AlphaGenome's primary application is scoring the functional impact of non-coding genetic variants, which is critical because the vast majority of GWAS hits fall outside protein-coding regions. Researchers can use it to prioritize candidate causal variants and understand their regulatory mechanisms — for example, the authors demonstrated its utility by characterizing mechanisms of clinically relevant variants near the TAL1 oncogene. The model also supports gene regulation studies, helping researchers identify enhancers, characterize their tissue specificity, and trace effects on target gene expression. Additional applications include interpreting variants of uncertain significance in whole-genome sequencing studies, supporting drug target discovery by mapping the regulatory landscape around disease loci, and comparative genomics studies leveraging its joint human and mouse training.

Impact

AlphaGenome represents a significant step forward for regulatory genomics, unifying prediction of diverse functional genomic modalities under a single architecture and achieving state-of-the-art performance across a comprehensive benchmark suite. By making model weights available through Hugging Face and Kaggle and providing a free API for non-commercial research, Google DeepMind lowered the barrier for researchers without specialized hardware to access megabase-scale regulatory sequence modeling. Key limitations include the inability to capture regulatory elements beyond the 1 Mb input window, reduced reliability for rare or poorly characterized cell types not well represented in training data, and unsuitability for direct clinical interpretation of individual patient genomes without additional validation. The model is not designed to explain complex trait genetics in isolation, as such phenomena involve developmental timing, environmental interactions, and epistasis beyond cis-regulatory sequence.

Citation

Advancing regulatory variant effect prediction with AlphaGenome

Avsec, Z., Latysheva, N., Cheng, J., et al. (2026). Advancing regulatory variant effect prediction with AlphaGenome. Nature, 649, 1206-1218.

DOI: 10.1038/s41586-025-10014-0

Metrics

GitHub

Stars1.9K
Forks258
Open Issues1
Contributors12
Last Push2d ago
LanguagePython
LicenseApache-2.0

Citations

Total Citations49
Influential3
References69

Tags

gene expressionregulatory genomicsvariant effect predictionfoundation modelchromatingenomicssplicing

Resources

GitHub RepositoryGitHub RepositoryResearch PaperOfficial WebsiteHuggingFace ModelDocumentation