AlphaMissense

Missense variant pathogenicity predictor built on AlphaFold 2 representations, scoring variants across the human proteome at 0.940 AuROC on ClinVar.

Released: September 2023

AlphaMissense is a computational model developed by Google DeepMind and published in Science in September 2023, designed to predict whether missense variants — single amino acid changes in a protein sequence — are likely to be pathogenic or benign. Missense variants are among the most common genetic differences between individuals, and determining which ones contribute to disease has been a longstanding challenge in clinical genetics and functional genomics. Most missense variants are classified as "variants of uncertain significance" (VUS) in clinical databases, creating diagnostic uncertainty for patients and physicians. AlphaMissense addresses this gap at proteome scale.

The model adapts the architecture of AlphaFold 2, the landmark protein structure prediction system, repurposing its learned representations of protein sequence and structure to score the functional consequences of amino acid substitutions. Rather than predicting a new 3D structure for each variant, AlphaMissense uses the structural context embedded in AlphaFold's representations — alongside evolutionary signals derived from multiple sequence alignments — to estimate the probability that a given substitution disrupts protein function. The key innovation is that structural and evolutionary information, jointly encoded in the same neural network that AlphaFold trained on, provides a rich substrate for variant effect prediction without requiring explicit structure prediction at inference time.

AlphaMissense was released with a freely available precomputed database covering all 71 million possible single amino acid substitutions across 19,233 canonical human proteins. Predictions are also integrated into the EMBL-EBI AlphaFold Protein Structure Database and are accessible through the Ensembl Variant Effect Predictor, enabling direct use in clinical genomics workflows.

Key Features

Proteome-wide coverage: Provides pathogenicity scores for all 71 million possible missense variants in the canonical human proteome, with 89% classified as either likely benign or likely pathogenic at 90% precision cutoffs.
AlphaFold-derived structural context: Leverages the Evoformer representations of AlphaFold 2 — which encode both evolutionary co-variation and inferred structural geometry — to model how a substitution perturbs the local and global protein environment.
Two-stage training: Pretrained on protein structure prediction using the same procedure as AlphaFold, then fine-tuned on population frequency data from human and primate variant databases, treating commonly observed variants as benign signals and rare or absent variants as pathogenic signals.
Continuous pathogenicity scores: Outputs a score between 0 and 1 calibrated to approximate the probability of clinical pathogenicity, enabling flexible threshold selection for different sensitivity-specificity trade-offs.
Database integration: Predictions are directly embedded in the AlphaFold Protein Structure Database and the Ensembl VEP pipeline, requiring no additional infrastructure for standard genomic analysis workflows.
Agnostic to experimental labels: The model achieves competitive performance on held-out DMS (deep mutational scanning) assays and ClinVar benchmarks without being directly trained on those data, demonstrating generalization of its underlying representations.

Technical Details

AlphaMissense closely follows the AlphaFold 2 architecture, inheriting its 93-million-parameter network built around the Evoformer stack and Structure Module. During the first training stage, the model is optimized for protein structure prediction identically to AlphaFold 2, but with an increased weight on the masked MSA reconstruction loss, which strengthens the network's sensitivity to individual sequence positions. In the second stage, the model is fine-tuned on a variant pathogenicity objective derived from human and primate population frequency databases following the PrimateAI approach: variants frequently observed across primate species are used as benign examples, while variants absent from population data are treated as putative pathogenic examples. Both structure prediction and variant classification objectives are jointly optimized during fine-tuning, preserving the structural representations while adding discriminative power for functional annotation.

At inference, the model takes a protein sequence and its MSA as input and outputs a continuous pathogenicity score for a queried amino acid substitution without re-computing a full 3D structure. On the ClinVar benchmark of 18,924 variants (9,462 pathogenic and 9,462 benign from 999 proteins), AlphaMissense achieves an area under the receiver operating characteristic curve (AuROC) of 0.940, outperforming prior methods including EVE (0.911) and CADD. At the 90% precision cutoff, 32% of all possible human missense variants (22.8 million) are classified as likely pathogenic and 57% (40.9 million) as likely benign, leaving approximately 11% in an ambiguous intermediate range.

Applications

AlphaMissense is directly applicable to clinical variant interpretation, where it can prioritize VUS candidates from exome and genome sequencing for follow-up functional studies or support diagnostic decision-making for rare disease patients. Researchers in functional genomics use the precomputed database as a fast prior for designing mutagenesis screens — for instance, targeting the subset of variants predicted pathogenic to validate AlphaMissense scores with deep mutational scanning or cell-based assays. Drug developers use the pathogenicity landscape to identify constitutively activating or loss-of-function mutations in disease-relevant proteins, informing target validation and understanding of disease-associated alleles. Population geneticists can overlay AlphaMissense scores onto genome-wide association study (GWAS) signals or linkage disequilibrium blocks to annotate likely functional variants within associated loci. Integration with the Ensembl VEP means that AlphaMissense predictions are immediately available as a standard annotation layer in standard genomic analysis pipelines used by clinical and academic sequencing centers worldwide.

Impact

AlphaMissense's publication in Science attracted immediate attention from both the clinical genetics and computational biology communities, as it offered a principled, proteome-wide solution to the VUS problem that had resisted resolution for decades. The model's transfer of structural learning to variant effect prediction demonstrated that the AlphaFold representations encode functional information far beyond what is needed for structure alone, opening a conceptual direction for adapting foundation models to downstream annotation tasks. Critically, the free availability of the precomputed database — with no per-query computational cost — lowers barriers for clinical laboratories that lack access to GPU infrastructure. Limitations include the fact that AlphaMissense does not predict the structural consequences of individual substitutions (it does not generate new 3D coordinates per variant), and its training signal from population frequency data can conflate pathogenicity with selection effects unrelated to clinical disease. Performance is also reduced for understudied protein families with sparse MSAs, where the evolutionary representation is less informative. Nonetheless, AlphaMissense is widely cited as one of the most useful general-purpose tools for proteome-scale missense annotation, and its integration into community databases has accelerated its adoption across both academic and clinical genomics.

Citation

Accurate proteome-wide missense variant effect prediction with AlphaMissense

Cheng, J., et al. (2023) Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science.

DOI: 10.1126/science.adg7492

Recent citations

Papers that recently cited this model.

Food-derived phenolic compounds in precision nutrition: computational and AI-assisted approaches for target identification and health intervention
Yu Li, Yongli Wang, Pranesha Prabhakaran, et al.
Food Research International · Oct 2026
0
A WHRN mutation impacts ocular morphology in rhesus macaques
Ana Ripolles-Garcia, A. Raposo, Sophie M. Le, et al.
Frontiers in Cell and Developmental Biology · Jul 2026
0
Germline whole-exome sequencing identifies CTNND1 as a candidate gene for hereditary gastric cancer in a large Brazilian cohort.
Deivid Calebe de Souza, Thaliane Buranello, Dario Tenorio Tavares Neto, et al.
Gastric Cancer · Jul 2026
0

Top citations

The most-cited papers that cite this model.

The Reactome Pathway Knowledgebase 2024
Marija Milacic, Deidre Beavers, P. Conley, et al.
Nucleic Acids Research · Nov 2023
1K
Ensembl 2025
Sarah Dyer, Olanrewaju Austine-Orimoloye, A. G. Azov, et al.
Nucleic Acids Research · Dec 2024
397
AlphaFold Protein Structure Database and 3D-Beacons: New Data and Capabilities.
Jennifer R Fleming, Paulyna Magana, S. Nair, et al.
Journal of Molecular Biology · Jan 2025
166
A guide to artificial intelligence for cancer researchers
R. Pérez-López, N. Ghaffari Laleh, Faisal Mahmood, et al.
Nature Reviews. Cancer · May 2024
163
Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering
Jason Yang, Francesca-Zhoufan Li, Frances H. Arnold
ACS Central Science · Feb 2024
163

Citations

Total Citations1.7K

Influential241

References97

GitHub

Stars635

Forks86

Open Issues1

Contributors5

Last Push2y ago

LanguagePython

LicenseApache-2.0

Fields of citing research

Medicine27%
Biology19%
Computer Science11%
Environmental Science1%
Chemistry1%
Engineering1%
Materials Science0%
Agricultural and Food Sciences0%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

44Partial

Usability — can I run it?67

Reproducibility — can I retrain it?25

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website Dataset

Key Features

Proteome-wide coverage: Provides pathogenicity scores for all 71 million possible missense variants in the canonical human proteome, with 89% classified as either likely benign or likely pathogenic at 90% precision cutoffs.

AlphaFold-derived structural context: Leverages the Evoformer representations of AlphaFold 2 — which encode both evolutionary co-variation and inferred structural geometry — to model how a substitution perturbs the local and global protein environment.

Two-stage training: Pretrained on protein structure prediction using the same procedure as AlphaFold, then fine-tuned on population frequency data from human and primate variant databases, treating commonly observed variants as benign signals and rare or absent variants as pathogenic signals.

Continuous pathogenicity scores: Outputs a score between 0 and 1 calibrated to approximate the probability of clinical pathogenicity, enabling flexible threshold selection for different sensitivity-specificity trade-offs.

Database integration: Predictions are directly embedded in the AlphaFold Protein Structure Database and the Ensembl VEP pipeline, requiring no additional infrastructure for standard genomic analysis workflows.

Agnostic to experimental labels: The model achieves competitive performance on held-out DMS (deep mutational scanning) assays and ClinVar benchmarks without being directly trained on those data, demonstrating generalization of its underlying representations.

Technical Details

Applications

Impact

Top citations

The most-cited papers that cite this model.

The Reactome Pathway Knowledgebase 2024

Marija Milacic, Deidre Beavers, P. Conley, et al.

Nucleic Acids Research · Nov 2023

Ensembl 2025

Sarah Dyer, Olanrewaju Austine-Orimoloye, A. G. Azov, et al.

Nucleic Acids Research · Dec 2024

397

AlphaFold Protein Structure Database and 3D-Beacons: New Data and Capabilities.

Jennifer R Fleming, Paulyna Magana, S. Nair, et al.

Journal of Molecular Biology · Jan 2025

166

A guide to artificial intelligence for cancer researchers

R. Pérez-López, N. Ghaffari Laleh, Faisal Mahmood, et al.

Nature Reviews. Cancer · May 2024

163

Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering

Jason Yang, Francesca-Zhoufan Li, Frances H. Arnold

ACS Central Science · Feb 2024

163

AlphaMissense

#Key Features

#Technical Details

#Applications

#Impact

Citation

Accurate proteome-wide missense variant effect prediction with AlphaMissense

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

AlphaMissense

#Key Features

#Technical Details

#Applications

#Impact

Citation

Accurate proteome-wide missense variant effect prediction with AlphaMissense

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact