INTERACT

Deep learning model predicting DNA methylation regulatory variants at CpG sites in the human brain, fine-mapping psychiatric disorder risk loci.

Released: August 2022

INTERACT (Integrative Neural Network for Epigenetic Regulatory Analysis of Chromatin and Transcription) is a deep learning model that predicts the effect of genetic variation on DNA methylation levels at individual CpG sites in the human brain. Developed by Jiyun Zhou, Qiang Chen, Patricia R. Braun, and colleagues primarily at the Lieber Institute for Brain Development at the Johns Hopkins Medical Campus, it was published in Proceedings of the National Academy of Sciences (PNAS) in August 2022. The model addresses a fundamental challenge in human neuroscience genetics: identifying which DNA sequence variants functionally regulate the epigenome, and how these regulatory effects contribute to risk for psychiatric disorders such as schizophrenia.

DNA methylation in the brain is highly dynamic, cell-type-specific, and influenced by genetic variation at nearby and distal sequence positions. Methylation quantitative trait loci (mQTLs) — genetic variants that statistically associate with changes in methylation at specific CpG sites — provide experimental evidence for such regulatory effects, but mQTL studies are limited by sample size, linkage disequilibrium, and restricted coverage of CpG sites on standard arrays. INTERACT addresses these limitations by learning to predict CpG methylation levels from DNA sequence using a hybrid CNN-transformer architecture trained on hippocampal whole-genome bisulfite sequencing, and then using the trained model to score the predicted impact of any sequence variant on methylation at its surrounding CpG sites. This yields a quantitative "methylation regulatory variant" (MRV) score for every variant in the genome, independent of statistical association data.

A key application demonstrated in the PNAS paper is the use of INTERACT-derived MRV scores for fine-mapping schizophrenia GWAS risk loci. By overlaying predicted regulatory effects on genetic association signals, the researchers prioritized candidate causal variants and identified potential novel risk genes for schizophrenia, demonstrating the utility of epigenomic deep learning for translational neurogenetics. An extended version of the model, Cell-type-INTERACT, was subsequently developed to capture cell-type-specific methylation regulatory effects in the brain, and is available on a separate GitHub repository.

Key Features

Hybrid CNN-transformer architecture: INTERACT integrates a convolutional neural network module — which captures local DNA sequence motifs such as transcription factor binding sites and CpG context signals — with a transformer module that uses self-attention to capture long-range sequence interactions. Fully connected layers then predict CpG methylation levels from the combined representations.
2 kbp input window centered on CpG sites: The model accepts 2-kilobase DNA sequence windows centered on the CpG site of interest, providing sufficient context to capture proximal regulatory elements including nearby transcription factor binding motifs and co-methylated regions, while remaining computationally efficient.
Variant effect scoring by in silico mutagenesis: Regulatory variant scores are computed by comparing INTERACT's methylation predictions for the reference allele and alternate allele at each variant position, generating a continuous score for the predicted methylation-altering effect of any nucleotide change without requiring matched genotype and methylation data.
Brain-specific training: The model was pre-trained on approximately 26 million CpG sites from whole-genome bisulfite sequencing (WGBS) of hippocampal tissue and fine-tuned on EPIC array methylation data from brain, blood, saliva, and buccal tissue across 21 donors, ensuring that predictions capture the tissue-specific epigenomic context of the brain.
Transcription factor motif discovery: Post-hoc analysis of the model's convolutional filters revealed 37 transcription factors whose DNA binding motifs match INTERACT's learned sequence features in a brain-specific manner, including ZIC2, NR2F1, PPARD, and RXRG — factors with known roles in neurodevelopment and neuropsychiatric disorders.
GWAS fine-mapping integration: Predicted MRV scores can be directly overlaid on GWAS summary statistics to perform epigenomic fine-mapping of risk loci. Applied to schizophrenia, this approach identified 124 risk loci where INTERACT predictions prioritized specific candidate regulatory variants and novel target genes.

Technical Details

INTERACT's architecture consists of three sequential modules. The first module is a CNN encoder consisting of multiple convolutional layers with increasing filter sizes, designed to hierarchically extract local sequence features from the 2 kbp input window. The second module is a transformer encoder with multi-head self-attention, which processes the CNN output sequence to capture inter-element dependencies across the local regulatory landscape. The third module consists of fully connected layers that map the combined representation to predicted CpG methylation beta values (continuous values between 0 and 1).

Training proceeded in two phases. Pre-training used WGBS data from hippocampal tissue covering approximately 26 million CpG sites, training the model to predict measured methylation levels from local sequence alone. Fine-tuning used EPIC array methylation data from 21 subjects across four tissue types (brain, blood, saliva, buccal), which provided training signal at the ~850,000 CpG positions covered by the EPIC array. After training, variant effect scores were generated for all common genetic variants across the genome by running the model twice per variant — once with the reference sequence and once with the alternate allele substituted — and taking the absolute difference in predicted methylation. These scores were validated against experimental mQTL evidence, showing strong enrichment of high-scoring INTERACT variants among known mQTL hits in brain tissue. Polygenic risk score analyses for schizophrenia showed significant improvement in risk prediction when incorporating INTERACT-derived MRV scores as annotations, particularly in non-European ancestry samples.

Applications

INTERACT's primary application domain is psychiatric and neurological genetics, where researchers need to identify which non-coding GWAS risk variants functionally regulate gene expression or chromatin state in brain tissue. The model enables researchers to prioritize variants for experimental validation among the many candidates in LD blocks that contain GWAS signals. It is also applicable in studies of neurodevelopmental conditions, Alzheimer's disease, bipolar disorder, and other brain-related traits where genome-wide association studies have identified non-coding risk loci. Beyond psychiatric genetics, INTERACT's predicted MRV scores provide a resource for understanding the regulatory grammar of DNA methylation in the brain, identifying which sequence motifs and transcription factor binding sites most strongly influence local CpG methylation. The model's approach generalizes to any tissue where matched WGBS and genotype data are available for training.

Impact

INTERACT demonstrated that deep learning models trained on epigenomic data could generate quantitative regulatory variant scores that both validate against experimental mQTL evidence and improve the resolution of GWAS fine-mapping for psychiatric traits. By explicitly integrating sequence-based predictions with genetic association data, the work established a template for epigenomic deep learning to contribute to causal variant identification — a longstanding bottleneck in human genetics. The PNAS publication attracted attention from psychiatric genetics and functional genomics communities, and the follow-on Cell-type-INTERACT model extended the approach to cell-type-specific methylation prediction. A key limitation is that INTERACT, like all sequence-to-methylation models, predicts the genetically regulated component of methylation variation and cannot account for environmental or stochastic contributions. The model's dependence on brain tissue training data also limits its performance in non-brain tissues without retraining, though the multi-tissue fine-tuning strategy partially addresses this.

Citation

Deep learning predicts DNA methylation regulatory variants in the human brain and elucidates the genetics of psychiatric disorders

Zhou, J., et al. (2022) Deep learning predicts DNA methylation regulatory variants in the human brain and elucidates the genetics of psychiatric disorders. Proceedings of the National Academy of Sciences of the United States of America.

DOI: 10.1073/pnas.2206069119

Recent citations

Papers that recently cited this model.

DeepMethylation: A deep learning framework for tissue-specific DNA methylation prediction and functional variant annotation
Wenran Li, Shijia Yu, Yingyu Cheng, et al.
PLoS Computational Biology · Jul 2026
0
Artificial intelligence across the aging continuum: mechanistic geroscience, therapeutic innovation, and clinical impact.
Hongbo Li, P. O. Abhulimen, Qiuliyang Yu, et al.
Ageing Research Reviews · Jun 2026
0
Large-scale multi-omic biosequence transformers for modeling protein–nucleic acid interactions
Sully F Chen, Robert J. Steele, Glen M. Hocky, et al.
PLoS ONE · Feb 2026
1

Top citations

The most-cited papers that cite this model.

Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review
Sanghyuk Roy Choi, Minhyeok Lee
Biology · Jul 2023
193
Integrating Machine Learning with Multi-Omics Technologies in Geroscience: Towards Personalized Medicine
Nikolaos Theodorakis, G. Feretzakis, L. Tzelves, et al.
Journal of Personalized Medicine · Aug 2024
36
Deep Learning Methods for Omics Data Imputation
Lei Huang, Meng Song, Hui Shen, et al.
Biology · Oct 2023
33
Application of deep learning in cancer epigenetics through DNA methylation analysis
Maryam Yassi, Aniruddha Chatterjee, Matthew Parry
Briefings Bioinform. · Sep 2023
23
From tradition to innovation: conventional and deep learning frameworks in genome annotation
Zhao Chen, N. Ain, Qian Zhao, et al.
Briefings Bioinform. · Mar 2024
21

Citations

Total Citations25

Influential2

References72

GitHub

Stars11

Forks5

Open Issues3

Contributors1

Last Push3y ago

LanguagePython

Fields of citing research

Biology88%
Medicine84%
Computer Science80%
Psychology8%
Agricultural and Food Sciences4%
Environmental Science4%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

9Closed

Usability — can I run it?11

Reproducibility — can I retrain it?9

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Hybrid CNN-transformer architecture: INTERACT integrates a convolutional neural network module — which captures local DNA sequence motifs such as transcription factor binding sites and CpG context signals — with a transformer module that uses self-attention to capture long-range sequence interactions. Fully connected layers then predict CpG methylation levels from the combined representations.

2 kbp input window centered on CpG sites: The model accepts 2-kilobase DNA sequence windows centered on the CpG site of interest, providing sufficient context to capture proximal regulatory elements including nearby transcription factor binding motifs and co-methylated regions, while remaining computationally efficient.

Variant effect scoring by in silico mutagenesis: Regulatory variant scores are computed by comparing INTERACT's methylation predictions for the reference allele and alternate allele at each variant position, generating a continuous score for the predicted methylation-altering effect of any nucleotide change without requiring matched genotype and methylation data.

Brain-specific training: The model was pre-trained on approximately 26 million CpG sites from whole-genome bisulfite sequencing (WGBS) of hippocampal tissue and fine-tuned on EPIC array methylation data from brain, blood, saliva, and buccal tissue across 21 donors, ensuring that predictions capture the tissue-specific epigenomic context of the brain.

Transcription factor motif discovery: Post-hoc analysis of the model's convolutional filters revealed 37 transcription factors whose DNA binding motifs match INTERACT's learned sequence features in a brain-specific manner, including ZIC2, NR2F1, PPARD, and RXRG — factors with known roles in neurodevelopment and neuropsychiatric disorders.

GWAS fine-mapping integration: Predicted MRV scores can be directly overlaid on GWAS summary statistics to perform epigenomic fine-mapping of risk loci. Applied to schizophrenia, this approach identified 124 risk loci where INTERACT predictions prioritized specific candidate regulatory variants and novel target genes.

Technical Details

Applications

Impact

Citation

Deep learning predicts DNA methylation regulatory variants in the human brain and elucidates the genetics of psychiatric disorders

DOI: 10.1073/pnas.2206069119

INTERACT

#Key Features

#Technical Details

#Applications

#Impact

Citation

Deep learning predicts DNA methylation regulatory variants in the human brain and elucidates the genetics of psychiatric disorders

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

INTERACT

#Key Features

#Technical Details

#Applications

#Impact

Citation

Deep learning predicts DNA methylation regulatory variants in the human brain and elucidates the genetics of psychiatric disorders

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact