MethylProphet

Transformer that infers whole-genome DNA methylation from gene expression, generalizing zero-shot to unmeasured CpG sites and unseen samples.

Released: February 2026

DNA methylation (DNAm) is a central epigenetic mark: the addition of methyl groups at CpG dinucleotides helps establish and maintain gene-expression programs, cell identity, and disease states. Measuring it genome-wide — by whole-genome bisulfite sequencing or large arrays — is informative but expensive, and many datasets capture only a subset of CpG sites or only some samples. Gene expression, by contrast, is measured ubiquitously. This raises a natural question: how much of the methylation landscape can be inferred from expression alone?

MethylProphet, developed by Huang and colleagues at Columbia University (preprint first posted February 2025, updated February 2026 on bioRxiv), is a transformer-based model that predicts whole-genome DNA methylation from gene-expression input. Framed as a generalized, gene-contextual model, it learns relationships between expression and methylation that let it impute methylation at genomic positions that were not directly measured, and to generalize to biological samples it has not seen during training.

By learning a shared expression-to-methylation mapping rather than memorizing per-site behavior, MethylProphet aims to fill in unmeasured CpGs and extend methylation profiling to samples where only expression is available — a potentially large practical saving for studies that already generate transcriptomic data.

Key Features

Expression-to-methylation inference: Predicts genome-wide DNAm using gene-expression input, exploiting the regulatory link between transcription and methylation.
Gene-contextual modeling: A generalized, gene-aware formulation lets the model reason about methylation in the context of nearby genes rather than treating CpGs in isolation.
Zero-shot to unmeasured sites: Infers methylation at CpG positions not directly assayed, effectively densifying sparse measurements.
Generalization to unseen samples: Transfers to biological samples outside the training set, supporting broad applicability across tissues and conditions.
Trained at large scale: Learned from extensive ENCODE and TCGA datasets spanning many samples and CpG sites.

Technical Details

MethylProphet is a transformer trained on large public resources — ENCODE and TCGA — to map gene expression to DNA methylation across the genome. The authors describe training over a very large collection of CpG-by-sample pairs (on the order of 1.6 billion), giving the model broad coverage of expression–methylation relationships. Its gene-contextual design allows it to infer methylation at unmeasured CpG sites and to generalize to previously unseen samples in a zero-shot fashion. As a preprint (v2, February 2026), exact architectural details such as parameter count and context length, along with code and trained weights, are not yet publicly released; reported capabilities therefore await the full release and independent benchmarking.

Applications

MethylProphet is aimed at epigenomics and cancer-genomics researchers who have abundant transcriptomic data but limited or partial methylation measurements. It can impute missing CpG values to complete sparse methylation arrays, extend methylation profiling to samples where only RNA-seq was collected, and support studies of how expression and methylation co-vary across tissues and tumors. By reducing the need to assay every CpG directly, it could lower the cost of epigenome-scale analyses in large cohorts such as TCGA-style cancer studies.

Impact

MethylProphet tests how far a single learned model can reconstruct the methylation landscape from expression, positioning gene expression as a partial proxy for the epigenome. If its zero-shot imputation holds up under peer review, it could make genome-wide methylation estimates accessible for the many datasets that include transcriptomics but not comprehensive bisulfite sequencing. As a bioRxiv preprint without released code or weights, its results require independent validation, but the scale of training and the focus on cross-site and cross-sample generalization make it a notable entry in epigenomic foundation modeling.

Citation

A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input

Preprint

Huang, X., et al. (2025) A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input. bioRxiv.

DOI: 10.1101/2025.02.05.636730

Recent citations

Papers that recently cited this model.

iDNA-DAPHA: a generic framework for methylation prediction via domain-adaptive pretraining and hierarchical attention
Wenjun Wang, Wenchong Tan, Lvlong Lai, et al.
Briefings Bioinform. · Nov 2025
0
Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions
Aymane Aghziel, M. A. Mahraz, H. Tairi, et al.
Briefings Bioinform. · Aug 2025
4Influential

Top citations

The most-cited papers that cite this model.

Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions
Aymane Aghziel, M. A. Mahraz, H. Tairi, et al.
Briefings Bioinform. · Aug 2025
4Influential
iDNA-DAPHA: a generic framework for methylation prediction via domain-adaptive pretraining and hierarchical attention
Wenjun Wang, Wenchong Tan, Lvlong Lai, et al.
Briefings Bioinform. · Nov 2025
0

Citations

Total Citations2

Influential1

References36

Fields of citing research

Biology100%
Computer Science100%
Medicine100%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

10Closed

Usability — can I run it?7

Reproducibility — can I retrain it?12

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper Official Website

Key Features

Expression-to-methylation inference: Predicts genome-wide DNAm using gene-expression input, exploiting the regulatory link between transcription and methylation.

Gene-contextual modeling: A generalized, gene-aware formulation lets the model reason about methylation in the context of nearby genes rather than treating CpGs in isolation.

Zero-shot to unmeasured sites: Infers methylation at CpG positions not directly assayed, effectively densifying sparse measurements.

Generalization to unseen samples: Transfers to biological samples outside the training set, supporting broad applicability across tissues and conditions.

Trained at large scale: Learned from extensive ENCODE and TCGA datasets spanning many samples and CpG sites.

Technical Details

Applications

Impact

MethylProphet

Key Features

Technical Details

Applications

Impact

Citation

A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input

Recent citations

iDNA-DAPHA: a generic framework for methylation prediction via domain-adaptive pretraining and hierarchical attention

Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions

Top citations

Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions

iDNA-DAPHA: a generic framework for methylation prediction via domain-adaptive pretraining and hierarchical attention

Citations

Fields of citing research

Openness

Tags

Resources

MethylProphet

Key Features

Technical Details

Applications

Impact

Citation

A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input

Recent citations

iDNA-DAPHA: a generic framework for methylation prediction via domain-adaptive pretraining and hierarchical attention

Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions

Top citations

Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions

iDNA-DAPHA: a generic framework for methylation prediction via domain-adaptive pretraining and hierarchical attention

Citations

Fields of citing research

Openness

Tags

Resources

MethylProphet

#Key Features

#Technical Details

#Applications

#Impact

Citation

A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

MethylProphet

#Key Features

#Technical Details

#Applications

#Impact

Citation

A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact