Shandong University / University of Electronic Science and Technology of China / Chinese Academy of Sciences
A deep learning framework that predicts locus-specific DNA methylation across 39 human tissues from genomic sequence, with a scRNA-seq-augmented variant for unseen cell types.
DNA methylation is a fundamental epigenetic modification in which methyl groups are added at CpG dinucleotides, helping to regulate gene expression, maintain genome stability, and establish tissue-specific cellular identity across processes such as embryonic development, differentiation, and aging. A long-standing question is how much of this methylation landscape is determined by the genomic sequence itself, and whether that sequence-to-methylation relationship can be learned well enough to predict methylation at specific loci in specific tissues without measuring it directly.
Melody, developed by Junru Jin, Leyi Wei, and colleagues at Shandong University (Joint SDU-NTU Centre for Artificial Intelligence Research) together with collaborators at the University of Electronic Science and Technology of China and the Chinese Academy of Sciences, is a deep learning framework that predicts locus-specific DNA methylation from genomic sequence across 39 human tissues. First posted to bioRxiv in November 2025, Melody takes 10-kb sequence windows as input — far longer than the short windows (often ~41 bp) used by earlier methods such as DeepCpG, CPGenie, and iDNA-ABF — allowing it to capture long-range regulatory dependencies that influence methylation state.
Unlike prior tools designed for a single cell line or a handful of samples, Melody is built around the extensive cell-type heterogeneity of methylation. The framework includes several variants tuned to different scenarios, including an extended model, Melody-G, that augments sequence input with embeddings derived from a single-cell RNA-seq (scRNA-seq) foundation model, enabling zero-shot generalization of methylation prediction to cell types not seen during training.
Melody is a 1D fully convolutional, U-Net-style encoder–decoder trained on DNA methylation profiles spanning 39 human tissues, taking 10-kb genomic sequence windows as input to predict locus-specific methylation. Its encoder is built from inverted residual blocks and is aggressively downsampled so the receptive field expands rapidly to capture long-range genomic dependencies — including distal regulatory elements — while residual learning keeps training stable; the decoder upsamples back to base resolution. The framework is organized into multiple variants optimized for distinct tasks: the single-track configuration (Melody-ST) uses the 1D U-Net backbone with a single output channel in its final 1×1 convolution and retains auxiliary heads that predict CpG counts and regional methylation levels at 100-bp resolution, while multi-task formulations and the scRNA-seq-augmented Melody-G reuse the same convolutional backbone for per-tissue prediction, generalization to unseen cell types, and variant effect estimation. For zero-shot extension to new cell types, Melody-G conditions methylation prediction on embeddings produced by a single-cell RNA-seq foundation model, effectively using transcriptomic state as a proxy for cellular identity. As a bioRxiv preprint (v2, released under a CC BY-NC license), exact hyperparameters such as parameter count are described in the manuscript, and at the time of cataloging no public code repository or trained weights had been released; reported benchmark gains therefore await independent reproduction and peer review.
Melody is aimed at epigenomics, regulatory-genomics, and statistical-genetics researchers who need methylation estimates where direct measurement is unavailable or incomplete. By predicting methylation from sequence alone, it can impute tissue- and cell-type-specific methylation for loci or cell types that were never assayed, and — through Melody-G — extend those predictions to new cell types using only scRNA-seq context. Its cross-task transfer to meQTL effect prediction makes it useful for interpreting non-coding variants and prioritizing candidate regulatory mutations, supporting fine-mapping and functional annotation efforts in disease genetics.
Melody contributes to a growing class of models that treat the genome as a learnable code for the epigenome, and it advances the field by combining long-range sequence context, broad tissue coverage, and transcriptome-conditioned generalization in a single framework. If its reported improvements over prior methylation predictors and its zero-shot cell-type generalization hold up under peer review, Melody could make cell-type-specific methylation estimates broadly accessible and strengthen the interpretation of methylation-associated variants. As a preprint without released code or weights, its results currently require independent validation, but its emphasis on locus-specific, cross-tissue, and cross-task prediction marks it as a notable entry in epigenomic sequence modeling.
Jin, J., et al. (2025) Melody: Decoding the Sequence Determinants of Locus-Specific DNA Methylation Across Human Tissues. bioRxiv.
DOI: 10.1101/2025.11.23.689975Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data