EMDLP

China University of Mining and Technology

RNA methylation site predictor combining multiple sequence encodings with dilated convolution and BiLSTM layers to identify m6A and m1A sites.

Released: June 2022

EMDLP (Ensemble Multiscale Deep Learning Predictor) is a computational tool developed at the China University of Mining and Technology (CUMT) for the identification of RNA methylation sites from sequence data. RNA methylation — particularly N6-methyladenosine (m6A) and N1-methyladenosine (m1A) — is among the most prevalent and functionally important post-transcriptional RNA modifications. These chemical marks regulate mRNA stability, translation efficiency, and splicing, and their dysregulation has been linked to cancer, neurological disorders, and viral infection. Accurate computational prediction of methylation sites is therefore essential for researchers who lack access to high-throughput epitranscriptomic sequencing methods such as MeRIP-seq.

EMDLP addresses this problem by combining multiple sequence encoding strategies with a hybrid neural architecture that captures both local sequence context and long-range dependencies simultaneously. The key insight driving the model is that different encodings reveal complementary information about the sequence neighborhood around a putative methylation site, and that an ensemble integrating all three encoding streams yields more robust predictions than any single strategy alone.

Published in BMC Bioinformatics in June 2022, EMDLP demonstrates state-of-the-art performance on benchmark datasets for both m1A and m6A site prediction, and is accompanied by a publicly accessible web server that allows users to submit RNA sequences without installing local software.

Key Features

Multiscale encoding: Three distinct sequence encoding schemes — RNA word embedding, one-hot encoding, and RGloVe (a GloVe-based word vector method adapted for RNA) — capture complementary local and global sequence patterns around candidate methylation sites.
Dilated convolutional BiLSTM (DCB) architecture: A dilated convolutional neural network (DCNN) feeds into a Bidirectional LSTM (BiLSTM), allowing the model to extract hierarchical sequence features at multiple receptive-field scales without losing temporal context.
Soft-voting ensemble: Predictions from the three independently trained encoding branches are combined via soft voting, reducing variance and improving generalization relative to any single-encoding model.
Multi-modification support: The model is evaluated on both m1A and m6A methylation types, demonstrating generalizability across chemically distinct RNA modifications.
Web server deployment: An interactive web server (labiip.net/EMDLP) allows researchers to submit sequences and receive methylation site predictions without local installation.
Open-source implementation: The full training and inference code is available on GitHub, enabling retraining on new datasets and adaptation to additional RNA modification types.

Technical Details

EMDLP's core architecture — the DCB module — chains a dilated convolutional neural network to a Bidirectional LSTM. Dilated convolutions expand the effective receptive field exponentially with depth without increasing the number of parameters, allowing the model to integrate information from nucleotides far upstream and downstream of the candidate site. The subsequent BiLSTM then processes the convolved feature maps bidirectionally, capturing sequential dependencies that pure CNNs cannot model. This combination is applied independently to features derived from each of the three encoding schemes: one-hot vectors provide a sparse, position-specific representation; RNA word embeddings project k-mer substrings into dense continuous space; and RGloVe augments the standard GloVe objective with RNA-specific co-occurrence statistics for improved contextual representations.

The three DCB branches produce independent probability scores for each candidate site, which are combined using soft voting — averaging the predicted probabilities — to produce the final classification. On benchmark datasets, EMDLP achieved an AUROC of 95.56% for m1A prediction and 85.24% for m6A prediction, exceeding previously reported state-of-the-art results on both tasks at the time of publication. Training and evaluation followed the standard positive/negative split conventions used in the RNA modification prediction literature, with balanced sampling to mitigate class imbalance.

Applications

EMDLP is designed for researchers studying epitranscriptomics — the chemical modification landscape of RNA — who need to prioritize candidate methylation sites for experimental validation. Molecular biologists investigating the regulatory roles of m6A or m1A in a specific transcript can use the web server to rapidly assess which adenosine positions are most likely to carry modifications, reducing the cost and scale of downstream MeRIP-seq or antibody-based enrichment experiments. The tool is also useful in large-scale transcriptome analyses: given a set of transcript sequences, EMDLP can generate genome-wide methylation site predictions that serve as hypotheses for follow-up functional assays. Cancer researchers and RNA biologists investigating modifications in viral RNAs or non-coding RNA classes can apply the model to any RNA species of interest, provided that training-domain considerations are kept in mind.

Impact

EMDLP contributes to a growing body of sequence-based tools for epitranscriptomic site prediction, sitting alongside methods such as SRAMP, m6ANet, and DeepM6ASeq. Its primary technical contribution is the systematic integration of three encoding strategies through a single unified ensemble framework, demonstrating that encoding diversity provides complementary signal over using any single representation. The accompanying web server lowers the barrier to entry for wet-lab researchers without bioinformatics infrastructure. The model is a relatively focused, task-specific tool rather than a broadly pre-trained foundation model, which means its predictions are most reliable within the sequence contexts represented in its training data. Users applying EMDLP to RNA species or organisms substantially different from the training distribution should interpret predictions with appropriate caution and consider retraining on domain-specific data using the provided codebase.

Citation

EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction

Wang, H., et al. (2022) EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction. BMC Bioinformatics.

DOI: 10.1186/s12859-022-04756-1

Recent citations

Papers that recently cited this model.

Advanced deep learning strategies in nanopore RNA sequencing.
C. Ling, Benjamin Lebeau, K. C. Keong, et al.
RNA Biology · Feb 2026
0
DT-m6A: A DenseNet-Transformer Hybrid Framework for Accurate Prediction of m6A Modification Sites across Diverse Cell Lines and Tissues.
Qi Tao, Jianhua Jia
Frontiers in Bioscience · Jan 2026
0
Prediction of RNA m6A Methylation Sites in Multiple Tissues Based on Dual-branch Residual Network
Xiaotian Guo, Wei Gao, Dan Chen, et al.
Biochemistry and Biophysics · Nov 2025
0

Top citations

The most-cited papers that cite this model.

BERT2OME: Prediction of 2′-O-Methylation Modifications From RNA Sequence by Transformer Architecture Based on BERT
Necla Nisa Soylu, Emre Sefer
IEEE/ACM Transactions on Computational Biology & Bioinformatics · May 2023
28
Molecular insights into regulatory RNAs in the cellular machinery
Sumin Yang, Sung-Hyun Kim, Eunjeong Yang, et al.
Experimental and Molecular Medicine · Jun 2024
24
RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models
M. Asim, Muhammad Ali Ibrahim, Tayyaba Asif, et al.
Heliyon · Jan 2025
13
DeepIRES: a hybrid deep learning model for accurate identification of internal ribosome entry sites in cellular and viral mRNAs
Jian Zhao, Zhewei Chen, Meng Zhang, et al.
Briefings Bioinform. · Jul 2024
12
Dynamic regulation and key roles of ribonucleic acid methylation
J. Zou, H. Liu, Wei Tan, et al.
Frontiers in Cellular Neuroscience · Dec 2022
12

Citations

Total Citations26

Influential0

References45

GitHub

Stars1

Forks1

Open Issues0

Contributors1

Last Push4y ago

LanguagePython

Fields of citing research

Biology92%
Computer Science85%
Medicine85%
Engineering4%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

34Closed

Usability — can I run it?30

Reproducibility — can I retrain it?26

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper Research Paper Official Website

Key Features

Multiscale encoding: Three distinct sequence encoding schemes — RNA word embedding, one-hot encoding, and RGloVe (a GloVe-based word vector method adapted for RNA) — capture complementary local and global sequence patterns around candidate methylation sites.

Dilated convolutional BiLSTM (DCB) architecture: A dilated convolutional neural network (DCNN) feeds into a Bidirectional LSTM (BiLSTM), allowing the model to extract hierarchical sequence features at multiple receptive-field scales without losing temporal context.

Soft-voting ensemble: Predictions from the three independently trained encoding branches are combined via soft voting, reducing variance and improving generalization relative to any single-encoding model.

Multi-modification support: The model is evaluated on both m1A and m6A methylation types, demonstrating generalizability across chemically distinct RNA modifications.

Web server deployment: An interactive web server (labiip.net/EMDLP) allows researchers to submit sequences and receive methylation site predictions without local installation.

Open-source implementation: The full training and inference code is available on GitHub, enabling retraining on new datasets and adaptation to additional RNA modification types.

Technical Details

Applications

Impact

EMDLP

#Key Features

#Technical Details

#Applications

#Impact

Citation

EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

EMDLP

#Key Features

#Technical Details

#Applications

#Impact

Citation

EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact