bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
RNA

EMDLP

China University of Mining and Technology

Ensemble multiscale deep learning model for RNA methylation site prediction, combining dilated convolution and BiLSTM with multiple sequence encodings.

Released: 2022

Overview

EMDLP (Ensemble Multiscale Deep Learning Predictor) is a computational tool developed at the China University of Mining and Technology (CUMT) for the identification of RNA methylation sites from sequence data. RNA methylation — particularly N6-methyladenosine (m6A) and N1-methyladenosine (m1A) — is among the most prevalent and functionally important post-transcriptional RNA modifications. These chemical marks regulate mRNA stability, translation efficiency, and splicing, and their dysregulation has been linked to cancer, neurological disorders, and viral infection. Accurate computational prediction of methylation sites is therefore essential for researchers who lack access to high-throughput epitranscriptomic sequencing methods such as MeRIP-seq.

EMDLP addresses this problem by combining multiple sequence encoding strategies with a hybrid neural architecture that captures both local sequence context and long-range dependencies simultaneously. The key insight driving the model is that different encodings reveal complementary information about the sequence neighborhood around a putative methylation site, and that an ensemble integrating all three encoding streams yields more robust predictions than any single strategy alone.

Published in BMC Bioinformatics in June 2022, EMDLP demonstrates state-of-the-art performance on benchmark datasets for both m1A and m6A site prediction, and is accompanied by a publicly accessible web server that allows users to submit RNA sequences without installing local software.

Key Features

  • Multiscale encoding: Three distinct sequence encoding schemes — RNA word embedding, one-hot encoding, and RGloVe (a GloVe-based word vector method adapted for RNA) — capture complementary local and global sequence patterns around candidate methylation sites.
  • Dilated convolutional BiLSTM (DCB) architecture: A dilated convolutional neural network (DCNN) feeds into a Bidirectional LSTM (BiLSTM), allowing the model to extract hierarchical sequence features at multiple receptive-field scales without losing temporal context.
  • Soft-voting ensemble: Predictions from the three independently trained encoding branches are combined via soft voting, reducing variance and improving generalization relative to any single-encoding model.
  • Multi-modification support: The model is evaluated on both m1A and m6A methylation types, demonstrating generalizability across chemically distinct RNA modifications.
  • Web server deployment: An interactive web server (labiip.net/EMDLP) allows researchers to submit sequences and receive methylation site predictions without local installation.
  • Open-source implementation: The full training and inference code is available on GitHub, enabling retraining on new datasets and adaptation to additional RNA modification types.

Technical Details

EMDLP's core architecture — the DCB module — chains a dilated convolutional neural network to a Bidirectional LSTM. Dilated convolutions expand the effective receptive field exponentially with depth without increasing the number of parameters, allowing the model to integrate information from nucleotides far upstream and downstream of the candidate site. The subsequent BiLSTM then processes the convolved feature maps bidirectionally, capturing sequential dependencies that pure CNNs cannot model. This combination is applied independently to features derived from each of the three encoding schemes: one-hot vectors provide a sparse, position-specific representation; RNA word embeddings project k-mer substrings into dense continuous space; and RGloVe augments the standard GloVe objective with RNA-specific co-occurrence statistics for improved contextual representations.

The three DCB branches produce independent probability scores for each candidate site, which are combined using soft voting — averaging the predicted probabilities — to produce the final classification. On benchmark datasets, EMDLP achieved an AUROC of 95.56% for m1A prediction and 85.24% for m6A prediction, exceeding previously reported state-of-the-art results on both tasks at the time of publication. Training and evaluation followed the standard positive/negative split conventions used in the RNA modification prediction literature, with balanced sampling to mitigate class imbalance.

Applications

EMDLP is designed for researchers studying epitranscriptomics — the chemical modification landscape of RNA — who need to prioritize candidate methylation sites for experimental validation. Molecular biologists investigating the regulatory roles of m6A or m1A in a specific transcript can use the web server to rapidly assess which adenosine positions are most likely to carry modifications, reducing the cost and scale of downstream MeRIP-seq or antibody-based enrichment experiments. The tool is also useful in large-scale transcriptome analyses: given a set of transcript sequences, EMDLP can generate genome-wide methylation site predictions that serve as hypotheses for follow-up functional assays. Cancer researchers and RNA biologists investigating modifications in viral RNAs or non-coding RNA classes can apply the model to any RNA species of interest, provided that training-domain considerations are kept in mind.

Impact

EMDLP contributes to a growing body of sequence-based tools for epitranscriptomic site prediction, sitting alongside methods such as SRAMP, m6ANet, and DeepM6ASeq. Its primary technical contribution is the systematic integration of three encoding strategies through a single unified ensemble framework, demonstrating that encoding diversity provides complementary signal over using any single representation. The accompanying web server lowers the barrier to entry for wet-lab researchers without bioinformatics infrastructure. The model is a relatively focused, task-specific tool rather than a broadly pre-trained foundation model, which means its predictions are most reliable within the sequence contexts represented in its training data. Users applying EMDLP to RNA species or organisms substantially different from the training distribution should interpret predictions with appropriate caution and consider retraining on domain-specific data using the provided codebase.

Citation

EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction

Wang, H., et al. (2022) EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction. BMC Bioinformatics.

DOI: 10.1186/s12859-022-04756-1

Metrics

GitHub

Stars1
Forks1
Open Issues0
Contributors1
Last Push4y ago
LanguagePython

Citations

Total Citations26
Influential0
References45

Tags

sequence analysisensemblemethylation

Resources

GitHub RepositoryResearch PaperOfficial Website