A CNN-transformer framework that maps enhancer-derived RNA (eRNA) loci genome-wide from DNA sequence and aggregated RNA-seq signal.
Enhancer-derived RNAs (eRNAs) are short, often bidirectionally transcribed non-coding RNAs produced at active enhancers, and their presence is one of the most reliable signatures of enhancer activity. Mapping eRNA loci across a genome is difficult because these transcripts are typically unstable, lowly expressed, and lack the splicing and polyadenylation marks that anchor conventional gene annotation pipelines. eRNAformer addresses this gap with a deep learning framework for genome-wide de novo mapping of eRNA loci directly from DNA sequence and aggregated RNA-seq signal.
Developed by researchers at Nanchang University and released as a bioRxiv preprint on June 27, 2026, eRNAformer is a multimodal model that combines convolutional neural networks with a transformer encoder. The convolutional layers extract local sequence motifs and read-coverage patterns, while the transformer captures the long-range dependencies characteristic of bidirectional enhancer transcription. By learning from both the genomic sequence and experimental expression evidence, the model classifies candidate loci as eRNA-producing without requiring per-dataset retraining.
The framework is positioned as a task-specific tool for enhancer characterization rather than a general-purpose sequence foundation model. It ships with pretrained weights that drive inference out of the box, alongside a fine-tuning path for adapting to new experimental contexts.
eRNAformer integrates a convolutional front end with a transformer encoder to classify genomic intervals as eRNA loci from paired sequence and RNA-seq inputs. Training benchmarks are built from established enhancer resources, including FANTOM5 enhancer annotations and the eRNAbase reference, with RNA-seq samples drawn from the SRA and processed through transcript assembly with StringTie; reference sequences come from GENCODE (human GRCh38 and mouse GRCm38). The model was benchmarked on ENCODE datasets, where it achieved high sensitivity and specificity for eRNA locus identification. Applied to GEO datasets spanning multiple hematologic malignancies, eRNAformer identified between 14,219 and 56,451 eRNA loci across cancer types, and the authors report that newly mapped loci are enriched for evolutionarily constrained variants and genetic risk factors for complex diseases. The implementation is built on PyTorch 2.0.1, with pretrained weights, optimal hyperparameters, and example data distributed through a Zenodo deposit released under CC BY 4.0. The GitHub repository is released under the MIT License.
eRNAformer serves genomics and gene-regulation researchers who need to locate active enhancers and their transcripts in datasets where dedicated enhancer assays such as CAGE or GRO-seq are unavailable. Because it operates on standard RNA-seq coverage plus reference sequence, it can repurpose existing transcriptomic data to annotate the regulatory landscape of a tissue or disease state. The authors demonstrate this in cancer genomics by profiling hematologic malignancies and experimentally validating FOXO1e, a novel eRNA cluster roughly 120 kb upstream of FOXO1 implicated in acute myeloid leukemia.
eRNAformer extends sequence-based regulatory genomics to a class of elements that conventional annotation pipelines routinely miss, and its validation of FOXO1e illustrates how computational eRNA mapping can surface disease-relevant regulatory loci for follow-up. As a preprint awaiting peer review, its benchmarks are reported by the authors and its broader adoption is still forming. The model is task-specific to eRNA locus classification rather than a general sequence model, and it is distributed as a local install with no hosted inference API, so deployment requires running the pretrained weights in a user-configured environment.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data