bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & GeneRNA

eRNAformer

Nanchang University

A CNN-transformer framework that maps enhancer-derived RNA (eRNA) loci genome-wide from DNA sequence and aggregated RNA-seq signal.

Released: June 2026

Enhancer-derived RNAs (eRNAs) are short, often bidirectionally transcribed non-coding RNAs produced at active enhancers, and their presence is one of the most reliable signatures of enhancer activity. Mapping eRNA loci across a genome is difficult because these transcripts are typically unstable, lowly expressed, and lack the splicing and polyadenylation marks that anchor conventional gene annotation pipelines. eRNAformer addresses this gap with a deep learning framework for genome-wide de novo mapping of eRNA loci directly from DNA sequence and aggregated RNA-seq signal.

Developed by researchers at Nanchang University and released as a bioRxiv preprint on June 27, 2026, eRNAformer is a multimodal model that combines convolutional neural networks with a transformer encoder. The convolutional layers extract local sequence motifs and read-coverage patterns, while the transformer captures the long-range dependencies characteristic of bidirectional enhancer transcription. By learning from both the genomic sequence and experimental expression evidence, the model classifies candidate loci as eRNA-producing without requiring per-dataset retraining.

The framework is positioned as a task-specific tool for enhancer characterization rather than a general-purpose sequence foundation model. It ships with pretrained weights that drive inference out of the box, alongside a fine-tuning path for adapting to new experimental contexts.

#Key Features

  • Multimodal sequence-plus-expression input: The model jointly ingests reference DNA sequence and aggregated RNA-seq coverage, letting it ground sequence-based predictions in measured transcriptional activity.
  • CNN-transformer architecture: Convolutional layers capture local motifs and coverage shape while the transformer models the long-range, bidirectional signal that distinguishes enhancer transcription.
  • Pretrained inference without retraining: A released checkpoint runs in evaluation mode on new genomic intervals, producing per-locus eRNA probabilities directly from a BED interval set, reference FASTA, and BigWig signal.
  • Fine-tuning support: A dedicated script adapts the pretrained weights to new benchmarks and can freeze the backbone to train only the classifier head.
  • Cross-species coverage: The repository provides example datasets and reference genomes for both human (GRCh38) and mouse (GRCm38).

#Technical Details

eRNAformer integrates a convolutional front end with a transformer encoder to classify genomic intervals as eRNA loci from paired sequence and RNA-seq inputs. Training benchmarks are built from established enhancer resources, including FANTOM5 enhancer annotations and the eRNAbase reference, with RNA-seq samples drawn from the SRA and processed through transcript assembly with StringTie; reference sequences come from GENCODE (human GRCh38 and mouse GRCm38). The model was benchmarked on ENCODE datasets, where it achieved high sensitivity and specificity for eRNA locus identification. Applied to GEO datasets spanning multiple hematologic malignancies, eRNAformer identified between 14,219 and 56,451 eRNA loci across cancer types, and the authors report that newly mapped loci are enriched for evolutionarily constrained variants and genetic risk factors for complex diseases. The implementation is built on PyTorch 2.0.1, with pretrained weights, optimal hyperparameters, and example data distributed through a Zenodo deposit released under CC BY 4.0. The GitHub repository is released under the MIT License.

#Applications

eRNAformer serves genomics and gene-regulation researchers who need to locate active enhancers and their transcripts in datasets where dedicated enhancer assays such as CAGE or GRO-seq are unavailable. Because it operates on standard RNA-seq coverage plus reference sequence, it can repurpose existing transcriptomic data to annotate the regulatory landscape of a tissue or disease state. The authors demonstrate this in cancer genomics by profiling hematologic malignancies and experimentally validating FOXO1e, a novel eRNA cluster roughly 120 kb upstream of FOXO1 implicated in acute myeloid leukemia.

#Impact

eRNAformer extends sequence-based regulatory genomics to a class of elements that conventional annotation pipelines routinely miss, and its validation of FOXO1e illustrates how computational eRNA mapping can surface disease-relevant regulatory loci for follow-up. As a preprint awaiting peer review, its benchmarks are reported by the authors and its broader adoption is still forming. The model is task-specific to eRNA locus classification rather than a general sequence model, and it is distributed as a local install with no hosted inference API, so deployment requires running the pretrained weights in a user-configured environment.

Citation

DOI: 10.64898/2026.06.24.734403

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
95Open
Usability — can I run it?100
Reproducibility — can I retrain it?92
Model Openness Framework
Class II
Open Tooling

Tags

cnnenhancer_annotationgenomicsmultimodalregulatory_element_predictionsupervisedtranscriptomicstransformer

Resources

GitHub RepositoryResearch PaperDataset