bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & Gene

CREP

University of Oxford

Fine-tuned Enformer derivative that predicts discrete, interpretable cis-regulatory element class annotations (enhancer, promoter, insulator) directly from DNA sequence across human cell types.

Released: June 2026

CREP (Cis-Regulatory Element Predictor) is a deep learning model that predicts discrete, interpretable cis-regulatory element (CRE) class annotations — such as enhancers, promoters, and insulators — directly from genomic DNA sequence. It was developed by Nicolo' Stranieri, Simone G. Riva, and Jim Raymond Hughes at the MRC Weatherall Institute of Molecular Medicine (MRC WIMM) at the University of Oxford, and released as a bioRxiv preprint in June 2026.

CREP is built as a fine-tuned derivative of Enformer, the transformer-based sequence-to-function model from Google DeepMind. Where Enformer predicts thousands of continuous epigenomic signal tracks (gene expression, chromatin accessibility, histone marks) from a 200kb DNA window, CREP repurposes that learned regulatory grammar toward a different output: rather than imputing raw signal that an analyst must then interpret, CREP emits categorical CRE class labels that map directly onto the functional vocabulary biologists use. This shifts the model from quantitative signal imputation toward direct, human-readable regulatory annotation.

The model addresses a recurring gap between sequence-to-function prediction and downstream interpretation. Continuous epigenomic tracks are powerful but require expert thresholding, peak calling, and integration across assays before a region can be confidently called an enhancer or insulator. By training a dedicated checkpoint to output element classes informed by REgulamentary-derived annotations — a de novo CRE annotation framework also from the Hughes lab — across multiple human cell types, CREP collapses that interpretation step into the model itself.

#Key Features

  • Discrete CRE class outputs: CREP predicts categorical regulatory element annotations (enhancer, promoter, insulator) rather than continuous epigenomic signal tracks, producing directly interpretable labels instead of raw signal that must be post-processed.
  • Enformer-based transfer learning: The model is a fine-tuned Enformer derivative, reusing Enformer's long-range convolution-transformer backbone and its 200kb receptive field while retraining a separate checkpoint for the classification objective.
  • REgulamentary-informed supervision: Training labels are derived from REgulamentary, the Hughes lab's de novo cis-regulatory element annotation method, providing consistent CRE definitions to learn from.
  • Multi-cell-type training: CREP is trained across multiple human cell types, allowing cell-type-aware annotation of regulatory architecture from sequence alone.
  • Variant and de novo element analysis: The authors demonstrate CREP on case studies including a Vanuatu population SNP and a de novo erythroid regulatory element, illustrating its use for interpreting sequence variation and newly emerged elements.

#Technical Details

CREP inherits Enformer's hybrid convolutional-transformer architecture, which encodes one-hot DNA sequence through convolutional downsampling blocks followed by transformer self-attention layers operating over a 200kb context (capturing distal regulatory interactions up to roughly 100kb away). Rather than fine-tuning toward the original continuous multi-track regression head, CREP trains a separate checkpoint with a classification objective that outputs discrete CRE class annotations. Supervision comes from REgulamentary-derived element annotations spanning multiple human cell types, casting regulatory annotation as a sequence-to-label task layered on Enformer's pretrained representations. The preprint demonstrates the approach on interpretive case studies — including a Vanuatu-population single-nucleotide polymorphism and a de novo erythroid cis-regulatory element — showing how categorical predictions surface the functional consequences of specific sequence changes.

#Applications

CREP is intended for researchers in regulatory genomics and human genetics who need interpretable annotations of noncoding DNA rather than continuous signal tracks. By emitting element classes directly, it supports annotating the regulatory architecture of a locus, prioritizing and interpreting noncoding variants by their predicted effect on element class, and characterizing newly arisen or cell-type-specific regulatory elements. The demonstrated case studies — a population-specific SNP and a de novo erythroid element — illustrate its applicability to disease-relevant variant interpretation and to studying the emergence of regulatory elements, particularly in hematopoietic and erythroid biology that is a focus of the Hughes lab.

#Impact

CREP illustrates a broader trend of adapting large pretrained genomic models such as Enformer toward task-specific, interpretable outputs rather than using them only for raw signal imputation. By coupling Enformer's learned long-range regulatory grammar with REgulamentary's CRE definitions, it offers a route to annotation that is both sequence-driven and directly meaningful to experimentalists. As a June 2026 bioRxiv preprint released under a CC BY-NC license, CREP's broader adoption and benchmarking remain to be established, and the work is not yet peer reviewed. At the time of release no public code or model-weights repository was identified, which currently limits independent reproduction and reuse; the case-study results should be read as demonstrative rather than as a comprehensive benchmark against existing CRE-annotation methods.

Citation

CREP: Cis-Regulatory Element Predictor Based on Fine-Tuned Enformer

Stranieri, N., et al. (2026) CREP: Cis-Regulatory Element Predictor Based on Fine-Tuned Enformer. openRxiv.

DOI: 10.64898/2026.06.05.730309

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
8Closed
Usability — can I run it?7
Reproducibility — can I retrain it?10
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cis_regulatory_element_annotationdnaregulatory_genomicssupervisedtransfer_learningtransformervariant_effect_prediction

Resources

Research Paper