eccDNAMamba

Bidirectional state-space (Mamba-2) genomic model for ultra-long extrachromosomal circular DNA, scaling linearly with sequence length.

Released: November 2025

Parameters: 500 Million

Extrachromosomal circular DNA (eccDNA) is a class of circular genetic elements that form outside chromosomes and play an increasingly recognized role in cancer, where amplified oncogenes on eccDNA drive tumor heterogeneity and drug resistance. Modeling these sequences is difficult: individual eccDNA molecules can span tens of kilobases, and their circular topology has no natural start or end. Existing genomic foundation models either rely on attention mechanisms whose cost grows quadratically with sequence length or truncate molecules into kilobase fragments, breaking sequence continuity.

eccDNAMamba, introduced in late 2025 by researchers at Brown University, is the first bidirectional state-space model purpose-built for eccDNA. It is based on the Mamba-2 architecture, whose selective state-space design scales linearly with input length, allowing the model to ingest ultra-long sequences in a single pass rather than chopping them into pieces. To respect the circular nature of eccDNA, the authors introduce a circular augmentation strategy that preserves the topology of each molecule during pretraining.

The model fills a gap left by chromosome-oriented genomic language models, which were not designed for the length and circularity of eccDNA. By pairing linear-time sequence modeling with topology-aware augmentation, eccDNAMamba offers a representation-learning backbone tailored to this emerging class of cancer-relevant genetic elements.

Key Features

Bidirectional Mamba-2 backbone: A selective state-space model processes sequences in both directions, capturing long-range dependencies while scaling linearly rather than quadratically with length.
Ultra-long context: The linear-time design lets the model read full eccDNA molecules spanning tens of kilobases without fragmenting them into shorter windows.
Circular augmentation: A topology-preserving augmentation reflects the start/end ambiguity of circular DNA, teaching the model invariance to rotation of the sequence.
Cancer-relevant tasks: Pretrained representations transfer to discriminating cancer versus healthy eccDNA and to copy-number-level prediction.
Open pretrained weights: Pretrained and fine-tuned checkpoints, along with the associated datasets, are released through the eccDNAMamba HuggingFace organization.

Technical Details

eccDNAMamba uses a BiMambaForMaskedLM formulation built on the Mamba-2 state-space architecture, pretrained with masked-language-style objectives on eccDNA sequences and released at roughly 0.5B parameters. The circular augmentation strategy rotates sequences to preserve eccDNA topology during training. The authors evaluate transfer to several downstream tasks, including cancer versus healthy eccDNA classification, copy-number-level prediction at multiple thresholds, and real-versus-pseudo eccDNA discrimination across Homo sapiens, Gallus gallus, and Arabidopsis thaliana datasets. The reported results show the model outperforming existing genomic foundation models on cancer discrimination and copy-number prediction. The implementation is in PyTorch with the mamba_ssm and causal_conv1d kernels, and checkpoints and datasets are distributed on HuggingFace.

Applications

eccDNAMamba is aimed at cancer genomics researchers studying oncogene amplification, tumor heterogeneity, and treatment resistance driven by extrachromosomal DNA. Its ability to classify cancer versus healthy eccDNA and predict copy-number levels makes it useful for analyzing sequencing data where eccDNA content may serve as a biomarker, and its species-spanning real-versus-pseudo classifiers support filtering and validation of detected circular elements. As a pretrained backbone, it can be fine-tuned for new eccDNA classification or regression tasks with relatively small labeled datasets.

Impact

eccDNAMamba is an early example of adapting state-space architectures to a specialized genomic problem where sequence length and circular topology defeat standard attention-based genomic language models. By releasing open weights and curated datasets, the authors lower the barrier to eccDNA-focused modeling and provide a reusable foundation for a growing area of cancer research. As a recent preprint, its benchmark comparisons and downstream adoption remain to be validated by the broader community, and its evaluation focuses on the specific eccDNA tasks the authors curated rather than a wide genomic benchmark suite.

Citation

From Circles to Signals: Representation Learning on Ultra-Long Extrachromosomal Circular DNA

Preprint

Li, J., et al. (2026) From Circles to Signals: Representation Learning on Ultra-Long Extrachromosomal Circular DNA. bioRxiv.

DOI: 10.1101/2025.11.22.689941

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References31

GitHub

Stars5

Forks1

Open Issues2

Contributors2

Last Push8mo ago

LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

54Partial

Usability — can I run it?54

Reproducibility — can I retrain it?48

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Bidirectional Mamba-2 backbone: A selective state-space model processes sequences in both directions, capturing long-range dependencies while scaling linearly rather than quadratically with length.

Ultra-long context: The linear-time design lets the model read full eccDNA molecules spanning tens of kilobases without fragmenting them into shorter windows.

Circular augmentation: A topology-preserving augmentation reflects the start/end ambiguity of circular DNA, teaching the model invariance to rotation of the sequence.

Cancer-relevant tasks: Pretrained representations transfer to discriminating cancer versus healthy eccDNA and to copy-number-level prediction.

Open pretrained weights: Pretrained and fine-tuned checkpoints, along with the associated datasets, are released through the eccDNAMamba HuggingFace organization.

Technical Details

Applications

Impact

eccDNAMamba

Key Features

Technical Details

Applications

Impact

Citation

From Circles to Signals: Representation Learning on Ultra-Long Extrachromosomal Circular DNA

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

eccDNAMamba

Key Features

Technical Details

Applications

Impact

Citation

From Circles to Signals: Representation Learning on Ultra-Long Extrachromosomal Circular DNA

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

eccDNAMamba

#Key Features

#Technical Details

#Applications

#Impact

Citation

From Circles to Signals: Representation Learning on Ultra-Long Extrachromosomal Circular DNA

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

eccDNAMamba

#Key Features

#Technical Details

#Applications

#Impact

Citation

From Circles to Signals: Representation Learning on Ultra-Long Extrachromosomal Circular DNA

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact