bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & Gene

eccDNAMamba

Brown University

Bidirectional state-space (Mamba-2) genomic model for ultra-long extrachromosomal circular DNA, scaling linearly with sequence length.

Released: November 2025
Parameters: 500 Million

Extrachromosomal circular DNA (eccDNA) is a class of circular genetic elements that form outside chromosomes and play an increasingly recognized role in cancer, where amplified oncogenes on eccDNA drive tumor heterogeneity and drug resistance. Modeling these sequences is difficult: individual eccDNA molecules can span tens of kilobases, and their circular topology has no natural start or end. Existing genomic foundation models either rely on attention mechanisms whose cost grows quadratically with sequence length or truncate molecules into kilobase fragments, breaking sequence continuity.

eccDNAMamba, introduced in late 2025 by researchers at Brown University, is the first bidirectional state-space model purpose-built for eccDNA. It is based on the Mamba-2 architecture, whose selective state-space design scales linearly with input length, allowing the model to ingest ultra-long sequences in a single pass rather than chopping them into pieces. To respect the circular nature of eccDNA, the authors introduce a circular augmentation strategy that preserves the topology of each molecule during pretraining.

The model fills a gap left by chromosome-oriented genomic language models, which were not designed for the length and circularity of eccDNA. By pairing linear-time sequence modeling with topology-aware augmentation, eccDNAMamba offers a representation-learning backbone tailored to this emerging class of cancer-relevant genetic elements.

#Key Features

  • Bidirectional Mamba-2 backbone: A selective state-space model processes sequences in both directions, capturing long-range dependencies while scaling linearly rather than quadratically with length.
  • Ultra-long context: The linear-time design lets the model read full eccDNA molecules spanning tens of kilobases without fragmenting them into shorter windows.
  • Circular augmentation: A topology-preserving augmentation reflects the start/end ambiguity of circular DNA, teaching the model invariance to rotation of the sequence.
  • Cancer-relevant tasks: Pretrained representations transfer to discriminating cancer versus healthy eccDNA and to copy-number-level prediction.
  • Open pretrained weights: Pretrained and fine-tuned checkpoints, along with the associated datasets, are released through the eccDNAMamba HuggingFace organization.

#Technical Details

eccDNAMamba uses a BiMambaForMaskedLM formulation built on the Mamba-2 state-space architecture, pretrained with masked-language-style objectives on eccDNA sequences and released at roughly 0.5B parameters. The circular augmentation strategy rotates sequences to preserve eccDNA topology during training. The authors evaluate transfer to several downstream tasks, including cancer versus healthy eccDNA classification, copy-number-level prediction at multiple thresholds, and real-versus-pseudo eccDNA discrimination across Homo sapiens, Gallus gallus, and Arabidopsis thaliana datasets. The reported results show the model outperforming existing genomic foundation models on cancer discrimination and copy-number prediction. The implementation is in PyTorch with the mamba_ssm and causal_conv1d kernels, and checkpoints and datasets are distributed on HuggingFace.

#Applications

eccDNAMamba is aimed at cancer genomics researchers studying oncogene amplification, tumor heterogeneity, and treatment resistance driven by extrachromosomal DNA. Its ability to classify cancer versus healthy eccDNA and predict copy-number levels makes it useful for analyzing sequencing data where eccDNA content may serve as a biomarker, and its species-spanning real-versus-pseudo classifiers support filtering and validation of detected circular elements. As a pretrained backbone, it can be fine-tuned for new eccDNA classification or regression tasks with relatively small labeled datasets.

#Impact

eccDNAMamba is an early example of adapting state-space architectures to a specialized genomic problem where sequence length and circular topology defeat standard attention-based genomic language models. By releasing open weights and curated datasets, the authors lower the barrier to eccDNA-focused modeling and provide a reusable foundation for a growing area of cancer research. As a recent preprint, its benchmark comparisons and downstream adoption remain to be validated by the broader community, and its evaluation focuses on the specific eccDNA tasks the authors curated rather than a wide genomic benchmark suite.

Tags

variant_effect_predictionsequence_classificationstate_space_modelmambafoundation_modelself_superviseddnagenomics