bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene

DamageFormer

University of Florida

Multimodal deep-learning framework that detects and localizes DNA lesions directly from native nanopore sequencing, built on the damage-aware LesionBERT foundation model.

Released: May 2026

DNA lesions—chemically altered or damaged bases arising from oxidation, alkylation, UV exposure, and other insults—are central to mutagenesis, aging, and disease, yet they are difficult to map directly because conventional sequencing chemistries are blind to most non-canonical base modifications. DamageFormer addresses this gap with a multimodal deep-learning framework that detects and localizes DNA lesions directly from native (PCR-free) nanopore sequencing reads, exploiting the subtle perturbations a damaged base imparts on the raw ionic current signal as DNA translocates through the pore.

Developed by Yang, Li, Ma, and Yin at the University of Florida's Department of Health Outcomes & Biomedical Informatics (HOBI / Yin Lab) and posted to bioRxiv in May 2026, DamageFormer pairs a damage-aware genomic foundation model with a dedicated nanopore signal encoder. Its core component, LesionBERT, is fine-tuned from DNABERT-2 using lesion-focused masked-reconstruction objectives so that the language-model representation becomes sensitized to the sequence context surrounding damage. This is then fused with raw-signal features to produce per-position lesion predictions.

What distinguishes DamageFormer is its generalization: the model transfers zero-shot to chemically distinct lesion types it never saw during training, reaching an AUROC of 0.99997 on held-out damage chemistries. This suggests the framework learns transferable signatures of structural distortion rather than memorizing individual lesion fingerprints.

#Key Features

  • Native nanopore detection: Operates directly on raw, PCR-free nanopore signal, so labile and non-canonical lesions are preserved rather than erased by amplification or bisulfite-style conversion chemistry.
  • LesionBERT foundation model: A damage-aware genomic encoder fine-tuned from DNABERT-2 with lesion-focused masked-reconstruction objectives plus a LoRA adapter, enabling efficient specialization without retraining the full backbone.
  • Multimodal adaptive gating: A CNN/BiLSTM signal encoder is fused with the LesionBERT sequence representation through an adaptive gating mechanism that learns how much to weight each modality per position.
  • Zero-shot cross-chemistry generalization: Detects chemically distinct lesion types absent from training, achieving AUROC 0.99997 and indicating the model captures general damage signatures.
  • Per-position localization: Beyond binary detection, the framework localizes lesions along the read, supporting fine-grained damage mapping.

#Technical Details

DamageFormer is a two-branch architecture. The sequence branch, LesionBERT, inherits the DNABERT-2 transformer backbone and is adapted via lesion-focused masked-language modeling together with a low-rank (LoRA) adapter for parameter-efficient fine-tuning. The signal branch encodes raw nanopore current with convolutional layers followed by a bidirectional LSTM to capture local and sequential dependencies in the translocation trace. The two modality embeddings are combined by an adaptive gating module that dynamically weights sequence-context versus signal evidence before a prediction head emits lesion calls. Inference is run through inference_multimodal.py, which loads a trained model from a --checkpoint together with the foundation-model weights specified by --pretrained_dir. On zero-shot evaluation against lesion chemistries excluded from training, the framework reports an AUROC of 0.99997.

#Applications

DamageFormer is aimed at researchers studying genome integrity, DNA-repair biology, mutagenesis, environmental and chemical genotoxicity, and aging, where knowing precisely where lesions occur is essential. Because it reads native nanopore signal rather than amplified DNA, it suits workflows that must preserve fragile modifications, and its zero-shot transfer makes it attractive for surveying novel or uncharacterized damage chemistries without assembling new labeled training sets for each.

#Impact

DamageFormer demonstrates that pairing a damage-aware genomic foundation model with raw-signal encoding can turn standard nanopore sequencing into a direct DNA-damage assay, and its strong cross-chemistry generalization points toward a single tool that maps diverse lesion types. As a recent (May 2026) bioRxiv preprint, its real-world adoption and independent benchmarking remain to be established, and the near-perfect reported AUROC warrants validation on broader, biologically realistic datasets. The code is released under the MIT license on GitHub; the pretrained weights are stated to be available but are not yet present in the repository tree (which currently contains only source code), and no separate license is specified for the weights themselves.

Citation

DamageFormer: a damage-aware multimodal deep learning framework for DNA lesion identification from nanopore sequencing

Yang, Q., et al. (2026) DamageFormer: a damage-aware multimodal deep learning framework for DNA lesion identification from nanopore sequencing. bioRxiv.

DOI: 10.64898/2026.05.14.725245

Openness

Unclassified
Missing required components

Tags

bertbilstmcnndnadna_damage_detectionfoundation_modelgenomicsmultimodalnanoporesequence_classificationtransfer_learningtransformervariant_effect_predictionzero_shot

Resources

GitHub RepositoryResearch Paper