bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cellDNA & Gene

scDNAm-GPT

Guangzhou Medical University / Guangzhou National Laboratory

A foundation model using Mamba and cross-attention to capture genome-wide CpG methylation dependencies in single-cell whole-genome bisulfite sequencing data.

Released: February 2025

scDNAm-GPT is a foundation model for single-cell DNA methylation analysis, developed by researchers at Guangzhou Medical University and Guangzhou National Laboratory and first posted to bioRxiv in February 2025. It addresses a persistent gap in single-cell epigenomics: while transformer-based foundation models have transformed single-cell RNA sequencing analysis, single-cell whole-genome bisulfite sequencing (scWGBS) has lacked a comparable general-purpose model. The central challenge is scale—a single methylome contains millions of CpG sites, far exceeding the context length that standard transformers can process efficiently.

The model captures genome-wide CpG methylation dependencies across extremely long genomic sequences, enabling a single pretrained backbone to support multiple downstream tasks without task-specific retraining. Rather than treating methylation as a fixed feature matrix, scDNAm-GPT learns representations directly from the raw methylome, allowing it to generalize across tissues, species, and analysis goals.

scDNAm-GPT was trained on over one million single cells spanning 35 human and mouse tissues, making it one of the first broadly applicable foundation models purpose-built for the single-cell methylation modality. It is released as open source, positioning it alongside RNA-focused single-cell foundation models while extending the foundation-model paradigm into the epigenetic layer of cellular identity.

#Key Features

  • Genome-scale context: A selective state space (Mamba) backbone combined with cross-attention processes very long CpG sequences—up to tens of millions of sites—that exceed the practical limits of conventional self-attention.
  • Universal pretraining: A single model trained on more than one million single cells from 35 human and mouse tissues serves as a shared backbone across diverse downstream tasks.
  • Zero-shot gene expression prediction: The model infers gene expression patterns directly from methylation signals without task-specific training.
  • Trajectory inference: Learned representations support reconstruction of developmental and differentiation trajectories from methylome data.
  • Cell-free DNA deconvolution: The model deconvolutes cfDNA methylation mixtures to estimate tissue-of-origin contributions, a capability relevant to liquid-biopsy applications.
  • Multiple released variants: Three pretrained checkpoints are provided, covering human/mouse brain, human body/mouse, and a smaller compact model.

#Technical Details

scDNAm-GPT pairs a Mamba selective state space model with cross-attention to model long-range dependencies among CpG sites efficiently. State space models scale near-linearly with sequence length, allowing the architecture to ingest sequences far longer than standard transformers can handle while retaining the ability to relate distant methylation events. Pretraining used scWGBS data from over one million single cells across 35 human and mouse tissues, and the authors report strong cell-type classification accuracy across human-body and brain cell types. The repository provides three model variants—a human/mouse brain model, a human body/mouse model, and a compact "small" model—distributed with their weights via Google Drive and tutorial notebooks demonstrating clustering, expression prediction, trajectory inference, and cfDNA deconvolution.

#Applications

scDNAm-GPT supports researchers studying epigenetic regulation, cell-type identity, and development through single-cell methylation data. Its zero-shot gene expression prediction lets investigators link methylation states to transcriptional output without paired multi-omic measurements, while trajectory inference aids studies of differentiation and lineage commitment. The cell-free DNA deconvolution capability is particularly relevant to liquid-biopsy and non-invasive diagnostics, where estimating the tissue of origin of circulating methylation signals can inform cancer detection and monitoring.

#Impact

By bringing the foundation-model paradigm to single-cell whole-genome bisulfite sequencing, scDNAm-GPT helps close a gap between the rapidly maturing ecosystem of RNA-based single-cell models and the comparatively underserved methylation modality. Its use of a state space backbone to handle genome-scale CpG context offers a practical template for modeling other ultra-long biological sequences. As a preprint with openly released code (MIT license) and pretrained weights, its long-term influence and benchmark standing remain to be established through peer review and independent evaluation, but it represents a notable early step toward general-purpose single-cell epigenomic models.

Citation

scDNAm-GPT Captures Genome-wide CpG Dependencies in Single-cell DNA methylomes to Revolutionize Epigenetic Analysis

Preprint

Liang, C., et al. (2025) scDNAm-GPT Captures Genome-wide CpG Dependencies in Single-cell DNA methylomes to Revolutionize Epigenetic Analysis. bioRxiv.

DOI: 10.1101/2025.02.19.638959

Recent citations

Papers that recently cited this model.

  • Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions

    Aymane Aghziel, M. A. Mahraz, H. Tairi, et al.

    Briefings Bioinform. · Aug 2025

    4Influential

Top citations

The most-cited papers that cite this model.

  • Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions

    Aymane Aghziel, M. A. Mahraz, H. Tairi, et al.

    Briefings Bioinform. · Aug 2025

    4Influential

Citations

Total Citations1
Influential1
References31

GitHub

Stars19
Forks6
Open Issues2
Contributors2
Last Push6mo ago
LanguageJupyter Notebook
LicenseMIT

Fields of citing research

  • Biology100%
  • Computer Science100%
  • Medicine100%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible
78Open
Usability — can I run it?95
Reproducibility — can I retrain it?56
Model Openness Framework
Class III
Open Model

Tags

cell_free_dnacell_type_annotationcross_attentiondna_methylationfoundation_modelgene_expressionstate_space_modeltrajectory_inferencezero_shot

Resources

GitHub RepositoryResearch Paper