Decoding the sequence-based logic that drives cell fate transitions is one of the central challenges in regulatory genomics. While chromatin accessibility assays like scATAC-seq reveal which genomic regions are open in each cell state, converting that information into mechanistic understanding — which transcription factors are active, how their activities change during differentiation, and how the interplay of sequence features and cell topology shapes regulatory dynamics — requires modeling that integrates DNA sequence, epigenomic signal, and cell-cell relationships simultaneously. MuBind was developed to address this integrative challenge within a single, unified deep learning architecture.

MuBind was developed by Ignacio L. Ibarra, Jonas Schneeberger, Erkan Erdogan, Linda Redl, Lara Martens, Dominik Klein, Hananeh Aliee, and Fabian J. Theis at the Institute of Computational Biology, Helmholtz Center Munich (Theis Lab). The preprint was posted on bioRxiv in August 2024. The model distinguishes itself from existing sequence-to-activity models by incorporating cell transition graphs — dynamic relationships between cell states derived from RNA-based trajectory models — as a structural prior that guides how motif activities are propagated between neighboring cells during training. This allows MuBind to learn not just which motifs are active in individual cell clusters but how motif activity evolves across the developmental landscape.

The integration of graph structure into a sequence-activity model addresses a fundamental limitation of existing approaches: standard models like ChromBPNet and DeepSEA predict genomic signal from sequence features independently for each genomic locus or cell state, without encoding information about how regulatory programs evolve across a continuous developmental trajectory. MuBind's graph component allows the model to borrow statistical strength across related cell states, improving the identification of TFs that drive transitions between them — precisely the regulators most relevant to developmental and disease biology.

Key Features

Joint sequence-activity-graph modeling: MuBind integrates DNA sequence features (via a convolutional sequence encoder), cell-based motif activities (estimated per cell or cluster), and cell neighborhood relationships (via a graph neural network) in a single end-to-end architecture.
Cell transition graph priors: Graphs derived from RNA-based dynamical models (e.g., RNA velocity) are used as structural priors in the GNN component, providing biological knowledge about cell state relationships that improves the identification of TFs driving specific transitions.
SELEX-competitive binding prediction: MuBind's motif learning from single-cell data produces TF binding specificity predictions that show high agreement with HT-SELEX measurements, achieving R² values consistent with dedicated binding assay models (R = 0.81, P < 0.001, n = 100 TFs).
Motif activity tracking across pseudotime: Because cell activities are modeled in the context of the cell graph, MuBind can quantify how each motif's activity changes along differentiation trajectories, identifying which TFs become progressively more or less active during specific transitions.
Validated regulatory discoveries: Applied to pancreatic endocrinogenesis (scATAC-seq), MuBind identified Sox9 as a key regulator consistent with independent evidence; in mouse and human neurogenesis data, Gli3 and Prdm16 were identified as transition-driving TFs with supporting literature and chromatin-pseudotime concordance.
Compatibility with bulk and single-cell data: MuBind shows competitive performance in both bulk ATAC-seq/ChIP-seq prediction and single-cell chromatin accessibility count prediction, making it applicable across genomic data modalities.

Technical Details

MuBind's architecture consists of three integrated modules. First, a convolutional sequence encoder processes the DNA sequence of each genomic region (ATAC-seq peak or ChIP-seq region) to extract sequence features and learn de novo motif representations. These sequence features are parameterized as convolutional filters that learn position-weight-matrix-like representations of TF binding preferences, analogous to those learned by SELEX-based models. Second, a cell activity module assigns a scalar activity weight to each learned motif in each cell (or cell cluster), representing the effective binding activity of the corresponding TF in that cell state. Third, a graph neural network module takes the cell activity matrix and propagates information across the cell transition graph — where nodes are cells or clusters and edges represent transition relationships — producing activity representations that reflect both local cell state and neighborhood context.

The final read count prediction for a given genomic region in a given cell is computed as a function of the sequence features, the cell's (GNN-updated) motif activities, and a learned baseline. The model is trained to minimize the divergence between predicted and observed read counts from scATAC-seq or bulk ATAC-seq data. Performance was evaluated against PyProBound — a state-of-the-art binding affinity prediction model — on a benchmark of 100 HT-SELEX datasets. MuBind's learned motifs and their relative activities showed high agreement with the ground-truth TF binding specificities (R = 0.81), validating that the sequence learning component produces biologically accurate motif representations. Three biological case studies were presented: pancreatic endocrinogenesis, mouse neurogenesis, and human brain organoids, with motif-pseudotime correlation plots and TF expression data providing independent validation of the identified regulators.

Applications

MuBind's primary application is the identification of transcriptional regulators driving cell fate transitions from single-cell chromatin accessibility data. In developmental biology, where understanding which TFs orchestrate differentiation is a central question, MuBind provides a data-driven framework to nominate key regulators from scATAC-seq atlases without requiring prior knowledge of the relevant TFs. The model is particularly suited to cell transition analysis — identifying which TFs are most active at decision points between cell states — rather than simply characterizing the motif landscape of stable terminal cell types. In disease contexts including cancer, where epigenomic reprogramming drives oncogenic state transitions, MuBind can identify the sequence-based regulatory logic underlying observed chromatin changes. The model is also useful as a component of multi-omic regulatory analysis pipelines, providing sequence-grounded motif activity estimates that complement TF inference methods such as CellOracle that operate on RNA-level regulon information.

Impact

MuBind represents a meaningful advance in the integration of DNA sequence modeling with single-cell regulatory genomics, particularly in its use of cell transition graphs as structural priors that encode developmental dynamics. By demonstrating competitive binding prediction performance against dedicated SELEX models while simultaneously providing single-cell resolved motif activity estimates and developmental trajectory context, MuBind bridges the gap between sequence-level TF binding characterization and cell-level regulatory dynamics analysis. The model's validation on multiple well-studied developmental systems — pancreatic endocrinogenesis, neurogenesis — with recoverable known regulators provides confidence that its discoveries in less well-characterized biological contexts will be biologically meaningful. As part of the Theis Lab's broader portfolio of single-cell regulatory analysis tools, MuBind complements CellOracle (GRN inference from multi-omics) and scGen/CPA (perturbation prediction) by contributing sequence-level mechanistic grounding to the regulatory inference pipeline.

Sources:

Key Features

Joint sequence-activity-graph modeling: MuBind integrates DNA sequence features (via a convolutional sequence encoder), cell-based motif activities (estimated per cell or cluster), and cell neighborhood relationships (via a graph neural network) in a single end-to-end architecture.

Cell transition graph priors: Graphs derived from RNA-based dynamical models (e.g., RNA velocity) are used as structural priors in the GNN component, providing biological knowledge about cell state relationships that improves the identification of TFs driving specific transitions.

SELEX-competitive binding prediction: MuBind's motif learning from single-cell data produces TF binding specificity predictions that show high agreement with HT-SELEX measurements, achieving R² values consistent with dedicated binding assay models (R = 0.81, P < 0.001, n = 100 TFs).

Motif activity tracking across pseudotime: Because cell activities are modeled in the context of the cell graph, MuBind can quantify how each motif's activity changes along differentiation trajectories, identifying which TFs become progressively more or less active during specific transitions.

Validated regulatory discoveries: Applied to pancreatic endocrinogenesis (scATAC-seq), MuBind identified Sox9 as a key regulator consistent with independent evidence; in mouse and human neurogenesis data, Gli3 and Prdm16 were identified as transition-driving TFs with supporting literature and chromatin-pseudotime concordance.

Compatibility with bulk and single-cell data: MuBind shows competitive performance in both bulk ATAC-seq/ChIP-seq prediction and single-cell chromatin accessibility count prediction, making it applicable across genomic data modalities.

Technical Details

Applications

Impact

Sources:

MuBind

Key Features

Technical Details

Applications

Impact

Citation

Learning sequence-based regulatory dynamics in single-cell genomics

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MuBind

Key Features

Technical Details

Applications

Impact

Citation

Learning sequence-based regulatory dynamics in single-cell genomics

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MuBind

#Key Features

#Technical Details

#Applications

#Impact

Citation

Learning sequence-based regulatory dynamics in single-cell genomics

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MuBind

#Key Features

#Technical Details

#Applications

#Impact

Citation

Learning sequence-based regulatory dynamics in single-cell genomics

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact