Changping Laboratory / Peking University
A Mixture-of-Experts generative model that turns DNA sequence plus cell-type ATAC-seq into unified epigenomic, transcriptomic, and 3D chromatin profiles, generalizing to unseen cell types.
The non-coding genome regulates gene expression through a complex, multiscale system in which cell-type-specific histone modifications, transcription factor binding, and three-dimensional genome conformation all interact. Predicting and interpreting this regulatory logic from sequence alone has remained difficult, because each layer is typically modeled by a separate specialized tool and most predictors do not transfer well to cell types absent from their training data. GenoME, introduced in a December 2025 bioRxiv preprint from Changping Laboratory and Peking University, addresses this by jointly modeling these layers in a single generative framework.
GenoME is a Mixture-of-Experts (MoE) generative model that takes a DNA sequence together with a cell-type-specific chromatin accessibility signal (ATAC-seq, or DNase-seq) and produces a unified genomic profile spanning epigenomics, transcriptomics, and chromatin architecture at resolutions ranging from individual base pairs to kilobases. Crucially, because chromatin accessibility is supplied as an input rather than learned only from a fixed set of training cell types, GenoME can predict the full regulatory landscape of an unseen or individualized cell type from a single ATAC-seq experiment, without retraining.
Beyond prediction, GenoME ships with an in silico perturbation framework for causal interrogation of regulatory function, positioning it as an all-in-one platform for generative modeling, cross-cell-type generalization, and mechanistic investigation of the regulatory genome. It was developed by Jiachen Wei, Yue Xue, Hao Chai, and Yi Qin Gao.
GenoME is a generative model built on a Mixture-of-Experts framework. Its inputs are a DNA sequence and a matched cell-type-specific chromatin accessibility profile (ATAC-seq or DNase-seq); its output is a unified, multiscale prediction covering epigenomic, transcriptomic, and 3D-conformation modalities at native base-pair-to-kilobase resolutions. The model is evaluated on held-out genomic regions to assess sequence-level generalization and on held-out cell types to assess cross-context transfer driven by the accessibility input. For regulatory inference, the authors report that GenoME's perturbation-based enhancer-promoter predictions exceed the performance of Activity-by-Contact, a widely used heuristic for linking enhancers to target genes. Exact parameter counts, the number of experts, training-corpus composition, and full benchmark tables are described in the preprint; specific figures are not reproduced here because they could not be independently verified from the abstract and indexed metadata.
GenoME is aimed at researchers studying gene regulation, functional genomics, and the interpretation of non-coding variation. Because it produces a complete regulatory profile for a cell type from a single ATAC-seq input, it is useful for characterizing rare, patient-derived, or otherwise data-sparse cellular contexts where comprehensive multi-omic profiling is impractical. Its perturbation framework supports prioritizing candidate regulatory variants, mapping enhancers to their target genes, and dissecting transcription factor grammar, tasks relevant to disease-variant interpretation, enhancer annotation, and the design of cell-type-specific regulatory hypotheses for downstream experimental validation.
GenoME contributes to a growing class of sequence-to-function models that move beyond single-modality prediction toward unified, conditionable representations of the regulatory genome. Its central design choice, supplying cell-type chromatin accessibility as an input so the model generalizes to unseen cell types without retraining, addresses a recurring limitation of fixed-cell-type predictors and aligns it conceptually with recent multimodal genomic foundation models. As a December 2025 preprint released under a CC BY-NC-ND license, its long-term influence and independent benchmarking remain to be established; no public code repository or model and data cards were located at the time of writing, which currently limits external reproduction and adoption.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data