Overview

TransferChrome is a deep learning model for predicting gene expression from histone modification signals, developed by Yuchi Chen, Minzhu Xie, and Jie Wen at the College of Information Science and Engineering, Hunan Normal University, Changsha, China, and published in Frontiers in Genetics in December 2022. It occupies a distinct position in the landscape of chromatin-to-expression models: while predecessors such as DeepChrome and AttentiveChrome established deep learning approaches for this task, and Chromoformer advanced the field by incorporating 3D chromatin contacts, TransferChrome focuses specifically on improving generalization across cell types through transfer learning — the ability to train a model on a source cell type and apply it effectively to an unseen target cell type with limited or no additional training data.

The core problem TransferChrome addresses is that histone modification patterns are highly cell-type-specific: a model trained to predict gene expression in one cell type may generalize poorly to another if the relationship between chromatin marks and expression differs substantially. This challenge is particularly acute when researchers want to predict gene expression in cell types for which ChIP-seq data are available but matched RNA-seq data are not — a common scenario in epigenomics studies with incomplete multi-omics profiles. TransferChrome addresses this by combining a densely connected convolutional network for local feature extraction with self-attention layers for global context aggregation, then fine-tuning the learned representations from a source cell type to a target cell type using a standard transfer learning protocol.

The model was evaluated across the same 56 cell-type datasets from the Roadmap Epigenomics Mapping Consortium (REMC) that have served as the standard benchmark for AttentiveChrome, DeepChrome, and related methods, enabling direct performance comparison. TransferChrome achieved an average AUC of 84.79% across all 56 cell types and outperformed AttentiveChrome, DeepChrome, and CRNN in cross-cell-line prediction experiments, validating the benefit of combining self-attention with transfer learning in this domain.

Key Features

Densely connected convolutional feature extraction: TransferChrome uses a densely connected convolutional network (similar in spirit to DenseNet) to extract local features from histone modification signals in 100 bp bins across the gene locus. Dense connectivity allows each layer to access feature maps from all preceding layers, improving gradient flow during training and enabling the model to learn multi-scale local patterns.
Self-attention for global context aggregation: Following the convolutional layers, a self-attention mechanism aggregates global contextual information across the entire histone mark signal profile. This allows the model to identify long-range dependencies between genomic positions — for instance, recognizing that a pattern of marks near the gene body is informative only in conjunction with patterns at the promoter — without the computational overhead of recurrent architectures.
Transfer learning across cell lines: The model's primary methodological contribution is a transfer learning protocol for cross-cell-type gene expression prediction. After initial training on a source cell type, TransferChrome fine-tunes the final layers (or the full model) on a small amount of target cell type data, allowing the learned convolutional and attention representations to be adapted to the target cell type's specific chromatin language.
Benchmark evaluation on 56 REMC cell types: Consistent with prior work in this lineage, TransferChrome is trained and evaluated across 56 human cell-type datasets from REMC, providing directly comparable performance metrics. This enables fair evaluation relative to DeepChrome, AttentiveChrome, and other published methods using the same benchmark.
Improved cross-cell-line generalization: In cross-cell-line experiments — where the model is trained on a source cell type and directly applied to held-out target cell types — TransferChrome's transfer learning protocol outperformed all three comparison methods (AttentiveChrome, DeepChrome, CRNN), demonstrating the practical value of transfer learning for chromatin-based prediction across tissues.
Five histone mark inputs: Like its predecessors, TransferChrome uses the five standard core histone marks from REMC — H3K27me3, H3K36me3, H3K4me1, H3K4me3, and H3K9me3 — quantified in 100 bp bins across a 10 kbp window centered on each gene's transcription start site, maintaining direct comparability with the AttentiveChrome benchmark.

Technical Details

TransferChrome's architecture combines two complementary components. The densely connected convolutional module processes the input histone modification signal matrix (5 marks × 100 bins) through multiple convolutional layers where each layer receives feature maps from all previous layers concatenated along the channel dimension. This dense connectivity is designed to mitigate the vanishing gradient problem and to preserve feature information at all scales of the convolutional hierarchy. The output of the convolutional module is a sequence of feature vectors, one per genomic bin, which are then passed to the self-attention module.

The self-attention module applies scaled dot-product attention across all genomic positions, computing pairwise attention weights that reflect the relevance of each position to every other position in the prediction. The attention output is then pooled (using global average pooling or a learned pooling mechanism) and passed to a classification head with sigmoid activation for binary expression prediction (high vs. low). For transfer learning, the convolutional layers are frozen or fine-tuned at a lower learning rate, while the attention and classification layers receive full gradient updates on the target cell type's data.

In same-cell-type evaluation across the 56 REMC cell lines, TransferChrome achieved a mean AUC of 84.79%. In cross-cell-line transfer experiments, it outperformed AttentiveChrome, DeepChrome, and CRNN across the majority of source-to-target cell type pairs tested, with the performance advantage most pronounced when the source and target cell types were from similar tissue lineages (e.g., different hematopoietic cells), where the learned chromatin representations transferred most effectively.

Applications

TransferChrome is most directly useful in epigenomics research where histone modification ChIP-seq data are available for a cell type of interest but matched gene expression data are unavailable or scarce. By transferring a model trained on a well-characterized cell type, researchers can generate predicted expression profiles to guide experimental prioritization or to complete a multi-omics dataset. The model is also applicable in comparative epigenomics studies where researchers want to understand how chromatin-to-expression relationships differ across cell types: by examining which model components transfer well and which require substantial fine-tuning, researchers gain insight into the universality versus cell-type-specificity of chromatin regulatory logic. Researchers building expression prediction pipelines in non-model organisms where training data are limited can also leverage the transfer learning framework by pre-training on richly profiled model organism data and fine-tuning on the target organism.

Impact

TransferChrome contributes a focused methodological improvement to a well-established benchmark problem — chromatin-to-expression prediction — by demonstrating that combining self-attention with transfer learning provides consistent gains in cross-cell-line generalization. Its publication in Frontiers in Genetics adds to the growing body of work showing that transfer learning, a technique widely applied in computer vision and NLP, is beneficial in epigenomics contexts where labeled data in target domains are sparse. The model serves as a useful baseline for researchers developing improved cross-cell-type epigenomic models, particularly as the field moves toward larger and more diverse cell-type atlases. Limitations include the same 10 kbp window constraint shared by its predecessors, which prevents the model from accessing distal regulatory elements, and the relatively modest architecture compared to more recent large transformer models applied to genomics. Future extensions incorporating longer-range context or additional data modalities such as chromatin accessibility would likely further improve cross-cell-type generalization.

Overview

Key Features

Densely connected convolutional feature extraction: TransferChrome uses a densely connected convolutional network (similar in spirit to DenseNet) to extract local features from histone modification signals in 100 bp bins across the gene locus. Dense connectivity allows each layer to access feature maps from all preceding layers, improving gradient flow during training and enabling the model to learn multi-scale local patterns.

Self-attention for global context aggregation: Following the convolutional layers, a self-attention mechanism aggregates global contextual information across the entire histone mark signal profile. This allows the model to identify long-range dependencies between genomic positions — for instance, recognizing that a pattern of marks near the gene body is informative only in conjunction with patterns at the promoter — without the computational overhead of recurrent architectures.

Transfer learning across cell lines: The model's primary methodological contribution is a transfer learning protocol for cross-cell-type gene expression prediction. After initial training on a source cell type, TransferChrome fine-tunes the final layers (or the full model) on a small amount of target cell type data, allowing the learned convolutional and attention representations to be adapted to the target cell type's specific chromatin language.

Benchmark evaluation on 56 REMC cell types: Consistent with prior work in this lineage, TransferChrome is trained and evaluated across 56 human cell-type datasets from REMC, providing directly comparable performance metrics. This enables fair evaluation relative to DeepChrome, AttentiveChrome, and other published methods using the same benchmark.

Improved cross-cell-line generalization: In cross-cell-line experiments — where the model is trained on a source cell type and directly applied to held-out target cell types — TransferChrome's transfer learning protocol outperformed all three comparison methods (AttentiveChrome, DeepChrome, CRNN), demonstrating the practical value of transfer learning for chromatin-based prediction across tissues.

Five histone mark inputs: Like its predecessors, TransferChrome uses the five standard core histone marks from REMC — H3K27me3, H3K36me3, H3K4me1, H3K4me3, and H3K9me3 — quantified in 100 bp bins across a 10 kbp window centered on each gene's transcription start site, maintaining direct comparability with the AttentiveChrome benchmark.

Technical Details

Applications

Impact

TransferChrome

Overview

Key Features

Technical Details

Applications

Impact

Tags

Resources

TransferChrome

Overview

Key Features

Technical Details

Applications

Impact

Tags

Resources