Chromnitron

Multimodal foundation model predicting genome-wide binding of chromatin-associated proteins from protein sequence, DNA sequence, and chromatin state.

Released: August 2025

Chromnitron is a multimodal foundation model from the Broad Institute of MIT and Harvard that predicts the genome-wide binding landscapes of chromatin-associated proteins (CAPs) — transcription factors, cofactors, and chromatin regulators — directly from molecular inputs. Where most binding predictors are trained one protein at a time and cannot generalize beyond their training targets, Chromnitron jointly reasons over three modalities: the amino-acid sequence of the protein, the local DNA sequence, and the cell-type-specific chromatin accessibility landscape. By learning the shared grammar that links protein identity, sequence motifs, and chromatin context, the model predicts where a given protein binds in a given cellular state.

The central advance is generalization. Because Chromnitron represents proteins by their sequence rather than as fixed output classes, it can produce binding predictions for proteins and cell types that were never observed during training — a zero-shot capability that conventional CAP-specific models lack. This reframes protein-DNA binding prediction as a transferable, foundation-model problem rather than a collection of isolated supervised tasks.

First posted to bioRxiv in August 2025, Chromnitron sits at the intersection of protein language modeling and regulatory genomics, complementing sequence-only chromatin models such as Enformer by adding an explicit, transferable representation of the binding protein itself.

Key Features

Three-way multimodal inputs: Integrates protein amino-acid sequence, local DNA sequence, and cell-type-specific chromatin accessibility (ATAC-seq) to predict binding, rather than relying on DNA motifs alone.
Zero-shot generalization: Predicts binding landscapes for chromatin-associated proteins and cell types absent from the training atlas, enabled by sequence-based protein representations.
Experimental validation: Predictions were used to discover and experimentally confirm previously unrecognized protein regulators of T-cell exhaustion.
Dynamic regulatory insight: Revealed uncharacterized shifts in the binding behavior of regulatory proteins over the course of neurogenesis.
Outperforms prior tools: Surpasses motif-accessibility baselines, maxATAC, and Enformer in predicting CAP binding profiles.

Technical Details

Chromnitron was pretrained on a curated atlas of more than 1,100 ChIP-seq binding maps spanning 767 distinct chromatin-associated proteins, then fine-tuned per protein to capture target-specific nuances. The architecture couples a protein-sequence encoder with DNA-sequence and chromatin-accessibility encoders, allowing the model to learn the relationship between a protein's identity, the underlying genomic sequence, and the chromatin environment that gates accessibility. This multimodal design lets it interpolate to unseen proteins and cellular contexts, where single-task and conventional multi-task learning approaches fail. A companion module, Chrom2Vec, processes raw ATAC-seq data into the chromatin-state representations the model consumes. In reported benchmarks, Chromnitron outperforms motif-based accessibility scoring, maxATAC, and Enformer at predicting genome-wide CAP binding.

Applications

Chromnitron supports researchers studying gene regulation, cell-state transitions, and disease-associated regulatory rewiring. Because it generalizes to unprofiled proteins and cell types, it can prioritize candidate regulators for experimental follow-up without first generating ChIP-seq data — as demonstrated by its identification of novel T-cell-exhaustion regulators, a finding directly relevant to immunology and cancer immunotherapy. It also enables in-silico exploration of how binding landscapes shift across developmental processes such as neurogenesis. The released inference pipeline, paired with Chrom2Vec, lets users run predictions on their own ATAC-seq data.

Impact

By recasting protein-DNA binding prediction as a transferable, multimodal foundation-model problem, Chromnitron expands what is computationally accessible in regulatory genomics: binding profiles for proteins and cell types that have never been experimentally assayed. Its combination of zero-shot generalization and concrete experimental validation — discovering bona fide T-cell-exhaustion regulators — distinguishes it from purely benchmark-driven models and signals a path toward broadly applicable models of the gene-regulatory landscape. At present the public repository releases only the inference pipeline and the ATAC-seq processing component, with a full release pending and the code license unspecified; trained weights for arbitrary targets and complete training code are not yet broadly available.

Citation

Multimodal learning decodes the global binding landscape of chromatin-associated proteins

Preprint

Tan, J., et al. (2025) Multimodal learning decodes the global binding landscape of chromatin-associated proteins. bioRxiv.

DOI: 10.1101/2025.08.17.670761

Recent citations

Papers that recently cited this model.

Mammalian genome writing: unlocking new length scales for genome engineering
S. Pinglay, John T. Atwater, Ran Brosh, et al.
Cell · Jan 2026
2

Top citations

The most-cited papers that cite this model.

Mammalian genome writing: unlocking new length scales for genome engineering
S. Pinglay, John T. Atwater, Ran Brosh, et al.
Cell · Jan 2026
2

Citations

Total Citations1

Influential0

References0

GitHub

Stars26

Forks5

Open Issues1

Contributors3

Last Push2mo ago

LanguagePython

Fields of citing research

Biology100%
Engineering100%
Medicine100%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

25Closed

Usability — can I run it?24

Reproducibility — can I retrain it?12

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Three-way multimodal inputs: Integrates protein amino-acid sequence, local DNA sequence, and cell-type-specific chromatin accessibility (ATAC-seq) to predict binding, rather than relying on DNA motifs alone.

Zero-shot generalization: Predicts binding landscapes for chromatin-associated proteins and cell types absent from the training atlas, enabled by sequence-based protein representations.

Experimental validation: Predictions were used to discover and experimentally confirm previously unrecognized protein regulators of T-cell exhaustion.

Dynamic regulatory insight: Revealed uncharacterized shifts in the binding behavior of regulatory proteins over the course of neurogenesis.

Outperforms prior tools: Surpasses motif-accessibility baselines, maxATAC, and Enformer in predicting CAP binding profiles.

Technical Details

Applications

Impact

Chromnitron

Key Features

Technical Details

Applications

Impact

Citation

Multimodal learning decodes the global binding landscape of chromatin-associated proteins

Recent citations

Mammalian genome writing: unlocking new length scales for genome engineering

Top citations

Mammalian genome writing: unlocking new length scales for genome engineering

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Chromnitron

Key Features

Technical Details

Applications

Impact

Citation

Multimodal learning decodes the global binding landscape of chromatin-associated proteins

Recent citations

Mammalian genome writing: unlocking new length scales for genome engineering

Top citations

Mammalian genome writing: unlocking new length scales for genome engineering

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Chromnitron

#Key Features

#Technical Details

#Applications

#Impact

Citation

Multimodal learning decodes the global binding landscape of chromatin-associated proteins

Recent citations

Mammalian genome writing: unlocking new length scales for genome engineering

Top citations

Mammalian genome writing: unlocking new length scales for genome engineering

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Chromnitron

#Key Features

#Technical Details

#Applications

#Impact

Citation

Multimodal learning decodes the global binding landscape of chromatin-associated proteins

Recent citations

Mammalian genome writing: unlocking new length scales for genome engineering

Top citations

Mammalian genome writing: unlocking new length scales for genome engineering

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact