bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & GeneProtein

Chromnitron

Broad Institute

Multimodal foundation model predicting genome-wide binding of chromatin-associated proteins from protein sequence, DNA sequence, and cell-type chromatin state.

Released: August 2025

Chromnitron is a multimodal foundation model from the Broad Institute of MIT and Harvard that predicts the genome-wide binding landscapes of chromatin-associated proteins (CAPs) — transcription factors, cofactors, and chromatin regulators — directly from molecular inputs. Where most binding predictors are trained one protein at a time and cannot generalize beyond their training targets, Chromnitron jointly reasons over three modalities: the amino-acid sequence of the protein, the local DNA sequence, and the cell-type-specific chromatin accessibility landscape. By learning the shared grammar that links protein identity, sequence motifs, and chromatin context, the model predicts where a given protein binds in a given cellular state.

The central advance is generalization. Because Chromnitron represents proteins by their sequence rather than as fixed output classes, it can produce binding predictions for proteins and cell types that were never observed during training — a zero-shot capability that conventional CAP-specific models lack. This reframes protein-DNA binding prediction as a transferable, foundation-model problem rather than a collection of isolated supervised tasks.

First posted to bioRxiv in August 2025, Chromnitron sits at the intersection of protein language modeling and regulatory genomics, complementing sequence-only chromatin models such as Enformer by adding an explicit, transferable representation of the binding protein itself.

#Key Features

  • Three-way multimodal inputs: Integrates protein amino-acid sequence, local DNA sequence, and cell-type-specific chromatin accessibility (ATAC-seq) to predict binding, rather than relying on DNA motifs alone.
  • Zero-shot generalization: Predicts binding landscapes for chromatin-associated proteins and cell types absent from the training atlas, enabled by sequence-based protein representations.
  • Experimental validation: Predictions were used to discover and experimentally confirm previously unrecognized protein regulators of T-cell exhaustion.
  • Dynamic regulatory insight: Revealed uncharacterized shifts in the binding behavior of regulatory proteins over the course of neurogenesis.
  • Outperforms prior tools: Surpasses motif-accessibility baselines, maxATAC, and Enformer in predicting CAP binding profiles.

#Technical Details

Chromnitron was pretrained on a curated atlas of more than 1,100 ChIP-seq binding maps spanning 767 distinct chromatin-associated proteins, then fine-tuned per protein to capture target-specific nuances. The architecture couples a protein-sequence encoder with DNA-sequence and chromatin-accessibility encoders, allowing the model to learn the relationship between a protein's identity, the underlying genomic sequence, and the chromatin environment that gates accessibility. This multimodal design lets it interpolate to unseen proteins and cellular contexts, where single-task and conventional multi-task learning approaches fail. A companion module, Chrom2Vec, processes raw ATAC-seq data into the chromatin-state representations the model consumes. In reported benchmarks, Chromnitron outperforms motif-based accessibility scoring, maxATAC, and Enformer at predicting genome-wide CAP binding.

#Applications

Chromnitron supports researchers studying gene regulation, cell-state transitions, and disease-associated regulatory rewiring. Because it generalizes to unprofiled proteins and cell types, it can prioritize candidate regulators for experimental follow-up without first generating ChIP-seq data — as demonstrated by its identification of novel T-cell-exhaustion regulators, a finding directly relevant to immunology and cancer immunotherapy. It also enables in-silico exploration of how binding landscapes shift across developmental processes such as neurogenesis. The released inference pipeline, paired with Chrom2Vec, lets users run predictions on their own ATAC-seq data.

#Impact

By recasting protein-DNA binding prediction as a transferable, multimodal foundation-model problem, Chromnitron expands what is computationally accessible in regulatory genomics: binding profiles for proteins and cell types that have never been experimentally assayed. Its combination of zero-shot generalization and concrete experimental validation — discovering bona fide T-cell-exhaustion regulators — distinguishes it from purely benchmark-driven models and signals a path toward broadly applicable models of the gene-regulatory landscape. At present the public repository releases only the inference pipeline and the ATAC-seq processing component, with a full release pending and the code license unspecified; trained weights for arbitrary targets and complete training code are not yet broadly available.

GitHub

Stars24
Forks5
Open Issues2
Contributors3
Last Push1mo ago
LanguagePython

Openness

bio.rodeo opennessClosed · low usability and reproducibility
25Closed
Usability — can I run it?24
Reproducibility — can I retrain it?12
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

protein_dna_binding_predictiongene_regulationvariant_effect_predictiontransformerfoundation_modelmultimodalzero_shotchromatingenomicstranscription_factors

Resources

GitHub RepositoryResearch Paper