A multimodal foundation model that distills Evo 2 (7B) into a compact encoder guided by Hi-C data to predict cell-type-specific 3D genome architecture and epigenomic signals.
Evo2HiC is a multimodal foundation model for the integrative analysis of genome sequence and three-dimensional chromatin architecture, developed by Tangqi Fang, Sheng Wang, William Stafford Noble, and colleagues at the University of Washington and released as a bioRxiv preprint in November 2025. It addresses a central bottleneck in 3D genomics: the largest DNA foundation models, such as Evo 2 (7B parameters), capture rich sequence features but are too computationally expensive to apply routinely to chromatin-structure tasks, while specialized Hi-C predictors lack the broad genomic priors that large sequence models provide.
The model's core idea is knowledge distillation guided by structure. Evo2HiC distills the 7-billion-parameter Evo 2 model into a compact encoder, using Hi-C chromatin contact data to steer the distillation so that the features most relevant to 3D genome organization are preserved. The result is a lightweight encoder that retains the predictive power of a much larger sequence model while being efficient enough to run cell-type-specific prediction at scale.
By jointly representing one-dimensional sequence and two-dimensional contact information, Evo2HiC sits at the intersection of genomic language modeling and chromatin biology. It predicts both Hi-C contact maps and epigenomic profiles, and it generalizes across species in a zero-shot setting, positioning it as a general-purpose tool for studying how genome sequence encodes nuclear architecture.
Evo2HiC builds on Evo 2, a genomic foundation model based on the StripedHyena 2 architecture that interleaves selective state-space layers with attention. Rather than running the full 7B model, Evo2HiC distills it into a compact encoder whose distillation objective is shaped by Hi-C contact data, preserving the long-range sequence dependencies that matter for chromatin folding. The architecture has three parts: a 1D DNA-sequence encoder for epigenomic profile prediction, a 2D joint sequence-structure encoder for Hi-C contact-matrix prediction and resolution enhancement, and a SigLIP-based retrieval module for Hi-C embeddings. On Hi-C prediction the model reports a 10.9% improvement in Spearman correlation over Orca, a leading sequence-to-Hi-C baseline, alongside state-of-the-art results across multiple chromatin-analysis tasks. Its cross-species evaluation spans 177 species, demonstrating that the distilled representations transfer beyond the training organisms.
Evo2HiC is aimed at researchers studying genome organization and gene regulation. Functional and regulatory genomicists can predict cell-type-specific Hi-C contact maps and epigenomic signals directly from sequence, including for cell types or species where experimental Hi-C is unavailable or low-resolution. Its resolution-enhancement capability lets groups upsample sparse contact maps, and its cross-species generalization supports comparative genomics across the 177 species evaluated. Because it identifies cell-type-specific sequence patterns, it can also help interpret how non-coding variation reshapes chromatin architecture, complementing variant-effect workflows. The compact encoder makes these analyses tractable on modest hardware compared with running the full Evo 2 model.
Evo2HiC demonstrates a practical recipe for transferring the capabilities of very large DNA foundation models into efficient, task-specialized tools: rather than scaling up, it distills down while using an orthogonal data modality (Hi-C) to retain the most relevant features. The reported 10.9% Spearman-correlation gain over Orca and broad cross-species generalization suggest that structure-guided distillation can outperform both standalone Hi-C predictors and naive use of large sequence models. As a recent preprint, its benchmarks await peer review and independent replication, and the distillation approach is tied to the availability and quality of Hi-C training data. Nonetheless, by pairing an open Apache-2.0 codebase with archived checkpoints, Evo2HiC offers the chromatin-biology community an accessible foundation model that bridges genomic language modeling and 3D genome analysis.
Fang, T., et al. (2025) Evo2HiC: a multimodal foundation model for integrative analysis of genome sequence and architecture. bioRxiv.
DOI: 10.1101/2025.11.18.689171Papers that recently cited this model.
Weicai Long, Yusen Hou, Yanlin Zhang
bioRxiv · May 2026
Michael Q. Zhang
Quantitative Biology · Mar 2026
The most-cited papers that cite this model.
Michael Q. Zhang
Quantitative Biology · Mar 2026
Weicai Long, Yusen Hou, Yanlin Zhang
bioRxiv · May 2026
Share of papers citing this model.