bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & Gene

Puget

University of Washington

Deep learning model that predicts cell-type-specific gene expression from DNA sequence and Hi-C 3D chromatin organization, generalizing to unseen cell types and species without retraining.

Released: November 2025

Puget is a deep learning model that predicts cell-type-specific gene expression by combining DNA sequence with experimentally measured three-dimensional chromatin organization. Developed by Shu Hang, William Stafford Noble, and colleagues in the Noble Lab at the University of Washington and released as a bioRxiv preprint in November 2025, Puget addresses a persistent limitation of sequence-only expression predictors: because the input DNA sequence at a locus is essentially identical across cell types, models that rely on sequence alone struggle to explain why the same gene is expressed differently in different cellular contexts.

Gene regulation depends not only on the linear arrangement of regulatory elements but on how the genome folds in three dimensions, bringing distal enhancers into physical contact with their target promoters. Puget makes this 3D context explicit by feeding Hi-C contact maps — which measure genome-wide DNA–DNA proximity — into the model alongside sequence. This pairing lets Puget capture the cell-type-specific looping that determines which enhancers are active for a given gene, a signal that is invisible to sequence-only architectures such as Enformer and Borzoi.

Puget sits in the same regulatory-genomics niche as Enformer and Chromoformer but is distinguished by its use of a pretrained Hi-C encoder and its demonstrated ability to generalize to held-out cell types and across species without any re-fitting, behaving as a genuine pretrained foundation model rather than a per-cell-type regression.

#Key Features

  • Sequence-plus-Hi-C inputs: Puget conditions expression predictions on both DNA sequence and cell-type-specific Hi-C contact maps, allowing it to model the 3D enhancer-promoter contacts that drive context-dependent regulation.
  • Pretrained dual encoders: The model pairs a pretrained sequence encoder with a pretrained Hi-C encoder (built on the Noble Lab's HiCFoundation masked-autoencoder model) whose weights are frozen, feeding their representations into a lightweight transformer decoder that is the only trained component.
  • Cross-cell-type generalization: Unlike sequence-only baselines, Puget generalizes to held-out biosamples not seen during training, predicting expression for new cell types directly from their Hi-C maps without re-training.
  • Cross-species transfer: A model trained on human and mouse data transfers from human to mouse without re-fitting, demonstrating that the learned sequence-to-expression rules are not memorized per genome.
  • In silico perturbation: Counterfactual modification of the input prioritizes experimentally validated enhancer-gene pairs, supporting hypothesis generation for regulatory element function.

#Technical Details

Puget couples two pretrained encoders to a lightweight transformer decoder. One encoder processes DNA sequence; the other processes Hi-C contact matrices and is based on HiCFoundation, a Vision-Transformer masked autoencoder pretrained on hundreds of Hi-C assays. The pretrained encoders are held fixed, and a compact transformer decoder integrates their embeddings to produce cell-type-specific expression predictions. This design keeps the number of trainable parameters small and concentrates learning on the cross-modal integration step.

The model was trained on paired Hi-C and RNA-seq data from 36 human and 4 mouse biosamples. Evaluation tested three generalization regimes: held-out genes, held-out biosamples, and human-to-mouse transfer. Relative to a sequence-only baseline, Puget improves cross-biosample Pearson correlation by up to 25% on highly variable genes, and — unlike the sequence-only model — it generalizes to held-out biosamples and across species without retraining. Highly variable genes, which differ most across cell types, are precisely the cases where sequence-only models fail and where the Hi-C signal contributes most.

#Applications

Puget is aimed at researchers in regulatory genomics, functional genomics, and gene-regulation modeling who need expression predictions that are sensitive to cellular context. Because it generalizes to held-out cell types from their Hi-C maps, it can impute expression for biosamples that have chromatin-conformation data but limited expression profiling, and its in silico perturbation capability lets investigators prioritize candidate enhancer-gene links for experimental follow-up. A practical constraint is that Puget requires Hi-C data as input in addition to sequence, so it is best suited to settings where 3D chromatin maps are already available rather than to purely sequence-driven, genome-wide screens.

#Impact

Puget demonstrates that incorporating measured 3D genome organization, rather than relying on sequence alone, can meaningfully improve cell-type-specific expression prediction and enable generalization to unseen cell types and species — a long-standing weakness of sequence-only regulatory models. By framing the problem around frozen pretrained sequence and Hi-C encoders with a lightweight trained decoder, it offers a parameter-efficient template for multimodal regulatory modeling. As a November 2025 preprint, its downstream adoption is still emerging, and its reliance on Hi-C inputs narrows its applicability relative to sequence-only predictors. At the time of writing, no public code repository or model weights had been released by the authors.

Citation

Puget predicts gene expression across cell types using sequence and 3D chromatin organization data

Preprint

Hang, S., et al. (2025) Puget predicts gene expression across cell types using sequence and 3D chromatin organization data. bioRxiv.

DOI: 10.1101/2025.11.19.689320

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References45

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
8Closed
Usability — can I run it?7
Reproducibility — can I retrain it?10
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

chromatindnafoundation_modelgene_expressionin_silico_perturbationmultimodalregulatory_genomicstransfer_learningtransformer

Resources

Research Paper