bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & GeneSingle-cell

DanioDecima

Chan Zuckerberg Biohub

A zebrafish DNA sequence-to-function model predicting cell-type-specific single-cell expression across 85 cell-type x developmental-timepoint combinations during embryogenesis.

Released: May 2026

DanioDecima is a sequence-to-function foundation model for the zebrafish (Danio rerio) that predicts cell-type-specific gene expression directly from DNA sequence across embryonic development. It addresses a gap in regulatory genomics: while sequence-to-expression models such as Enformer, Borzoi, and Decima have advanced rapidly for human and mouse, no comparable model existed for zebrafish — one of the most widely used vertebrate models for studying development, organogenesis, and disease. By bringing a modern sequence-to-function model to this organism, DanioDecima makes it possible to interrogate the regulatory code that shapes cell identity during embryogenesis in a tractable, experimentally accessible system.

The model was developed by Voges, Kim, Frank, Iovino, Senbabaoglu, and Royer at the Chan Zuckerberg Biohub San Francisco and released as a bioRxiv preprint in 2026. Rather than training from scratch on the relatively small amount of zebrafish data, DanioDecima leverages transfer learning from the human/mouse Borzoi and Decima lineage. This strategy transfers regulatory knowledge learned from large mammalian compendia into a vertebrate that diverged from humans roughly 450 million years ago, testing how well the learned regulatory grammar generalizes across deep evolutionary distance.

A distinctive contribution of the work is its use of the trained model for in-silico directed evolution: iteratively mutating candidate sequences and scoring them with the model to design synthetic promoters predicted to drive expression in specific cell types. This demonstrates that the model is not only predictive but also generative in a practical, design-oriented sense relevant to developmental biology and synthetic biology.

#Key Features

  • Zebrafish-specific expression prediction: Predicts single-cell RNA-seq expression for 85 cell-type x developmental-timepoint combinations spanning zebrafish embryogenesis, derived from the ZebraHub developmental atlas (Lange et al., 2024).
  • Long-range sequence context: Accepts 524,288 bp input windows encoded as five channels (four DNA bases plus a gene mask), capturing distal regulatory elements that act over hundreds of kilobases.
  • Transfer learning across species: Compares four initialization strategies — random, Human-Borzoi, Human-Decima, and Mouse-Borzoi pretraining — to quantify how much mammalian regulatory knowledge transfers to zebrafish.
  • In-silico directed evolution: Includes a pipeline that evolves synthetic regulatory sequences toward cell-type-specific expression objectives, enabling model-guided design of synthetic promoters.

#Technical Details

DanioDecima extends the Borzoi/Decima architecture, combining 7 convolutional blocks with 8 transformer blocks operating at 1,920 embedding channels, with an exponential output activation and a task-wise Poisson-multinomial loss for count-based expression targets. Inputs are 524,288 bp sequences with a 5-channel encoding (the four nucleotides plus a gene mask that focuses prediction on a target gene). Training targets are cell-type-specific pseudobulk profiles aggregated from the ZebraHub single-cell atlas across 85 cell-type x timepoint combinations. The experiments systematically evaluate four weight-initialization strategies, each across four replicates, to isolate the contribution of mammalian pretraining versus training from scratch. As a bioRxiv preprint, these results have not yet undergone peer review.

#Applications

DanioDecima is intended for developmental biologists, regulatory genomicists, and synthetic biologists working in zebrafish. Researchers can use it to predict the transcriptional consequences of sequence changes in specific cell types and timepoints, prioritize candidate regulatory variants, and interpret enhancer and promoter function during embryogenesis. Its directed-evolution capability supports practical design tasks such as engineering synthetic, cell-type-selective promoters for reporter lines and gene-expression tools — applications where zebrafish's optical transparency and rapid external development are particularly advantageous.

#Impact

DanioDecima extends the rapidly growing family of sequence-to-function models beyond mammals, providing a quantitative test of how well regulatory grammar learned in human and mouse transfers across deep vertebrate evolutionary distance. By pairing prediction with model-guided synthetic promoter design, it offers a template for using foundation models as both interpretive and generative tools in developmental systems. Practical adoption depends on distribution details that remain limited at release: the GitHub repository ships a training and fine-tuning framework rather than a clearly distributed, ready-to-use pretrained checkpoint, and the code carries a Non-Commercial Software License v1.0 (commercial use prohibited) inherited from the upstream Decima repositories, with the licensing of any released weights unconfirmed. Users should verify checkpoint availability and licensing terms before relying on the model in downstream work.

Citation

DanioDecima: A DNA sequence-to-function model of zebrafish embryogenesis

Voges, M. J., et al. (2026) DanioDecima: A DNA sequence-to-function model of zebrafish embryogenesis. bioRxiv.

DOI: 10.64898/2026.05.29.728876

Openness

Unclassified
Restrictive license on core components

Tags

cnnde_novo_designdnafoundation_modelgene_expressiongenomicssingle_celltransfer_learningtransformervariant_effect_prediction

Resources

GitHub RepositoryResearch Paper