A zebrafish DNA sequence-to-function model predicting cell-type-specific single-cell expression across 85 cell-type x developmental-timepoint combinations during embryogenesis.
DanioDecima is a sequence-to-function foundation model for the zebrafish (Danio rerio) that predicts cell-type-specific gene expression directly from DNA sequence across embryonic development. It addresses a gap in regulatory genomics: while sequence-to-expression models such as Enformer, Borzoi, and Decima have advanced rapidly for human and mouse, no comparable model existed for zebrafish — one of the most widely used vertebrate models for studying development, organogenesis, and disease. By bringing a modern sequence-to-function model to this organism, DanioDecima makes it possible to interrogate the regulatory code that shapes cell identity during embryogenesis in a tractable, experimentally accessible system.
The model was developed by Voges, Kim, Frank, Iovino, Senbabaoglu, and Royer at the Chan Zuckerberg Biohub San Francisco and released as a bioRxiv preprint in 2026. Rather than training from scratch on the relatively small amount of zebrafish data, DanioDecima leverages transfer learning from the human/mouse Borzoi and Decima lineage. This strategy transfers regulatory knowledge learned from large mammalian compendia into a vertebrate that diverged from humans roughly 450 million years ago, testing how well the learned regulatory grammar generalizes across deep evolutionary distance.
A distinctive contribution of the work is its use of the trained model for in-silico directed evolution: iteratively mutating candidate sequences and scoring them with the model to design synthetic promoters predicted to drive expression in specific cell types. This demonstrates that the model is not only predictive but also generative in a practical, design-oriented sense relevant to developmental biology and synthetic biology.
DanioDecima extends the Borzoi/Decima architecture, combining 7 convolutional blocks with 8 transformer blocks operating at 1,920 embedding channels, with an exponential output activation and a task-wise Poisson-multinomial loss for count-based expression targets. Inputs are 524,288 bp sequences with a 5-channel encoding (the four nucleotides plus a gene mask that focuses prediction on a target gene). Training targets are cell-type-specific pseudobulk profiles aggregated from the ZebraHub single-cell atlas across 85 cell-type x timepoint combinations. The experiments systematically evaluate four weight-initialization strategies, each across four replicates, to isolate the contribution of mammalian pretraining versus training from scratch. As a bioRxiv preprint, these results have not yet undergone peer review.
DanioDecima is intended for developmental biologists, regulatory genomicists, and synthetic biologists working in zebrafish. Researchers can use it to predict the transcriptional consequences of sequence changes in specific cell types and timepoints, prioritize candidate regulatory variants, and interpret enhancer and promoter function during embryogenesis. Its directed-evolution capability supports practical design tasks such as engineering synthetic, cell-type-selective promoters for reporter lines and gene-expression tools — applications where zebrafish's optical transparency and rapid external development are particularly advantageous.
DanioDecima extends the rapidly growing family of sequence-to-function models beyond mammals, providing a quantitative test of how well regulatory grammar learned in human and mouse transfers across deep vertebrate evolutionary distance. By pairing prediction with model-guided synthetic promoter design, it offers a template for using foundation models as both interpretive and generative tools in developmental systems. Practical adoption depends on distribution details that remain limited at release: the GitHub repository ships a training and fine-tuning framework rather than a clearly distributed, ready-to-use pretrained checkpoint, and the code carries a Non-Commercial Software License v1.0 (commercial use prohibited) inherited from the upstream Decima repositories, with the licensing of any released weights unconfirmed. Users should verify checkpoint availability and licensing terms before relying on the model in downstream work.
Voges, M. J., et al. (2026) DanioDecima: A DNA sequence-to-function model of zebrafish embryogenesis. bioRxiv.
DOI: 10.64898/2026.05.29.728876