bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell

Scooby

Technical University of Munich / Helmholtz Munich / Harvard Medical School / Broad Institute / Harvard University

Predicts single-cell-resolution scRNA-seq coverage and scATAC-seq insertion profiles directly from DNA sequence by adapting the Borzoi predictor with a cell-specific decoder.

Released: October 2025

Scooby is a sequence-to-function foundation model that predicts multimodal genomic profiles at single-cell resolution directly from DNA sequence. Where most sequence models output bulk or pseudobulk signal averaged over many cells, Scooby produces per-cell predictions of both scRNA-seq coverage and scATAC-seq insertion profiles, capturing how regulatory grammar plays out across the continuum of cell states within a tissue. It was developed by the Gagneur lab at the Technical University of Munich together with collaborators at Helmholtz Munich, Harvard Medical School, the Broad Institute, and Harvard University, and published in Nature Methods in 2025.

The central idea is to reuse the regulatory knowledge already learned by Borzoi, a large convolutional-transformer predictor trained on thousands of bulk RNA-seq and related assays, rather than training a single-cell model from scratch. Single-cell data are far too sparse and noisy to support a model of Borzoi's scale on their own, so Scooby keeps Borzoi's pretrained sequence trunk and grafts on a lightweight, cell-specific decoder. This lets the model inherit a rich representation of cis-regulatory sequence while learning how that signal is modulated in individual cells.

By bridging large-scale bulk sequence models and single-cell genomics, Scooby addresses a longstanding gap: it can predict the effect of genetic variants and sequence perturbations on expression and accessibility in specific, even rare, cell populations, a capability that pseudobulk approaches cannot offer.

#Key Features

  • Single-cell-resolution prediction: Outputs per-cell scRNA-seq coverage and scATAC-seq insertion tracks from sequence alone, rather than bulk or pseudobulk averages.
  • Multimodal output: Jointly models transcription (RNA) and chromatin accessibility (ATAC) from the same 10x multiome data, linking regulatory sequence to both readouts.
  • Borzoi transfer learning: Adapts the pretrained Borzoi trunk via low-rank adaptation (LoRA), avoiding the need to train a billion-parameter model on sparse single-cell data.
  • Poisson-MultiVI decoder: A cell-specific decoder built on a Poisson count likelihood and the MultiVI framework maps sequence embeddings to each cell's expected counts across modalities.
  • In silico perturbation: Supports variant effect and motif perturbation analysis at the level of individual cell states and trajectories.

#Technical Details

Scooby is built on the Borzoi architecture, a convolutional-transformer network that ingests long DNA sequences (on the order of hundreds of kilobases) and predicts genomic coverage tracks at base-pair-adjacent resolution. The pretrained Borzoi sequence trunk is frozen and fine-tuned with LoRA adapters, while a new decoder is trained on single-cell multiome data. The decoder combines a Poisson observation model with the MultiVI latent-variable framework, conditioning predictions on per-cell embeddings so the same genomic sequence yields different RNA and ATAC profiles depending on cell state. The model was trained and evaluated on 10x Multiome data and demonstrates accurate prediction of cell-type-specific expression and accessibility, recovery of known regulatory elements, and concordance between predicted and measured variant effects.

#Applications

Scooby is aimed at researchers studying gene regulation and the functional consequences of genetic variation. Because it resolves predictions to individual cell states, it is well suited to interpreting non-coding variants and regulatory mutations in the specific cell types where they act, prioritizing candidate causal variants from GWAS and eQTL studies, and performing in silico perturbation experiments along developmental or disease trajectories. It is also useful for dissecting cell-type-specific enhancer and promoter logic that is washed out in bulk assays.

#Impact

Scooby is among the first models to bring large sequence-to-function predictors to true single-cell resolution, demonstrating that the regulatory knowledge captured by bulk-trained models like Borzoi can be transferred to sparse single-cell readouts through parameter-efficient adaptation. This establishes a practical recipe for single-cell regulatory genomics that sidesteps the data and compute barriers of training from scratch. The code is released under the MIT license and training data are shared on Zenodo under CC-BY-4.0; the model weights are distributed via HuggingFace, though those weight repositories currently lack an explicit license and model card.

Citation

scooby: modeling multimodal genomic profiles from DNA sequence at single-cell resolution

Hingerl, J. C., et al. (2025) scooby: modeling multimodal genomic profiles from DNA sequence at single-cell resolution. Nature Methods.

DOI: 10.1038/s41592-025-02854-5

Openness

Unclassified
Restrictive license on core components

Tags

chromatinchromatin_accessibility_predictionconvolutional_neural_networkfoundation_modelgene_expressionsingle_cell_genomicstransfer_learningtransformervariant_effect_prediction

Resources

GitHub RepositoryResearch PaperHuggingFace ModelDataset