bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

scDFM

Westlake University

Distributional flow matching model for single-cell perturbation prediction that models population-level expression shifts using a graph-aware differential-attention transformer.

Released: February 2026

scDFM (single-cell Distributional Flow Matching) is a model for predicting how cell populations respond transcriptionally to perturbations such as gene knockouts or drug combinations. A key observation motivating the work is that perturbations often induce population-level shifts in gene expression rather than changes that can be tracked in individual cells—and because single-cell sequencing is destructive, control and perturbed cells cannot be matched one-to-one. scDFM therefore models the full distribution of perturbed expression profiles conditioned on control states instead of relying on cell-level correspondences.

Developed by Tailin Wu's group at Westlake University and accepted to ICLR 2026, scDFM uses conditional flow matching together with a maximum mean discrepancy (MMD) objective to learn maps between control and perturbed distributions. It is paired with a custom backbone, the PAD-Transformer, which incorporates gene-interaction graphs and a differential-attention mechanism to capture context-specific expression changes.

By framing perturbation prediction as distribution matching with flow-based generative modeling, scDFM targets robustness in challenging settings such as combinatorial (multi-gene) perturbations, where predicting joint effects is especially difficult.

#Key Features

  • Distributional flow matching: Uses conditional flow matching to map control distributions to perturbed distributions, sidestepping the need for paired single cells.
  • MMD objective: Trains with a maximum mean discrepancy loss to align predicted and observed population-level expression distributions.
  • PAD-Transformer backbone: Employs gene-interaction graphs and differential attention to model context-specific, gene-level expression changes.
  • Combinatorial robustness: Reported to improve mean squared error by 19.6% over prior methods in combinatorial perturbation settings.

#Technical Details

scDFM combines conditional flow matching with a maximum mean discrepancy objective to learn distribution-to-distribution maps between control and perturbed single-cell expression states. Its PAD-Transformer backbone integrates gene-interaction graph structure with a differential-attention mechanism to capture context-specific changes in gene expression. The authors report a 19.6% reduction in mean squared error relative to prior methods in combinatorial (multi-gene) perturbation settings. The work was accepted at ICLR 2026; the code is released under an MIT license, with pretrained checkpoints provided for the Norman and ComboSciPlex benchmark datasets.

#Applications

scDFM is intended for systems biology and drug-discovery workflows that use single-cell perturbation screens, where accurate prediction of expression responses—especially to untested gene or drug combinations—can prioritize experiments and inform mechanistic hypotheses. Its distribution-level formulation is well suited to the unpaired, population-shifting nature of perturbation data, and the released Norman and ComboSciPlex checkpoints make it directly usable on common combinatorial benchmarks.

#Impact

scDFM advances distribution-based single-cell perturbation modeling by combining flow matching with a graph-aware, differential-attention transformer, and reports meaningful gains in the difficult combinatorial regime. Acceptance at ICLR 2026 plus an open MIT-licensed implementation with pretrained checkpoints lowers the barrier to adoption and reproduction. As with other perturbation-prediction models, its broader impact will depend on how well benchmark improvements generalize to new biological systems and experimental designs.

Tags

perturbation_predictiongene_expressionflow_matchingtransformergenerativetranscriptomicsperturbation