bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

Lingshu-Cell

DAMO Academy

A generative cellular world model that uses masked discrete diffusion to learn whole-transcriptome scRNA-seq distributions and simulate perturbation responses across tissues and species.

Released: March 2026

Lingshu-Cell is a generative cellular "world model" for single-cell transcriptomics, developed by researchers at DAMO Academy (Alibaba Group) and released as an arXiv preprint in March 2026. It reframes the virtual cell problem as learning the full distribution of cellular transcriptomic states, then sampling from that distribution to simulate how cells respond to perturbations. Rather than regressing a single expected expression vector, the model captures the heterogeneity of cell populations directly.

The central technical idea is to model the whole transcriptome—approximately 18,000 genes—with a masked discrete diffusion process operating in token space, without prior gene selection or feature filtering. This is well matched to the sparse, non-sequential, and highly variable nature of scRNA-seq counts, where most genes are zero in any given cell. Pretrained across multiple tissues and species, Lingshu-Cell aims to serve as a general substrate for in silico experimentation: predicting transcriptome-wide expression changes for novel combinations of cell identity and perturbation.

Lingshu-Cell sits alongside a fast-growing class of perturbation-oriented virtual cell models such as STATE and SCALE, but is distinguished by its diffusion-in-token-space generative formulation and its emphasis on faithfully reproducing whole-distribution cellular state rather than point estimates.

#Key Features

  • Whole-transcriptome modeling: Operates over ~18,000 genes directly, with no prior gene selection, capturing transcriptome-wide expression dependencies in a single generative model.
  • Masked discrete diffusion: Learns transcriptomic state distributions via a discrete diffusion process in token space, a design suited to the sparse, non-sequential structure of scRNA-seq data.
  • Conditional perturbation simulation: Supports conditional generation under genetic and cytokine perturbations, using classifier-free guidance to predict responses for unseen identity–perturbation combinations.
  • Multi-tissue, multi-species pretraining: Trained across diverse human tissues (e.g., neocortex, heart, lung, colon) and species (mouse, rhesus macaque, zebrafish, fruit fly), reproducing cell-state distributions, marker-gene patterns, and cell-subtype proportions.
  • Distributional fidelity: Models population-level heterogeneity rather than a single expected cell, enabling realistic simulation of cellular variation.

#Technical Details

Lingshu-Cell is a masked discrete diffusion model that tokenizes single-cell expression and learns the joint distribution over the full ~18,000-gene transcriptome. The architecture incorporates classifier-free guidance for conditional generation, sequence compression to scale to large gene panels, and biological prior injection to ground generation in known structure; reported experiments scale to on the order of 200,000 cells (demonstrated on a PARSE 10M PBMC dataset). On the Virtual Cell Challenge (H1) genetic perturbation benchmark, the authors report leading performance, with strong scores on the challenge's perturbation-discrimination and differential-expression metrics, and they further show accurate prediction of cytokine-induced responses in human PBMCs. A specific parameter count is not disclosed in the preprint.

#Applications

Lingshu-Cell targets computational and experimental biologists who want to forecast cellular responses before running costly wet-lab screens. By simulating transcriptome-wide responses to genetic knockouts/knockdowns and cytokine treatments, it can support target discovery, immunology and drug-response studies, and hypothesis prioritization in single-cell pharmacology. Its multi-species pretraining also makes it useful as a general reference distribution for cell-state characterization, marker analysis, and cross-tissue comparison.

#Impact

As a top-performing entry on the Virtual Cell Challenge (H1) genetic perturbation benchmark, Lingshu-Cell demonstrates that whole-transcriptome generative diffusion is a competitive route to virtual cell modeling, and it broadens the methodological landscape beyond regression- and flow-matching-based perturbation predictors. At the time of writing the model is an arXiv preprint with an official project homepage and a HuggingFace collection, but no public code or downloadable weights have been confirmed; downstream adoption and independent reproduction therefore remain to be established once a code/weights release lands.

Tags

perturbation_response_predictionvirtual_cell_modelinggene_expressiondiffusionfoundation_modelgenerativesingle_cell_transcriptomics