bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Single-cell

X-Cell

Xaira Therapeutics

4.9 billion parameter diffusion language model for predicting genome-wide genetic perturbation responses, trained on the largest CRISPRi Perturb-seq dataset built to date.

Released: 2026
Parameters: 4,900,000,000

Overview

X-Cell is a 4.9-billion-parameter diffusion language model for predicting genome-wide genetic perturbation responses in single cells, developed by Xaira Therapeutics and announced in March 2026 with an accompanying bioRxiv preprint. It is trained on X-Atlas/Pisces, the largest CRISPRi Perturb-seq corpus assembled to date, comprising 25.6 million perturbed single-cell transcriptomes spanning thousands of gene knockdowns across multiple cell contexts.

X-Cell is notable for being the first virtual-cell model to demonstrate clear scaling laws in the perturbation domain — performance on out-of-distribution gene knockdowns improves predictably with both data and parameter count, mirroring the scaling behavior observed in language models. This puts CRISPR perturbation modeling on a similar empirical footing to natural-language modeling and supports the case for continued investment in larger Perturb-seq datasets.

Key Features

  • Genome-wide perturbation prediction: Predicts transcriptome-wide expression changes following knockdown of any human gene, including those held out from training.
  • Diffusion-based generative modeling: Uses a discrete diffusion objective over gene-expression count vectors rather than a fixed regression head, allowing sampling of post-perturbation cell states.
  • Empirical scaling laws: Performance improves smoothly with parameter count and dataset size, providing the first clear scaling-law evidence in the virtual-cell domain.
  • X-Atlas/Pisces training corpus: Trained on 25.6 million single-cell transcriptomes from CRISPRi screens, the largest causal perturbation dataset publicly known.
  • Cell-context conditioning: Generates predictions conditioned on baseline cell state, supporting context-specific drug target prioritization.

Technical Details

X-Cell uses a transformer backbone adapted for token-like representations of gene-expression count vectors, with a discrete diffusion forward process that masks and reconstructs gene expression conditional on perturbation identity and baseline cell state. The model is trained on Xaira's proprietary X-Atlas/Pisces corpus, which combines published Perturb-seq datasets with substantial in-house data generation. Training was performed on standard transformer infrastructure; full hyperparameters are reported in the bioRxiv preprint.

The model is benchmarked on held-out perturbation prediction (predicting the expression response to knockdowns not seen during training), held-out cell-context prediction, and downstream drug-target prioritization. X-Cell outperforms scGPT, Geneformer, and prior task-specific perturbation models at the largest scales tested.

Applications

X-Cell is designed for in silico target prioritization in early drug discovery. Pharma teams can rank candidate genetic targets by their predicted phenotypic effect in disease-relevant cell contexts before committing wet-lab resources. The model also supports counterfactual reasoning — asking how a cell would respond to a perturbation it has never been measured under — which is critical for novel target nomination.

Impact

X-Cell raises the ceiling on what foundation models can do for genetic perturbation prediction and reframes the perturbation-modeling problem as one that scales with data, compute, and parameters in the same way language modeling does. The model is the largest causal perturbation model built to date, and the X-Atlas/Pisces corpus it was trained on is itself a notable contribution. Xaira has not committed to fully open-sourcing the weights, though the preprint is public and reports detailed methodology. The work is likely to motivate larger Perturb-seq data-generation projects across the academic and commercial sectors.

Citation

X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models

Wang, C., et al. (2026) X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models. bioRxiv.

DOI: 10.64898/2026.03.18.712807

Metrics

Citations

Total Citations2
Influential0
References0

Tags

perturbation predictionvirtual cell modelingdrug target discoverydiffusiontransformerself-supervisedfoundation modelsingle-cell transcriptomeCRISPR perturbation

Resources

Research PaperOfficial Website