X-Cell

4.9 billion parameter diffusion language model for predicting genome-wide genetic perturbation responses, trained on the largest CRISPRi Perturb-seq dataset built to date.

Released: March 2026

Parameters: 4.9 Billion

X-Cell is a 4.9-billion-parameter diffusion language model for predicting genome-wide genetic perturbation responses in single cells, developed by Xaira Therapeutics and announced in March 2026 with an accompanying bioRxiv preprint. It is trained on X-Atlas/Pisces, the largest CRISPRi Perturb-seq corpus assembled to date, comprising 25.6 million perturbed single-cell transcriptomes spanning thousands of gene knockdowns across multiple cell contexts.

X-Cell is notable for being the first virtual-cell model to demonstrate clear scaling laws in the perturbation domain — performance on out-of-distribution gene knockdowns improves predictably with both data and parameter count, mirroring the scaling behavior observed in language models. This puts CRISPR perturbation modeling on a similar empirical footing to natural-language modeling and supports the case for continued investment in larger Perturb-seq datasets.

Key Features

Genome-wide perturbation prediction: Predicts transcriptome-wide expression changes following knockdown of any human gene, including those held out from training.
Diffusion-based generative modeling: Uses a discrete diffusion objective over gene-expression count vectors rather than a fixed regression head, allowing sampling of post-perturbation cell states.
Empirical scaling laws: Performance improves smoothly with parameter count and dataset size, providing the first clear scaling-law evidence in the virtual-cell domain.
X-Atlas/Pisces training corpus: Trained on 25.6 million single-cell transcriptomes from CRISPRi screens, the largest causal perturbation dataset publicly known.
Cell-context conditioning: Generates predictions conditioned on baseline cell state, supporting context-specific drug target prioritization.

Technical Details

X-Cell uses a transformer backbone adapted for token-like representations of gene-expression count vectors, with a discrete diffusion forward process that masks and reconstructs gene expression conditional on perturbation identity and baseline cell state. The model is trained on Xaira's proprietary X-Atlas/Pisces corpus, which combines published Perturb-seq datasets with substantial in-house data generation. Training was performed on standard transformer infrastructure; full hyperparameters are reported in the bioRxiv preprint.

The model is benchmarked on held-out perturbation prediction (predicting the expression response to knockdowns not seen during training), held-out cell-context prediction, and downstream drug-target prioritization. X-Cell outperforms scGPT, Geneformer, and prior task-specific perturbation models at the largest scales tested.

Applications

X-Cell is designed for in silico target prioritization in early drug discovery. Pharma teams can rank candidate genetic targets by their predicted phenotypic effect in disease-relevant cell contexts before committing wet-lab resources. The model also supports counterfactual reasoning — asking how a cell would respond to a perturbation it has never been measured under — which is critical for novel target nomination.

Impact

X-Cell raises the ceiling on what foundation models can do for genetic perturbation prediction and reframes the perturbation-modeling problem as one that scales with data, compute, and parameters in the same way language modeling does. The model is the largest causal perturbation model built to date, and the X-Atlas/Pisces corpus it was trained on is itself a notable contribution. Xaira has not committed to fully open-sourcing the weights, though the preprint is public and reports detailed methodology. The work is likely to motivate larger Perturb-seq data-generation projects across the academic and commercial sectors.

Citation

X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models

Wang, C., et al. (2026) X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models. bioRxiv.

DOI: 10.64898/2026.03.18.712807

Recent citations

Papers that recently cited this model.

Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction
S. Bhattacharya, Christian Gensbigler, Shaamil Karim, et al.
bioRxiv · Jun 2026
0
OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction
Dan Jiang, Zheming An, Yalong Zhao, et al.
Jun 2026
0
Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction
Mufan Qiu, Genhui Zheng, Yinuo Xu, et al.
May 2026
0Influential

Top citations

The most-cited papers that cite this model.

Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction
S. Bhattacharya, Christian Gensbigler, Shaamil Karim, et al.
bioRxiv · Jun 2026
0
Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction
Mufan Qiu, Genhui Zheng, Yinuo Xu, et al.
May 2026
0Influential
Harnessing AI to Build Virtual Cells
Xingyi Cheng, Pan Li, Han Guo, et al.
bioRxiv · Apr 2026
0
PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling
Zichao Yan, Yan Wu, Mica Xu Ji, et al.
Apr 2026
0
Systematic identification of seed-driven off-target effects in Perturb-seq experiments
Austin Hartman, John D. Blair, Thao P. Nguyen, et al.
bioRxiv · Mar 2026
0

Citations

Total Citations6

Influential1

References0

GitHub

Stars100

Forks5

Open Issues3

Contributors1

Last Push3mo ago

LanguagePython

HuggingFace

Downloads0

Likes12

Last Modified3mo ago

Pipelineother

Fields of citing research

Biology100%
Computer Science100%
Medicine17%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

20Closed

Usability — can I run it?26

Reproducibility — can I retrain it?5

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper Official Website HuggingFace Model Dataset

Key Features

Genome-wide perturbation prediction: Predicts transcriptome-wide expression changes following knockdown of any human gene, including those held out from training.

Diffusion-based generative modeling: Uses a discrete diffusion objective over gene-expression count vectors rather than a fixed regression head, allowing sampling of post-perturbation cell states.

Empirical scaling laws: Performance improves smoothly with parameter count and dataset size, providing the first clear scaling-law evidence in the virtual-cell domain.

X-Atlas/Pisces training corpus: Trained on 25.6 million single-cell transcriptomes from CRISPRi screens, the largest causal perturbation dataset publicly known.

Cell-context conditioning: Generates predictions conditioned on baseline cell state, supporting context-specific drug target prioritization.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction

S. Bhattacharya, Christian Gensbigler, Shaamil Karim, et al.

bioRxiv · Jun 2026

OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

Dan Jiang, Zheming An, Yalong Zhao, et al.

Jun 2026

Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction

Mufan Qiu, Genhui Zheng, Yinuo Xu, et al.

May 2026

0Influential

Top citations

The most-cited papers that cite this model.

Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction

S. Bhattacharya, Christian Gensbigler, Shaamil Karim, et al.

bioRxiv · Jun 2026

Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction

Mufan Qiu, Genhui Zheng, Yinuo Xu, et al.

May 2026

0Influential

Harnessing AI to Build Virtual Cells

Xingyi Cheng, Pan Li, Han Guo, et al.

bioRxiv · Apr 2026

PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

Zichao Yan, Yan Wu, Mica Xu Ji, et al.

Apr 2026

Systematic identification of seed-driven off-target effects in Perturb-seq experiments

Austin Hartman, John D. Blair, Thao P. Nguyen, et al.

bioRxiv · Mar 2026

X-Cell

#Key Features

#Technical Details

#Applications

#Impact

Citation

X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models

Recent citations

OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction

Top citations

Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction

PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

X-Cell

#Key Features

#Technical Details

#Applications

#Impact

Citation

X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models

Recent citations

OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction

Top citations

Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction

PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact