bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

X-Cell

Xaira Therapeutics

4.9 billion parameter diffusion language model for predicting genome-wide genetic perturbation responses, trained on the largest CRISPRi Perturb-seq dataset built to date.

Released: March 2026
Parameters: 4.9 Billion

X-Cell is a 4.9-billion-parameter diffusion language model for predicting genome-wide genetic perturbation responses in single cells, developed by Xaira Therapeutics and announced in March 2026 with an accompanying bioRxiv preprint. It is trained on X-Atlas/Pisces, the largest CRISPRi Perturb-seq corpus assembled to date, comprising 25.6 million perturbed single-cell transcriptomes spanning thousands of gene knockdowns across multiple cell contexts.

X-Cell is notable for being the first virtual-cell model to demonstrate clear scaling laws in the perturbation domain — performance on out-of-distribution gene knockdowns improves predictably with both data and parameter count, mirroring the scaling behavior observed in language models. This puts CRISPR perturbation modeling on a similar empirical footing to natural-language modeling and supports the case for continued investment in larger Perturb-seq datasets.

#Key Features

  • Genome-wide perturbation prediction: Predicts transcriptome-wide expression changes following knockdown of any human gene, including those held out from training.
  • Diffusion-based generative modeling: Uses a discrete diffusion objective over gene-expression count vectors rather than a fixed regression head, allowing sampling of post-perturbation cell states.
  • Empirical scaling laws: Performance improves smoothly with parameter count and dataset size, providing the first clear scaling-law evidence in the virtual-cell domain.
  • X-Atlas/Pisces training corpus: Trained on 25.6 million single-cell transcriptomes from CRISPRi screens, the largest causal perturbation dataset publicly known.
  • Cell-context conditioning: Generates predictions conditioned on baseline cell state, supporting context-specific drug target prioritization.

#Technical Details

X-Cell uses a transformer backbone adapted for token-like representations of gene-expression count vectors, with a discrete diffusion forward process that masks and reconstructs gene expression conditional on perturbation identity and baseline cell state. The model is trained on Xaira's proprietary X-Atlas/Pisces corpus, which combines published Perturb-seq datasets with substantial in-house data generation. Training was performed on standard transformer infrastructure; full hyperparameters are reported in the bioRxiv preprint.

The model is benchmarked on held-out perturbation prediction (predicting the expression response to knockdowns not seen during training), held-out cell-context prediction, and downstream drug-target prioritization. X-Cell outperforms scGPT, Geneformer, and prior task-specific perturbation models at the largest scales tested.

#Applications

X-Cell is designed for in silico target prioritization in early drug discovery. Pharma teams can rank candidate genetic targets by their predicted phenotypic effect in disease-relevant cell contexts before committing wet-lab resources. The model also supports counterfactual reasoning — asking how a cell would respond to a perturbation it has never been measured under — which is critical for novel target nomination.

#Impact

X-Cell raises the ceiling on what foundation models can do for genetic perturbation prediction and reframes the perturbation-modeling problem as one that scales with data, compute, and parameters in the same way language modeling does. The model is the largest causal perturbation model built to date, and the X-Atlas/Pisces corpus it was trained on is itself a notable contribution. Xaira has not committed to fully open-sourcing the weights, though the preprint is public and reports detailed methodology. The work is likely to motivate larger Perturb-seq data-generation projects across the academic and commercial sectors.

Citation

X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models

Wang, C., et al. (2026) X-Cell: Scaling Causal Perturbation Prediction Across Diverse Cellular Contexts via Diffusion Language Models. bioRxiv.

DOI: 10.64898/2026.03.18.712807

Recent citations

Papers that recently cited this model.

  • Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction

    S. Bhattacharya, Christian Gensbigler, Shaamil Karim, et al.

    bioRxiv · Jun 2026

    0
  • OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

    Dan Jiang, Zheming An, Yalong Zhao, et al.

    Jun 2026

    0
  • Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction

    Mufan Qiu, Genhui Zheng, Yinuo Xu, et al.

    May 2026

    0Influential

Top citations

The most-cited papers that cite this model.

  • Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction

    S. Bhattacharya, Christian Gensbigler, Shaamil Karim, et al.

    bioRxiv · Jun 2026

    0
  • Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction

    Mufan Qiu, Genhui Zheng, Yinuo Xu, et al.

    May 2026

    0Influential
  • Harnessing AI to Build Virtual Cells

    Xingyi Cheng, Pan Li, Han Guo, et al.

    bioRxiv · Apr 2026

    0
  • PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

    Zichao Yan, Yan Wu, Mica Xu Ji, et al.

    Apr 2026

    0
  • Systematic identification of seed-driven off-target effects in Perturb-seq experiments

    Austin Hartman, John D. Blair, Thao P. Nguyen, et al.

    bioRxiv · Mar 2026

    0

Citations

Total Citations6
Influential1
References0

GitHub

Stars100
Forks5
Open Issues3
Contributors1
Last Push3mo ago
LanguagePython

HuggingFace

Downloads0
Likes12
Last Modified3mo ago
Pipelineother

Fields of citing research

  • Biology100%
  • Computer Science100%
  • Medicine17%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility
20Closed
Usability — can I run it?26
Reproducibility — can I retrain it?5
Model Openness Framework
Unclassified
Missing required components

Tags

crispr_perturbationdiffusionfoundation_modelperturbation_predictionself_supervisedsingle_cell_transcriptometransformervirtual_cell_modeling

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace ModelDataset