bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cellSmall molecule

Tahoe-x1

Tahoe Therapeutics

Perturbation-trained single-cell foundation models (up to 3B parameters) that jointly model genes, cells, and compounds for precision oncology tasks.

Released: October 2025
Parameters: 3 Billion

Tahoe-x1 (Tx1) is a family of single-cell foundation models developed by Tahoe Therapeutics and released as a bioRxiv preprint in October 2025. While foundation models have reshaped natural language processing and computer vision, their potential in single-cell biology — and particularly in complex diseases such as cancer — has remained comparatively underexplored. Tx1 targets this gap directly, scaling perturbation-trained single-cell models up to 3 billion parameters and orienting their training and evaluation toward cancer-relevant tasks.

The central idea behind Tx1 is to learn not just from baseline transcriptional states but from how cells respond to perturbation. The models are pretrained on large-scale single-cell transcriptomic data, including the Tahoe-100M perturbation compendium, and then fine-tuned for disease-relevant downstream applications. Rather than treating gene expression in isolation, Tx1 jointly learns representations of genes, cells, and compounds, allowing a single backbone to reason about both cellular identity and the effect of chemical interventions.

Tx1 sits alongside earlier single-cell foundation models such as Geneformer, scGPT, scFoundation, and AIDO.Cell, but differentiates itself through its perturbation-centric training signal, its explicit modeling of compounds via a drug token, and its focus on precision oncology benchmarks. Tahoe Therapeutics released the model as an unusually open package — pretrained checkpoints, training code, and evaluation workflows — to accelerate community work on perturbation-trained single-cell models.

#Key Features

  • Perturbation-trained pretraining: Tx1 is pretrained on large-scale single-cell transcriptomic data including the Tahoe-100M perturbation compendium, so the models learn from how cells respond to interventions rather than from baseline states alone.
  • Joint gene, cell, and compound modeling: A masked-expression generative objective augmented with a drug token lets a single model jointly represent genes, cells, and chemical compounds, enabling flexible adaptation across downstream tasks.
  • Scaling to 3 billion parameters: Three checkpoints are released — Tx1-70M (~70M), Tx1-1B (~1.3B), and Tx1-3B (~3B parameters) — spanning a range of compute-performance trade-offs.
  • High compute efficiency: Through architectural optimizations, data-loader refinements, and efficient training strategies, Tx1 reaches 3-30x higher compute efficiency than prior implementations of cell-state models.
  • Cancer-focused benchmarking: The models are evaluated on four disease-relevant tasks — gene essentiality prediction, hallmarks-of-cancer gene identification, cell-type classification, and perturbation-response prediction in held-out contexts.
  • Open release: Pretrained checkpoints and code are released under Apache-2.0, with an interactive HuggingFace Space demo for hands-on exploration.

#Technical Details

Tx1 is a transformer-based single-cell foundation model trained with a masked-expression generative objective. The key architectural addition is a drug token that is incorporated alongside gene and cell representations, allowing the model to condition expression predictions on chemical perturbations and to jointly learn gene, cell, and compound embeddings. Pretraining draws on roughly 266 million single-cell transcriptomic profiles, including the Tahoe-100M perturbation compendium, before task-specific fine-tuning. Architectural optimizations, data-loader refinements, and efficient training strategies together yield a reported 3-30x improvement in compute efficiency relative to prior cell-state model implementations. The released family spans three sizes — approximately 70M, 1.3B, and 3B parameters — and the authors report state-of-the-art performance across all four evaluated benchmarks: overall and context-specific gene essentiality, hallmarks-of-cancer gene identification, cell-type classification, and perturbation-response prediction in held-out cellular contexts.

#Applications

Tx1 is aimed at precision oncology and broader perturbation biology. By predicting context-specific gene essentiality, it can help prioritize candidate therapeutic targets in particular cancer backgrounds, while its hallmarks-of-cancer gene identification supports mechanistic interpretation of tumor biology. The model's ability to predict perturbation responses in held-out cellular contexts is directly useful for in silico screening — anticipating how cells will respond to genetic or chemical interventions before committing to expensive wet-lab experiments. Cell-type classification rounds out a toolkit relevant to computational biologists, cancer researchers, and drug discovery teams analyzing single-cell and perturbation datasets.

#Impact

Tahoe-x1 demonstrates that single-cell foundation models can be scaled to billions of parameters and trained on perturbation data while remaining compute-efficient and competitive on cancer-relevant benchmarks. Its joint modeling of genes, cells, and compounds via a drug token is a notable design choice that extends single-cell models toward chemical-perturbation reasoning. By releasing pretrained checkpoints, training code, and evaluation workflows under permissive licenses, Tahoe Therapeutics lowers the barrier for other groups to build on perturbation-trained models for precision oncology. As a 2025 preprint, its independent validation and downstream adoption are still developing, and the reported state-of-the-art results await broader external benchmarking.

Citation

Tahoe-x1: Scaling Perturbation-Trained Single-Cell Foundation Models to 3 Billion Parameters

Preprint

Gandhi, S., et al. (2025) Tahoe-x1: Scaling Perturbation-Trained Single-Cell Foundation Models to 3 Billion Parameters. bioRxiv.

DOI: 10.1101/2025.10.23.683759

Recent citations

Papers that recently cited this model.

  • Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction

    S. Bhattacharya, Christian Gensbigler, Shaamil Karim, et al.

    bioRxiv · Jun 2026

    0
  • PertDiffBench: Benchmarking Diffusion Models for Single-Cell Perturbation Response Prediction

    Zijun Song, Yujia Xiang, Zhi-yi Song, et al.

    bioRxiv · Jun 2026

    0
  • Identifying fate-determining transcription factors with single-cell omics.

    Xi Xi, Chen Li, Lei Wei, et al.

    Trends in Genetics · Jun 2026

    0

Top citations

The most-cited papers that cite this model.

  • Virtual Cells Need Context, Not Just Scale

    Payam Dibaeinia, Sudarshan Babu, Mei Knudson, et al.

    bioRxiv · Feb 2026

    3
  • Discrete Diffusion for Single-Cell Gene Expression Modeling

    S. Bhattacharya, Christian Gensbigler, Shaamil Karim, et al.

    bioRxiv · Feb 2026

    2
  • Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction

    S. Bhattacharya, Christian Gensbigler, Shaamil Karim, et al.

    bioRxiv · Jun 2026

    0
  • Identifying fate-determining transcription factors with single-cell omics.

    Xi Xi, Chen Li, Lei Wei, et al.

    Trends in Genetics · Jun 2026

    0
  • Effective Biological Representation Learning by Masking Gene Expression

    Kian Kenyon-Dean, Alina Selega, Ihab Bendidi, et al.

    May 2026

    0

Citations

Total Citations13
Influential2
References28

GitHub

Stars158
Forks25
Open Issues3
Contributors9
Last Push21d ago
LanguagePython
LicenseApache-2.0

HuggingFace

Downloads54
Likes73
Last Modified8mo ago

Fields of citing research

  • Computer Science92%
  • Biology85%
  • Medicine31%
  • Engineering8%
  • Materials Science8%
  • Chemistry8%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible
95Open
Usability — can I run it?95
Reproducibility — can I retrain it?92
Model Openness Framework
Unclassified
No formal model card / data card

Tags

cancercell_type_annotationdrug_discoveryfoundation_modelgene_essentiality_predictionmultimodalperturbation_modelingself_supervisedtranscriptomicstransformer

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace ModelDemoDataset