PanFoMa

Shanghai Jiao Tong University / Jiangxi University of Finance and Economics / Shenzhen University / University of Technology Sydney / Chongqing University of Posts and Telecommunications

Pan-cancer single-cell foundation model with a hybrid Transformer-Mamba architecture, released with the PanFoMaBench cancer evaluation benchmark.

Released: December 2025

PanFoMa is a single-cell foundation model designed specifically for pan-cancer transcriptomics. Single-cell RNA sequencing (scRNA-seq) is central to dissecting tumor heterogeneity, but cancer-focused modeling faces two persistent obstacles: learning discriminative yet computationally efficient single-cell representations, and the lack of a comprehensive, cancer-specific evaluation benchmark. PanFoMa addresses both by pairing a lightweight hybrid architecture with a curated pan-cancer benchmark.

The model was introduced in December 2025 by Xiaoshui Huang and colleagues across Shanghai Jiao Tong University, Jiangxi University of Finance and Economics, Shenzhen University, the University of Technology Sydney, and Chongqing University of Posts and Telecommunications, and was accepted to the AAAI Conference on Artificial Intelligence. Its central design idea is to combine the expressive, order-independent attention of Transformers with the linear-time scalability of a state-space (Mamba) model, so that both local gene-gene interactions and global transcriptome context can be captured without the quadratic cost of attention over very long gene sequences.

Alongside the model, the authors release PanFoMaBench, a large-scale benchmark assembled from published cancer studies that provides a standardized testbed for evaluating foundation models on cancer single-cell data.

Key Features

Hybrid Transformer-Mamba design: A front-end local-context encoder with shared self-attention layers captures complex, order-independent gene interactions, while a back-end global sequential decoder uses a linear-time state-space model to integrate long-range context efficiently.
Lightweight and scalable: The state-space decoder avoids the quadratic cost of full attention over long gene sequences, balancing representational power with efficiency for large single-cell datasets.
Gene and expression embeddings: Each gene is represented by a discrete gene-ID embedding combined with a binned expression-value embedding, fused before encoding.
Pan-cancer benchmark (PanFoMaBench): A companion benchmark spanning 33 cancer subtypes and over 3.5 million high-quality cells from 616 patients, drawn from 83 published studies, enabling standardized cross-model comparison.
Multi-task evaluation: Validated across pan-cancer diagnosis, cell-type annotation, batch integration, gene regulatory network inference, and multi-omic integration.

Technical Details

PanFoMa couples a 6-layer Transformer encoder with parameters shared across layers to a 6-layer bidirectional Mamba decoder. To handle the large gene vocabulary efficiently, genes are processed in four chunks of 768 genes (3,072 genes sampled per epoch), with each gene encoded through separate gene-ID and binned expression-value embedding layers that are combined by element-wise addition. The model is pretrained generatively on large unlabeled single-cell datasets and then evaluated on downstream tasks. On the authors' reported benchmarks, PanFoMa reaches 94.74% accuracy on pan-cancer diagnosis (versus 90.13% for scGPT and 91.24% for Geneformer), 98.15% accuracy on hPancreas cell-type annotation, and a 0.9641 integration score on the Immune batch-integration task, exceeding the compared baselines including scGPT, Geneformer, and GeneMamba. The released GitHub repository is currently an early stub (README only) with no published model weights, dataset card, or explicit code license at the time of writing.

Applications

PanFoMa targets computational oncology and single-cell analysis workflows where researchers need efficient, transferable representations of tumor cells. Its demonstrated tasks—cell-type annotation, batch correction, gene regulatory network inference, multi-omic integration, and pan-cancer diagnosis—map directly onto common steps in tumor microenvironment characterization and cancer atlas building. PanFoMaBench additionally serves the broader community as a standardized yardstick for benchmarking single-cell foundation models on cancer data, a setting that general-purpose models such as scGPT and Geneformer were not specifically tuned for.

Impact

By focusing a single-cell foundation model on cancer and releasing a matched benchmark, PanFoMa addresses a gap left by general-purpose models trained on healthy or mixed tissue atlases. Its Transformer-Mamba hybrid is part of a broader trend of incorporating state-space models into single-cell modeling to control the cost of long gene sequences, and the accompanying PanFoMaBench offers a reusable evaluation resource for the field. As of its release the model's reported gains over scGPT, Geneformer, and GeneMamba are based on the authors' own benchmarks; independent validation and public release of pretrained weights would help establish its real-world utility.

Citation

PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer

Preprint

Huang, X., et al. (2025) PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer. Proceedings of the AAAI Conference on Artificial Intelligence.

DOI: 10.48550/arXiv.2512.03111

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References31

GitHub

Stars2

Forks0

Open Issues1

Contributors1

Last Push7mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

13Closed

Usability — can I run it?7

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Hybrid Transformer-Mamba design: A front-end local-context encoder with shared self-attention layers captures complex, order-independent gene interactions, while a back-end global sequential decoder uses a linear-time state-space model to integrate long-range context efficiently.

Lightweight and scalable: The state-space decoder avoids the quadratic cost of full attention over long gene sequences, balancing representational power with efficiency for large single-cell datasets.

Gene and expression embeddings: Each gene is represented by a discrete gene-ID embedding combined with a binned expression-value embedding, fused before encoding.

Pan-cancer benchmark (PanFoMaBench): A companion benchmark spanning 33 cancer subtypes and over 3.5 million high-quality cells from 616 patients, drawn from 83 published studies, enabling standardized cross-model comparison.

Multi-task evaluation: Validated across pan-cancer diagnosis, cell-type annotation, batch integration, gene regulatory network inference, and multi-omic integration.

Technical Details

Applications

Impact

PanFoMa

Key Features

Technical Details

Applications

Impact

Citation

PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

PanFoMa

Key Features

Technical Details

Applications

Impact

Citation

PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

PanFoMa

#Key Features

#Technical Details

#Applications

#Impact

Citation

PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

PanFoMa

#Key Features

#Technical Details

#Applications

#Impact

Citation

PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact