bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

RegFormer

BGI Research

GRN-informed single-cell foundation model combining gene regulatory hierarchy priors with long-sequence Mamba modeling for clustering, batch integration, perturbation modeling, and drug response prediction.

Released: April 2026

RegFormer is a single-cell foundation model developed by BGI Research and first posted to bioRxiv in January 2025 (later published in Nature Communications in 2026) that combines gene regulatory network (GRN) priors with long-sequence Mamba state-space modeling. By incorporating regulatory-hierarchy priors derived from public GRN databases, RegFormer biases its representations toward biologically meaningful regulatory dependencies rather than relying purely on data-driven attention.

The Mamba backbone enables efficient long-sequence modeling — operating over thousands of gene tokens per cell — at lower computational cost than full self-attention transformers. Across clustering, batch integration, perturbation modeling, and drug response prediction benchmarks, RegFormer consistently outperforms scGPT, Geneformer, scFoundation, and scBERT.

#Key Features

  • Gene regulatory network priors: Regulatory hierarchies from public GRN databases shape model architecture and training, biasing learned representations toward known regulatory dependencies.
  • Mamba long-sequence backbone: State-space architecture enables efficient processing of thousands of gene tokens per cell without quadratic attention cost.
  • Multi-task SOTA: Consistently outperforms scGPT, Geneformer, scFoundation, and scBERT on clustering, batch integration, perturbation modeling, and drug response prediction.
  • Knowledge-data integration: Demonstrates that regulatory priors provide signal beyond what scale alone delivers.
  • Open code and weights: Published in Nature Communications with code and model weights released for community use.

#Technical Details

RegFormer uses a Mamba-based state-space backbone with GRN-derived priors integrated through gene-token embeddings. Pretraining is self-supervised over a large pan-tissue scRNA-seq corpus. The published paper reports architectural details, training schedule, GRN preprocessing, and comprehensive benchmark comparisons against prior single-cell FMs.

#Applications

RegFormer is suited for single-cell research groups working on perturbation response prediction, drug response modeling, and integrated multi-batch analysis. The GRN-priored architecture is particularly valuable when ground-truth regulatory knowledge is available for the system under study and when interpretable representations are desired.

#Impact

RegFormer is among the first single-cell foundation models to combine state-space architectures (Mamba) with explicit biological priors (GRNs), establishing a useful template for knowledge-augmented single-cell FMs. The consistent improvements over scGPT, Geneformer, scFoundation, and scBERT on multiple downstream tasks suggest that informative biological priors continue to provide meaningful signal even at the foundation-model scale.

Citations

RegFormer: a single-cell foundation model powered by gene regulatory hierarchies

Hu, L., et al. (2026) RegFormer: a single-cell foundation model powered by gene regulatory hierarchies. Nature Communications.

DOI: 10.1038/s41467-026-72198-x

RegFormer: A Single-Cell Foundation Model Powered by Gene Regulatory Hierarchies

Preprint

Hu, L., et al. (2025) RegFormer: A Single-Cell Foundation Model Powered by Gene Regulatory Hierarchies. bioRxiv.

DOI: 10.1101/2025.01.24.634217

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations3
Influential0
References26

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
10Closed
Usability — can I run it?7
Reproducibility — can I retrain it?13
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

batch_integrationcell_clusteringdrug_response_predictionfoundation_modelgene_regulatory_networkmambaperturbation_modelingself_supervisedsingle_cell_transcriptomestate_space_model

Resources

Research PaperResearch Paper