scMOBA

Chinese Academy of Sciences / Shanghai AI Laboratory

Conversational single-cell and spatial multi-omics brain foundation model, with zero-shot cell annotation and disease prediction across species.

Released: December 2025

scMOBA (single-cell Multi-Omics Brain Agent) is a conversational foundation model for analyzing single-cell and spatial multi-omics data from the brain across species. Single-cell and spatial assays are transforming our understanding of the developing, aging, and diseased brain, but integrating knowledge across modalities and across species remains a persistent challenge. scMOBA addresses this by coupling a large language model with a gene encoder, allowing biological data and natural-language questions to be reasoned over jointly within a single conversational system.

Developed by researchers at the Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology (Chinese Academy of Sciences) in Shanghai, in collaboration with Shanghai AI Laboratory, the model was released as a bioRxiv preprint in December 2025. It is positioned as a brain-specialized counterpart to general single-cell foundation models, pretrained specifically on a large cross-species and multi-omics brain corpus to capture the cellular heterogeneity underlying brain complexity.

The central design idea is a multi-omics Feature-Question-Answer framework that lets the agent generate biological insights from gene-level data together with text inputs. Rather than treating each analytical task as a separate trained model, scMOBA frames annotation, comparison, and prediction as questions answered against learned cellular representations, supporting zero-shot and few-shot use without task-specific retraining.

Key Features

Conversational multi-omics agent: Combines a large language model, a gene encoder, and a cross-attention projector so that single-cell measurements and natural-language prompts can be processed within one system.
Brain-focused, cross-species pretraining: Pretrained on roughly 130 million single-cell and spatial multi-omics data points spanning the brain across diverse species, developmental stages, aging, and disease.
Feature-Question-Answer framework: A multi-omics prompting scheme that turns analytical tasks into questions answered from gene and text inputs, enabling generalization without additional training.
Zero-shot and few-shot generalization: Supports fine-grained cell type annotation, cross-species comparison, aging-clock construction, and disease status prediction with little or no labeled data.
Fine-tunable for downstream tasks: Adaptable to customized fine-tuning for multi-omics data integration, cell trajectory analysis, and gene regulatory network inference.

Technical Details

scMOBA is built from three components: a large language model backbone, a gene encoder that embeds single-cell and spatial multi-omics measurements, and a cross-attention projector that aligns the gene-derived representations with the language model so the two modalities can be reasoned over together. Pretraining used approximately 130 million single-cell and spatial multi-omics data points drawn from across the brain in multiple species and across development, aging, and disease states. According to the preprint, the model achieves state-of-the-art performance on fine-grained cell type classification across species and modalities, as well as on batch correction and multi-omics data integration. Exact parameter counts, the specific language-model backbone, and detailed benchmark scores are not reported in the publicly available metadata and are omitted here to avoid overstating the verified record.

Applications

scMOBA targets neuroscience and translational research workflows that depend on interpreting heterogeneous brain single-cell and spatial data. Researchers can use it for fine-grained cell type annotation, cross-species comparison of brain cell populations, construction of cell-type-specific aging clocks, and disease status prediction, while its fine-tuning support extends to data integration, trajectory analysis, and gene regulatory network inference. The conversational, question-answering interface lowers the barrier for biologists to query large multi-omics datasets, supporting studies of brain development, aging, and neurological disease.

Impact

By specializing a single-cell foundation model for the brain and pairing it with a conversational language-model interface, scMOBA aims to serve as a discovery engine for multi-omics neuroscience, advancing precision prediction and early intervention for neurological aging and disease. As a December 2025 preprint, its long-term adoption and independent validation remain to be established, and key implementation details such as model size and code availability were not confirmable from the available metadata at the time of writing.

Citation

scMOBA: A conversational single-cell Multi-Omics Brain Agent across species

Wei, R., et al. (2025) scMOBA: A conversational single-cell Multi-Omics Brain Agent across species. bioRxiv.

DOI: 10.64898/2025.12.01.691565

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

5Closed

Usability — can I run it?7

Reproducibility — can I retrain it?0

not reproducible

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper

Key Features

Conversational multi-omics agent: Combines a large language model, a gene encoder, and a cross-attention projector so that single-cell measurements and natural-language prompts can be processed within one system.

Brain-focused, cross-species pretraining: Pretrained on roughly 130 million single-cell and spatial multi-omics data points spanning the brain across diverse species, developmental stages, aging, and disease.

Feature-Question-Answer framework: A multi-omics prompting scheme that turns analytical tasks into questions answered from gene and text inputs, enabling generalization without additional training.

Zero-shot and few-shot generalization: Supports fine-grained cell type annotation, cross-species comparison, aging-clock construction, and disease status prediction with little or no labeled data.

Fine-tunable for downstream tasks: Adaptable to customized fine-tuning for multi-omics data integration, cell trajectory analysis, and gene regulatory network inference.

Technical Details

Applications

Impact

scMOBA

Key Features

Technical Details

Applications

Impact

Citation

scMOBA: A conversational single-cell Multi-Omics Brain Agent across species

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

scMOBA

Key Features

Technical Details

Applications

Impact

Citation

scMOBA: A conversational single-cell Multi-Omics Brain Agent across species

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

scMOBA

#Key Features

#Technical Details

#Applications

#Impact

Citation

scMOBA: A conversational single-cell Multi-Omics Brain Agent across species

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

scMOBA

#Key Features

#Technical Details

#Applications

#Impact

Citation

scMOBA: A conversational single-cell Multi-Omics Brain Agent across species

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact