Chinese Academy of Sciences / Shanghai AI Laboratory
A conversational single-cell and spatial multi-omics brain agent pretrained on 130 million cells across species for zero-shot cell annotation and disease prediction.
scMOBA (single-cell Multi-Omics Brain Agent) is a conversational foundation model for analyzing single-cell and spatial multi-omics data from the brain across species. Single-cell and spatial assays are transforming our understanding of the developing, aging, and diseased brain, but integrating knowledge across modalities and across species remains a persistent challenge. scMOBA addresses this by coupling a large language model with a gene encoder, allowing biological data and natural-language questions to be reasoned over jointly within a single conversational system.
Developed by researchers at the Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology (Chinese Academy of Sciences) in Shanghai, in collaboration with Shanghai AI Laboratory, the model was released as a bioRxiv preprint in December 2025. It is positioned as a brain-specialized counterpart to general single-cell foundation models, pretrained specifically on a large cross-species and multi-omics brain corpus to capture the cellular heterogeneity underlying brain complexity.
The central design idea is a multi-omics Feature-Question-Answer framework that lets the agent generate biological insights from gene-level data together with text inputs. Rather than treating each analytical task as a separate trained model, scMOBA frames annotation, comparison, and prediction as questions answered against learned cellular representations, supporting zero-shot and few-shot use without task-specific retraining.
scMOBA is built from three components: a large language model backbone, a gene encoder that embeds single-cell and spatial multi-omics measurements, and a cross-attention projector that aligns the gene-derived representations with the language model so the two modalities can be reasoned over together. Pretraining used approximately 130 million single-cell and spatial multi-omics data points drawn from across the brain in multiple species and across development, aging, and disease states. According to the preprint, the model achieves state-of-the-art performance on fine-grained cell type classification across species and modalities, as well as on batch correction and multi-omics data integration. Exact parameter counts, the specific language-model backbone, and detailed benchmark scores are not reported in the publicly available metadata and are omitted here to avoid overstating the verified record.
scMOBA targets neuroscience and translational research workflows that depend on interpreting heterogeneous brain single-cell and spatial data. Researchers can use it for fine-grained cell type annotation, cross-species comparison of brain cell populations, construction of cell-type-specific aging clocks, and disease status prediction, while its fine-tuning support extends to data integration, trajectory analysis, and gene regulatory network inference. The conversational, question-answering interface lowers the barrier for biologists to query large multi-omics datasets, supporting studies of brain development, aging, and neurological disease.
By specializing a single-cell foundation model for the brain and pairing it with a conversational language-model interface, scMOBA aims to serve as a discovery engine for multi-omics neuroscience, advancing precision prediction and early intervention for neurological aging and disease. As a December 2025 preprint, its long-term adoption and independent validation remain to be established, and key implementation details such as model size and code availability were not confirmable from the available metadata at the time of writing.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data