bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cellLanguage model

scMOBA

Chinese Academy of Sciences / Shanghai AI Laboratory

A conversational single-cell and spatial multi-omics brain agent pretrained on 130 million cells across species for zero-shot cell annotation and disease prediction.

Released: December 2025

scMOBA (single-cell Multi-Omics Brain Agent) is a conversational foundation model for analyzing single-cell and spatial multi-omics data from the brain across species. Single-cell and spatial assays are transforming our understanding of the developing, aging, and diseased brain, but integrating knowledge across modalities and across species remains a persistent challenge. scMOBA addresses this by coupling a large language model with a gene encoder, allowing biological data and natural-language questions to be reasoned over jointly within a single conversational system.

Developed by researchers at the Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology (Chinese Academy of Sciences) in Shanghai, in collaboration with Shanghai AI Laboratory, the model was released as a bioRxiv preprint in December 2025. It is positioned as a brain-specialized counterpart to general single-cell foundation models, pretrained specifically on a large cross-species and multi-omics brain corpus to capture the cellular heterogeneity underlying brain complexity.

The central design idea is a multi-omics Feature-Question-Answer framework that lets the agent generate biological insights from gene-level data together with text inputs. Rather than treating each analytical task as a separate trained model, scMOBA frames annotation, comparison, and prediction as questions answered against learned cellular representations, supporting zero-shot and few-shot use without task-specific retraining.

#Key Features

  • Conversational multi-omics agent: Combines a large language model, a gene encoder, and a cross-attention projector so that single-cell measurements and natural-language prompts can be processed within one system.
  • Brain-focused, cross-species pretraining: Pretrained on roughly 130 million single-cell and spatial multi-omics data points spanning the brain across diverse species, developmental stages, aging, and disease.
  • Feature-Question-Answer framework: A multi-omics prompting scheme that turns analytical tasks into questions answered from gene and text inputs, enabling generalization without additional training.
  • Zero-shot and few-shot generalization: Supports fine-grained cell type annotation, cross-species comparison, aging-clock construction, and disease status prediction with little or no labeled data.
  • Fine-tunable for downstream tasks: Adaptable to customized fine-tuning for multi-omics data integration, cell trajectory analysis, and gene regulatory network inference.

#Technical Details

scMOBA is built from three components: a large language model backbone, a gene encoder that embeds single-cell and spatial multi-omics measurements, and a cross-attention projector that aligns the gene-derived representations with the language model so the two modalities can be reasoned over together. Pretraining used approximately 130 million single-cell and spatial multi-omics data points drawn from across the brain in multiple species and across development, aging, and disease states. According to the preprint, the model achieves state-of-the-art performance on fine-grained cell type classification across species and modalities, as well as on batch correction and multi-omics data integration. Exact parameter counts, the specific language-model backbone, and detailed benchmark scores are not reported in the publicly available metadata and are omitted here to avoid overstating the verified record.

#Applications

scMOBA targets neuroscience and translational research workflows that depend on interpreting heterogeneous brain single-cell and spatial data. Researchers can use it for fine-grained cell type annotation, cross-species comparison of brain cell populations, construction of cell-type-specific aging clocks, and disease status prediction, while its fine-tuning support extends to data integration, trajectory analysis, and gene regulatory network inference. The conversational, question-answering interface lowers the barrier for biologists to query large multi-omics datasets, supporting studies of brain development, aging, and neurological disease.

#Impact

By specializing a single-cell foundation model for the brain and pairing it with a conversational language-model interface, scMOBA aims to serve as a discovery engine for multi-omics neuroscience, advancing precision prediction and early intervention for neurological aging and disease. As a December 2025 preprint, its long-term adoption and independent validation remain to be established, and key implementation details such as model size and code availability were not confirmable from the available metadata at the time of writing.

Citation

DOI: 10.64898/2025.12.01.691565

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
5Closed
Usability — can I run it?7
Reproducibility — can I retrain it?0
not reproducible
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cell_biologycell_type_annotationdata_integrationdisease_status_predictionfoundation_modellanguage_modelspatial_transcriptomicstransformerzero_shot

Resources

Research Paper