Multi-LLM consensus framework for automated cell type annotation in scRNA-seq data, outperforming prior methods by ~15% in mean accuracy.
mLLMCelltype is an iterative multi-LLM consensus framework for automated cell type annotation in single-cell RNA sequencing (scRNA-seq) data. Developed by Chen Yang, Xianyang Zhang, and Jun Chen at Texas A&M University and Mayo Clinic, and posted as a bioRxiv preprint in April 2025, the tool addresses a longstanding bottleneck in single-cell analysis: manual annotation of cell clusters is labor-intensive, requires deep domain expertise, and does not scale to the size of modern atlasing studies. Rather than relying on a single model, mLLMCelltype orchestrates multiple state-of-the-art large language models in a structured deliberation process, then aggregates their outputs into a single, uncertainty-aware annotation call.
The consensus strategy draws on a well-established principle in machine learning — that ensembling diverse models reduces individual error and improves generalization — and applies it to the biological knowledge encoded in frontier LLMs. Each participating model independently evaluates the marker gene signatures for each cluster, and a cross-model deliberation round allows the models to exchange reasoning and refine their initial calls before a final consensus is reached. This transparent reasoning chain sets mLLMCelltype apart from black-box automated annotation tools and gives researchers visibility into why a given annotation was or was not confidently assigned.
Critically, mLLMCelltype requires no reference dataset or specialized fine-tuning. The framework passes cluster-level marker gene lists directly to the LLMs as structured prompts, which means it can be applied to any tissue or organism for which marker gene lists can be derived, without the domain-locking that affects supervised classifiers trained on specific cell atlases.
pip install mllmcelltype), integrating directly into standard Seurat- and Scanpy-based workflows.mLLMCelltype is a prompt-engineering and orchestration framework rather than a trained neural network. It does not have its own learned parameters; instead, it systematically queries external LLM APIs. The input to each model is a structured natural-language prompt containing the ranked marker gene list for a given cluster, contextual tissue metadata when available, and instructions to reason about which cell type the marker profile most likely represents. In the deliberation phase, model outputs are shared across the ensemble and each model is prompted to reconsider its annotation in light of the other models' reasoning — an approach analogous to structured expert panel review.
Benchmarking across 50 diverse scRNA-seq datasets spanning 26 tissue types and more than 8 million cells demonstrated a mean accuracy of 77.3%, compared to 61.3% for the best prior single-model or supervised-classifier baseline — a gain of nearly 15 percentage points. On some datasets, accuracy reached 95%. The framework's uncertainty metrics (Consensus Proportion and Shannon Entropy) showed strong calibration: clusters flagged as ambiguous exhibited lower ground-truth agreement, validating the practical utility of the uncertainty outputs for prioritizing expert review.
mLLMCelltype is designed for any scRNA-seq study that requires cell type annotation at scale, from single-tissue experiments to large-scale cell atlas projects. It is particularly valuable in settings where curated reference datasets are unavailable — rare tissue types, non-model organisms, or disease contexts that differ substantially from healthy-tissue atlases. Computational biologists can integrate it directly into Seurat or Scanpy workflows using the R or Python packages. The uncertainty outputs are actionable in practice: research groups can set a Consensus Proportion threshold to route confidently annotated clusters directly to downstream analysis while reserving uncertain clusters for manual curation, optimizing the trade-off between throughput and annotation quality.
As a preprint published in April 2025, mLLMCelltype represents an early but methodologically significant application of frontier LLM consensus strategies to a core task in single-cell genomics. The reported 15-percentage-point accuracy gain over prior state-of-the-art methods is substantial relative to the margins typical of incremental improvements in this field. By decoupling annotation quality from the availability of curated reference atlases and enabling transparent, uncertainty-aware annotation, the framework lowers the barrier for single-cell studies in under-characterized biological systems. A key limitation is that the framework depends on paid LLM API access, which introduces per-run costs and raises reproducibility concerns as commercial model versions are updated or deprecated. Its performance also depends on the quality and completeness of the input marker gene lists, meaning that upstream clustering quality remains a critical confounding factor.
Yang, C., et al. (2025) Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data. bioRxiv.
DOI: 10.1101/2025.04.10.647852