mLLMCelltype

Multi-LLM consensus framework for automated cell type annotation in scRNA-seq data, outperforming prior methods by ~15% in mean accuracy.

Released: April 2025

mLLMCelltype is an iterative multi-LLM consensus framework for automated cell type annotation in single-cell RNA sequencing (scRNA-seq) data. Developed by Chen Yang, Xianyang Zhang, and Jun Chen at Texas A&M University and Mayo Clinic, and posted as a bioRxiv preprint in April 2025, the tool addresses a longstanding bottleneck in single-cell analysis: manual annotation of cell clusters is labor-intensive, requires deep domain expertise, and does not scale to the size of modern atlasing studies. Rather than relying on a single model, mLLMCelltype orchestrates multiple state-of-the-art large language models in a structured deliberation process, then aggregates their outputs into a single, uncertainty-aware annotation call.

The consensus strategy draws on a well-established principle in machine learning — that ensembling diverse models reduces individual error and improves generalization — and applies it to the biological knowledge encoded in frontier LLMs. Each participating model independently evaluates the marker gene signatures for each cluster, and a cross-model deliberation round allows the models to exchange reasoning and refine their initial calls before a final consensus is reached. This transparent reasoning chain sets mLLMCelltype apart from black-box automated annotation tools and gives researchers visibility into why a given annotation was or was not confidently assigned.

Critically, mLLMCelltype requires no reference dataset or specialized fine-tuning. The framework passes cluster-level marker gene lists directly to the LLMs as structured prompts, which means it can be applied to any tissue or organism for which marker gene lists can be derived, without the domain-locking that affects supervised classifiers trained on specific cell atlases.

Key Features

Multi-LLM consensus deliberation: Multiple LLMs from providers including OpenAI (GPT series), Anthropic (Claude series), Google (Gemini series), and DeepSeek independently annotate clusters and then participate in iterative cross-model discussion rounds before a final consensus label is produced.
Uncertainty quantification: Each annotation is accompanied by a Consensus Proportion score and Shannon Entropy value, giving researchers a principled way to identify ambiguous clusters that warrant manual review — reducing both over-annotation and missed cell types.
Reference-free operation: The framework requires only a ranked list of marker genes per cluster and LLM API access. No labeled training data, curated reference atlas, or domain-specific fine-tuning is needed, enabling immediate application to novel tissue types and non-model organisms.
Transparent reasoning chains: The deliberation process exposes the intermediate reasoning each LLM generates, making annotation decisions auditable and enabling researchers to assess biological plausibility step by step.
Dual-language implementation: Available as both an R package and a Python package (pip install mllmcelltype), integrating directly into standard Seurat- and Scanpy-based workflows.
Extensible LLM backend: New LLMs can be added to the consensus pool without modifying core framework logic, ensuring the tool benefits from future improvements in frontier models.

Technical Details

mLLMCelltype is a prompt-engineering and orchestration framework rather than a trained neural network. It does not have its own learned parameters; instead, it systematically queries external LLM APIs. The input to each model is a structured natural-language prompt containing the ranked marker gene list for a given cluster, contextual tissue metadata when available, and instructions to reason about which cell type the marker profile most likely represents. In the deliberation phase, model outputs are shared across the ensemble and each model is prompted to reconsider its annotation in light of the other models' reasoning — an approach analogous to structured expert panel review.

Benchmarking across 50 diverse scRNA-seq datasets spanning 26 tissue types and more than 8 million cells demonstrated a mean accuracy of 77.3%, compared to 61.3% for the best prior single-model or supervised-classifier baseline — a gain of nearly 15 percentage points. On some datasets, accuracy reached 95%. The framework's uncertainty metrics (Consensus Proportion and Shannon Entropy) showed strong calibration: clusters flagged as ambiguous exhibited lower ground-truth agreement, validating the practical utility of the uncertainty outputs for prioritizing expert review.

Applications

mLLMCelltype is designed for any scRNA-seq study that requires cell type annotation at scale, from single-tissue experiments to large-scale cell atlas projects. It is particularly valuable in settings where curated reference datasets are unavailable — rare tissue types, non-model organisms, or disease contexts that differ substantially from healthy-tissue atlases. Computational biologists can integrate it directly into Seurat or Scanpy workflows using the R or Python packages. The uncertainty outputs are actionable in practice: research groups can set a Consensus Proportion threshold to route confidently annotated clusters directly to downstream analysis while reserving uncertain clusters for manual curation, optimizing the trade-off between throughput and annotation quality.

Impact

As a preprint published in April 2025, mLLMCelltype represents an early but methodologically significant application of frontier LLM consensus strategies to a core task in single-cell genomics. The reported 15-percentage-point accuracy gain over prior state-of-the-art methods is substantial relative to the margins typical of incremental improvements in this field. By decoupling annotation quality from the availability of curated reference atlases and enabling transparent, uncertainty-aware annotation, the framework lowers the barrier for single-cell studies in under-characterized biological systems. A key limitation is that the framework depends on paid LLM API access, which introduces per-run costs and raises reproducibility concerns as commercial model versions are updated or deprecated. Its performance also depends on the quality and completeness of the input marker gene lists, meaning that upstream clustering quality remains a critical confounding factor.

Citation

Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data

Preprint

Yang, C., et al. (2025) Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data. bioRxiv.

DOI: 10.1101/2025.04.10.647852

Recent citations

Papers that recently cited this model.

NanoCellAnnotator: Formalizing Expert Cell Type Annotation with Large Language Models
Md Ishtyaq Mahmud, V. Kochat, Humaira Anzum, et al.
bioRxiv · Jun 2026
0
Computational modelling of cell identity
Woo Jun Shim, C. Chow, Shaine Chenxin Bao, et al.
Biochemical Journal · May 2026
0
Macrophage reprogramming nodes for bone repair identified by single-cell and spatial omics.
Hang Chen, Chang Lei, Giselle C. Yeo, et al.
Bone · May 2026
0

Top citations

The most-cited papers that cite this model.

ChatSpatial: Schema-Enforced Agentic Orchestration for Reproducible and Cross-Platform Spatial Transcriptomics
Chen Yang, Xianyang Zhang, Jun Chen
bioRxiv · Mar 2026
3
Benchmarking large language models for cell typing in single-cell RNA-Seq
Tianxiang Xiao, Dezhi Hua, Yanan Wang, et al.
Briefings Bioinform. · Nov 2025
2
Model confrontation and collaboration: A debate intelligence framework for enhancing medical reasoning in large language models
Xinti Sun, Q. Hong, Mengyang Zhang, et al.
Cell Reports Medicine · Jan 2026
1
Generative Artificial Intelligence in Bioinformatics: A Systematic Review of Models, Applications, and Methodological Advances
Riasad Alvi, Sayeem Been Zaman, Wasimul Karim, et al.
arXiv.org · Nov 2025
1Influential
LLM-Assisted Functional Gene Annotation
C. Boyce, C. Pereira, G. Kim, et al.
bioRxiv · Aug 2025
1

Citations

Total Citations10

Influential1

References0

GitHub

Stars653

Forks57

Open Issues9

Contributors3

Last Push5d ago

LanguagePython

LicenseMIT

Fields of citing research

Biology80%
Medicine80%
Computer Science70%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

37Closed

Usability — can I run it?73

Reproducibility — can I retrain it?0

open weights, closed recipenot reproducible

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper Official Website Documentation Documentation

Key Features

Multi-LLM consensus deliberation: Multiple LLMs from providers including OpenAI (GPT series), Anthropic (Claude series), Google (Gemini series), and DeepSeek independently annotate clusters and then participate in iterative cross-model discussion rounds before a final consensus label is produced.

Uncertainty quantification: Each annotation is accompanied by a Consensus Proportion score and Shannon Entropy value, giving researchers a principled way to identify ambiguous clusters that warrant manual review — reducing both over-annotation and missed cell types.

Reference-free operation: The framework requires only a ranked list of marker genes per cluster and LLM API access. No labeled training data, curated reference atlas, or domain-specific fine-tuning is needed, enabling immediate application to novel tissue types and non-model organisms.

Transparent reasoning chains: The deliberation process exposes the intermediate reasoning each LLM generates, making annotation decisions auditable and enabling researchers to assess biological plausibility step by step.

Dual-language implementation: Available as both an R package and a Python package (pip install mllmcelltype), integrating directly into standard Seurat- and Scanpy-based workflows.

Extensible LLM backend: New LLMs can be added to the consensus pool without modifying core framework logic, ensuring the tool benefits from future improvements in frontier models.

Technical Details

Applications

Impact

mLLMCelltype

#Key Features

#Technical Details

#Applications

#Impact

Citation

Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

mLLMCelltype

#Key Features

#Technical Details

#Applications

#Impact

Citation

Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact