Tongji University / Helmholtz Munich
An LLM-based single-cell foundation model that fine-tunes LLaMA-3.1-8B via LoRA on transcriptomic and PPI-network data reformulated as natural-language Q&A pairs.
CellHermes is a single-cell foundation model that reframes single-cell analysis as a natural-language problem, allowing a general-purpose large language model to serve as a unified encoder, predictor, and explainer across many downstream tasks. Rather than training a bespoke transformer on tokenized gene expression from scratch — the approach taken by models such as scGPT and Geneformer — CellHermes adapts an existing instruction-tuned LLM (LLaMA-3.1-8B-Instruct) to the single-cell domain through parameter-efficient fine-tuning.
The model's central idea is to convert two complementary sources of biological information — single-cell transcriptomic profiles and protein-protein interaction (PPI) network structure — into natural-language question-and-answer pairs. By learning over this reformulated corpus, CellHermes inherits the reasoning and generalization capacity of a modern LLM while grounding it in cell-state and gene-network knowledge. This lets a single set of weights handle ten distinct downstream single-cell tasks without per-task architecture changes or retraining from scratch.
CellHermes was developed by researchers at Tongji University in collaboration with the Theis Lab at Helmholtz Munich (co-authored by Fabian Theis), and posted to bioRxiv in November 2025. It sits in the emerging space of LLM-based cellular foundation models, where the goal is to unify representation learning, prediction, and interpretability under one language-native framework.
CellHermes is built on LLaMA-3.1-8B-Instruct, an 8-billion-parameter decoder-only transformer, adapted via low-rank adaptation (LoRA) so that only a small set of adapter weights is updated during specialization. Training data combines single-cell transcriptomic measurements with protein-protein interaction network information, both serialized into instruction-style natural-language Q&A pairs; the multi-task instruction-tuning stage draws on seven databases spanning ten downstream tasks. The framework releases multiple checkpoints — a base pretrained model, a multi-task adapter, and a T-cell-reactivity adapter — each accessible through the same LLM interface. Code is distributed under the GPL-3.0 license, and weights are hosted on Hugging Face; note that the weights are released on a personal account rather than an organizational namespace.
CellHermes targets computational biologists and single-cell researchers who want a single model that can move fluidly between tasks such as cell type annotation, gene expression and perturbation prediction, gene-interaction inference, cell fitness estimation, and T-cell reactivity prediction. Because predictions and their explanations are expressed in natural language, the framework is well suited to interpretability-focused workflows where understanding why a prediction was made matters as much as the prediction itself. The parameter-efficient design also lowers the barrier for groups that want to adapt a strong LLM backbone to their own single-cell datasets.
CellHermes is part of a broader shift toward using general-purpose language models — rather than purpose-built expression transformers — as the substrate for single-cell foundation models, joining efforts to make biological reasoning language-native. By demonstrating that an off-the-shelf LLM, fine-tuned with LoRA on transcriptomic and network data cast as text, can serve as a unified encoder, predictor, and explainer across ten tasks, it offers a template for multi-task generalization without proliferating task-specific models. As a recent preprint (November 2025), its benchmark standing and adoption remain to be established through peer review and independent evaluation, and the personal-account hosting of weights is a practical consideration for groups planning to depend on it.
Gao, Y., et al. (2025) Language may be all omics needs: Harmonizing multimodal data for omics understanding with CellHermes. bioRxiv.
DOI: 10.1101/2025.11.07.687322Papers that recently cited this model.
Till Richter, Eric Zimmermann, J. Hall, et al.
bioRxiv · Mar 2026
Mojtaba Bahrami, Till Richter, Niklas A. Schmacke, et al.
Cell Systems · Feb 2026
The most-cited papers that cite this model.
Mojtaba Bahrami, Till Richter, Niklas A. Schmacke, et al.
Cell Systems · Feb 2026
Till Richter, Eric Zimmermann, J. Hall, et al.
bioRxiv · Mar 2026
Share of papers citing this model.