CellHermes

An LLM-based single-cell foundation model that fine-tunes LLaMA-3.1-8B via LoRA on transcriptomic and PPI-network data reformulated as natural-language Q&A pairs.

Released: November 2025

CellHermes is a single-cell foundation model that reframes single-cell analysis as a natural-language problem, allowing a general-purpose large language model to serve as a unified encoder, predictor, and explainer across many downstream tasks. Rather than training a bespoke transformer on tokenized gene expression from scratch — the approach taken by models such as scGPT and Geneformer — CellHermes adapts an existing instruction-tuned LLM (LLaMA-3.1-8B-Instruct) to the single-cell domain through parameter-efficient fine-tuning.

The model's central idea is to convert two complementary sources of biological information — single-cell transcriptomic profiles and protein-protein interaction (PPI) network structure — into natural-language question-and-answer pairs. By learning over this reformulated corpus, CellHermes inherits the reasoning and generalization capacity of a modern LLM while grounding it in cell-state and gene-network knowledge. This lets a single set of weights handle ten distinct downstream single-cell tasks without per-task architecture changes or retraining from scratch.

CellHermes was developed by researchers at Tongji University in collaboration with the Theis Lab at Helmholtz Munich (co-authored by Fabian Theis), and posted to bioRxiv in November 2025. It sits in the emerging space of LLM-based cellular foundation models, where the goal is to unify representation learning, prediction, and interpretability under one language-native framework.

Key Features

Language-native single-cell modeling: Transcriptomic profiles and PPI network structure are reformulated as natural-language Q&A pairs, so a general LLM can be applied directly to single-cell biology without a custom tokenizer.
Parameter-efficient fine-tuning: CellHermes adapts LLaMA-3.1-8B-Instruct using LoRA adapters, making it feasible to specialize an 8B-parameter backbone on single-cell data without full fine-tuning.
Unified encoder, predictor, and explainer: One model spans the full pipeline — learning cell and gene representations, making predictions, and producing interpretable, language-based rationales.
Ten downstream tasks, no retraining: A single multi-task adapter (trained across 7 databases) covers ten distinct single-cell tasks, including perturbation prediction, cell fitness prediction, and gene-interaction prediction.
Specialized checkpoints: Beyond the base pretrained model, the release includes a multi-task adapter and a dedicated T-cell-reactivity checkpoint for applications requiring task-specific tuning.

Technical Details

CellHermes is built on LLaMA-3.1-8B-Instruct, an 8-billion-parameter decoder-only transformer, adapted via low-rank adaptation (LoRA) so that only a small set of adapter weights is updated during specialization. Training data combines single-cell transcriptomic measurements with protein-protein interaction network information, both serialized into instruction-style natural-language Q&A pairs; the multi-task instruction-tuning stage draws on seven databases spanning ten downstream tasks. The framework releases multiple checkpoints — a base pretrained model, a multi-task adapter, and a T-cell-reactivity adapter — each accessible through the same LLM interface. Code is distributed under the GPL-3.0 license, and weights are hosted on Hugging Face; note that the weights are released on a personal account rather than an organizational namespace.

Applications

CellHermes targets computational biologists and single-cell researchers who want a single model that can move fluidly between tasks such as cell type annotation, gene expression and perturbation prediction, gene-interaction inference, cell fitness estimation, and T-cell reactivity prediction. Because predictions and their explanations are expressed in natural language, the framework is well suited to interpretability-focused workflows where understanding why a prediction was made matters as much as the prediction itself. The parameter-efficient design also lowers the barrier for groups that want to adapt a strong LLM backbone to their own single-cell datasets.

Impact

CellHermes is part of a broader shift toward using general-purpose language models — rather than purpose-built expression transformers — as the substrate for single-cell foundation models, joining efforts to make biological reasoning language-native. By demonstrating that an off-the-shelf LLM, fine-tuned with LoRA on transcriptomic and network data cast as text, can serve as a unified encoder, predictor, and explainer across ten tasks, it offers a template for multi-task generalization without proliferating task-specific models. As a recent preprint (November 2025), its benchmark standing and adoption remain to be established through peer review and independent evaluation, and the personal-account hosting of weights is a practical consideration for groups planning to depend on it.

Citation

Language may be all omics needs: Harmonizing multimodal data for omics understanding with CellHermes

Preprint

Gao, Y., et al. (2025) Language may be all omics needs: Harmonizing multimodal data for omics understanding with CellHermes. bioRxiv.

DOI: 10.1101/2025.11.07.687322

Recent citations

Papers that recently cited this model.

Beyond alignment: synergistic integration is required for multimodal cell foundation models
Till Richter, Eric Zimmermann, J. Hall, et al.
bioRxiv · Mar 2026
1
From modality-specific to compositional foundation models for cell biology.
Mojtaba Bahrami, Till Richter, Niklas A. Schmacke, et al.
Cell Systems · Feb 2026
3

Top citations

The most-cited papers that cite this model.

From modality-specific to compositional foundation models for cell biology.
Mojtaba Bahrami, Till Richter, Niklas A. Schmacke, et al.
Cell Systems · Feb 2026
3
Beyond alignment: synergistic integration is required for multimodal cell foundation models
Till Richter, Eric Zimmermann, J. Hall, et al.
bioRxiv · Mar 2026
1

Citations

Total Citations2

Influential0

References76

GitHub

Stars29

Forks6

Open Issues1

Contributors1

Last Push5mo ago

LanguagePython

LicenseGPL-3.0

HuggingFace

Downloads11

Likes1

Last Modified8mo ago

Pipelinefeature-extraction

Fields of citing research

Biology100%
Computer Science100%
Medicine50%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

55Partial

Usability — can I run it?55

Reproducibility — can I retrain it?69

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model HuggingFace Model

Key Features

Language-native single-cell modeling: Transcriptomic profiles and PPI network structure are reformulated as natural-language Q&A pairs, so a general LLM can be applied directly to single-cell biology without a custom tokenizer.

Parameter-efficient fine-tuning: CellHermes adapts LLaMA-3.1-8B-Instruct using LoRA adapters, making it feasible to specialize an 8B-parameter backbone on single-cell data without full fine-tuning.

Unified encoder, predictor, and explainer: One model spans the full pipeline — learning cell and gene representations, making predictions, and producing interpretable, language-based rationales.

Ten downstream tasks, no retraining: A single multi-task adapter (trained across 7 databases) covers ten distinct single-cell tasks, including perturbation prediction, cell fitness prediction, and gene-interaction prediction.

Specialized checkpoints: Beyond the base pretrained model, the release includes a multi-task adapter and a dedicated T-cell-reactivity checkpoint for applications requiring task-specific tuning.

Technical Details

Applications

Impact

CellHermes

#Key Features

#Technical Details

#Applications

#Impact

Citation

Language may be all omics needs: Harmonizing multimodal data for omics understanding with CellHermes

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Resources

CellHermes

#Key Features

#Technical Details

#Applications

#Impact

Citation

Language may be all omics needs: Harmonizing multimodal data for omics understanding with CellHermes

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact