bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cellRNA

CellHermes

Tongji University / Helmholtz Munich

An LLM-based single-cell foundation model that fine-tunes LLaMA-3.1-8B via LoRA on transcriptomic and PPI-network data reformulated as natural-language Q&A pairs.

Released: November 2025

CellHermes is a single-cell foundation model that reframes single-cell analysis as a natural-language problem, allowing a general-purpose large language model to serve as a unified encoder, predictor, and explainer across many downstream tasks. Rather than training a bespoke transformer on tokenized gene expression from scratch — the approach taken by models such as scGPT and Geneformer — CellHermes adapts an existing instruction-tuned LLM (LLaMA-3.1-8B-Instruct) to the single-cell domain through parameter-efficient fine-tuning.

The model's central idea is to convert two complementary sources of biological information — single-cell transcriptomic profiles and protein-protein interaction (PPI) network structure — into natural-language question-and-answer pairs. By learning over this reformulated corpus, CellHermes inherits the reasoning and generalization capacity of a modern LLM while grounding it in cell-state and gene-network knowledge. This lets a single set of weights handle ten distinct downstream single-cell tasks without per-task architecture changes or retraining from scratch.

CellHermes was developed by researchers at Tongji University in collaboration with the Theis Lab at Helmholtz Munich (co-authored by Fabian Theis), and posted to bioRxiv in November 2025. It sits in the emerging space of LLM-based cellular foundation models, where the goal is to unify representation learning, prediction, and interpretability under one language-native framework.

#Key Features

  • Language-native single-cell modeling: Transcriptomic profiles and PPI network structure are reformulated as natural-language Q&A pairs, so a general LLM can be applied directly to single-cell biology without a custom tokenizer.
  • Parameter-efficient fine-tuning: CellHermes adapts LLaMA-3.1-8B-Instruct using LoRA adapters, making it feasible to specialize an 8B-parameter backbone on single-cell data without full fine-tuning.
  • Unified encoder, predictor, and explainer: One model spans the full pipeline — learning cell and gene representations, making predictions, and producing interpretable, language-based rationales.
  • Ten downstream tasks, no retraining: A single multi-task adapter (trained across 7 databases) covers ten distinct single-cell tasks, including perturbation prediction, cell fitness prediction, and gene-interaction prediction.
  • Specialized checkpoints: Beyond the base pretrained model, the release includes a multi-task adapter and a dedicated T-cell-reactivity checkpoint for applications requiring task-specific tuning.

#Technical Details

CellHermes is built on LLaMA-3.1-8B-Instruct, an 8-billion-parameter decoder-only transformer, adapted via low-rank adaptation (LoRA) so that only a small set of adapter weights is updated during specialization. Training data combines single-cell transcriptomic measurements with protein-protein interaction network information, both serialized into instruction-style natural-language Q&A pairs; the multi-task instruction-tuning stage draws on seven databases spanning ten downstream tasks. The framework releases multiple checkpoints — a base pretrained model, a multi-task adapter, and a T-cell-reactivity adapter — each accessible through the same LLM interface. Code is distributed under the GPL-3.0 license, and weights are hosted on Hugging Face; note that the weights are released on a personal account rather than an organizational namespace.

#Applications

CellHermes targets computational biologists and single-cell researchers who want a single model that can move fluidly between tasks such as cell type annotation, gene expression and perturbation prediction, gene-interaction inference, cell fitness estimation, and T-cell reactivity prediction. Because predictions and their explanations are expressed in natural language, the framework is well suited to interpretability-focused workflows where understanding why a prediction was made matters as much as the prediction itself. The parameter-efficient design also lowers the barrier for groups that want to adapt a strong LLM backbone to their own single-cell datasets.

#Impact

CellHermes is part of a broader shift toward using general-purpose language models — rather than purpose-built expression transformers — as the substrate for single-cell foundation models, joining efforts to make biological reasoning language-native. By demonstrating that an off-the-shelf LLM, fine-tuned with LoRA on transcriptomic and network data cast as text, can serve as a unified encoder, predictor, and explainer across ten tasks, it offers a template for multi-task generalization without proliferating task-specific models. As a recent preprint (November 2025), its benchmark standing and adoption remain to be established through peer review and independent evaluation, and the personal-account hosting of weights is a practical consideration for groups planning to depend on it.

Citation

Language may be all omics needs: Harmonizing multimodal data for omics understanding with CellHermes

Preprint

Gao, Y., et al. (2025) Language may be all omics needs: Harmonizing multimodal data for omics understanding with CellHermes. bioRxiv.

DOI: 10.1101/2025.11.07.687322

Recent citations

Papers that recently cited this model.

  • Beyond alignment: synergistic integration is required for multimodal cell foundation models

    Till Richter, Eric Zimmermann, J. Hall, et al.

    bioRxiv · Mar 2026

    1
  • From modality-specific to compositional foundation models for cell biology.

    Mojtaba Bahrami, Till Richter, Niklas A. Schmacke, et al.

    Cell Systems · Feb 2026

    3

Top citations

The most-cited papers that cite this model.

  • From modality-specific to compositional foundation models for cell biology.

    Mojtaba Bahrami, Till Richter, Niklas A. Schmacke, et al.

    Cell Systems · Feb 2026

    3
  • Beyond alignment: synergistic integration is required for multimodal cell foundation models

    Till Richter, Eric Zimmermann, J. Hall, et al.

    bioRxiv · Mar 2026

    1

Citations

Total Citations2
Influential0
References76

GitHub

Stars29
Forks6
Open Issues1
Contributors1
Last Push5mo ago
LanguagePython
LicenseGPL-3.0

HuggingFace

Downloads11
Likes1
Last Modified8mo ago
Pipelinefeature-extraction

Fields of citing research

  • Biology100%
  • Computer Science100%
  • Medicine50%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible
55Partial
Usability — can I run it?55
Reproducibility — can I retrain it?69
Model Openness Framework
Unclassified
Restrictive license on core components

Resources

GitHub RepositoryResearch PaperHuggingFace ModelHuggingFace Model