bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Spatial omics foundation models
Spatial omicsSingle-cell

TissueNarrator

Carnegie Mellon University

An LLM-based generative model that treats spatially resolved transcriptomes as 'spatial sentences' to simulate cellular profiles, predict cell-cell interactions, and run in silico perturbations.

Released: November 2025

TissueNarrator reframes the analysis of spatially resolved transcriptomics as a language modeling problem. Spatial transcriptomics technologies such as MERFISH and CosMx SMI measure gene expression while preserving the physical location of each cell within a tissue section, producing rich maps of cellular state and neighborhood context. Making generative, predictive use of this data—rather than merely descriptive clustering—has remained difficult because cellular profiles, spatial coordinates, and metadata do not naturally fit a single model. TissueNarrator addresses this by encoding each tissue section as a sequence of "spatial sentences": ranked lists of expressed genes paired with spatial coordinates and metadata, rendered as text that a large language model can read and generate.

Developed by Jian Ma's lab at Carnegie Mellon University and released as a bioRxiv preprint in November 2025, the model adapts the open-weight Qwen3-4B-Base language model through parameter-efficient LoRA fine-tuning. Rather than training a bespoke architecture from scratch, TissueNarrator transfers the sequence-modeling capacity of a general pretrained LLM to the structured, spatially-aware language of tissue biology.

This positions TissueNarrator within a growing class of single-cell and spatial foundation models that borrow the tokenize-and-generate paradigm of natural language processing. Its distinguishing move is treating spatial context as part of the generated sequence, enabling a single model to simulate realistic profiles, reason about intercellular communication, and predict the consequences of perturbations.

#Key Features

  • Spatial sentences: Tissue sections are serialized into text sequences combining ranked gene lists, spatial coordinates, and metadata, letting a standard LLM operate directly on spatial transcriptomics data.
  • Generative cell profiles: The model generates realistic cellular expression profiles conditioned on spatial and metadata context rather than only classifying existing cells.
  • Intercellular interaction prediction: TissueNarrator predicts cell-cell interactions and recovers known ligand-receptor signaling pathways from spatial neighborhoods.
  • In silico perturbation: It supports computational perturbation experiments, simulating how cellular states shift in response to gene-level changes.
  • Parameter-efficient adaptation: LoRA fine-tuning of Qwen3-4B-Base keeps training tractable while reusing a general-purpose pretrained language model.

#Technical Details

TissueNarrator is built on Qwen3-4B-Base, a 4-billion-parameter transformer language model, adapted via low-rank adaptation (LoRA) so that only a small set of additional weights are trained. Inputs are constructed as spatial sentences—per-cell ranked gene expression combined with spatial coordinates and metadata—so that autoregressive next-token prediction becomes the mechanism for generating and reasoning about tissue. The authors evaluate the approach across three spatial profiling platforms: MERFISH, Perturb-FISH, and CosMx SMI, spanning tasks of profile generation, intercellular interaction inference, ligand-receptor pathway recovery, and in silico perturbation. A pretrained LoRA checkpoint fine-tuned on a MERFISH mouse-brain dataset is distributed via Google Drive, with optional per-dataset fine-tuning supported. Training and inference in the reported experiments used an NVIDIA GPU with roughly 48 GB of VRAM.

#Applications

TissueNarrator is aimed at researchers studying tissue organization, cellular neighborhoods, and signaling in spatial transcriptomics data. By generating realistic cellular profiles and predicting intercellular interactions, it can help prioritize candidate ligand-receptor pathways, hypothesize the composition of cellular microenvironments, and screen perturbations computationally before committing to wet-lab Perturb-FISH or similar experiments. Because it adapts a general open-weight LLM with lightweight LoRA training, groups can fine-tune it on their own spatial datasets without the cost of building a model from scratch.

#Impact

TissueNarrator is an early demonstration that general-purpose large language models can be repurposed as generative engines for spatial transcriptomics through a text-based encoding of tissue. By unifying profile generation, interaction prediction, pathway recovery, and perturbation simulation in one LLM-based framework, it broadens the toolkit for spatial biology beyond descriptive analysis toward generative, hypothesis-driven modeling. As a recent preprint, its benchmarks await peer review and broader independent validation, and practical adoption is constrained by the substantial GPU memory (~48 GB VRAM) required; the open MIT-licensed code and released checkpoint nonetheless lower the barrier for other labs to reproduce and extend the approach.

Citation

TissueNarrator: Generative Modeling of Spatial Transcriptomics with Large Language Models

Preprint

Liu, S., et al. (2025) TissueNarrator: Generative Modeling of Spatial Transcriptomics with Large Language Models. bioRxiv.

DOI: 10.1101/2025.11.24.690325

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References45

GitHub

Stars5
Forks1
Open Issues0
Contributors1
Last Push3mo ago
LanguageJupyter Notebook
LicenseMIT

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
53Partial
Usability — can I run it?61
Reproducibility — can I retrain it?54
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

gene_expressiongenerativelanguage_modelperturbation_predictionspatial_transcriptomicstransfer_learningtransformer

Resources

GitHub RepositoryResearch Paper