An LLM-based generative model that treats spatially resolved transcriptomes as 'spatial sentences' to simulate cellular profiles, predict cell-cell interactions, and run in silico perturbations.
TissueNarrator reframes the analysis of spatially resolved transcriptomics as a language modeling problem. Spatial transcriptomics technologies such as MERFISH and CosMx SMI measure gene expression while preserving the physical location of each cell within a tissue section, producing rich maps of cellular state and neighborhood context. Making generative, predictive use of this data—rather than merely descriptive clustering—has remained difficult because cellular profiles, spatial coordinates, and metadata do not naturally fit a single model. TissueNarrator addresses this by encoding each tissue section as a sequence of "spatial sentences": ranked lists of expressed genes paired with spatial coordinates and metadata, rendered as text that a large language model can read and generate.
Developed by Jian Ma's lab at Carnegie Mellon University and released as a bioRxiv preprint in November 2025, the model adapts the open-weight Qwen3-4B-Base language model through parameter-efficient LoRA fine-tuning. Rather than training a bespoke architecture from scratch, TissueNarrator transfers the sequence-modeling capacity of a general pretrained LLM to the structured, spatially-aware language of tissue biology.
This positions TissueNarrator within a growing class of single-cell and spatial foundation models that borrow the tokenize-and-generate paradigm of natural language processing. Its distinguishing move is treating spatial context as part of the generated sequence, enabling a single model to simulate realistic profiles, reason about intercellular communication, and predict the consequences of perturbations.
TissueNarrator is built on Qwen3-4B-Base, a 4-billion-parameter transformer language model, adapted via low-rank adaptation (LoRA) so that only a small set of additional weights are trained. Inputs are constructed as spatial sentences—per-cell ranked gene expression combined with spatial coordinates and metadata—so that autoregressive next-token prediction becomes the mechanism for generating and reasoning about tissue. The authors evaluate the approach across three spatial profiling platforms: MERFISH, Perturb-FISH, and CosMx SMI, spanning tasks of profile generation, intercellular interaction inference, ligand-receptor pathway recovery, and in silico perturbation. A pretrained LoRA checkpoint fine-tuned on a MERFISH mouse-brain dataset is distributed via Google Drive, with optional per-dataset fine-tuning supported. Training and inference in the reported experiments used an NVIDIA GPU with roughly 48 GB of VRAM.
TissueNarrator is aimed at researchers studying tissue organization, cellular neighborhoods, and signaling in spatial transcriptomics data. By generating realistic cellular profiles and predicting intercellular interactions, it can help prioritize candidate ligand-receptor pathways, hypothesize the composition of cellular microenvironments, and screen perturbations computationally before committing to wet-lab Perturb-FISH or similar experiments. Because it adapts a general open-weight LLM with lightweight LoRA training, groups can fine-tune it on their own spatial datasets without the cost of building a model from scratch.
TissueNarrator is an early demonstration that general-purpose large language models can be repurposed as generative engines for spatial transcriptomics through a text-based encoding of tissue. By unifying profile generation, interaction prediction, pathway recovery, and perturbation simulation in one LLM-based framework, it broadens the toolkit for spatial biology beyond descriptive analysis toward generative, hypothesis-driven modeling. As a recent preprint, its benchmarks await peer review and broader independent validation, and practical adoption is constrained by the substantial GPU memory (~48 GB VRAM) required; the open MIT-licensed code and released checkpoint nonetheless lower the barrier for other labs to reproduce and extend the approach.
Liu, S., et al. (2025) TissueNarrator: Generative Modeling of Spatial Transcriptomics with Large Language Models. bioRxiv.
DOI: 10.1101/2025.11.24.690325Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data