Duke University School of Medicine
A pretrained whole-cell segmentation model for spatial transcriptomics that fuses DAPI nuclear images with RNA transcript density to recover cell boundaries.
Halo is a pretrained whole-cell segmentation model for imaging-based spatial transcriptomics, developed by Xingyuan Zhang, Haotian Zhuang, and Zhicheng Ji at the Duke University School of Medicine and posted to bioRxiv in April 2026. It addresses a persistent bottleneck in platforms such as 10x Genomics Xenium: while a DAPI stain reliably marks cell nuclei, the full cell boundary is rarely imaged directly. Most analysis pipelines default to nuclear expansion — dilating each nucleus by a fixed radius — which ignores true cell shape and frequently misassigns RNA transcripts to neighboring cells, corrupting the single-cell expression matrices that all downstream analysis depends on.
Halo reframes segmentation as a multimodal problem. Rather than relying on nuclear morphology alone, it integrates the spatial distribution of detected RNA transcripts, which trace out cytoplasmic extent, with the DAPI nuclear image. Transcript coordinates are converted into a continuous molecular density map (a "pseudoimage") that can be processed jointly with the nuclear channel inside a single segmentation network.
What distinguishes Halo from earlier approaches is that it ships as a pretrained model. It is trained once on multimodally stained Xenium samples and, according to the authors, can be applied directly to new datasets and tissue types without dataset-specific retraining — an important practical advantage for labs that lack ground-truth membrane stains for their own tissues.
Halo is built on Cellpose-SAM, a recent vision-transformer-based segmentation foundation model, and is trained on 15 multimodally stained Xenium samples spanning 12 tissue types, including human pancreatic cancer, glioblastoma, melanoma, and lung adenocarcinoma, alongside mouse brain and additional tissues. Inputs are two-channel images: the DAPI nuclear stain and an RNA pseudoimage formed by summing isotropic Gaussian kernels over transcript coordinates. Across diverse held-out tissues, Halo substantially outperforms the standard nuclear-expansion strategy, reaching a median image-based intersection-over-union (IoU) near 0.70 against ground-truth boundaries (roughly +0.15 over expansion) and higher gene-based IoU, and it improves downstream cell-type clustering agreement (ARI/AMI) in nearly all tissues. More faithful boundaries also yield more reliable morphological features such as area, aspect ratio, and roundness.
Halo targets researchers working with Xenium and similar imaging-based spatial transcriptomics data who need accurate single-cell boundaries before quantifying expression. By improving which transcripts are assigned to which cells, it produces cleaner per-cell expression profiles for cell-type annotation, spatial neighborhood analysis, and tumor microenvironment studies in cancer and neuroscience. Because it is distributed as a pretrained pipeline that ingests standard Xenium output folders and emits cell masks, it can be dropped into existing workflows without requiring labs to annotate training data for their own tissues.
Cell segmentation is a foundational and error-prone step in spatial transcriptomics, and inaccurate boundaries propagate into every downstream conclusion. By demonstrating that a transferable, transcript-aware model can outperform the ubiquitous nuclear-expansion heuristic without per-dataset retraining, Halo offers a practical drop-in improvement for the rapidly growing Xenium user base. The model, software, and training data are openly released (model and code on Hugging Face, training data on Zenodo), supporting reproducibility and adoption. As a recent preprint, its results await peer review and broader independent benchmarking across platforms and tissue types.