Halo

Whole-cell segmentation model for spatial transcriptomics that fuses DAPI nuclear images with RNA transcript density to recover true cell boundaries.

Released: April 2026

Halo is a pretrained whole-cell segmentation model for imaging-based spatial transcriptomics, developed by Xingyuan Zhang, Haotian Zhuang, and Zhicheng Ji at the Duke University School of Medicine and posted to bioRxiv in April 2026. It addresses a persistent bottleneck in platforms such as 10x Genomics Xenium: while a DAPI stain reliably marks cell nuclei, the full cell boundary is rarely imaged directly. Most analysis pipelines default to nuclear expansion — dilating each nucleus by a fixed radius — which ignores true cell shape and frequently misassigns RNA transcripts to neighboring cells, corrupting the single-cell expression matrices that all downstream analysis depends on.

Halo reframes segmentation as a multimodal problem. Rather than relying on nuclear morphology alone, it integrates the spatial distribution of detected RNA transcripts, which trace out cytoplasmic extent, with the DAPI nuclear image. Transcript coordinates are converted into a continuous molecular density map (a "pseudoimage") that can be processed jointly with the nuclear channel inside a single segmentation network.

What distinguishes Halo from earlier approaches is that it ships as a pretrained model. It is trained once on multimodally stained Xenium samples and, according to the authors, can be applied directly to new datasets and tissue types without dataset-specific retraining — an important practical advantage for labs that lack ground-truth membrane stains for their own tissues.

Key Features

Multimodal fusion of nuclei and transcripts: Combines the DAPI nuclear channel with a transcript-derived molecular density map, using RNA signal to recover cytoplasmic boundaries that nuclear staining alone cannot reveal.
Transcript pseudoimages: Converts discrete transcript coordinates into a smooth density representation via Gaussian kernels (σ = 2.5), turning point-cloud molecular data into an image-compatible channel.
Cellpose-SAM backbone: Builds on the Cellpose-SAM segmentation foundation model, fine-tuned on two-channel (DAPI + transcript density) inputs for whole-cell boundary prediction.
Pretrained and transferable: Generalizes to new datasets and tissue types out of the box, removing the need for per-dataset annotation or retraining.
Improved RNA-to-cell assignment: Recovers cell shape more faithfully than nuclear expansion, yielding more accurate transcript assignment and cleaner cell-type clustering.

Technical Details

Halo is built on Cellpose-SAM, a recent vision-transformer-based segmentation foundation model, and is trained on 15 multimodally stained Xenium samples spanning 12 tissue types, including human pancreatic cancer, glioblastoma, melanoma, and lung adenocarcinoma, alongside mouse brain and additional tissues. Inputs are two-channel images: the DAPI nuclear stain and an RNA pseudoimage formed by summing isotropic Gaussian kernels over transcript coordinates. Across diverse held-out tissues, Halo substantially outperforms the standard nuclear-expansion strategy, reaching a median image-based intersection-over-union (IoU) near 0.70 against ground-truth boundaries (roughly +0.15 over expansion) and higher gene-based IoU, and it improves downstream cell-type clustering agreement (ARI/AMI) in nearly all tissues. More faithful boundaries also yield more reliable morphological features such as area, aspect ratio, and roundness.

Applications

Halo targets researchers working with Xenium and similar imaging-based spatial transcriptomics data who need accurate single-cell boundaries before quantifying expression. By improving which transcripts are assigned to which cells, it produces cleaner per-cell expression profiles for cell-type annotation, spatial neighborhood analysis, and tumor microenvironment studies in cancer and neuroscience. Because it is distributed as a pretrained pipeline that ingests standard Xenium output folders and emits cell masks, it can be dropped into existing workflows without requiring labs to annotate training data for their own tissues.

Impact

Cell segmentation is a foundational and error-prone step in spatial transcriptomics, and inaccurate boundaries propagate into every downstream conclusion. By demonstrating that a transferable, transcript-aware model can outperform the ubiquitous nuclear-expansion heuristic without per-dataset retraining, Halo offers a practical drop-in improvement for the rapidly growing Xenium user base. The model, software, and training data are openly released (model and code on Hugging Face, training data on Zenodo), supporting reproducibility and adoption. As a recent preprint, its results await peer review and broader independent benchmarking across platforms and tissue types.

Citation

Halo: a pretrained model for whole-cell segmentation from nuclei images in spatial transcriptomics

Zhang, X., et al. (2026) Halo: a pretrained model for whole-cell segmentation from nuclei images in spatial transcriptomics. bioRxiv.

DOI: 10.64898/2026.04.02.716237

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References30

HuggingFace

Downloads0

Likes0

Last Modified3mo ago

Pipelineimage-segmentation

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

63Partial

Usability — can I run it?94

Reproducibility — can I retrain it?39

open weights, closed recipe

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper HuggingFace Model Dataset

Key Features

Multimodal fusion of nuclei and transcripts: Combines the DAPI nuclear channel with a transcript-derived molecular density map, using RNA signal to recover cytoplasmic boundaries that nuclear staining alone cannot reveal.

Transcript pseudoimages: Converts discrete transcript coordinates into a smooth density representation via Gaussian kernels (σ = 2.5), turning point-cloud molecular data into an image-compatible channel.

Cellpose-SAM backbone: Builds on the Cellpose-SAM segmentation foundation model, fine-tuned on two-channel (DAPI + transcript density) inputs for whole-cell boundary prediction.

Pretrained and transferable: Generalizes to new datasets and tissue types out of the box, removing the need for per-dataset annotation or retraining.

Improved RNA-to-cell assignment: Recovers cell shape more faithfully than nuclear expansion, yielding more accurate transcript assignment and cleaner cell-type clustering.

Technical Details

Applications

Impact

Halo

Key Features

Technical Details

Applications

Impact

Citation

Halo: a pretrained model for whole-cell segmentation from nuclei images in spatial transcriptomics

Recent citations

Top citations

Citations

HuggingFace

Fields of citing research

Openness

Tags

Resources

Halo

Key Features

Technical Details

Applications

Impact

Citation

Halo: a pretrained model for whole-cell segmentation from nuclei images in spatial transcriptomics

Recent citations

Top citations

Citations

HuggingFace

Fields of citing research

Openness

Tags

Resources

Halo

#Key Features

#Technical Details

#Applications

#Impact

Citation

Halo: a pretrained model for whole-cell segmentation from nuclei images in spatial transcriptomics

Recent citations

Top citations

Related models

Citations

HuggingFace

Fields of citing research

Openness

Tags

Resources

Halo

#Key Features

#Technical Details

#Applications

#Impact

Citation

Halo: a pretrained model for whole-cell segmentation from nuclei images in spatial transcriptomics

Recent citations

Top citations

Related models

Citations

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact