bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Imaging

CellSeg3D

Mathis Lab

Self-supervised 3D cell segmentation for fluorescence microscopy using WNet3D and Swin-UNetR, achieving supervised-level performance without annotated training data.

Released: 2024

Overview

CellSeg3D is a self-supervised 3D cell segmentation toolkit developed by the Mathis Lab at EPFL, designed to quantify cells in volumetric fluorescence microscopy data without requiring manually annotated training examples. The central challenge it addresses is a practical bottleneck familiar to anyone working with 3D imaging: generating ground-truth segmentation masks for volumetric data is extraordinarily labor-intensive, yet most high-performing segmentation methods depend on large labeled datasets. CellSeg3D sidesteps this constraint through its novel WNet3D architecture, which learns spatially coherent segmentations from raw, unlabeled 3D volumes.

The toolkit pairs WNet3D with a second model, Swin-UNetR, a supervised 3D Swin Transformer that delivers strong performance when limited labeled data is available. Together, the two models cover a range of annotation budgets — from zero labeled examples to a small curated set — making CellSeg3D adaptable to the reality of most fluorescence imaging labs. The software was published in eLife in 2024 and accompanies a newly released, fully human-annotated mesoSPIM benchmark dataset of cleared-tissue mouse brain volumes.

CellSeg3D is distributed as the napari-cellseg3d plugin, integrating directly into the napari image viewer. Researchers can train models, run inference, and review predictions interactively without writing code, or use the provided Jupyter notebook workflows for batch processing. The toolkit is open source under the MIT license and installable via PyPI.

Key Features

  • WNet3D self-supervised architecture: Adapts the classical WNet encoder-decoder to fully volumetric 3D inputs and trains using a SoftNCuts loss — a differentiable normalized graph cut objective — requiring no ground-truth labels during training.
  • Swin-UNetR supervised model: A 3D Swin Transformer with UNet-style decoder that processes volumetric patches via shifted window attention, achieving strong segmentation performance with as little as 10% of available labeled data.
  • Zero-shot generalization: Pre-trained WNet3D achieves the highest average F1-score across four independent benchmark datasets without any fine-tuning, enabling immediate deployment on new tissue types and imaging modalities.
  • napari GUI plugin: The napari-cellseg3d plugin provides a point-and-click interface for training, inference, and manual label curation, lowering the barrier to entry for wet-lab researchers.
  • Multi-modality support: Validated on mesoSPIM light-sheet, confocal, and cleared-tissue volumetric data, with preprocessing options to accommodate anisotropic voxel dimensions common in these modalities.
  • Curated benchmark dataset: Ships with a fully annotated mesoSPIM dataset of 2,632 TPH2-tdTomato neurons in cleared mouse brain tissue, providing a rigorous 3D benchmark previously absent from the field.

Technical Details

CellSeg3D provides two complementary segmentation architectures. WNet3D is a fully convolutional 3D encoder-decoder with skip connections, trained end-to-end using a weighted combination of reconstruction loss and SoftNCuts loss. The SoftNCuts objective encourages spatially coherent cluster assignments without requiring labels, analogous to spectral clustering but implemented as a differentiable neural network loss. The architecture explicitly handles anisotropic voxel dimensions, a practical requirement for light-sheet and mesoSPIM data where axial resolution differs from lateral resolution.

Swin-UNetR is built on the 3D Swin Transformer backbone, which partitions volumetric inputs into non-overlapping 3D patches and applies self-attention within shifted local windows. This design captures long-range spatial context while remaining tractable for large volumes. The UNet-style decoder combines hierarchical features via skip connections. On the mesoSPIM benchmark with 10% labeled training data, Swin-UNetR achieves an F1-score of 0.78 (±0.07), outperforming Cellpose (0.33 ±0.40) and StarDist (0.69 ±0.02). Pre-trained WNet3D with post-processing artifact filtering reaches an F1-score of 0.81 (±0.004), exceeding all tested supervised baselines — a notable result for a fully label-free approach.

Applications

CellSeg3D is suited for any volumetric fluorescence imaging workflow where annotation cost is a limiting factor. Primary use cases include neuroscience applications such as quantifying labeled neurons in cleared whole-brain light-sheet datasets (mesoSPIM, iDISCO+, CUBIC), and developmental biology applications such as segmenting nuclei in embryonic specimens imaged by confocal or light-sheet microscopy. The toolkit is also applicable to high-throughput screening contexts requiring batch cell counting across volumetric time-lapse or multi-well datasets. Its napari integration makes it particularly accessible to wet-lab researchers who need an end-to-end solution without a software engineering background, while the Jupyter notebook interface serves computational labs requiring reproducible, scriptable pipelines.

Impact

CellSeg3D addresses a genuine gap in the 3D fluorescence microscopy tooling ecosystem: prior state-of-the-art tools such as Cellpose and StarDist were primarily designed for 2D or pseudo-3D analysis, and adapting them to true volumetric data required substantial labeled training sets. By demonstrating that a self-supervised model can outperform supervised baselines across multiple 3D benchmarks, the work challenges the assumption that large annotated datasets are a prerequisite for high-quality volumetric segmentation. The release of the mesoSPIM annotated dataset provides a community benchmark that will facilitate fair comparison of future 3D segmentation methods. Key limitations to note: WNet3D produces semantic segmentation masks and relies on post-processing for instance separation of touching cells; performance on highly anisotropic confocal z-stacks may require preprocessing adjustments; and large volumes with Swin-UNetR demand substantial GPU memory, making patch-based inference the recommended approach.

Citation

CellSeg3D, Self-supervised 3D cell segmentation for fluorescence microscopy

Achard, C., et al. (2025) CellSeg3D, Self-supervised 3D cell segmentation for fluorescence microscopy. eLife.

DOI: 10.7554/eLife.99848

Metrics

GitHub

Stars119
Forks21
Open Issues4
Contributors8
Last Push1mo ago
LanguageJupyter Notebook
LicenseMIT

Citations

Total Citations0
Influential0
References32

Tags

segmentationtransformerself-supervised3D imagingmicroscopy

Resources

GitHub RepositoryResearch PaperOfficial WebsiteDocumentation