Self-supervised 3D cell segmentation for fluorescence microscopy using WNet3D and Swin-UNetR, achieving supervised-level performance without annotated training data.
CellSeg3D is a self-supervised 3D cell segmentation toolkit developed by the Mathis Lab at EPFL, designed to quantify cells in volumetric fluorescence microscopy data without requiring manually annotated training examples. The central challenge it addresses is a practical bottleneck familiar to anyone working with 3D imaging: generating ground-truth segmentation masks for volumetric data is extraordinarily labor-intensive, yet most high-performing segmentation methods depend on large labeled datasets. CellSeg3D sidesteps this constraint through its novel WNet3D architecture, which learns spatially coherent segmentations from raw, unlabeled 3D volumes.
The toolkit pairs WNet3D with a second model, Swin-UNetR, a supervised 3D Swin Transformer that delivers strong performance when limited labeled data is available. Together, the two models cover a range of annotation budgets — from zero labeled examples to a small curated set — making CellSeg3D adaptable to the reality of most fluorescence imaging labs. The software was published in eLife in 2024 and accompanies a newly released, fully human-annotated mesoSPIM benchmark dataset of cleared-tissue mouse brain volumes.
CellSeg3D is distributed as the napari-cellseg3d plugin, integrating directly into the napari image viewer. Researchers can train models, run inference, and review predictions interactively without writing code, or use the provided Jupyter notebook workflows for batch processing. The toolkit is open source under the MIT license and installable via PyPI.
napari-cellseg3d plugin provides a point-and-click interface for training, inference, and manual label curation, lowering the barrier to entry for wet-lab researchers.CellSeg3D provides two complementary segmentation architectures. WNet3D is a fully convolutional 3D encoder-decoder with skip connections, trained end-to-end using a weighted combination of reconstruction loss and SoftNCuts loss. The SoftNCuts objective encourages spatially coherent cluster assignments without requiring labels, analogous to spectral clustering but implemented as a differentiable neural network loss. The architecture explicitly handles anisotropic voxel dimensions, a practical requirement for light-sheet and mesoSPIM data where axial resolution differs from lateral resolution.
Swin-UNetR is built on the 3D Swin Transformer backbone, which partitions volumetric inputs into non-overlapping 3D patches and applies self-attention within shifted local windows. This design captures long-range spatial context while remaining tractable for large volumes. The UNet-style decoder combines hierarchical features via skip connections. On the mesoSPIM benchmark with 10% labeled training data, Swin-UNetR achieves an F1-score of 0.78 (±0.07), outperforming Cellpose (0.33 ±0.40) and StarDist (0.69 ±0.02). Pre-trained WNet3D with post-processing artifact filtering reaches an F1-score of 0.81 (±0.004), exceeding all tested supervised baselines — a notable result for a fully label-free approach.
CellSeg3D is suited for any volumetric fluorescence imaging workflow where annotation cost is a limiting factor. Primary use cases include neuroscience applications such as quantifying labeled neurons in cleared whole-brain light-sheet datasets (mesoSPIM, iDISCO+, CUBIC), and developmental biology applications such as segmenting nuclei in embryonic specimens imaged by confocal or light-sheet microscopy. The toolkit is also applicable to high-throughput screening contexts requiring batch cell counting across volumetric time-lapse or multi-well datasets. Its napari integration makes it particularly accessible to wet-lab researchers who need an end-to-end solution without a software engineering background, while the Jupyter notebook interface serves computational labs requiring reproducible, scriptable pipelines.
CellSeg3D addresses a genuine gap in the 3D fluorescence microscopy tooling ecosystem: prior state-of-the-art tools such as Cellpose and StarDist were primarily designed for 2D or pseudo-3D analysis, and adapting them to true volumetric data required substantial labeled training sets. By demonstrating that a self-supervised model can outperform supervised baselines across multiple 3D benchmarks, the work challenges the assumption that large annotated datasets are a prerequisite for high-quality volumetric segmentation. The release of the mesoSPIM annotated dataset provides a community benchmark that will facilitate fair comparison of future 3D segmentation methods. Key limitations to note: WNet3D produces semantic segmentation masks and relies on post-processing for instance separation of touching cells; performance on highly anisotropic confocal z-stacks may require preprocessing adjustments; and large volumes with Swin-UNetR demand substantial GPU memory, making patch-based inference the recommended approach.
Achard, C., et al. (2025) CellSeg3D, Self-supervised 3D cell segmentation for fluorescence microscopy. eLife.
DOI: 10.7554/eLife.99848