Overview

Cellpose-SAM is a generalist cell segmentation model developed at HHMI Janelia Research Campus that grafts the pretrained ViT-L image encoder from Meta's Segment Anything Model (SAM) onto the Cellpose flow-field prediction framework. Released as Cellpose v4, it is the first generalist model to consistently outperform average human annotators on the Cellpose benchmark, achieving an error rate of 0.163 — well below the inter-annotator agreement level of 0.257 and representing a 44% improvement over its predecessor Cellpose3.

The central innovation is the pairing of SAM's large-scale visual representations with the proven Cellpose inference mechanism. SAM's ViT-L encoder, originally trained on over one billion natural image masks, supplies rich, transferable feature representations that generalize across microscopy modalities without requiring domain-specific pretraining from scratch. The Cellpose convolutional decoder translates those representations into per-pixel gradient flows and cell probability maps, which are then converted into instance segmentation masks through Cellpose's established dynamics-based postprocessing.

Cellpose-SAM was released in May 2025 and is distributed as part of the Cellpose ecosystem, maintaining full backward compatibility with the existing GUI, fine-tuning API, image restoration pipeline, and 3D segmentation tools. It is available under a CC-BY-NC license reflecting the non-commercial terms of the SAM pretrained weights.

Key Features

Superhuman segmentation accuracy: Achieves an error rate of 0.163 on the Cellpose benchmark, crossing the inter-annotator agreement threshold of 0.257 and setting a new state of the art for generalist biological cell segmentation.
SAM ViT-L backbone: The 312-million-parameter Vision Transformer encoder, pretrained on large-scale natural image data, provides transferable representations that generalize across diverse cell types and imaging modalities without task-specific pretraining.
Cellpose flow-field decoder: A lightweight convolutional head converts SAM encoder outputs into the gradient flow fields and cell probability maps that drive Cellpose's instance segmentation dynamics, preserving the interpretability and reliability of the original framework.
Built-in degradation robustness: Multi-degradation training — incorporating Poisson noise, Gaussian blur, anisotropic blur, and downsampling applied to 50% of training images — produces a model that handles poor-quality acquisitions without requiring a separate image restoration step.
Drop-in ecosystem integration: Functions as a direct replacement for prior Cellpose models within the Cellpose GUI, Python API, fine-tuning workflow, and 3D segmentation pipeline, minimizing adoption friction.

Technical Details

Cellpose-SAM combines two components: the ViT-L image encoder from SAM, consisting of 24 transformer blocks with a 1024-dimensional embedding, and a two-branch Cellpose convolutional decoder that predicts gradient flows and cell probability independently. The ViT-L encoder accounts for approximately 305 million of the model's 312 million total parameters. Training used 22,826 microscopy images drawn from 20 public segmentation datasets, encompassing approximately 3.34 million annotated cell regions of interest. This breadth of training data spans fluorescence, phase-contrast, and brightfield modalities across a wide variety of cell and tissue types.

Data augmentation included random resizing between 0.25x and 4x scale, random channel permutation, and the four image degradation types described above. The model was evaluated on the Cellpose generalist benchmark, where it achieved an error rate of 0.163 compared to 0.257 for average human annotators and 0.292 for Cellpose3. Because the SAM weights are released under a non-commercial license, Cellpose-SAM inherits a CC-BY-NC restriction; researchers requiring commercial use should consult the Cellpose3 model or fine-tune from a permissively licensed backbone.

Applications

Cellpose-SAM is designed as a general-purpose default for light microscopy cell segmentation. Cell biologists processing large-scale imaging screens benefit from its high out-of-the-box accuracy across diverse cell lines and staining protocols, reducing the manual correction burden that limits throughput with earlier models. Developmental biologists and neuroscientists working with morphologically complex cells — neurons, organoids, or heterogeneous tissue sections — benefit from the model's improved handling of irregular shapes. Researchers with degraded or low-quality image data gain robustness that previously required preprocessing pipelines. Because Cellpose-SAM integrates directly into the existing Cellpose ecosystem, any lab already using Cellpose can adopt it with minimal workflow changes, and the fine-tuning API enables specialization for unusual cell morphologies not well represented in the training set.

Impact

Cellpose-SAM establishes a new performance ceiling for generalist biological image segmentation by being the first model in this category to cross the human inter-annotator agreement threshold on a standard benchmark. This milestone is practically significant: it implies that for typical microscopy data, automated segmentation with Cellpose-SAM will produce results indistinguishable from those of a skilled human annotator, substantially reducing the manual curation that has long been a bottleneck in high-content imaging pipelines. The model builds on the already-widespread adoption of the Cellpose family, which has accumulated widespread use across the cell biology community since its initial release. A key limitation is the CC-BY-NC license restriction arising from the SAM backbone, which excludes commercial applications. Additionally, the model targets 2D and 3D light microscopy; cryo-EM or other specialized modalities may require domain-specific fine-tuning to achieve comparable accuracy.

Overview

Key Features

Superhuman segmentation accuracy: Achieves an error rate of 0.163 on the Cellpose benchmark, crossing the inter-annotator agreement threshold of 0.257 and setting a new state of the art for generalist biological cell segmentation.

SAM ViT-L backbone: The 312-million-parameter Vision Transformer encoder, pretrained on large-scale natural image data, provides transferable representations that generalize across diverse cell types and imaging modalities without task-specific pretraining.

Cellpose flow-field decoder: A lightweight convolutional head converts SAM encoder outputs into the gradient flow fields and cell probability maps that drive Cellpose's instance segmentation dynamics, preserving the interpretability and reliability of the original framework.

Built-in degradation robustness: Multi-degradation training — incorporating Poisson noise, Gaussian blur, anisotropic blur, and downsampling applied to 50% of training images — produces a model that handles poor-quality acquisitions without requiring a separate image restoration step.

Drop-in ecosystem integration: Functions as a direct replacement for prior Cellpose models within the Cellpose GUI, Python API, fine-tuning workflow, and 3D segmentation pipeline, minimizing adoption friction.

Technical Details

Applications

Impact

Cellpose-SAM

Overview

Key Features

Technical Details

Applications

Impact

Citation

Cellpose-SAM: superhuman generalization for cellular segmentation

Metrics

GitHub

Citations

Tags

Resources

Cellpose-SAM

Overview

Key Features

Technical Details

Applications

Impact

Citation

Cellpose-SAM: superhuman generalization for cellular segmentation

Metrics

GitHub

Citations

Tags

Resources