Generalist cell segmentation model combining SAM's ViT-L backbone with Cellpose flow fields. First model to surpass average human annotators on the Cellpose benchmark.
Cellpose-SAM is a generalist cell segmentation model developed at HHMI Janelia Research Campus that grafts the pretrained ViT-L image encoder from Meta's Segment Anything Model (SAM) onto the Cellpose flow-field prediction framework. Released as Cellpose v4, it is the first generalist model to consistently outperform average human annotators on the Cellpose benchmark, achieving an error rate of 0.163 — well below the inter-annotator agreement level of 0.257 and representing a 44% improvement over its predecessor Cellpose3.
The central innovation is the pairing of SAM's large-scale visual representations with the proven Cellpose inference mechanism. SAM's ViT-L encoder, originally trained on over one billion natural image masks, supplies rich, transferable feature representations that generalize across microscopy modalities without requiring domain-specific pretraining from scratch. The Cellpose convolutional decoder translates those representations into per-pixel gradient flows and cell probability maps, which are then converted into instance segmentation masks through Cellpose's established dynamics-based postprocessing.
Cellpose-SAM was released in May 2025 and is distributed as part of the Cellpose ecosystem, maintaining full backward compatibility with the existing GUI, fine-tuning API, image restoration pipeline, and 3D segmentation tools. It is available under a CC-BY-NC license reflecting the non-commercial terms of the SAM pretrained weights.
Cellpose-SAM combines two components: the ViT-L image encoder from SAM, consisting of 24 transformer blocks with a 1024-dimensional embedding, and a two-branch Cellpose convolutional decoder that predicts gradient flows and cell probability independently. The ViT-L encoder accounts for approximately 305 million of the model's 312 million total parameters. Training used 22,826 microscopy images drawn from 20 public segmentation datasets, encompassing approximately 3.34 million annotated cell regions of interest. This breadth of training data spans fluorescence, phase-contrast, and brightfield modalities across a wide variety of cell and tissue types.
Data augmentation included random resizing between 0.25x and 4x scale, random channel permutation, and the four image degradation types described above. The model was evaluated on the Cellpose generalist benchmark, where it achieved an error rate of 0.163 compared to 0.257 for average human annotators and 0.292 for Cellpose3. Because the SAM weights are released under a non-commercial license, Cellpose-SAM inherits a CC-BY-NC restriction; researchers requiring commercial use should consult the Cellpose3 model or fine-tune from a permissively licensed backbone.
Cellpose-SAM is designed as a general-purpose default for light microscopy cell segmentation. Cell biologists processing large-scale imaging screens benefit from its high out-of-the-box accuracy across diverse cell lines and staining protocols, reducing the manual correction burden that limits throughput with earlier models. Developmental biologists and neuroscientists working with morphologically complex cells — neurons, organoids, or heterogeneous tissue sections — benefit from the model's improved handling of irregular shapes. Researchers with degraded or low-quality image data gain robustness that previously required preprocessing pipelines. Because Cellpose-SAM integrates directly into the existing Cellpose ecosystem, any lab already using Cellpose can adopt it with minimal workflow changes, and the fine-tuning API enables specialization for unusual cell morphologies not well represented in the training set.
Cellpose-SAM establishes a new performance ceiling for generalist biological image segmentation by being the first model in this category to cross the human inter-annotator agreement threshold on a standard benchmark. This milestone is practically significant: it implies that for typical microscopy data, automated segmentation with Cellpose-SAM will produce results indistinguishable from those of a skilled human annotator, substantially reducing the manual curation that has long been a bottleneck in high-content imaging pipelines. The model builds on the already-widespread adoption of the Cellpose family, which has accumulated widespread use across the cell biology community since its initial release. A key limitation is the CC-BY-NC license restriction arising from the SAM backbone, which excludes commercial applications. Additionally, the model targets 2D and 3D light microscopy; cryo-EM or other specialized modalities may require domain-specific fine-tuning to achieve comparable accuracy.
Pachitariu, M., et al. (2025) Cellpose-SAM: superhuman generalization for cellular segmentation. bioRxiv.
DOI: 10.1101/2025.04.28.651001