Universal cell segmentation model adapting Meta's SAM for biology. Segments mammalian cells, yeast, and bacteria across diverse imaging modalities with human-level accuracy.
CellSAM is a foundation model for universal cell segmentation developed by the Van Valen Lab at Caltech. It adapts Meta's Segment Anything Model (SAM) for the cellular imaging domain by pairing it with a purpose-built object detector, CellFinder, that automatically generates bounding-box prompts for SAM's mask decoder. The result is a single model capable of segmenting cells across mammalian tissue culture, yeast, and bacteria imaged with fluorescence microscopy, brightfield, phase contrast, and H&E histology — without domain-specific retraining.
The central challenge CellSAM addresses is the fragmentation of the cell segmentation landscape: prior tools were typically tuned for specific cell types or imaging modalities, requiring researchers to train or select a new model for each experimental context. CellSAM was designed to replace this patchwork with a single general-purpose system that generalizes across the diversity of cell biology. First posted as a preprint in November 2023 and subsequently published in Nature Methods in 2025, the work demonstrated that this universal approach could match human expert performance across benchmark datasets spanning five imaging archetypes.
The model's design is notable for how it adapts natural-image foundation model representations to microscopy without full retraining. By freezing SAM's Vision Transformer (ViT) encoder and fine-tuning only the connecting neck layers, CellSAM preserves SAM's broad generalization while acquiring sensitivity to cellular morphology. CellFinder's automated detection step removes the requirement for user-supplied prompts, making the full pipeline end-to-end and suitable for high-throughput analysis.
CellSAM is a two-component architecture. CellFinder, the detection stage, is built on the Anchor DETR framework and shares SAM's ViT backbone for feature extraction. It is trained first to predict bounding boxes around individual cells; once trained, its ViT weights are frozen. SAM then uses these bounding boxes as spatial prompts, with fine-tuning applied only to the neck — the layers connecting SAM's frozen ViT encoder to its mask decoder. This targeted fine-tuning adapts SAM's representations to cellular imaging without sacrificing the cross-modality robustness learned on natural images.
Training data spans five broad imaging archetypes assembled by the Van Valen Lab: histological tissue sections including H&E preparations, mammalian cell culture across fluorescence and transmitted-light modalities, budding and fission yeast, and bacterial cells from diverse species. Benchmark evaluation demonstrated human-level segmentation across these archetypes, with zero-shot performance on held-out conditions and few-shot adaptation capability for domains outside the training distribution. Specific instance segmentation metrics (e.g., mean average precision across IoU thresholds) are reported in the Nature Methods publication.
CellSAM is suited for any workflow requiring robust, scalable cell segmentation without per-dataset model development. High-content screening operations benefit from automated segmentation across large image collections and diverse cell lines. Spatial transcriptomics pipelines use CellSAM for cell boundary delineation prior to gene expression assignment. Live-cell and time-lapse experiments can apply the model for consistent segmentation across frames without manual intervention. Microbiologists gain a tool for segmenting bacteria in phase contrast and fluorescence images where dedicated models are scarce. Pathologists and computational pathology researchers can apply CellSAM to instance segmentation of cells in histological sections. The web server at cellsam.deepcell.org provides access without local installation, lowering the barrier for wet-lab researchers unfamiliar with deep learning infrastructure.
CellSAM's publication in Nature Methods marks a meaningful step toward consolidating the fragmented cell segmentation tool ecosystem. By demonstrating that a single foundation model can match expert performance across the major archetypes of cellular imaging, the work challenges the established practice of developing and maintaining separate models for each imaging context. The few-shot adaptation capability has practical implications for rare cell types or novel imaging setups where large labeled datasets are unavailable. A key limitation is that CellSAM is designed for instance segmentation of individual cells and is not suited for semantic tissue segmentation tasks lacking discrete cell boundaries; performance may also degrade on imaging conditions far outside the training distribution or in images with dense, heavily overlapping cells where CellFinder detection is challenged. The open-source release on GitHub and the accessible web server have contributed to adoption across both computational and experimental biology communities.
Marks, M., et al. (2025) CellSAM: a foundation model for cell segmentation. Nature Methods.
DOI: 10.1038/s41592-025-02879-w