United Imaging Intelligence / ShanghaiTech University
A general CT image segmentation foundation model that uses task-prompted automatic pathway decoding to segment 83 anatomical structures and lesions across whole-body CT.
Computed tomography (CT) is one of the most widely used diagnostic imaging modalities, and automated segmentation of organs, vessels, bones, and lesions is a foundational step for quantitative analysis, treatment planning, and disease monitoring. The conventional approach trains a separate deep-learning model for each anatomical target or lesion type, which ignores the rich anatomical and contextual relationships shared across tasks and makes broad clinical deployment cumbersome. gCIS (general CT Image Segmentation) addresses this fragmentation by jointly learning a wide range of segmentation tasks within a single model.
Developed by researchers at United Imaging Intelligence and ShanghaiTech University (Dinggang Shen's lab) and published in Communications Engineering in October 2024, gCIS demonstrates that a general model trained across many tasks can match or exceed the accuracy of dedicated task-specific networks. The team led by Xi Ouyang and Dongdong Gu assembled one of the largest CT segmentation collections to date and showed that shared representations transfer across organs, tubular structures, and tumors.
gCIS fits into the emerging class of "universal" medical image segmentation models alongside efforts such as MedSAM and the MSD-trained nnU-Net family, but distinguishes itself with an explicit task-prompt mechanism and learnable feature routing that let a single network specialize on demand without separate weights per task.
gCIS couples a 3D Swin Transformer image encoder (four stages, 2×2×2 patches, ~8M parameters) with a pre-trained language model text encoder (~63M parameters, 12 layers) that embeds task prompts. Decoding is handled by seven Automatic Pathway modules, each containing learnable routing layers over M=6 sub-pathways, enabling task-conditioned feature selection. The model was trained on 36,419 CT scans carrying 64,674 annotated masks across 83 segmentation tasks (32,170 scans for training, 4,249 for testing), totaling more than 11 million slices of whole-body anatomy and pathology. Across all 83 tasks gCIS reaches an average Dice coefficient of 82.84%, with representative results including 92.59% on stomach and 80.75% on lung tumor — both ahead of a strongly tuned nnU-Net baseline — and consistently lower standard deviations, indicating greater robustness.
gCIS is designed for radiology and clinical research workflows that require segmenting many different structures from CT volumes, such as automated organ-at-risk delineation for radiotherapy planning, tumor burden quantification, vascular analysis, and large-scale anatomical labeling for downstream biomarker studies. Because a single deployed model handles dozens of targets through task prompts, it simplifies integration into PACS and treatment-planning pipelines compared with maintaining many separate networks, benefiting radiologists, radiation oncologists, and imaging researchers.
By showing that joint multi-task training improves rather than dilutes per-task accuracy, gCIS strengthens the case for general-purpose CT segmentation models over fragmented task-specific pipelines. Its combination of language-driven task prompting and learnable routing offers a scalable template for adding new structures without retraining bespoke models, and its release of code and weights supports reproduction and extension. Practical adoption is constrained by data-sharing limits — only partial training data could be released due to hospital privacy regulations — and by a non-commercial license, but the model remains a notable demonstration of unified whole-body CT segmentation at scale.
Xi, O., et al. (2024) Towards a general computed tomography image segmentation model for anatomical structures and lesions. Communications Engineer.
DOI: 10.1038/s44172-024-00287-0Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data