bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
Imaging

FetalCLIP

Mohamed bin Zayed University of Artificial Intelligence / Corniche Hospital

A CLIP-based vision-language foundation model for fetal ultrasound, pretrained on 210,035 image-caption pairs for plane classification, biometry, anomaly detection, and segmentation.

Released: February 2025

FetalCLIP is a vision-language foundation model purpose-built for fetal ultrasound image analysis. Fetal ultrasound is the primary modality for monitoring pregnancy, yet automated interpretation is uniquely difficult: anatomical structures vary rapidly with gestational age, image quality is operator-dependent, and labeled data is scarce because annotation requires specialized obstetric expertise. General-purpose medical imaging models trained on radiology or pathology transfer poorly to this domain, motivating a dedicated foundation model.

Developed by researchers at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in collaboration with clinicians at Corniche Hospital (Abu Dhabi Health Services Company, SEHA) and released as a preprint in February 2025, FetalCLIP adapts the Contrastive Language-Image Pretraining (CLIP) paradigm to the fetal domain. It is pretrained on 210,035 fetal ultrasound images paired with text descriptions—described by the authors as the largest paired dataset of its kind used for foundation model development to date.

By learning a joint image-text embedding space, FetalCLIP produces universal representations that transfer across many clinically relevant downstream tasks, including in zero-shot settings where no task-specific training data is available. This positions it as a backbone for building obstetric ultrasound tools without retraining from scratch for each application.

#Key Features

  • Domain-specific contrastive pretraining: Trained with image-caption contrastive learning on 210,035 fetal ultrasound pairs, aligning ultrasound imagery with clinical text to capture fetal-specific anatomical semantics.
  • Strong zero-shot transfer: Achieves an 87.1% F1 score on zero-shot fetal plane classification, substantially outperforming the SonoNet baseline (69.9%) without task-specific fine-tuning.
  • Multi-task versatility: A single backbone supports plane classification, gestational age estimation, congenital heart defect (CHD) detection, and anatomical segmentation.
  • Extended text encoder: The text encoder accepts up to 117 tokens (versus CLIP's standard 77) to accommodate detailed clinical descriptions and captions.
  • Label-efficient: Delivers strong performance even with limited labeled data, addressing the chronic annotation bottleneck in fetal imaging.
  • Publicly released: Code and pretrained weights are available on GitHub and Hugging Face under a non-commercial (CC-BY-NC-4.0) license.

#Technical Details

FetalCLIP uses a dual-encoder CLIP architecture, initialized from a general medical-domain CLIP checkpoint and fine-tuned on fetal data using a modified OpenCLIP training pipeline. The image encoder is a ViT-L vision transformer operating on 224×224 inputs with 14×14 patches and 24 transformer layers; the text encoder has 12 transformer layers, and both project into a shared 768-dimensional embedding space. Pretraining maximizes the similarity of paired image-caption embeddings while minimizing that of unpaired examples. On downstream evaluations, FetalCLIP reaches 87.1% F1 on zero-shot plane classification, an 83.5% prediction validity rate for gestational age estimation, and 78.72% AUROC for CHD detection from four-chamber heart videos. For segmentation it attains Dice similarity coefficients of 97.92% (brain view), 81.82% (abdomen view), and 72.91% (four-chamber view).

#Applications

FetalCLIP serves as a reusable backbone for obstetric ultrasound AI, enabling automated standard-plane recognition, fetal biometry and gestational age estimation, screening for congenital heart defects, and segmentation of fetal anatomy. Because it transfers in zero-shot and low-label regimes, it is well suited to clinical and research settings where annotated fetal ultrasound data is limited—supporting sonographers and obstetricians with quality control, triage, and decision support, and giving researchers a starting point for new fetal-imaging tools without large labeled datasets.

#Impact

FetalCLIP is among the first foundation models tailored specifically to fetal ultrasound, a domain underserved by general medical imaging models. By assembling the largest known paired fetal image-text corpus and demonstrating consistent gains across classification, biometry, anomaly detection, and segmentation, it establishes a strong reference point for vision-language modeling in obstetric imaging and lowers the barrier to building data-efficient prenatal screening tools. Its public release of weights and code (under a non-commercial license) supports reproducibility and downstream research, though the license restricts commercial deployment and, as a preprint-stage model, broader clinical validation remains future work.

Citation

FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

Preprint

Maani, F., et al. (2025) FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis. arXiv.org.

DOI: 10.48550/arXiv.2502.14807

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations14
Influential0
References54

GitHub

Stars65
Forks15
Open Issues4
Contributors2
Last Push4mo ago
LanguagePython

HuggingFace

Downloads0
Likes2
Last Modified1y ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
13Closed
Usability — can I run it?14
Reproducibility — can I retrain it?9
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

classificationcontrastive_learningfoundation_modelmultimodalobstetricssegmentationtransformerultrasoundvision_transformerzero_shot

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace Model