All Competitors
Every biological foundation model, evaluated and ranked by the bio.rodeo team
Showing 1–24 of 35 filtered models
MIMO
1223—A medical vision-language model that accepts visual-referring multimodal input and produces pixel-grounded multimodal output, jointly answering and segmenting medical images.
ImagingLanguage model11OpennessSigPhi-Med
583Chongqing University of TechnologyJuly 1, 2025histologyinstruction_tuningmedical_image_understanding+5A 4.2B-parameter lightweight biomedical vision-language assistant built on Phi-2 that outperforms larger LLaVA-Med models on medical visual question answering.
ImagingLanguage model15OpennessEyeCLIP
8147—The Hong Kong Polytechnic University +5 othersJune 21, 2025clipcontrastive_learningcross_modal_retrieval+11A CLIP-based visual-language foundation model for multi-modal ophthalmic imaging, enabling zero-shot disease detection across 11 modalities including fundus, OCT, and slit-lamp.
ImagingLanguage model15OpennessUniBiomed
619215Hong Kong University of Science and Technology +2 othersApril 30, 2025foundation_modelhistologymultimodal+6Universal foundation model that jointly generates diagnostic text and segments the corresponding targets across ten biomedical imaging modalities.
ImagingLanguage model64OpennessGMAI-VL-R1
1828—A reinforcement-learning-enhanced general medical vision-language model that adds step-by-step reasoning for medical image diagnosis and visual question answering.
ImagingLanguage model17OpennessMedVLM-R1
29171419A 2B-parameter medical vision-language model that uses reinforcement learning (GRPO) to produce explicit, human-interpretable reasoning for radiology visual question answering.
ImagingLanguage model83OpennessHealthGPT
1.6K10022Zhejiang University +4 othersFebruary 14, 2025histologyimage_reconstructionmedical_image_generation+7Medical large vision-language model unifying image comprehension and generation in one autoregressive framework via heterogeneous LoRA knowledge adaptation.
PathologyImaging68OpennessEndoChat
504219Chinese University of Hong Kong +5 othersJanuary 20, 2025endoscopygrounded_dialogueinstruction_tuning+6Grounded multimodal large language model for endoscopic surgery, supporting visual dialogue, region-based question answering, and bounding-box grounding across surgical scene understanding tasks.
ImagingLanguage model22OpennessMUSK
229246—A vision-language foundation model for precision oncology that pretrains on 50M pathology images and 1B text tokens via unified masked modeling.
PathologyLanguage model12OpennessBiMediX2
731620Mohamed bin Zayed University of Artificial IntelligenceDecember 10, 2024histologyinstruction_tuninglanguage_model+7A bilingual (Arabic-English) bio-medical large multimodal model built on Llama 3.1 for medical image understanding and clinical text conversation.
Language modelImagingPathology11OpennessMedRegA
452613Hong Kong University of Science and Technology +1 otherOctober 24, 2024histologyimage_classificationinstruction_tuning+8Region-aware bilingual (Chinese-English) medical multimodal LLM that handles image- and region-level vision-language tasks across eight imaging modalities.
PathologyLanguage model65OpennessBiomedGPT
709373—Open-source, lightweight generalist vision-language foundation model for diverse biomedical imaging and text tasks.
Language modelImagingPathology33OpennessLLaVA-Tri
409843A medical multimodal large language model pretrained on the 25M-image MedTrinity-25M dataset, achieving state-of-the-art accuracy on biomedical visual question answering.
Language modelPathology30OpennessPathChat
—420—A multimodal vision-language copilot for human pathology that analyzes histology images and answers diverse pathology queries in natural language.
PathologyLanguage model35OpennessHuatuoGPT-Vision
3991871.9KShenzhen Research Institute of Big Data +1 otherJune 27, 2024histologyinstruction_tuningmedical_image_understanding+5A family of open medical multimodal LLMs (7B and 34B) trained on PubMedVision, a 1.3M-sample medical VQA dataset distilled from PubMed image-text pairs.
PathologyLanguage model52OpennessMAIRA-2
—1393.3KMicrosoft Research multimodal LLM for grounded chest X-ray report generation, localizing each described finding with bounding boxes on the image.
ImagingLanguage model35OpennessEyeFound
—40—The Hong Kong Polytechnic University +3 othersMay 18, 2024disease_diagnosisfoundation_modelmasked_autoencoder+7A multimodal generalist foundation model for ophthalmic imaging, self-supervised on 2.78M images across 11 modalities for diagnosis, prognosis, and visual question answering.
Imaging4OpennessMedDr
982655Hong Kong University of Science and TechnologyApril 23, 2024foundation_modelhistologyinstruction_tuning+7A 40B-parameter generalist medical vision-language foundation model spanning radiology, pathology, dermatology, retinography, and endoscopy.
ImagingLanguage model69OpennessMed-MoE
15882—A lightweight mixture-of-experts medical vision-language model that routes between domain-specific experts for VQA and image classification while activating only 30-50% of parameters.
ImagingLanguage modelPathology81OpennessM3D
442159873A multimodal large language model for 3D medical imaging, handling retrieval, report generation, VQA, positioning, and segmentation on CT volumes.
ImagingLanguage model77OpennessCheXagent
226711.3KAn instruction-tuned vision-language foundation model from Stanford for interpreting and summarizing chest X-rays across eight clinical task types.
ImagingLanguage model32OpennessQilin-Med-VL
65816The first Chinese medical large vision-language model, pairing a pretrained ViT with an LLM to interpret medical images and answer clinical questions in Chinese.
Language modelPathology44Openness