bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Language model foundation models
Language modelImagingPathology

BiMediX2

Mohamed bin Zayed University of Artificial Intelligence

A bilingual (Arabic-English) bio-medical large multimodal model built on Llama 3.1 for medical image understanding and clinical text conversation.

Released: December 2024
Parameters: 8 Billion

BiMediX2 is a bilingual (Arabic-English) bio-medical large multimodal model (LMM) designed to unify text-based and image-based medical interactions in a single conversational system. It addresses a persistent gap in clinical AI: most medical vision-language models are English-only and narrowly scoped, leaving Arabic-speaking clinicians and patients underserved and limiting cross-lingual medical reasoning. BiMediX2 supports multi-turn conversation grounded in diverse medical imaging modalities, including radiology, CT, and histology, alongside text-only medical question answering.

Developed by the Oryx research group at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), with collaborators including the University of Melbourne, the model was first released in December 2024 and later accepted to the EMNLP 2025 Findings track. It builds on the Llama 3.1 language backbone and couples it with a vision encoder, extending the earlier text-only BiMediX line into the multimodal setting.

Beyond the model itself, the authors contribute a large bilingual instruction dataset (BiMed-V) and BiMed-MBench, described as the first Arabic-English medical LMM evaluation benchmark verified by medical experts, providing the community with both training material and a means of standardized assessment.

#Key Features

  • Bilingual medical reasoning: Natively handles both Arabic and English medical conversations, narrowing the language gap in clinical AI tools for the Middle East and North Africa region.
  • Multimodal understanding: Processes radiology, CT, and histology images in addition to text, supporting medical visual question answering, report generation, and summarization within multi-turn dialogue.
  • Llama 3.1 foundation: Pairs a Meta-Llama-3.1-8B-Instruct language backbone with a CLIP-based vision encoder, trained via projector pretraining followed by LoRA finetuning.
  • Expert-verified benchmark: Introduces BiMed-MBench, a GPT-4o-graded evaluation set of 386 medical queries reviewed by clinicians to assess bilingual multimodal competence.
  • Open release: Code, multiple model checkpoints, and the BiMed-V instruction dataset are publicly available under a CC-BY-NC-SA 4.0 (research-only) license.

#Technical Details

BiMediX2 adopts a LLaVA-style architecture, connecting a CLIP vision encoder to the Meta-Llama-3.1-8B-Instruct language model through a multimodal projector. Training proceeds in two stages: projector pretraining to align visual and text representations, followed by LoRA finetuning on the BiMed-V dataset, a curated collection of roughly 1.6 million bilingual samples spanning medical conversations, report generation, and visual question answering. Image sources include PMC, SLAKE-VQA, RAD-VQA, and PATH-VQA. The collection includes an 8B variant alongside larger 70B and smaller 4B configurations. On BiMed-MBench, the model reports improvements of more than 9% in English and over 20% in Arabic relative to existing open-source models, exceeds GPT-4 by roughly 8% on USMLE, and surpasses GPT-4 by about 9% on UPHILL factual accuracy.

#Applications

BiMediX2 targets clinical and research settings where bilingual, image-grounded medical dialogue is valuable: assisting radiologists and pathologists in interpreting scans and slides, drafting structured reports, answering exam-style and patient-facing medical questions, and supporting education in Arabic-speaking healthcare contexts. By covering both languages in a single model, it benefits clinicians, students, and researchers in regions where English-only tools fall short, while its open weights enable academic study of multilingual medical multimodal reasoning.

#Impact

BiMediX2 is among the first efforts to bring Arabic-English parity to medical multimodal AI, extending the multilingual medical LLM landscape beyond English and providing reusable assets, the BiMed-V dataset and the expert-verified BiMed-MBench benchmark, that lower the barrier for future bilingual work. Its acceptance to EMNLP 2025 Findings and the release of multiple checkpoints have made it a reference point for evaluating cross-lingual clinical models. Its licensing and the authors' explicit caution that the system is research-only and not validated for clinical or commercial deployment remain important limitations for any downstream use.

Citation

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Preprint

Mullappilly, S. S., et al. (2024) BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities. Conference on Empirical Methods in Natural Language Processing.

DOI: 10.48550/arXiv.2412.07769

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations16
Influential3
References28

GitHub

Stars73
Forks8
Open Issues0
Contributors2
Last Push7mo ago
LanguagePython

HuggingFace

Downloads19
Likes8
Last Modified1y ago
Pipelineimage-text-to-text

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
11Closed
Usability — can I run it?12
Reproducibility — can I retrain it?11
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

histologyinstruction_tuninglanguage_modelmedical_question_answeringmultimodalradiologyreport_generationtransformervision_transformervisual_question_answering

Resources

GitHub RepositoryResearch PaperHuggingFace ModelDataset