Mohamed bin Zayed University of Artificial Intelligence
A bilingual (Arabic-English) bio-medical large multimodal model built on Llama 3.1 for medical image understanding and clinical text conversation.
BiMediX2 is a bilingual (Arabic-English) bio-medical large multimodal model (LMM) designed to unify text-based and image-based medical interactions in a single conversational system. It addresses a persistent gap in clinical AI: most medical vision-language models are English-only and narrowly scoped, leaving Arabic-speaking clinicians and patients underserved and limiting cross-lingual medical reasoning. BiMediX2 supports multi-turn conversation grounded in diverse medical imaging modalities, including radiology, CT, and histology, alongside text-only medical question answering.
Developed by the Oryx research group at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), with collaborators including the University of Melbourne, the model was first released in December 2024 and later accepted to the EMNLP 2025 Findings track. It builds on the Llama 3.1 language backbone and couples it with a vision encoder, extending the earlier text-only BiMediX line into the multimodal setting.
Beyond the model itself, the authors contribute a large bilingual instruction dataset (BiMed-V) and BiMed-MBench, described as the first Arabic-English medical LMM evaluation benchmark verified by medical experts, providing the community with both training material and a means of standardized assessment.
BiMediX2 adopts a LLaVA-style architecture, connecting a CLIP vision encoder to the Meta-Llama-3.1-8B-Instruct language model through a multimodal projector. Training proceeds in two stages: projector pretraining to align visual and text representations, followed by LoRA finetuning on the BiMed-V dataset, a curated collection of roughly 1.6 million bilingual samples spanning medical conversations, report generation, and visual question answering. Image sources include PMC, SLAKE-VQA, RAD-VQA, and PATH-VQA. The collection includes an 8B variant alongside larger 70B and smaller 4B configurations. On BiMed-MBench, the model reports improvements of more than 9% in English and over 20% in Arabic relative to existing open-source models, exceeds GPT-4 by roughly 8% on USMLE, and surpasses GPT-4 by about 9% on UPHILL factual accuracy.
BiMediX2 targets clinical and research settings where bilingual, image-grounded medical dialogue is valuable: assisting radiologists and pathologists in interpreting scans and slides, drafting structured reports, answering exam-style and patient-facing medical questions, and supporting education in Arabic-speaking healthcare contexts. By covering both languages in a single model, it benefits clinicians, students, and researchers in regions where English-only tools fall short, while its open weights enable academic study of multilingual medical multimodal reasoning.
BiMediX2 is among the first efforts to bring Arabic-English parity to medical multimodal AI, extending the multilingual medical LLM landscape beyond English and providing reusable assets, the BiMed-V dataset and the expert-verified BiMed-MBench benchmark, that lower the barrier for future bilingual work. Its acceptance to EMNLP 2025 Findings and the release of multiple checkpoints have made it a reference point for evaluating cross-lingual clinical models. Its licensing and the authors' explicit caution that the system is research-only and not validated for clinical or commercial deployment remain important limitations for any downstream use.
Mullappilly, S. S., et al. (2024) BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities. Conference on Empirical Methods in Natural Language Processing.
DOI: 10.48550/arXiv.2412.07769Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data