BiMediX2

Mohamed bin Zayed University of Artificial Intelligence

Bilingual Arabic-English medical multimodal model built on Llama 3.1 for radiology, CT, and histology image understanding and question answering.

Released: December 2024

Parameters: 8 Billion

BiMediX2 is a bilingual (Arabic-English) bio-medical large multimodal model (LMM) designed to unify text-based and image-based medical interactions in a single conversational system. It addresses a persistent gap in clinical AI: most medical vision-language models are English-only and narrowly scoped, leaving Arabic-speaking clinicians and patients underserved and limiting cross-lingual medical reasoning. BiMediX2 supports multi-turn conversation grounded in diverse medical imaging modalities, including radiology, CT, and histology, alongside text-only medical question answering.

Developed by the Oryx research group at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), with collaborators including the University of Melbourne, the model was first released in December 2024 and later accepted to the EMNLP 2025 Findings track. It builds on the Llama 3.1 language backbone and couples it with a vision encoder, extending the earlier text-only BiMediX line into the multimodal setting.

Beyond the model itself, the authors contribute a large bilingual instruction dataset (BiMed-V) and BiMed-MBench, described as the first Arabic-English medical LMM evaluation benchmark verified by medical experts, providing the community with both training material and a means of standardized assessment.

Key Features

Bilingual medical reasoning: Natively handles both Arabic and English medical conversations, narrowing the language gap in clinical AI tools for the Middle East and North Africa region.
Multimodal understanding: Processes radiology, CT, and histology images in addition to text, supporting medical visual question answering, report generation, and summarization within multi-turn dialogue.
Llama 3.1 foundation: Pairs a Meta-Llama-3.1-8B-Instruct language backbone with a CLIP-based vision encoder, trained via projector pretraining followed by LoRA finetuning.
Expert-verified benchmark: Introduces BiMed-MBench, a GPT-4o-graded evaluation set of 386 medical queries reviewed by clinicians to assess bilingual multimodal competence.
Open release: Code, multiple model checkpoints, and the BiMed-V instruction dataset are publicly available under a CC-BY-NC-SA 4.0 (research-only) license.

Technical Details

BiMediX2 adopts a LLaVA-style architecture, connecting a CLIP vision encoder to the Meta-Llama-3.1-8B-Instruct language model through a multimodal projector. Training proceeds in two stages: projector pretraining to align visual and text representations, followed by LoRA finetuning on the BiMed-V dataset, a curated collection of roughly 1.6 million bilingual samples spanning medical conversations, report generation, and visual question answering. Image sources include PMC, SLAKE-VQA, RAD-VQA, and PATH-VQA. The collection includes an 8B variant alongside larger 70B and smaller 4B configurations. On BiMed-MBench, the model reports improvements of more than 9% in English and over 20% in Arabic relative to existing open-source models, exceeds GPT-4 by roughly 8% on USMLE, and surpasses GPT-4 by about 9% on UPHILL factual accuracy.

Applications

BiMediX2 targets clinical and research settings where bilingual, image-grounded medical dialogue is valuable: assisting radiologists and pathologists in interpreting scans and slides, drafting structured reports, answering exam-style and patient-facing medical questions, and supporting education in Arabic-speaking healthcare contexts. By covering both languages in a single model, it benefits clinicians, students, and researchers in regions where English-only tools fall short, while its open weights enable academic study of multilingual medical multimodal reasoning.

Impact

BiMediX2 is among the first efforts to bring Arabic-English parity to medical multimodal AI, extending the multilingual medical LLM landscape beyond English and providing reusable assets, the BiMed-V dataset and the expert-verified BiMed-MBench benchmark, that lower the barrier for future bilingual work. Its acceptance to EMNLP 2025 Findings and the release of multiple checkpoints have made it a reference point for evaluating cross-lingual clinical models. Its licensing and the authors' explicit caution that the system is research-only and not validated for clinical or commercial deployment remain important limitations for any downstream use.

Citation

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Preprint

Mullappilly, S. S., et al. (2024) BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities. Conference on Empirical Methods in Natural Language Processing.

DOI: 10.48550/arXiv.2412.07769

Recent citations

Papers that recently cited this model.

UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA
M. Chen, Yan Shu, Chi Liu, et al.
Jun 2026
0
MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution
Chunzheng Zhu, Jiaqi Zeng, Junyue Jiang, et al.
Apr 2026
9
Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs
Jianghang Lin, Haihua Yang, Deli Yu, et al.
Apr 2026
0

Top citations

The most-cited papers that cite this model.

LLM Post-Training: A Deep Dive into Reasoning Large Language Models
Komal Kumar, Tajamul Ashraf, Omkar Thawakar, et al.
arXiv.org · Feb 2025
105
Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions
Mohammad Almansoori, Komal Kumar, Hisham Cholakkal
International Conference on Medical Image Computing and Computer-Assisted Intervention · Mar 2025
32
MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs
Jiyao Liu, Jinjie Wei, Wanying Qu, et al.
arXiv.org · Oct 2025
9
MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution
Chunzheng Zhu, Jiaqi Zeng, Junyue Jiang, et al.
Apr 2026
9
Beyond N-grams: A Hierarchical Reward Learning Framework for Clinically-Aware Medical Report Generation
Yuan Wang, Shujian Gao, Jiaxiang Liu, et al.
AAAI Conference on Artificial Intelligence · Dec 2025
2

Citations

Total Citations21

Influential3

References28

GitHub

Stars74

Forks8

Open Issues0

Contributors2

Last Push9mo ago

LanguagePython

HuggingFace

Downloads18

Likes8

Last Modified1y ago

Pipelineimage-text-to-text

Fields of citing research

Computer Science100%
Medicine88%
Environmental Science12%
Engineering12%
Agricultural and Food Sciences6%
Linguistics6%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

11Closed

Usability — can I run it?12

Reproducibility — can I retrain it?11

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model Dataset

Key Features

Bilingual medical reasoning: Natively handles both Arabic and English medical conversations, narrowing the language gap in clinical AI tools for the Middle East and North Africa region.

Multimodal understanding: Processes radiology, CT, and histology images in addition to text, supporting medical visual question answering, report generation, and summarization within multi-turn dialogue.

Llama 3.1 foundation: Pairs a Meta-Llama-3.1-8B-Instruct language backbone with a CLIP-based vision encoder, trained via projector pretraining followed by LoRA finetuning.

Expert-verified benchmark: Introduces BiMed-MBench, a GPT-4o-graded evaluation set of 386 medical queries reviewed by clinicians to assess bilingual multimodal competence.

Open release: Code, multiple model checkpoints, and the BiMed-V instruction dataset are publicly available under a CC-BY-NC-SA 4.0 (research-only) license.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA

M. Chen, Yan Shu, Chi Liu, et al.

Jun 2026

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

Chunzheng Zhu, Jiaqi Zeng, Junyue Jiang, et al.

Apr 2026

Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs

Jianghang Lin, Haihua Yang, Deli Yu, et al.

Apr 2026

Top citations

The most-cited papers that cite this model.

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Komal Kumar, Tajamul Ashraf, Omkar Thawakar, et al.

arXiv.org · Feb 2025

105

Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions

Mohammad Almansoori, Komal Kumar, Hisham Cholakkal

International Conference on Medical Image Computing and Computer-Assisted Intervention · Mar 2025

MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs

Jiyao Liu, Jinjie Wei, Wanying Qu, et al.

arXiv.org · Oct 2025

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

Chunzheng Zhu, Jiaqi Zeng, Junyue Jiang, et al.

Apr 2026

Beyond N-grams: A Hierarchical Reward Learning Framework for Clinically-Aware Medical Report Generation

Yuan Wang, Shujian Gao, Jiaxiang Liu, et al.

AAAI Conference on Artificial Intelligence · Dec 2025

BiMediX2

#Key Features

#Technical Details

#Applications

#Impact

Citation

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Recent citations

UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs

Top citations

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

BiMediX2

#Key Features

#Technical Details

#Applications

#Impact

Citation

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Recent citations

UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

Learning from Medical Entity Trees: An Entity-Centric Medical Data Engineering Framework for MLLMs

Top citations

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact