bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Pathology foundation models
PathologyImaging

Chiron-o1

Shanghai AI Laboratory / Fudan University / Shanghai Jiao Tong University

Medical multimodal LLM (2B and 8B) trained for generalizable, step-by-step clinical reasoning via Mentor-Intern Collaborative Search.

Released: June 2025

Chiron-o1 is a family of medical multimodal large language models (MLLMs) built to perform deep, verifiable, step-by-step reasoning over clinical images and text, rather than producing single-shot answers to visual questions. It was introduced in June 2025 by researchers from Shanghai Artificial Intelligence Laboratory, Fudan University, and Shanghai Jiao Tong University, and the work was accepted to NeurIPS 2025.

The central problem the authors target is that most medical MLLMs answer visual questions directly, without an explicit reasoning trace, which limits both interpretability and the ability to generalize to complex clinical scenarios. High-quality medical chain-of-thought (CoT) supervision is scarce, and naive prompting of a single model tends to produce shallow or unreliable reasoning paths. Chiron-o1 addresses this with a data-generation strategy called Mentor-Intern Collaborative Search (MICS), which searches for effective reasoning paths by having strong "mentor" models propose reasoning steps and weaker "intern" models continue and stress-test them.

Released in 2B and 8B parameter sizes built on the InternVL3 backbone, Chiron-o1 reports state-of-the-art results across a range of medical visual question answering and reasoning benchmarks, positioning it among the open, reasoning-focused medical MLLMs alongside efforts such as HuatuoGPT-Vision, Med-R1, and MedVLM-R1.

#Key Features

  • Mentor-Intern Collaborative Search (MICS): A reasoning-path search scheme where mentor models (GPT-4o, Gemini 2.5 Pro, Qwen2.5-VL-72B) initialize steps and intern models continue them, selecting paths by an MICS-Score that measures how learnable a reasoning path is.
  • MMRP reasoning dataset: A ranked multimodal medical reasoning dataset combining simple QA pairs, image-text alignment annotations, and MICS-generated multimodal chain-of-thought data for complex cases.
  • Curriculum learning: Training proceeds from simpler alignment and QA tasks toward harder multimodal CoT, progressively building generalizable reasoning ability.
  • Two open sizes: Chiron-o1-2B (InternVL3-2B, ~8GB GPU) and Chiron-o1-8B (InternVL3-8B, ~19GB GPU) with released weights under an MIT-licensed codebase.
  • Verifiable reasoning traces: Outputs include explicit step-by-step chains rather than opaque single answers, aiding interpretability in clinical contexts.

#Technical Details

Chiron-o1 fine-tunes the InternVL3 vision-language architecture (2B and 8B variants) using the MMRP dataset and a curriculum learning schedule. MICS generates training CoT by having mentor models seed reasoning steps while intern models (Qwen2.5-VL-7B, Qwen2-VL-7B, InternVL3-8B) continue them; the MICS-Score ranks candidate paths by how well interns can follow and complete them, favoring reasoning that is both correct and learnable. On benchmarks, the 8B model reports VQA-RAD 76.8%, SLAKE 83.2%, PathVQA 74.0%, PMC-VQA 57.5%, and MMMU Health & Medicine 54.6%, generally outperforming larger general-purpose and medical baselines such as HuatuoGPT-Vision-34B and Gemini-2.5-Pro on several tasks. On the held-out MMRP reasoning split it reaches 92.1% (pure text) and 58.4% (multimodal) accuracy.

#Applications

Chiron-o1 targets medical visual question answering and reasoning across modalities including radiology, pathology, and general clinical imagery. Its explicit reasoning traces make it useful for research on interpretable clinical decision support, medical education and tutoring, and as a base for further fine-tuning. The compact 2B variant runs on modest GPUs (~8GB), lowering the barrier for academic groups and resource-constrained deployments to experiment with reasoning-capable medical MLLMs.

#Impact

By reframing medical MLLM training around searched, ranked chain-of-thought data rather than direct-answer supervision, Chiron-o1 demonstrates that collaborative search over reasoning paths can yield more generalizable clinical reasoning. Its NeurIPS 2025 acceptance, openly released 2B and 8B weights, and MIT-licensed code make it a practical reference point for the growing class of reasoning-oriented medical foundation models. Key limitations include reliance on proprietary mentor models (GPT-4o, Gemini) to generate training data and licensing restrictions on parts of the underlying image corpus (e.g., Radiopaedia), which constrain full reproduction of the training dataset.

Citation

Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search

Preprint

Sun, H., et al. (2025) Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search.

DOI: 10.48550/arXiv.2506.16962

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations9
Influential0
References83

GitHub

Stars59
Forks9
Open Issues0
Contributors1
Last Push7mo ago
LanguagePython
LicenseMIT

HuggingFace

Downloads6
Likes3
Last Modified11mo ago
Pipelineimage-text-to-text

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
67Partial
Usability — can I run it?87
Reproducibility — can I retrain it?48
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

chain_of_thoughtclinical_reasoninghistologylanguage_modelmedical_visual_question_answeringmultimodalradiologytransformervision_transformer

Resources

GitHub RepositoryResearch PaperHuggingFace Model