bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingLanguage model

Brainfound

Tsinghua University / Chinese PLA General Hospital / Beijing Tiantan Hospital

A multimodal vision-text foundation model for brain CT and MRI, pretrained on ~10M paired images and reports to act as a clinical copilot across seven imaging tasks.

Released: January 2025

Brainfound is a multimodal vision-text foundation model for brain imaging that functions as an interactive clinical copilot across both CT and MRI. Rather than training a separate network for each clinical task, Brainfound learns shared representations spanning brain CT, brain MRI, and the radiology reports that accompany them, then applies that single foundation to a broad spectrum of work—from low-level image enhancement up to high-level report generation and free-form human-AI dialogue. It targets the gap between narrow, single-task medical imaging models and the generalist assistants clinicians increasingly want at the point of care.

The model was introduced in a January 2025 medRxiv preprint, "A Multimodal Vision-text AI Copilot for Brain Disease Diagnosis and Medical Imaging," by Guoxun Zhang, Yuchen Guo, Xin Lou, Qionghai Dai, and colleagues. The work is a collaboration led by Tsinghua University together with the Chinese PLA General Hospital and Beijing Tiantan Hospital (Capital Medical University), pairing a strong computational-imaging group with two major neuroimaging clinical centers. A peer-reviewed version was subsequently published in Cell Reports Medicine.

Brainfound's central design choice is to combine generative image modeling with image-text alignment: a diffusion-based visual module learns to model and enhance brain images, while contrastive learning aligns that visual representation with paired clinical text. This pairing lets one backbone both generate and reason over images and language, which is what enables zero-shot classification and conversational use without task-specific retraining.

#Key Features

  • Vision-text copilot: Accepts flexible image and text input and produces image or text output, supporting free human-AI conversation about brain scans rather than a single fixed prediction.
  • CT and MRI in one model: A single foundation handles both brain CT and brain MRI, learning a shared representation across modalities and their paired reports.
  • Seven downstream tasks: Covers brain disease diagnosis, lesion segmentation, MRI enhancement, cross-modality translation, automatic report generation, zero-shot disease classification, and human-AI dialogue.
  • Diffusion plus contrastive alignment: Built on a diffusion-based generative framework with image-text contrastive learning aligning the visual and language modules.
  • Zero-shot capability: Image-text alignment enables zero-shot brain disease classification without additional task-specific labels.

#Technical Details

Brainfound was pretrained on a large multimodal corpus—over 3 million brain CT images and over 7 million brain MRI images, each with paired clinical reports (roughly 10 million image-report pairs in total). Its architecture combines a diffusion-based generative visual module with a language module, the two aligned by contrastive learning so that visual features and report text share a common embedding space. The authors report state-of-the-art results across the seven evaluated tasks: in automatic report generation for brain imaging it exceeded the prior leading model by 51.75%, and on brain-imaging multiple-choice questions it outperformed GPT-4V by 47.68%, with diagnostic performance approaching that of expert physicians on the evaluated benchmarks. Three experienced radiologists from three hospitals participated in labeling and evaluation.

#Applications

Brainfound is aimed at radiologists and clinical-imaging teams who need a single assistant that can read brain CT and MRI, draft structured reports, answer imaging questions, enhance or translate scans between contrasts, and segment lesions. Because it accepts mixed image and text input and supports dialogue, it can slot into reporting workflows as a drafting and question-answering aid, support triage and second-read scenarios, and enable zero-shot classification of conditions not seen during fine-tuning—particularly valuable in neuroimaging settings where annotated data is scarce.

#Impact

Brainfound is an early example of a generalist, conversational foundation model purpose-built for a single organ system's imaging, demonstrating that combining diffusion-based generation with image-text contrastive learning can unify enhancement, segmentation, diagnosis, and report generation in one brain-imaging model. Its reported margins over strong baselines, including a large language-vision model on brain-imaging QA, and its progression from medRxiv preprint to publication in Cell Reports Medicine, signal growing interest in clinical copilots for radiology. As a clinically oriented model, its real-world value still depends on prospective, multi-site validation, and the training corpus and weights are not openly released, which limits independent reproduction.

Citation

A Multimodal Vision-text AI Copilot for Brain Disease Diagnosis and Medical Imaging

Preprint

Zhang, G., et al. (2025) A Multimodal Vision-text AI Copilot for Brain Disease Diagnosis and Medical Imaging. medRxiv.

DOI: 10.1101/2025.01.09.25320293

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations3
Influential0
References0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
7Closed
Usability — can I run it?9
Reproducibility — can I retrain it?0
not reproducible
Model Openness Framework
Unclassified
Missing required components

Tags

contrastive_learningcross_modality_translationdiffusiondisease_diagnosisfoundation_modelimage_enhancementmultimodalneuroimagingreport_generationsegmentationtransformerzero_shot

Resources

Research PaperResearch Paper