PathChat

Mahmood Lab / Brigham and Women's Hospital / Harvard Medical School / Massachusetts General Hospital / The Ohio State University

Multimodal vision-language copilot for pathology that answers open-ended questions about histology images and reasons about differential diagnoses.

Released: July 2024

Parameters: 13 Billion

PathChat is a multimodal generative AI copilot for human pathology that lets pathologists hold an interactive, natural-language conversation about a histology image. Given a region of interest from a whole-slide image, it can describe morphology, reason about likely diagnoses, answer open-ended questions, and incorporate clinical context supplied in the prompt. It was developed by the Mahmood Lab at Brigham and Women's Hospital and Harvard Medical School, with collaborators at Massachusetts General Hospital and The Ohio State University, and published in Nature in 2024 (preprinted as "A Foundational Multimodal Vision Language AI Assistant for Human Pathology" in December 2023).

The model addresses a gap left by earlier computational pathology tools, which were typically narrow classifiers trained for a single tissue type or task. By coupling a pathology-specialized vision encoder to a large language model, PathChat instead acts as a general-purpose, instruction-following assistant that generalizes across tissue origins and disease models without task-specific retraining. This positions it alongside vision-language foundation models in pathology while distinguishing it through its conversational, copilot-style interface aimed at real diagnostic workflows.

PathChat fits within the Mahmood Lab's broader ecosystem of pathology foundation models, building directly on the lab's CONCH vision-language encoder and complementing slide-level models such as UNI. It was among the first systems to demonstrate that a pathology-grounded multimodal LLM could match or exceed specialized models on diagnostic question answering.

Key Features

Conversational pathology copilot: Pathologists can ask free-form questions about an image and receive grounded, multi-turn answers, including differential diagnoses and morphological descriptions.
Pathology-specialized vision encoder: It uses the CONCH family encoder, pretrained on roughly 100 million histology image tiles and over 1.18 million image-caption pairs, giving it domain-aware visual representations rather than generic natural-image features.
Clinical-context awareness: When relevant clinical information is provided in the prompt, response accuracy improves substantially, mirroring how human pathologists integrate patient history.
Broad task generality: A single fixed checkpoint handles diagnosis, description, and open-ended querying across diverse tissue types and disease models without per-task fine-tuning.

Technical Details

PathChat connects a pathology foundation vision encoder (CONCH-Large) to a 13-billion-parameter pretrained large language model through a multimodal projector module, following a LLaVA-style vision-language architecture. The vision encoder was pretrained on approximately 100 million histology images from more than 100,000 patient cases plus 1.18 million pathology image-caption pairs. The full system was instruction-tuned on a curated dataset of over 456,000 diverse visual-language instructions comprising roughly 999,000 question-and-answer turns, assembled to be disease-agnostic. On multiple-choice diagnostic questions drawn from publicly available cases, PathChat reached about 87% accuracy when clinical context was supplied, achieving state-of-the-art performance relative to contemporary multimodal models, and in blinded expert evaluation produced responses that pathologists preferred over baseline assistants.

Applications

PathChat is aimed at diagnostic and educational pathology workflows. Practising pathologists can use it as a second-opinion copilot to surface differential diagnoses, summarize morphological findings, and draft narrative descriptions of regions of interest, potentially accelerating sign-out and reducing routine documentation burden. Trainees and educators benefit from an interactive tutor that can explain what is visible in a slide and why a given diagnosis is favored. Because it accepts arbitrary natural-language queries, it can also support research tasks such as exploratory annotation and hypothesis generation across heterogeneous tissue types.

Impact

PathChat helped establish the multimodal "copilot" as a paradigm for computational pathology, shifting the field beyond single-task classifiers toward conversational, instruction-following assistants that integrate vision and language. Published in Nature and backed by the Mahmood Lab's track record with CONCH and UNI, it drew substantial attention and was followed by an improved successor, PathChat 2. The model weights are not released: the authors state they cannot be made available because they were trained on proprietary internal patient data subject to privacy and intellectual-property obligations, and the system has been exclusively licensed to the commercial spin-out Modella AI, leaving the trained model effectively unavailable to the broader community. The training code is released but restricted to academic research use only (not an open-source license), and the system is positioned as decision support rather than an autonomous diagnostic device—an important limitation given that clinical deployment requires regulatory validation and expert oversight.

Citation

A multimodal generative AI copilot for human pathology

Lu, M. Y., et al. (2024) A multimodal generative AI copilot for human pathology. Nature.

DOI: 10.1038/s41586-024-07618-3

Recent citations

Papers that recently cited this model.

MMLNB: Multi-Modal Learning for Neuroblastoma subtyping classification assisted with textual description generation
Huangwei Chen, Yifei Chen, Zhenyu Yan, et al.
Biomedical Signal Processing and Control · Oct 2026
0
Bridging the Gap — Translating AI in Pathology into Clinical Impact
Fang-Yi Su, Eliana Marostica, Xiyue Wang, et al.
NEJM AI · Jul 2026
0
Auditing Data Leakage in Whole-Slide Image Multimodal Benchmarks
Wenhao Zhang, Zhongliang Zhou, John Kang, et al.
Jul 2026
0

Top citations

The most-cited papers that cite this model.

Vision-language models for medical report generation and visual question answering: a review
Iryna Hartsock, Ghulam Rasool
Frontiers Artif. Intell. · Mar 2024
246
A Vision-Language Foundation Model for Precision Oncology
Jinxi Xiang, Xiyue Wang, Xiaoming Zhang, et al.
Nature · Jan 2025
245
A Review on Edge Large Language Models: Design, Execution, and Applications
Yue Zheng, Yuhao Chen, Bin Qian, et al.
ACM Computing Surveys · Sep 2024
190
Generalist foundation models from a multimodal dataset for 3D computed tomography.
I. Hamamci, Sezgin Er, Furkan Almas, et al.
Nature Biomedical Engineering · Mar 2024
177
HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis
Guillaume Jaume, Paul Doucet, Andrew H. Song, et al.
Neural Information Processing Systems · Jun 2024
141

Citations

Total Citations476

Influential19

References0

Fields of citing research

Medicine88%
Computer Science87%
Engineering12%
Biology8%
Environmental Science4%
Education3%
Chemistry2%
Linguistics2%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

35Closed

Usability — can I run it?36

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper Official Website Documentation

Key Features

Conversational pathology copilot: Pathologists can ask free-form questions about an image and receive grounded, multi-turn answers, including differential diagnoses and morphological descriptions.

Pathology-specialized vision encoder: It uses the CONCH family encoder, pretrained on roughly 100 million histology image tiles and over 1.18 million image-caption pairs, giving it domain-aware visual representations rather than generic natural-image features.

Clinical-context awareness: When relevant clinical information is provided in the prompt, response accuracy improves substantially, mirroring how human pathologists integrate patient history.

Broad task generality: A single fixed checkpoint handles diagnosis, description, and open-ended querying across diverse tissue types and disease models without per-task fine-tuning.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

MMLNB: Multi-Modal Learning for Neuroblastoma subtyping classification assisted with textual description generation

Huangwei Chen, Yifei Chen, Zhenyu Yan, et al.

Biomedical Signal Processing and Control · Oct 2026

Bridging the Gap — Translating AI in Pathology into Clinical Impact

Fang-Yi Su, Eliana Marostica, Xiyue Wang, et al.

NEJM AI · Jul 2026

Auditing Data Leakage in Whole-Slide Image Multimodal Benchmarks

Wenhao Zhang, Zhongliang Zhou, John Kang, et al.

Jul 2026

PathChat

#Key Features

#Technical Details

#Applications

#Impact

Citation

A multimodal generative AI copilot for human pathology

Recent citations

Auditing Data Leakage in Whole-Slide Image Multimodal Benchmarks

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

PathChat

#Key Features

#Technical Details

#Applications

#Impact

Citation

A multimodal generative AI copilot for human pathology

Recent citations

Auditing Data Leakage in Whole-Slide Image Multimodal Benchmarks

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact