Mahmood Lab / Brigham and Women's Hospital / Harvard Medical School / Massachusetts General Hospital / The Ohio State University
A multimodal vision-language copilot for human pathology that analyzes histology images and answers diverse pathology queries in natural language.
PathChat is a multimodal generative AI copilot for human pathology that lets pathologists hold an interactive, natural-language conversation about a histology image. Given a region of interest from a whole-slide image, it can describe morphology, reason about likely diagnoses, answer open-ended questions, and incorporate clinical context supplied in the prompt. It was developed by the Mahmood Lab at Brigham and Women's Hospital and Harvard Medical School, with collaborators at Massachusetts General Hospital and The Ohio State University, and published in Nature in 2024 (preprinted as "A Foundational Multimodal Vision Language AI Assistant for Human Pathology" in December 2023).
The model addresses a gap left by earlier computational pathology tools, which were typically narrow classifiers trained for a single tissue type or task. By coupling a pathology-specialized vision encoder to a large language model, PathChat instead acts as a general-purpose, instruction-following assistant that generalizes across tissue origins and disease models without task-specific retraining. This positions it alongside vision-language foundation models in pathology while distinguishing it through its conversational, copilot-style interface aimed at real diagnostic workflows.
PathChat fits within the Mahmood Lab's broader ecosystem of pathology foundation models, building directly on the lab's CONCH vision-language encoder and complementing slide-level models such as UNI. It was among the first systems to demonstrate that a pathology-grounded multimodal LLM could match or exceed specialized models on diagnostic question answering.
PathChat connects a pathology foundation vision encoder (CONCH-Large) to a 13-billion-parameter pretrained large language model through a multimodal projector module, following a LLaVA-style vision-language architecture. The vision encoder was pretrained on approximately 100 million histology images from more than 100,000 patient cases plus 1.18 million pathology image-caption pairs. The full system was instruction-tuned on a curated dataset of over 456,000 diverse visual-language instructions comprising roughly 999,000 question-and-answer turns, assembled to be disease-agnostic. On multiple-choice diagnostic questions drawn from publicly available cases, PathChat reached about 87% accuracy when clinical context was supplied, achieving state-of-the-art performance relative to contemporary multimodal models, and in blinded expert evaluation produced responses that pathologists preferred over baseline assistants.
PathChat is aimed at diagnostic and educational pathology workflows. Practising pathologists can use it as a second-opinion copilot to surface differential diagnoses, summarize morphological findings, and draft narrative descriptions of regions of interest, potentially accelerating sign-out and reducing routine documentation burden. Trainees and educators benefit from an interactive tutor that can explain what is visible in a slide and why a given diagnosis is favored. Because it accepts arbitrary natural-language queries, it can also support research tasks such as exploratory annotation and hypothesis generation across heterogeneous tissue types.
PathChat helped establish the multimodal "copilot" as a paradigm for computational pathology, shifting the field beyond single-task classifiers toward conversational, instruction-following assistants that integrate vision and language. Published in Nature and backed by the Mahmood Lab's track record with CONCH and UNI, it drew substantial attention and was followed by an improved successor, PathChat 2. The model weights are not released: the authors state they cannot be made available because they were trained on proprietary internal patient data subject to privacy and intellectual-property obligations, and the system has been exclusively licensed to the commercial spin-out Modella AI, leaving the trained model effectively unavailable to the broader community. The training code is released but restricted to academic research use only (not an open-source license), and the system is positioned as decision support rather than an autonomous diagnostic device—an important limitation given that clinical deployment requires regulatory validation and expert oversight.
Lu, M. Y., et al. (2024) A multimodal generative AI copilot for human pathology. Nature.
DOI: 10.1038/s41586-024-07618-3Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data