PathAsst

Westlake University / Zhejiang University / The Ohio State University / Hangzhou City University

Multimodal pathology assistant that answers questions about histology and cytology images, pairing the PathCLIP vision encoder with a Vicuna-13B LLM.

Released: May 2023

PathAsst is a multimodal generative foundation model that acts as an AI assistant for computational pathology, conversing about pathology images and orchestrating specialized diagnostic tools in response to natural-language instructions. Introduced by researchers from Westlake University, Zhejiang University, The Ohio State University, and Hangzhou City University, it was first released as a preprint in May 2023 and published at AAAI 2024. The work frames its ambition explicitly as a step toward artificial general intelligence for pathology — a single, instruction-following system that can handle the diverse questions a pathologist might pose about a histology or cytology image.

PathAsst belongs to the wave of medical multimodal large language models (MLLMs) that adapt the LLaVA-style "vision encoder plus LLM" recipe to a specialized domain. Its core contribution is twofold: a pathology-dedicated CLIP model, PathCLIP, trained on large-scale pathology image-text pairs to give the assistant a domain-aware visual backbone, and a curated instruction-tuning corpus that teaches a general-purpose LLM to reason about pathology and to invoke external tools when a task exceeds conversational reasoning.

Rather than predicting a single label, PathAsst is designed as an interactive copilot. It can answer open-ended questions, generate captions and descriptive reports, and route requests to a toolkit of eight pathology-specific models — for example a nucleus segmentation or tissue-classification model — plus a literature retrieval system, returning the combined result in a conversational format.

Key Features

PathCLIP vision encoder: A pathology-adapted CLIP built from collected pathology image-text pairs, giving the assistant a domain-specialized visual representation rather than a generic natural-image encoder.
LLM-driven tool use: A Vicuna-13B backbone is instruction-tuned to call eight pathology-specific sub-models (segmentation, classification, detection, and related tasks) when a query requires capabilities beyond text generation.
Literature retrieval: An integrated paper-retrieval system over a corpus of roughly 5.3 million biomedical articles lets the assistant ground answers in published sources.
Released datasets (non-commercial, gated): The PathCap image-caption corpus and PathInstruct instruction-following data are available on HuggingFace, but are access-gated and non-commercially licensed (CC-BY-NC), supporting research use rather than fully open reuse.

Technical Details

PathAsst combines the PathCLIP vision encoder with the Vicuna-13B large language model, connected through a projection layer in the LLaVA tradition. Training proceeds in stages: the vision encoder and LLM are initially frozen while only the connecting fully connected layer is trained to align visual features with the language space, followed by instruction tuning on pathology-specific data. The training corpus comprises over 207,000 high-quality pathology image-text pairs gathered from authoritative sources (PathCap) and more than 180,000 instruction-following samples generated with ChatGPT (PathInstruct), including examples that teach the model when and how to invoke its eight specialized sub-models. The released artifacts comprise the PathCLIP encoder weights (CC-BY-NC-4.0) and the access-gated PathCap and PathInstruct datasets (CC-BY-NC-2.0) on HuggingFace; the full instruction-tuned PathAsst (Vicuna-13B) assistant weights are not released, and the GitHub repository ships data-construction tooling without a license. The authors report that PathAsst improves pathology-image understanding and tool-augmented question answering relative to general-purpose multimodal baselines.

Applications

PathAsst is aimed at researchers and clinicians who want an interactive assistant for histopathology and cytology workflows. Practical uses include answering questions about a tissue or cell image, generating descriptive captions and draft reports, and dispatching quantitative tasks such as nucleus segmentation or tissue classification to dedicated models without leaving the conversational interface. Its retrieval component additionally helps users connect an image-based question to relevant biomedical literature, making it a candidate building block for diagnostic decision support and pathology education tools.

Impact

As one of the earlier pathology-specific multimodal LLMs, PathAsst helped establish the template — domain-adapted CLIP plus an instruction-tuned LLM with tool-calling — that later computational-pathology assistants such as PathChat and SlideChat would build on. Its released PathCLIP encoder and the PathCap and PathInstruct datasets — all under non-commercial CC-BY-NC licenses, with the datasets access-gated — have been reused as resources for subsequent pathology vision-language work, though only the encoder is shared, not the full assistant. Limitations are typical of the approach: it operates on image patches rather than whole-slide images, inherits the factual reliability concerns of its underlying LLM, and its synthetic instruction data can propagate biases, so outputs require expert verification before any clinical use.

Citation

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Sun, Y., et al. (2023) PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. AAAI Conference on Artificial Intelligence.

DOI: 10.1609/aaai.v38i5.28308

Recent citations

Papers that recently cited this model.

Language-Guided Segmentation of Medical Images: A Review of Foundation Models
Saqib Qamar
Bioengineering · Jul 2026
0
IRIS: An Intelligent Vision-Language System for Ocular Surface Diseases via Topic Tree and Scene-Driven VQA Generation
Hao Wei, Wenjin Qi, Dasen Dai, et al.
Jul 2026
0
Co-assistant networks by pathology foundation model and convolutional neural network for gigapixel whole slide image analysis.
Zhuoran Liu, Jun-yi Shen, Lei Cui, et al.
Medical Image Analysis · Jul 2026
0

Top citations

The most-cited papers that cite this model.

A Vision-Language Foundation Model for Precision Oncology
Jinxi Xiang, Xiyue Wang, Xiaoming Zhang, et al.
Nature · Jan 2025
245
Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions
Yuting He, Fuxiang Huang, Xinrui Jiang, et al.
IEEE Reviews in Biomedical Engineering · Apr 2024
134
Embodied Task Planning with Large Language Models
Zhenyu Wu, Ziwei Wang, Xiuwei Xu, et al.
arXiv.org · Jul 2023
125
MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
Binxu Li, Tian Yan, Yuanting Pan, et al.
Conference on Empirical Methods in Natural Language Processing · Jul 2024
120
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao, Feizhong Zhou, X. Liu, et al.
Information Fusion · May 2024
116Influential

Citations

Total Citations104

Influential8

References41

GitHub

Stars136

Forks6

Open Issues6

Contributors2

Last Push2y ago

LanguagePython

HuggingFace

Downloads0

Likes8

Last Modified2y ago

Fields of citing research

Computer Science95%
Medicine87%
Engineering11%
Environmental Science4%
Biology4%
Education2%
Psychology2%
Physics1%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

17Closed

Usability — can I run it?23

Reproducibility — can I retrain it?6

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model Dataset Dataset

Key Features

PathCLIP vision encoder: A pathology-adapted CLIP built from collected pathology image-text pairs, giving the assistant a domain-specialized visual representation rather than a generic natural-image encoder.

LLM-driven tool use: A Vicuna-13B backbone is instruction-tuned to call eight pathology-specific sub-models (segmentation, classification, detection, and related tasks) when a query requires capabilities beyond text generation.

Literature retrieval: An integrated paper-retrieval system over a corpus of roughly 5.3 million biomedical articles lets the assistant ground answers in published sources.

Released datasets (non-commercial, gated): The PathCap image-caption corpus and PathInstruct instruction-following data are available on HuggingFace, but are access-gated and non-commercially licensed (CC-BY-NC), supporting research use rather than fully open reuse.

Technical Details

Applications

Impact

Citation

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Sun, Y., et al. (2023) PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. AAAI Conference on Artificial Intelligence.

DOI: 10.1609/aaai.v38i5.28308

Recent citations

Papers that recently cited this model.

PathAsst

#Key Features

#Technical Details

#Applications

#Impact

Citation

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Recent citations

IRIS: An Intelligent Vision-Language System for Ocular Surface Diseases via Topic Tree and Scene-Driven VQA Generation

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

PathAsst

#Key Features

#Technical Details

#Applications

#Impact

Citation

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Recent citations

IRIS: An Intelligent Vision-Language System for Ocular Surface Diseases via Topic Tree and Scene-Driven VQA Generation

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact