bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Pathology foundation models
PathologyLanguage model

PathAsst

Westlake University / Zhejiang University / The Ohio State University / Hangzhou City University

A multimodal generative AI assistant for pathology, pairing the PathCLIP vision encoder with a Vicuna-13B LLM and a toolkit of eight pathology-specific models.

Released: May 2023

PathAsst is a multimodal generative foundation model that acts as an AI assistant for computational pathology, conversing about pathology images and orchestrating specialized diagnostic tools in response to natural-language instructions. Introduced by researchers from Westlake University, Zhejiang University, The Ohio State University, and Hangzhou City University, it was first released as a preprint in May 2023 and published at AAAI 2024. The work frames its ambition explicitly as a step toward artificial general intelligence for pathology — a single, instruction-following system that can handle the diverse questions a pathologist might pose about a histology or cytology image.

PathAsst belongs to the wave of medical multimodal large language models (MLLMs) that adapt the LLaVA-style "vision encoder plus LLM" recipe to a specialized domain. Its core contribution is twofold: a pathology-dedicated CLIP model, PathCLIP, trained on large-scale pathology image-text pairs to give the assistant a domain-aware visual backbone, and a curated instruction-tuning corpus that teaches a general-purpose LLM to reason about pathology and to invoke external tools when a task exceeds conversational reasoning.

Rather than predicting a single label, PathAsst is designed as an interactive copilot. It can answer open-ended questions, generate captions and descriptive reports, and route requests to a toolkit of eight pathology-specific models — for example a nucleus segmentation or tissue-classification model — plus a literature retrieval system, returning the combined result in a conversational format.

#Key Features

  • PathCLIP vision encoder: A pathology-adapted CLIP built from collected pathology image-text pairs, giving the assistant a domain-specialized visual representation rather than a generic natural-image encoder.
  • LLM-driven tool use: A Vicuna-13B backbone is instruction-tuned to call eight pathology-specific sub-models (segmentation, classification, detection, and related tasks) when a query requires capabilities beyond text generation.
  • Literature retrieval: An integrated paper-retrieval system over a corpus of roughly 5.3 million biomedical articles lets the assistant ground answers in published sources.
  • Released datasets (non-commercial, gated): The PathCap image-caption corpus and PathInstruct instruction-following data are available on HuggingFace, but are access-gated and non-commercially licensed (CC-BY-NC), supporting research use rather than fully open reuse.

#Technical Details

PathAsst combines the PathCLIP vision encoder with the Vicuna-13B large language model, connected through a projection layer in the LLaVA tradition. Training proceeds in stages: the vision encoder and LLM are initially frozen while only the connecting fully connected layer is trained to align visual features with the language space, followed by instruction tuning on pathology-specific data. The training corpus comprises over 207,000 high-quality pathology image-text pairs gathered from authoritative sources (PathCap) and more than 180,000 instruction-following samples generated with ChatGPT (PathInstruct), including examples that teach the model when and how to invoke its eight specialized sub-models. The released artifacts comprise the PathCLIP encoder weights (CC-BY-NC-4.0) and the access-gated PathCap and PathInstruct datasets (CC-BY-NC-2.0) on HuggingFace; the full instruction-tuned PathAsst (Vicuna-13B) assistant weights are not released, and the GitHub repository ships data-construction tooling without a license. The authors report that PathAsst improves pathology-image understanding and tool-augmented question answering relative to general-purpose multimodal baselines.

#Applications

PathAsst is aimed at researchers and clinicians who want an interactive assistant for histopathology and cytology workflows. Practical uses include answering questions about a tissue or cell image, generating descriptive captions and draft reports, and dispatching quantitative tasks such as nucleus segmentation or tissue classification to dedicated models without leaving the conversational interface. Its retrieval component additionally helps users connect an image-based question to relevant biomedical literature, making it a candidate building block for diagnostic decision support and pathology education tools.

#Impact

As one of the earlier pathology-specific multimodal LLMs, PathAsst helped establish the template — domain-adapted CLIP plus an instruction-tuned LLM with tool-calling — that later computational-pathology assistants such as PathChat and SlideChat would build on. Its released PathCLIP encoder and the PathCap and PathInstruct datasets — all under non-commercial CC-BY-NC licenses, with the datasets access-gated — have been reused as resources for subsequent pathology vision-language work, though only the encoder is shared, not the full assistant. Limitations are typical of the approach: it operates on image patches rather than whole-slide images, inherits the factual reliability concerns of its underlying LLM, and its synthetic instruction data can propagate biases, so outputs require expert verification before any clinical use.

Citation

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Sun, Y., et al. (2023) PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology. AAAI Conference on Artificial Intelligence.

DOI: 10.1609/aaai.v38i5.28308

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations95
Influential8
References41

GitHub

Stars133
Forks6
Open Issues6
Contributors2
Last Push2y ago
LanguagePython

HuggingFace

Downloads0
Likes7
Last Modified2y ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
17Closed
Usability — can I run it?23
Reproducibility — can I retrain it?6
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cytologyfoundation_modelhistologyimage_captioninginstruction_tuningmultimodalreport_generationtransformervision_transformervisual_question_answering

Resources

GitHub RepositoryResearch PaperHuggingFace ModelDatasetDataset