MetaboliteChat

A multimodal ChatGPT-style LLM that fuses a molecular-graph GNN, a molecular-image CNN, and a Vicuna-13B backbone for interactive, free-form prediction of metabolite mechanisms and properties.

Released: November 2025

MetaboliteChat is a multimodal large language model that brings a ChatGPT-style conversational interface to metabolite analysis. Rather than predicting a single fixed endpoint, it accepts a metabolite's molecular structure together with a free-form natural-language question and returns free-text answers about the molecule's biological mechanisms, functions, and physicochemical properties. The model was developed by Zhenhao Guo and colleagues in Pengtao Xie's lab at New York University and released as a bioRxiv preprint in November 2025.

Metabolomics has historically relied on narrow, task-specific predictors that each address one property and must be retrained for every new question. This fragments the analysis workflow and makes it hard for non-specialists to interrogate a metabolite holistically. MetaboliteChat reframes the problem as instruction-following: by aligning molecular representations with a language model, it lets a researcher pose arbitrary questions in plain English and receive explanatory, multi-turn responses grounded in the molecule's structure.

The model sits within a growing family of biomedical multimodal LLMs (such as LLaVA-Med and BioGPT) but is distinctive in targeting small-molecule metabolites specifically and in fusing two complementary structural views — a molecular graph and a rendered molecular image — into a single conversational system.

Key Features

Conversational metabolite analysis: Provides an interactive, free-form question-answering interface over metabolites instead of fixed single-task outputs, supporting multi-turn reasoning about mechanisms and properties.
Dual structural encoders: Combines a graph neural network operating on the molecular graph with a convolutional neural network operating on a rendered molecular image, capturing both topological and visual structural cues.
Vicuna-13B language backbone: Uses a 13-billion-parameter instruction-tuned LLM as its reasoning and generation engine, connected to the structural encoders through a trainable projection layer.
Zero-shot inference on unseen metabolites: A single fixed pretrained checkpoint generalizes to metabolites not seen during training, with no per-molecule or per-task retraining required.
Broad metabolite coverage: Trained across 152,222 metabolites, giving the model wide exposure to the chemical space of human metabolism.

Technical Details

MetaboliteChat follows the now-common vision-language alignment recipe (its code is adapted from MiniGPT-4 and LAVIS) but substitutes molecule-aware encoders for a generic image encoder. A GNN encodes the molecular graph derived from the SMILES string while a CNN encodes a rendered image of the molecule; their features are projected into the embedding space of a Vicuna-13B-v1.5 backbone via a trainable projection layer, with the backbone providing language understanding and generation. The instruction-tuning corpus is built from the Human Metabolome Database (HMDB), formatted as per-molecule question–answer pairs, and spans 152,222 metabolites. The authors report that MetaboliteChat outperforms both generic LLMs and task-specific baselines on metabolite analysis tasks. Inference runs from a fixed checkpoint, enabling zero-shot use on new metabolites; training and evaluation require an NVIDIA GPU with roughly 70–80 GB of memory.

Applications

MetaboliteChat is aimed at metabolomics researchers, computational chemists, and drug-discovery scientists who need to rapidly characterize metabolites without building a bespoke model for each property of interest. Because it answers free-form questions, it can serve as an exploratory assistant — summarizing a molecule's likely mechanisms, suggesting functional roles, or comparing properties across candidates — and lowers the barrier for wet-lab biologists who lack machine-learning expertise. Its zero-shot generalization is particularly useful for newly identified or poorly annotated metabolites where labeled training data is scarce.

Impact

MetaboliteChat extends the conversational, instruction-following paradigm that has reshaped protein and pathology AI into the metabolomics domain, where multimodal LLMs remain comparatively rare. By unifying structural and textual reasoning in one interactive system, it points toward more accessible, explanation-oriented tools for small-molecule analysis. As a preprint with code released under a BSD-3-Clause license and weights distributed via Google Drive, its results await peer review and independent benchmarking; the authors explicitly position it as a prototype requiring expert validation before any applied or pharmaceutical use.

Citation

MetaboliteChat: A Unified Multimodal Large Language Model for Interactive Metabolite Analysis and Functional Insights

Preprint

Guo, Z., et al. (2025) MetaboliteChat: A Unified Multimodal Large Language Model for Interactive Metabolite Analysis and Functional Insights. bioRxiv.

DOI: 10.1101/2025.11.07.687008

Recent citations

Papers that recently cited this model.

Integrated Transcriptomics-Metabolomics Framework for Predictive and Translational Biostimulant Design: Insights into Smart Systems for Sustainable Agriculture
M. D. Mashabela, Nompumelelo R. Sibanyoni, Kamogelo Mmotla, et al.
Current Plant Biology · Jun 2026
0
When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs
Shuai Xiao, Suhai Liu, Wei Zhou, et al.
May 2026
0
From Classification to Cross-Modal Understanding: Leveraging Vision-Language Models for Fine-Grained Renal Pathology
Zhenhao Guo, Rachit Saluja, Tianyuan Yao, et al.
arXiv.org · Nov 2025
0

Top citations

The most-cited papers that cite this model.

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs
Shuai Xiao, Suhai Liu, Wei Zhou, et al.
May 2026
0
From Classification to Cross-Modal Understanding: Leveraging Vision-Language Models for Fine-Grained Renal Pathology
Zhenhao Guo, Rachit Saluja, Tianyuan Yao, et al.
arXiv.org · Nov 2025
0
Integrated Transcriptomics-Metabolomics Framework for Predictive and Translational Biostimulant Design: Insights into Smart Systems for Sustainable Agriculture
M. D. Mashabela, Nompumelelo R. Sibanyoni, Kamogelo Mmotla, et al.
Current Plant Biology · Jun 2026
0

Citations

Total Citations3

Influential0

References40

GitHub

Stars4

Forks2

Open Issues0

Contributors1

Last Push6mo ago

LanguagePython

LicenseBSD-3-Clause

Fields of citing research

Computer Science67%
Agricultural and Food Sciences33%
Biology33%
Environmental Science33%
Linguistics33%
Medicine33%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

48Partial

Usability — can I run it?61

Reproducibility — can I retrain it?45

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper

Key Features

Conversational metabolite analysis: Provides an interactive, free-form question-answering interface over metabolites instead of fixed single-task outputs, supporting multi-turn reasoning about mechanisms and properties.

Dual structural encoders: Combines a graph neural network operating on the molecular graph with a convolutional neural network operating on a rendered molecular image, capturing both topological and visual structural cues.

Vicuna-13B language backbone: Uses a 13-billion-parameter instruction-tuned LLM as its reasoning and generation engine, connected to the structural encoders through a trainable projection layer.

Zero-shot inference on unseen metabolites: A single fixed pretrained checkpoint generalizes to metabolites not seen during training, with no per-molecule or per-task retraining required.

Broad metabolite coverage: Trained across 152,222 metabolites, giving the model wide exposure to the chemical space of human metabolism.

Technical Details

Applications

Impact

Citation

MetaboliteChat: A Unified Multimodal Large Language Model for Interactive Metabolite Analysis and Functional Insights

Preprint

Guo, Z., et al. (2025) MetaboliteChat: A Unified Multimodal Large Language Model for Interactive Metabolite Analysis and Functional Insights. bioRxiv.

DOI: 10.1101/2025.11.07.687008

MetaboliteChat

#Key Features

#Technical Details

#Applications

#Impact

Citation

MetaboliteChat: A Unified Multimodal Large Language Model for Interactive Metabolite Analysis and Functional Insights

Recent citations

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Top citations

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MetaboliteChat

#Key Features

#Technical Details

#Applications

#Impact

Citation

MetaboliteChat: A Unified Multimodal Large Language Model for Interactive Metabolite Analysis and Functional Insights

Recent citations

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Top citations

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact