bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Small molecule foundation models
Small molecule

MMPT-RAG

Emory University

A retrieval-augmented foundation model for matched molecular pair transformations that proposes controllable, medicinal-chemistry-style analog edits.

Released: February 2026

MMPT-RAG is a retrieval-augmented foundation model for matched molecular pair (MMP) transformations—the local, single-edit modifications that medicinal chemists routinely use to design analogs of a lead compound. Existing machine-learning approaches to analog generation tend to fall into two camps: whole-molecule generative models with limited control over where and how a molecule is edited, or MMP-style models trained in restricted settings with small models. MMPT-RAG reframes the task as variable-to-variable generation, learning the substructure edits themselves from large-scale transformation data so that the model proposes chemically meaningful, localized changes.

The model was introduced in February 2026 by Bo Pan, Liang Zhao, and colleagues at Emory University as an arXiv preprint. Its defining feature is retrieval augmentation: rather than relying solely on parametric knowledge, MMPT-RAG retrieves external reference analogs and uses them as contextual guidance when generating a transformation. This lets the model condition each edit on relevant precedent, helping it recapitulate the kind of intuition an experienced medicinal chemist brings to analog design.

The authors report gains in diversity, novelty, and controllability across chemical and patent datasets. As a recent preprint, MMPT-RAG does not yet have released weights or code, and its reported results should be treated as preprint-stage.

#Key Features

  • MMP transformation modeling: Learns matched-molecular-pair edits directly, framing analog design as variable-to-variable generation rather than whole-molecule sampling.
  • Retrieval-augmented generation: Retrieves external reference analogs as contextual guidance, conditioning each proposed edit on relevant precedent.
  • Edit controllability: Targets local, controllable chemical edits, addressing a key limitation of whole-molecule generators.
  • Foundation-model scale: Trained on large-scale transformation data to capture broad medicinal-chemistry patterns.
  • Diversity and novelty: Reports improvements in the diversity and novelty of generated analogs across chemical and patent benchmarks.

#Technical Details

MMPT-RAG combines a foundation model trained on large-scale matched-molecular-pair transformations with a retrieval-augmented generation (RAG) layer. The generative core treats analog design as a variable-to-variable problem—mapping the variable region of a molecule to a transformed variable region—so edits remain local and controllable. At generation time, the retrieval component surfaces external reference analogs that serve as in-context guidance, steering the model toward precedented, chemically sensible edits. The authors evaluate on chemical and patent datasets and report improvements in diversity, novelty, and controllability relative to prior approaches. The preprint (CC BY 4.0) does not disclose a specific parameter count, and no public weights or code accompany it at the time of writing.

#Applications

MMPT-RAG is aimed at lead optimization and analog design in drug discovery, where medicinal chemists iteratively make small structural edits to improve potency, selectivity, or ADMET properties. By proposing controllable, precedent-grounded MMP transformations, the model could assist computational chemists in enumerating high-quality analog ideas, prioritizing edits, and exploring chemical space around a hit or lead in a way that mirrors expert intuition.

#Impact

MMPT-RAG sits at the intersection of two active trends: foundation models for molecular generation and retrieval augmentation for grounding generative systems in external knowledge. By bringing RAG to matched-molecular-pair editing, it offers a path toward more controllable, interpretable analog design than whole-molecule generators. As a February 2026 preprint without released weights, its downstream influence and the robustness of its diversity/novelty/controllability gains await independent validation.

Tags

drug_discoverymolecule_generationtransformerfoundation_modelgenerativemedicinal_chemistry