LinkLlama

Molecular linker design model fine-tuned from Llama 3 that emits PROTAC and fragment linkers as SMILES from natural-language geometry prompts.

Released: April 2026

Molecular linker design is a recurring bottleneck in modern drug discovery. Whether joining two binding fragments in fragment-based drug design or connecting the warhead and E3-ligase binder of a PROTAC, the chemist must place a chemically sensible bridge that satisfies geometric constraints (the distance and orientation between anchor atoms) while remaining synthetically and pharmacologically reasonable. Most generative approaches to this problem are bespoke 3D-aware models trained from scratch, often paired with reinforcement-learning loops that are expensive to tune and hard to steer.

LinkLlama, introduced by Sun and colleagues in the Head-Gordon group at UC Berkeley in a 2026 bioRxiv preprint, reframes linker design as a natural-language task. The model is a fine-tuned Meta Llama 3 large language model that accepts text prompts specifying geometric targets (e.g., anchor-atom distances and angles) together with physicochemical objectives such as Lipinski's rules and rotatable-bond limits, and emits candidate linkers as SMILES strings. By relying on the chemical grammar a language model absorbs during supervised fine-tuning, LinkLlama prioritizes chemically valid output without the complex reinforcement-learning machinery used by many earlier generators.

Its central result is a roughly two-fold improvement in the proportion of chemically reasonable designs over a baseline, raising the success rate from about 35% to over 80% while remaining competitive on geometric fidelity against strictly 3D-aware models.

Key Features

Natural-language constraint prompting: Users specify geometric goals (distances, angles) and physicochemical targets (Lipinski's rules, rotatable-bond limits) directly in text, steering generation without retraining the model.
Chemically reasonable output: Over 80% of generated linkers pass strict structural filters, versus roughly 35% for the baseline, judged against criteria including PAINS, non-drug-like substructures, and overly complex ring systems.
No reinforcement-learning loop required: Chemical validity is captured through supervised fine-tuning on drug-like molecules, avoiding the costly RL tuning common to competing linker generators.
Competitive geometric fidelity: Benchmarks show geometry on par with dedicated 3D-aware models despite operating on text-based SMILES representations.
Versatile across design tasks: Demonstrated on both small-molecule scaffold hopping and PROTAC linker design from a single fine-tuned model.

Technical Details

LinkLlama is built by supervised fine-tuning of a Meta Llama 3 model on a curated corpus of drug-like molecules drawn from ChEMBL, teaching the model to reproduce chemically valid SMILES while conditioning on the geometric and physicochemical descriptors encoded in its prompts. The authors benchmark the model on the ZINC and HiQBind datasets, measuring both geometric agreement with reference structures and the fraction of outputs that survive a comprehensive battery of chemical-reasonableness filters (PAINS alerts, non-drug-like patterns, and complex ring systems). On these benchmarks LinkLlama matches the geometric performance of strictly 3D-aware baselines while roughly doubling the share of chemically reasonable designs, from approximately 35% to more than 80%. Prospective case studies were validated with molecular docking and molecular dynamics simulations against known crystal poses.

Applications

LinkLlama targets medicinal chemists and computational drug-discovery teams who need to generate candidate linkers under explicit geometric and drug-likeness constraints. Two prospective use cases are highlighted: novel small-molecule scaffold hopping, where the core of a known binder is replaced while preserving key interactions, and PROTAC linker design, where the geometry between a target-protein ligand and an E3-ligase recruiter is critical to forming a productive ternary complex. Because constraints are expressed in natural language, the same model can be re-steered to new objectives without additional training, lowering the barrier for non-experts to explore linker chemistry.

Impact

LinkLlama is an early demonstration that a general-purpose large language model, fine-tuned on chemical data, can rival purpose-built 3D generative models on a structurally demanding design task while producing markedly more synthetically and pharmacologically plausible molecules. Its prompt-driven interface points toward more accessible, steerable design tools for fragment linking and targeted-protein-degradation programs. As a 2026 preprint, its results await peer review. Code is available on GitHub (THGLab/LinkLlama) and the fine-tuned weights are released on Hugging Face, though under the upstream Llama 3.2 community license and a non-commercial code license rather than fully open terms.

Citation

LinkLlama: Enabling Large Language Model for Chemically Reasonable Linker Design

Sun, K., et al. (2026) LinkLlama: Enabling Large Language Model for Chemically Reasonable Linker Design. bioRxiv.

DOI: 10.64898/2026.04.15.718690

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations2

Influential1

References44

GitHub

Stars10

Forks1

Open Issues0

Contributors1

Last Push3mo ago

LanguagePython

HuggingFace

Downloads13

Likes0

Last Modified3mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

27Closed

Usability — can I run it?21

Reproducibility — can I retrain it?24

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Official Website HuggingFace Model

Key Features

Natural-language constraint prompting: Users specify geometric goals (distances, angles) and physicochemical targets (Lipinski's rules, rotatable-bond limits) directly in text, steering generation without retraining the model.

Chemically reasonable output: Over 80% of generated linkers pass strict structural filters, versus roughly 35% for the baseline, judged against criteria including PAINS, non-drug-like substructures, and overly complex ring systems.

No reinforcement-learning loop required: Chemical validity is captured through supervised fine-tuning on drug-like molecules, avoiding the costly RL tuning common to competing linker generators.

Competitive geometric fidelity: Benchmarks show geometry on par with dedicated 3D-aware models despite operating on text-based SMILES representations.

Versatile across design tasks: Demonstrated on both small-molecule scaffold hopping and PROTAC linker design from a single fine-tuned model.

Technical Details

Applications

Impact

LinkLlama

Key Features

Technical Details

Applications

Impact

Citation

LinkLlama: Enabling Large Language Model for Chemically Reasonable Linker Design

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

LinkLlama

Key Features

Technical Details

Applications

Impact

Citation

LinkLlama: Enabling Large Language Model for Chemically Reasonable Linker Design

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

LinkLlama

#Key Features

#Technical Details

#Applications

#Impact

Citation

LinkLlama: Enabling Large Language Model for Chemically Reasonable Linker Design

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

LinkLlama

#Key Features

#Technical Details

#Applications

#Impact

Citation

LinkLlama: Enabling Large Language Model for Chemically Reasonable Linker Design

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact