Stanford University / University of Tokyo / RIKEN Center for Advanced Intelligence Project / Chinese University of Hong Kong
A reasoning-guided foundation model for de novo antibody CDR design, pairing a multimodal-LLM understanding expert with a Boltz-1-based diffusion generation expert.
Proteo-R1 is a reasoning-guided protein design foundation model for de novo antibody complementarity-determining region (CDR) design. Most generative protein models map a target directly to a designed sequence or structure, leaving the underlying "why these residues" reasoning implicit. Proteo-R1 instead separates molecular understanding from geometric generation: a multimodal large language model first reasons over a sequence and structure to identify the functionally critical residues, then hands those decisions as hard constraints to a diffusion model that builds the corresponding three-dimensional structure.
The system is built from two cooperating experts. The understanding expert couples a Qwen3-4B language model with a Protenix structural encoder, giving the LLM access to residue-level geometric context so it can reason about which positions drive binding. The generation expert is a Boltz-1-based conditional diffusion model that performs framework inpainting and diffusion sampling to design CDR loops under the constraints emitted by the understanding expert.
Proteo-R1 was introduced in 2026 by a collaboration led by researchers at Stanford University (including Jure Leskovec and Yejin Choi), with contributors from the University of Tokyo and RIKEN AIP (Naoto Yokoya, Masashi Sugiyama) and the Chinese University of Hong Kong (Pheng-Ann Heng). It was accepted to ICML 2026.
proteor1-prepare-cdr, proteor1-design); the framework ships with fixed
weights and is not intended for user-side training.Proteo-R1 is trained through a three-stage curriculum on protein structures from
the Protein Data Bank (PDB) together with antibody-antigen complexes from SAbDab,
producing fixed weights for inference. The understanding expert is a roughly
4-billion-parameter Qwen3 model augmented with a Protenix structural encoder; the
generation expert is a Boltz-1-based conditional diffusion model. On the RAbD
CDR-H3 design benchmark, Proteo-R1 reaches a DockQ of 0.801, substantially above
the reported baseline of 0.473, indicating markedly more accurate reconstruction
of bound antibody-antigen geometry. The reference implementation is released on
GitHub under Apache 2.0, and the two checkpoints
(thinking-bio-lab/proteor1-understand and thinking-bio-lab/proteor1-generate)
download automatically from HuggingFace on first inference.
Proteo-R1 is aimed at computational antibody engineering and de novo binder design. Given an antigen and antibody framework, it proposes CDR sequences and structures predicted to bind, which is useful for therapeutic antibody discovery, affinity optimization, and prospective design campaigns that are then validated experimentally. Because the understanding expert exposes which residues it deems functionally important, the workflow can also help researchers interpret and prioritize candidate designs rather than treating generation as a black box.
By coupling a reasoning language model to a structure-generating diffusion model, Proteo-R1 illustrates a broader trend of bringing explicit, residue-level reasoning into protein design instead of relying solely on end-to-end generation. Its large reported gain on RAbD CDR-H3 (DockQ 0.801 vs. 0.473) suggests that constraint extraction by an understanding expert can meaningfully improve downstream geometric generation for antibodies. As an ICML 2026 contribution with open code and downloadable checkpoints, it offers a concrete template for reasoning-guided design that other groups can build on. Note that the HuggingFace checkpoints currently ship without a model card or a stated weights license (distinct from the Apache 2.0 code), so users should verify licensing terms before deployment.