bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein

Proteo-R1

Stanford University / University of Tokyo / RIKEN Center for Advanced Intelligence Project / Chinese University of Hong Kong

A reasoning-guided foundation model for de novo antibody CDR design, pairing a multimodal-LLM understanding expert with a Boltz-1-based diffusion generation expert.

Released: May 2026

Proteo-R1 is a reasoning-guided protein design foundation model for de novo antibody complementarity-determining region (CDR) design. Most generative protein models map a target directly to a designed sequence or structure, leaving the underlying "why these residues" reasoning implicit. Proteo-R1 instead separates molecular understanding from geometric generation: a multimodal large language model first reasons over a sequence and structure to identify the functionally critical residues, then hands those decisions as hard constraints to a diffusion model that builds the corresponding three-dimensional structure.

The system is built from two cooperating experts. The understanding expert couples a Qwen3-4B language model with a Protenix structural encoder, giving the LLM access to residue-level geometric context so it can reason about which positions drive binding. The generation expert is a Boltz-1-based conditional diffusion model that performs framework inpainting and diffusion sampling to design CDR loops under the constraints emitted by the understanding expert.

Proteo-R1 was introduced in 2026 by a collaboration led by researchers at Stanford University (including Jure Leskovec and Yejin Choi), with contributors from the University of Tokyo and RIKEN AIP (Naoto Yokoya, Masashi Sugiyama) and the Chinese University of Hong Kong (Pheng-Ann Heng). It was accepted to ICML 2026.

#Key Features

  • Reasoning-then-design pipeline: A multimodal LLM identifies functionally critical residues and passes them as explicit constraints to the generator, separating molecular understanding from geometric generation rather than predicting structure end-to-end.
  • Dual-expert architecture: An understanding expert (Qwen3-4B paired with a Protenix encoder) handles residue-level reasoning, while a generation expert built on Boltz-1 conditional diffusion handles structure synthesis.
  • Antibody CDR specialization: The model targets de novo design of antibody CDR loops, including the difficult CDR-H3, the most variable and binding-relevant loop.
  • Inference-only release: The published checkpoints support an inference CLI (proteor1-prepare-cdr, proteor1-design); the framework ships with fixed weights and is not intended for user-side training.

#Technical Details

Proteo-R1 is trained through a three-stage curriculum on protein structures from the Protein Data Bank (PDB) together with antibody-antigen complexes from SAbDab, producing fixed weights for inference. The understanding expert is a roughly 4-billion-parameter Qwen3 model augmented with a Protenix structural encoder; the generation expert is a Boltz-1-based conditional diffusion model. On the RAbD CDR-H3 design benchmark, Proteo-R1 reaches a DockQ of 0.801, substantially above the reported baseline of 0.473, indicating markedly more accurate reconstruction of bound antibody-antigen geometry. The reference implementation is released on GitHub under Apache 2.0, and the two checkpoints (thinking-bio-lab/proteor1-understand and thinking-bio-lab/proteor1-generate) download automatically from HuggingFace on first inference.

#Applications

Proteo-R1 is aimed at computational antibody engineering and de novo binder design. Given an antigen and antibody framework, it proposes CDR sequences and structures predicted to bind, which is useful for therapeutic antibody discovery, affinity optimization, and prospective design campaigns that are then validated experimentally. Because the understanding expert exposes which residues it deems functionally important, the workflow can also help researchers interpret and prioritize candidate designs rather than treating generation as a black box.

#Impact

By coupling a reasoning language model to a structure-generating diffusion model, Proteo-R1 illustrates a broader trend of bringing explicit, residue-level reasoning into protein design instead of relying solely on end-to-end generation. Its large reported gain on RAbD CDR-H3 (DockQ 0.801 vs. 0.473) suggests that constraint extraction by an understanding expert can meaningfully improve downstream geometric generation for antibodies. As an ICML 2026 contribution with open code and downloadable checkpoints, it offers a concrete template for reasoning-guided design that other groups can build on. Note that the HuggingFace checkpoints currently ship without a model card or a stated weights license (distinct from the Apache 2.0 code), so users should verify licensing terms before deployment.

Citation

Preprint

DOI: 10.48550/arXiv.2605.02937

DOI: 10.48550/arXiv.2605.02937

Openness

Unclassified
Restrictive license on core components

Tags

antibodyde_novo_designdiffusionfoundation_modelmultimodalprotein_designstructure_predictiontransformer

Resources

GitHub RepositoryResearch PaperHuggingFace ModelHuggingFace Model