bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

GermRL

Johns Hopkins University

A reinforcement learning framework that fine-tunes the ProGen2-OAS antibody language model with GRPO to reduce germline bias and generate diverse, plausible antibody sequences.

Released: June 2026
Parameters: 764 Million

GermRL is a reinforcement learning (RL) framework that fine-tunes pre-trained autoregressive antibody language models to overcome their tendency to generate sequences that stay close to inherited germline genes. Antibody repertoire models such as ProGen2-OAS learn the statistics of natural B-cell receptor sequences, which are dominated by low-mutation, near-germline variants. This germline bias limits the diversity of candidates a generative model proposes, even though the affinity-matured, heavily mutated antibodies prized in therapeutic discovery lie far from germline. GermRL directly targets this bias, steering generation toward sequences with a controllable number of mutations from germline while keeping them biologically plausible.

Developed by Laurent Ludwig, Michael Chungyoun, and Jeffrey J. Gray at Johns Hopkins University (Gray Lab) and posted to bioRxiv in June 2026, GermRL applies Group Relative Policy Optimization (GRPO) to a frozen ProGen2-OAS base model, producing a fixed RL-adapted checkpoint. The authors note that prior work on germline bias focused on masked antibody language models; GermRL is among the first to address the bias in generative autoregressive models, where sampling dynamics make the problem distinct.

Crucially, GermRL ships downloadable pretrained weights on Hugging Face, so users run inference directly against a released checkpoint rather than retraining the RL policy for each new dataset or mutation target.

#Key Features

  • Germline-bias mitigation: GRPO fine-tuning rewards sequences that hit specified mutation thresholds from germline, dramatically expanding the diversity an autoregressive antibody model will generate.
  • Controllable mutation level: The framework conditions generation on a target distance from germline, enabling one-shot sampling of antibodies at low (5) or high (35) mutation counts as needed.
  • Reward-hacking safeguards: A pair of GRPO modifications (per-epoch weight synchronization and exclusive sampling from the updating policy) improves training efficiency and discourages the model from gaming the reward.
  • Preserved biological realism: RL-generated sequences retain identifiable germline V/J assignments, embedding-level similarity to natural antibodies, and comparable developability profiles.
  • Inference-ready weights: A released checkpoint (GermRL-LD5) lets practitioners generate candidates without running the RL loop themselves.

#Technical Details

GermRL builds on the ~764M-parameter ProGen2-OAS autoregressive transformer (using the open ProGen2-OAS implementation by Hrbáň et al., derived from Nijkamp et al.) and fine-tunes it with a customized GRPO algorithm. Generation begins from the start token and is rewarded for satisfying a target mutation distance from germline while maintaining structural plausibility. The two key GRPO modifications—synchronizing policy weights once per epoch rather than per step, and sampling exclusively from the updating policy—stabilize training and curb reward hacking in the antibody setting. On the central benchmark, GermRL reaches 0.992 pass@1 at a low threshold of 5 mutations from germline and 0.950 pass@1 at a high threshold of 35 mutations, versus 0.398 and 0.034 respectively for the unmodified pre-trained model. The released GermRL-LD5 checkpoint (Safetensors, F32) is the low-distance variant. Code is MIT-licensed; the released weights carry a BSD-3-Clause license and are hosted under a personal Hugging Face account.

#Applications

GermRL is aimed at antibody engineers and computational immunologists who use generative models to propose novel candidates. Because near-germline sequences are over-represented in natural repertoires, off-the-shelf antibody language models under-sample the highly mutated regions of sequence space where many desirable therapeutic properties emerge. GermRL lets researchers dial in a target mutation level and generate diverse yet plausible antibodies in a single shot, supporting library design, lead diversification, and exploration of alternative evolutionary mutational patterns during early-stage therapeutic discovery.

#Impact

GermRL extends germline-bias research—previously confined to masked antibody models—into the generative autoregressive regime, demonstrating that reinforcement learning can reshape a pre-trained language model's sampling distribution without sacrificing the global properties (germline identifiability, embedding similarity, developability) that make antibodies usable. By packaging the approach as a lightweight, modular RL framework with downloadable weights, the Gray Lab makes germline-bias mitigation practical for other antibody models. As a June 2026 preprint, GermRL is early-stage: validation rests on computational metrics and pass@1 benchmarks rather than experimental affinity data, the released weights cover a single low-distance configuration, and documentation remains limited. Still, it offers a reusable recipe for navigating the antibody sequence landscape beyond germline.

Citation

GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning

Ludwig, L., et al. (2026) GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning. bioRxiv.

DOI: 10.64898/2026.06.08.730660

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References61

GitHub

Stars1
Forks0
Open Issues0
Contributors1
Last Push16d ago
LanguagePython
LicenseMIT

HuggingFace

Downloads86
Likes0
Last Modified16d ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
65Partial
Usability — can I run it?94
Reproducibility — can I retrain it?40
Model Openness Framework
Unclassified
Missing required components

Tags

protein_designantibody_designde_novo_designtransformerreinforcement_learninglanguage_modelgenerativeantibodyimmunology

Resources

GitHub RepositoryResearch PaperHuggingFace Model