GermRL is a reinforcement learning (RL) framework that fine-tunes pre-trained autoregressive antibody language models to overcome their tendency to generate sequences that stay close to inherited germline genes. Antibody repertoire models such as ProGen2-OAS learn the statistics of natural B-cell receptor sequences, which are dominated by low-mutation, near-germline variants. This germline bias limits the diversity of candidates a generative model proposes, even though the affinity-matured, heavily mutated antibodies prized in therapeutic discovery lie far from germline. GermRL directly targets this bias, steering generation toward sequences with a controllable number of mutations from germline while keeping them biologically plausible.

Developed by Laurent Ludwig, Michael Chungyoun, and Jeffrey J. Gray at Johns Hopkins University (Gray Lab) and posted to bioRxiv in June 2026, GermRL applies Group Relative Policy Optimization (GRPO) to a frozen ProGen2-OAS base model, producing a fixed RL-adapted checkpoint. The authors note that prior work on germline bias focused on masked antibody language models; GermRL is among the first to address the bias in generative autoregressive models, where sampling dynamics make the problem distinct.

Crucially, GermRL ships downloadable pretrained weights on Hugging Face, so users run inference directly against a released checkpoint rather than retraining the RL policy for each new dataset or mutation target.

Key Features

Germline-bias mitigation: GRPO fine-tuning rewards sequences that hit specified mutation thresholds from germline, dramatically expanding the diversity an autoregressive antibody model will generate.
Controllable mutation level: The framework conditions generation on a target distance from germline, enabling one-shot sampling of antibodies at low (5) or high (35) mutation counts as needed.
Reward-hacking safeguards: A pair of GRPO modifications (per-epoch weight synchronization and exclusive sampling from the updating policy) improves training efficiency and discourages the model from gaming the reward.
Preserved biological realism: RL-generated sequences retain identifiable germline V/J assignments, embedding-level similarity to natural antibodies, and comparable developability profiles.
Inference-ready weights: A released checkpoint (GermRL-LD5) lets practitioners generate candidates without running the RL loop themselves.

Technical Details

GermRL builds on the ~764M-parameter ProGen2-OAS autoregressive transformer (using the open ProGen2-OAS implementation by Hrbáň et al., derived from Nijkamp et al.) and fine-tunes it with a customized GRPO algorithm. Generation begins from the start token and is rewarded for satisfying a target mutation distance from germline while maintaining structural plausibility. The two key GRPO modifications—synchronizing policy weights once per epoch rather than per step, and sampling exclusively from the updating policy—stabilize training and curb reward hacking in the antibody setting. On the central benchmark, GermRL reaches 0.992 pass@1 at a low threshold of 5 mutations from germline and 0.950 pass@1 at a high threshold of 35 mutations, versus 0.398 and 0.034 respectively for the unmodified pre-trained model. The released GermRL-LD5 checkpoint (Safetensors, F32) is the low-distance variant. Code is MIT-licensed; the released weights carry a BSD-3-Clause license and are hosted under a personal Hugging Face account.

Applications

GermRL is aimed at antibody engineers and computational immunologists who use generative models to propose novel candidates. Because near-germline sequences are over-represented in natural repertoires, off-the-shelf antibody language models under-sample the highly mutated regions of sequence space where many desirable therapeutic properties emerge. GermRL lets researchers dial in a target mutation level and generate diverse yet plausible antibodies in a single shot, supporting library design, lead diversification, and exploration of alternative evolutionary mutational patterns during early-stage therapeutic discovery.

Impact

GermRL extends germline-bias research—previously confined to masked antibody models—into the generative autoregressive regime, demonstrating that reinforcement learning can reshape a pre-trained language model's sampling distribution without sacrificing the global properties (germline identifiability, embedding similarity, developability) that make antibodies usable. By packaging the approach as a lightweight, modular RL framework with downloadable weights, the Gray Lab makes germline-bias mitigation practical for other antibody models. As a June 2026 preprint, GermRL is early-stage: validation rests on computational metrics and pass@1 benchmarks rather than experimental affinity data, the released weights cover a single low-distance configuration, and documentation remains limited. Still, it offers a reusable recipe for navigating the antibody sequence landscape beyond germline.

Citation

GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning

Ludwig, L., et al. (2026) GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning. bioRxiv.

DOI: 10.64898/2026.06.08.730660

Key Features

Germline-bias mitigation: GRPO fine-tuning rewards sequences that hit specified mutation thresholds from germline, dramatically expanding the diversity an autoregressive antibody model will generate.

Controllable mutation level: The framework conditions generation on a target distance from germline, enabling one-shot sampling of antibodies at low (5) or high (35) mutation counts as needed.

Reward-hacking safeguards: A pair of GRPO modifications (per-epoch weight synchronization and exclusive sampling from the updating policy) improves training efficiency and discourages the model from gaming the reward.

Preserved biological realism: RL-generated sequences retain identifiable germline V/J assignments, embedding-level similarity to natural antibodies, and comparable developability profiles.

Inference-ready weights: A released checkpoint (GermRL-LD5) lets practitioners generate candidates without running the RL loop themselves.

Technical Details

Applications

Impact

Citation

GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning

Ludwig, L., et al. (2026) GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning. bioRxiv.

DOI: 10.64898/2026.06.08.730660

GermRL

Key Features

Technical Details

Applications

Impact

Citation

GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

GermRL

Key Features

Technical Details

Applications

Impact

Citation

GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

GermRL

#Key Features

#Technical Details

#Applications

#Impact

Citation

GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

GermRL

#Key Features

#Technical Details

#Applications

#Impact

Citation

GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact