A promptable DNA language model that generates multi-kilobase plasmid sequences from human-readable component specifications, post-trained with verifiable rewards.
PlasmidLM is a promptable DNA language model for plasmid design, developed by McClain Thiel and Chris P. Barnes at University College London (UCL) and released as a bioRxiv preprint in May 2026. Plasmids are the workhorse vectors of molecular biology and synthetic biology, encoding origins of replication, selectable markers, promoters, and payloads across multiple kilobases. Designing a functional plasmid traditionally requires manual assembly of validated parts and careful checking of sequence-level constraints, a process PlasmidLM aims to automate by generating complete constructs directly from a natural-language description of the desired components.
The model's central innovation is the application of verifiable-reward post-training to DNA generation. Rather than relying purely on likelihood maximization over a sequence corpus, PlasmidLM refines a pretrained autoregressive base model using Group Relative Policy Optimization (GRPO), where the reward is computed from a curated registry of sequence motifs. This connects recent advances in reinforcement-learning-based alignment of language models to the concrete, checkable requirements of a useful plasmid, such as the presence of a specified resistance gene or reporter.
PlasmidLM builds on PlasmidGPT, a related autoregressive base model pretrained on roughly 153,000 engineered plasmids from Addgene. PlasmidLM inherits that base and adds a reward-driven post-training stage, positioning it alongside other generative genomic models while targeting the specific, practically constrained problem of vector construction.
PlasmidLM is a 19.3M-parameter autoregressive transformer that operates over DNA sequence. The base model, PlasmidGPT, is pretrained on roughly 153,000 engineered plasmids sourced from Addgene, learning the statistical structure of real vector backbones and payloads. Post-training applies Group Relative Policy Optimization (GRPO), a verifiable-reward reinforcement-learning method, in which candidate sequences are scored against a registry of 660 sequence motifs that encode the requested functional components; sequences satisfying more of the specified constraints receive higher reward. The model is distributed as a fixed checkpoint. On a 1,000-prompt held-out benchmark, PlasmidLM produces a useful plasmid 48.5% of the time in a single shot, rising to 89.7% when the best of four sampled sequences is selected, demonstrating that modest oversampling substantially improves the rate of constraint-satisfying constructs.
PlasmidLM is aimed at molecular biologists and synthetic biology engineers who need to assemble plasmid vectors from a high-level description rather than manually curating parts. By translating specifications such as copy number, host organism, resistance marker, and reporter into candidate full-length sequences, it can accelerate early-stage construct design, support rapid iteration over vector variants, and lower the expertise barrier for routine cloning workflows. The best-of-N sampling strategy makes it practical to generate several candidates and select one that meets the requested constraints before downstream synthesis and validation.
PlasmidLM demonstrates that verifiable-reward post-training—an approach popularized for aligning general-purpose language models—can be transferred to genomic sequence generation, where success is defined by checkable biological constraints rather than human preference. By coupling a domain-specific pretrained base (PlasmidGPT) with a motif-based reward, it offers a template for steering generative DNA models toward functional, specification-compliant outputs. As a compact, openly described model with released code and weights, it provides a reproducible starting point for further work on controllable plasmid and vector design. The licensing terms for the released weights were not confirmed at the time of writing.
Thiel, M. & Barnes, C. P. (2026) PlasmidLM: A Promptable DNA Language Model via Verifiable-Reward Post-Training. bioRxiv.
DOI: 10.64898/2026.05.19.725242