bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & Gene

PatchDNA

Relation Therapeutics

A DNA language model that replaces fixed tokenization with conservation-guided patching, letting models up to 10x smaller match or beat state-of-the-art genomic benchmarks.

Released: March 2026
Parameters: 19.2 Million

DNA language models inherit a design choice from natural-language processing: tokenization. Whether using single nucleotides, fixed k-mers, or byte-pair encoding, these schemes are decided before training and frozen into the model, often splitting the genome in ways that ignore biology and forcing larger models and longer context windows to compensate. PatchDNA argues that this fixed tokenization is a bottleneck and replaces it with a flexible, biologically informed alternative called patching.

Developed at Relation Therapeutics and presented as an ICLR 2026 paper (with a bioRxiv preprint), PatchDNA segments DNA into contiguous, variable-length patches rather than a fixed vocabulary of tokens. During pretraining, patch boundaries are guided by evolutionary conservation scores, concentrating the model's capacity on functionally important, conserved regions while compressing less informative stretches. Crucially, because patching is a preprocessing strategy rather than a learned vocabulary, the boundaries can be changed at inference time without retraining the model—a flexibility that fixed tokenizers cannot offer.

The result is a striking efficiency story: PatchDNA reports that models up to an order of magnitude smaller than current systems match or surpass state-of-the-art performance on established DNA benchmarks, building on the byte-latent-transformer line of work while grounding patch boundaries in genomic conservation.

#Key Features

  • Conservation-guided patching: Patch boundaries are placed using evolutionary conservation scores, focusing model capacity on functionally important regions instead of arbitrary fixed tokens.
  • Tokenization-free flexibility: Patching replaces a frozen vocabulary, so the segmentation scheme can be altered at inference time without any retraining.
  • Extreme parameter efficiency: Models up to 10x smaller than prior approaches reach or exceed state-of-the-art accuracy on DNA benchmarks.
  • Long-range context: A 7.7M-parameter variant operates over a 131 kbp context window, enabling whole-locus modeling at very low parameter cost.

#Technical Details

PatchDNA pretrains transformer models that consume variable-length DNA patches whose boundaries are derived from evolutionary conservation scores. The work releases two main configurations: a 19.2M-parameter model with a 16 kbp context window and a 7.7M-parameter model with a 131 kbp context window. The smaller, long-context variant is reported to outperform baseline long-sequence models on 6 of 7 tasks in the Genomics Long Range Benchmark, and across standard DNA benchmarks PatchDNA matches or surpasses larger state-of-the-art models. Because patches are computed rather than learned as a fixed vocabulary, the patching strategy is a post-hoc, adjustable component, allowing the same trained model to be re-segmented for different downstream needs.

#Applications

PatchDNA is aimed at genomics researchers and computational-biology teams who need efficient DNA foundation models for tasks such as regulatory-element annotation, variant effect prediction, and long-range genomic context modeling. Its parameter efficiency and long context make it attractive where compute or memory is constrained, or where modeling large genomic loci end-to-end matters. The ability to change patching at inference time is particularly useful for adapting a single pretrained model to new tasks or resolutions without the cost of retraining.

#Impact

PatchDNA challenges the assumption that DNA language models must scale up parameters and vocabularies to improve, showing instead that biologically informed, flexible input representations can deliver state-of-the-art results at a fraction of the size. Its inference-time adjustability reframes tokenization from a fixed architectural commitment into a tunable knob, which could influence how future genomic foundation models are designed. As a recent preprint and conference paper, the reported gains await broader independent replication, and the public availability of code and weights was not confirmed at the time of writing.

Tags

variant_effect_predictionrepresentation_learningtransformerfoundation_modelself_supervisedgenomicsdna