Birkbeck, University of London
Autoregressive temporal-convolutional generative model for synthetic Saccharomyces cerevisiae promoter design, guided during training by a pretrained sequence-to-expression predictor.
Gen-DNA-TCN is a generative model for designing synthetic promoter sequences in the budding yeast Saccharomyces cerevisiae. Promoters are the regulatory DNA elements that control when and how strongly a gene is transcribed, and designing synthetic promoters with desired expression behavior is a core task in synthetic biology and metabolic engineering. The challenge is to generate sequences that are simultaneously novel, diverse, and functional, meaning they carry realistic arrangements of transcription factor binding sites (TFBS) and drive expression in the intended range.
Developed in the School of Computing and Mathematical Sciences at Birkbeck, University of London, and posted to bioRxiv in October 2024 (with an updated version in 2026), Gen-DNA-TCN is an autoregressive generative model built on temporal convolutional networks (TCNs). Its distinguishing idea is to exploit a pretrained sequence-to-expression predictor during the training of the generator, so that the generative model is shaped not only to produce realistic DNA but to produce DNA associated with appropriate expression. This connects generative sequence modeling to the rich body of work mapping yeast promoter sequence to expression level.
The model fits within the broader landscape of regulatory-DNA generative models, where autoregressive language-model-style approaches have been used to design realistic regulatory sequences. Gen-DNA-TCN specializes this idea to yeast promoters using a convolutional autoregressive backbone and expression-aware guidance.
Gen-DNA-TCN is an autoregressive sequence generator whose backbone is a temporal convolutional network, trained to model S. cerevisiae promoter sequences position by position. During training, a separately pretrained sequence-to-expression predictive model is incorporated to guide generation toward sequences with desired expression characteristics. The authors evaluate generated promoters on properties expected of functional regulatory DNA, reporting that synthetic sequences encode TFBS distributions comparable to real promoters while remaining novel and diverse relative to the training set. The study is focused specifically on yeast promoters; weights and code are not reported as released at preprint time, and the model is trained and evaluated within this single organism and regulatory context.
Gen-DNA-TCN serves synthetic biologists and metabolic engineers who need custom yeast promoters with tunable, realistic regulatory behavior, for example to balance expression of pathway enzymes or to build genetic circuits in S. cerevisiae. By generating diverse candidate promoters whose TFBS content and predicted expression resemble natural sequences, it can supply design libraries for downstream experimental screening, reducing reliance on hand-curated or randomly mutated promoter variants.
Gen-DNA-TCN demonstrates that coupling a temporal-convolutional autoregressive generator with a pretrained sequence-to-expression predictor can yield synthetic yeast promoters that respect natural regulatory grammar while remaining novel. Its contribution is methodological and domain-focused: the scope is limited to S. cerevisiae promoter design, and the absence of released weights or code constrains immediate adoption. Within yeast synthetic biology, it adds to the toolkit of expression-aware generative models for regulatory DNA and illustrates a guidance strategy that ties sequence generation to functional readouts.