bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & GeneRNAProtein

Central Dogma Transformer

Nobuyuki Ota

A multimodal architecture that couples pretrained DNA, RNA, and protein language models via directional cross-attention following the central dogma to form a unified Virtual Cell Embedding.

Released: January 2026

The Central Dogma Transformer (CDT) is a mechanism-oriented architecture that tries to model cellular information flow the way molecular biology describes it: DNA is transcribed into RNA, and RNA is translated into protein. Rather than training a single monolithic sequence model, CDT integrates three separate pretrained language models—one each for DNA, RNA, and protein—and connects them with directional cross-attention modules that mirror the central dogma. DNA-to- RNA attention is intended to capture transcriptional regulation, while RNA-to- protein attention captures translational relationships, and the combined signal is distilled into a unified representation the author calls a Virtual Cell Embedding.

CDT was developed and released as a single-author preprint by Nobuyuki Ota in January 2026. It is explicitly framed as a proof-of-concept ("CDT v1") and a step toward mechanism-oriented AI for cellular understanding, rather than a production foundation model. The design philosophy contrasts with purely data-driven multimodal models by hard-wiring the directionality of the central dogma into the attention structure, which the author argues yields more interpretable, biologically grounded representations.

The work sits at the intersection of genomic, transcriptomic, and proteomic language modeling, and is positioned as a bridge between single-modality foundation models (such as DNA, RNA, and protein language models) and the emerging goal of integrated "virtual cell" representations.

#Key Features

  • Directional cross-attention along the central dogma: DNA-to-RNA attention models transcriptional regulation and RNA-to-protein attention models translational relationships, encoding biological directionality into the architecture rather than learning it implicitly.
  • Frozen backbones, trained connectors: The pretrained DNA, RNA, and protein language models are kept frozen; only the cross-attention modules are trained, keeping the approach lightweight and modular.
  • Unified Virtual Cell Embedding: The three modalities are fused into a single representation intended to summarize a cell's molecular state.
  • Built-in interpretability: Attention and gradient analyses provide complementary mechanistic insight, including identification of a CTCF binding site corroborated by Hi-C data.
  • Honest proof-of-concept scope: CDT v1 uses fixed, non-cell-specific RNA and protein embeddings, a limitation the author states explicitly.

#Technical Details

CDT is a transformer-based multimodal model that wires together three frozen pretrained language models with trainable directional cross-attention layers. In the v1 proof of concept, the RNA and protein embeddings are fixed rather than cell-state-specific, so the learned coupling is concentrated in the cross- attention connectors. The model was validated on CRISPRi enhancer perturbation data from K562 cells, where it predicted perturbation effects with a Pearson correlation of 0.503—about 63% of an estimated theoretical ceiling of r = 0.797 set by cross-experiment variability. Interpretability analyses combined attention inspection with gradient attribution; the gradient analysis surfaced a CTCF binding site that was consistent with Hi-C chromatin contact evidence, supporting the claim that the architecture captures biologically meaningful regulatory signal.

#Applications

CDT is aimed at researchers interested in modeling regulatory information flow across DNA, RNA, and protein within a single framework, particularly for predicting the effects of genomic perturbations such as enhancer CRISPRi screens. Its Virtual Cell Embedding could serve as a feature representation for downstream functional genomics tasks, and its interpretability tooling makes it useful for hypothesis generation about transcriptional regulation, for example locating candidate regulatory elements like CTCF sites. As a v1 prototype it is best suited to methodological exploration rather than turnkey deployment.

#Impact

CDT contributes a biologically structured alternative to generic multimodal fusion by encoding the directionality of the central dogma directly into model attention. Its early validation on K562 enhancer perturbation data and its interpretability results are promising signals for mechanism-oriented modeling of the cell. However, the work is a single-author preprint with a clearly stated proof-of-concept scope, fixed non-cell-specific embeddings in v1, and no public code, weights, or license located at the time of writing—so its broader adoption and influence remain to be demonstrated.

Openness

bio.rodeo opennessClosed · low usability and reproducibility
22Closed
Usability — can I run it?15
Reproducibility — can I retrain it?14
Model Openness Framework
Unclassified
Missing required components

Tags

gene_expressionvariant_effect_predictionrepresentation_learningtransformermultimodalrepresentation_learninggenomicscell_biology

Resources

Research Paper