Generative multimodal foundation model that jointly models DNA, RNA, protein, and cellular context across six biological modalities, with SOTA splicing prediction.
MIMIC is a generative multimodal foundation model that jointly represents the molecules of the central dogma—DNA, RNA, and protein—together with the regulatory, evolutionary, structural, and contextual signals that constrain them. Most biological foundation models are trained on a single modality (a protein language model, a genomic sequence model, an RNA structure predictor) and therefore cannot reason about how a coding change propagates into altered splicing, structure, or function. MIMIC instead conditions on arbitrary subsets of observed modalities and reconstructs or generates the missing components of a molecular state, allowing any-to-any inference across the genome, transcriptome, and proteome.
The model was developed by Polymathic AI, a research collaboration based at the Flatiron Institute, with computation supported by the Simons Foundation and Schmidt Sciences' AI2050 program. It was released as an arXiv preprint in April 2026 by Siavash Golkar, Shirley Ho, and roughly 30 collaborators. MIMIC is positioned as a unifying alternative to the fragmented landscape of modality-specific biological models, demonstrating that cross-modal supervision improves performance over sequence-only training.
A central claim of the work is that coupled constraints across sequence, structure, regulation, evolution, and cellular context are best learned jointly. By aligning these modalities during pretraining, MIMIC produces representations that transfer to RNA and protein downstream tasks and that enable constrained generative design rather than prediction alone.
MIMIC is a roughly one-billion-parameter split-track encoder-decoder transformer. Inputs are organized into distinct track groups by biological coordinate system rather than concatenated, with localized RoPE position indices that reset at track boundaries and learnable register tokens that aggregate context across tracks. Training uses around 25 distinct pathways to ensure rare modality combinations are represented, along with a staged curriculum that scales context windows from 1,000 to 10,000 tokens. The model is trained on LORE, a newly curated cross-modal dataset linking nucleic-acid, protein, evolutionary, structural, regulatory, and semantic modalities—comprising 13 million RNA transcripts, 15.5 million proteins, over 4 billion natural-language tokens, and more than 6,000 organisms.
MIMIC targets researchers studying gene regulation, RNA biology, and protein function who need a single model spanning the central dogma. Demonstrated use cases include identifying RNA editing in clinically relevant mutations using evolutionary and structural signals, designing proteins with multimodal conditioning for target binding, predicting and inversely designing splice patterns, and modeling experimental-context-dependent RNA reactivity. Because it conditions on partial observations, it fits naturally into workflows where some modalities are measured and others must be inferred or designed.
MIMIC offers early evidence that jointly modeling the central dogma yields better downstream performance and richer generative capabilities than modality-specific models, particularly for splicing where it surpasses established baselines such as SpliceAI and AlphaGenome. As a unifying any-to-any framework it suggests a path toward integrated biological foundation models. A key limitation at release is availability: the authors state that code, weights, and LORE assets are in preparation for public release on the Polymathic AI GitHub but are not yet downloadable, and results are reported in a non-peer-reviewed preprint, so independent reproduction remains pending.