A 52.3M-parameter bimodal masked language model that jointly learns representations of bulk RNA-seq expression and DNA methylation for cancer genomics.
MOJO (MultiOmics JOint representation learning) is a bimodal foundation model developed by InstaDeep that jointly learns representations of two complementary omics layers: bulk RNA-seq gene expression and DNA methylation. Cancer is driven by both transcriptional dysregulation and aberrant epigenetic states, yet most omics foundation models operate on a single modality. MOJO addresses this gap by training a shared encoder that captures the coordinated signal across expression and methylation, producing patient-level embeddings useful for downstream oncology tasks.
Released in June 2025 as a bioRxiv preprint and presented at the ICML 2025 Workshop on Generative AI and Biology, MOJO builds directly on InstaDeep's earlier BulkRNABert, a unimodal bulk RNA-seq encoder. Where BulkRNABert demonstrated that masked language modeling over binned expression profiles yields transferable embeddings, MOJO extends that recipe to a second modality and tackles the practical reality that paired multi-omics data is often incomplete in clinical settings.
The model sits in the growing landscape of transcriptomics and epigenomics foundation models, but is distinguished by its explicit bimodal training objective and its engineering for robustness when one modality is missing at inference time.
MOJO is a 52.3M-parameter transformer encoder trained with a BERT-style masked language modeling objective applied jointly to bulk RNA-seq and DNA methylation inputs. Expression and methylation values are discretized into bins and tokenized, and the model reconstructs masked tokens across both modalities simultaneously, forcing the shared encoder to integrate cross-modal structure. Pretraining uses paired multi-omics samples from The Cancer Genome Atlas (TCGA); the BulkRNABert predecessor (approximately 6M parameters) was additionally pretrained on GTEx and ENCODE expression data. At fine-tuning, a mutual-information minimization regularizer is added to improve resilience to missing modalities. Evaluation centers on TCGA cancer-type classification and survival analysis, where the bimodal representation is compared against unimodal baselines including BulkRNABert.
MOJO is aimed at computational oncology and translational research groups working with TCGA-style multi-omics cohorts. Its patient-level embeddings support cancer-type classification, survival and risk modeling, and exploratory analyses such as patient stratification or subtype discovery. Because the model degrades gracefully when methylation or expression data is absent, it is well suited to real-world clinical datasets where complete paired profiling is the exception rather than the rule. The Transformers-compatible interface lets researchers integrate MOJO embeddings into existing scikit-learn or PyTorch pipelines as features for downstream predictors.
MOJO contributes to the emerging class of multi-omics foundation models by showing that joint masked modeling across expression and methylation produces representations that outperform single-modality encoders on cancer tasks, while remaining robust to incomplete inputs. As an open release with accessible weights and code, alongside its widely cited BulkRNABert predecessor, it lowers the barrier for groups seeking pretrained embeddings for oncology genomics. As a 2025 preprint, its benchmark comparisons and downstream adoption are still maturing, and broader validation across cohorts beyond TCGA remains an open direction. The model's license has not been formally confirmed; users should verify terms before deployment.
Gélard, M., et al. (2026) Bimodal masked language modeling for bulk RNA-seq and DNA methylation representation learning. bioRxiv.
DOI: 10.1101/2025.06.25.661237Gélard, M., et al. (2024) BulkRNABert: Cancer prognosis from bulk RNA-seq based language models. bioRxiv.
DOI: 10.1101/2024.06.18.599483