A deep generative decoder that infers microRNA expression directly from bulk or single-cell mRNA gene expression via a shared mRNA/miRNA latent space.
miDGD is a multi-modal deep generative model that predicts microRNA (miRNA) expression directly from messenger RNA (mRNA) gene expression, operating on both bulk and single-cell transcriptomic data. It was developed by Farhad Zamani, Asta Mannstaedt Rasmussen, Viktoria Schuster, Mathilde Hartvig Diekema, Anders Krogh, and Jakob Skou Pedersen at Aarhus University and posted as a bioRxiv preprint in 2026. The model addresses a practical asymmetry in transcriptomics: mRNA expression is now routinely and inexpensively profiled across enormous sample collections and at single-cell resolution, whereas matched small-RNA sequencing that measures miRNA abundance is far less common, more expensive, and rarely available at single-cell resolution. By learning to infer miRNA levels computationally, miDGD makes it possible to study miRNA regulation in datasets where only mRNA was measured.
microRNAs are short non-coding RNAs that post-transcriptionally repress target transcripts and play central roles in development, differentiation, and disease, particularly cancer. Because miRNAs and their target mRNAs are coupled through regulatory interactions, the mRNA expression profile of a sample carries a measurable signature of its underlying miRNA activity. miDGD exploits this coupling by jointly modeling matched mRNA and miRNA profiles within a single shared representation, so that an unseen sample's mRNA can be mapped into the latent space and decoded into a predicted miRNA profile.
miDGD belongs to the Deep Generative Decoder (DGD) family of models from the same research group, which also produced multiDGD for single-cell multi-omics integration. Unlike a competing line of work that frames miRNA inference as a regression problem, miDGD adopts a generative latent-variable formulation, which lends itself to handling multiple data modalities and generalizing across distinct datasets.
miDGD is built on the Deep Generative Decoder framework, in which sample representations are obtained by maximum a posteriori estimation over a learned latent distribution rather than through a separate amortized encoder as in a variational autoencoder. In miDGD this decoder-centric generative approach is extended to two coupled modalities: matched mRNA and miRNA expression profiles are embedded in a shared latent space, and a learned decoder maps latent representations to both mRNA and miRNA outputs. At inference time, a new sample's mRNA expression is used to find its latent representation, from which the corresponding miRNA profile is generated. This formulation contrasts with regression-based predictors that map mRNA features directly to individual miRNA values.
The model was trained on matched mRNA and miRNA data drawn from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), and human cell-line datasets, spanning tumor, normal tissue, and in vitro contexts. According to the authors, miDGD learns a shared latent representation of matched mRNA and miRNA profiles and generalizes across these datasets, outperforming the prior method miRSCAPE as well as more recent miRNA-activity inference approaches on the task of predicting miRNA abundance. The preprint is released under a CC BY-NC-ND license. At the time of cataloging, no public code repository or trained model weights had been released.
miDGD is most useful for researchers who have mRNA expression data but lack matched miRNA measurements, which describes the majority of large transcriptomic resources and essentially all single-cell mRNA datasets. Cancer researchers can recover miRNA expression for TCGA-style tumor cohorts to study dysregulated miRNAs and their prognostic associations without additional sequencing. Single-cell biologists can begin to resolve miRNA activity across cell types and states in scRNA-seq experiments, a regime where direct miRNA profiling is largely impractical. More broadly, the model supports retrospective miRNA analysis of existing mRNA datasets, hypothesis generation about miRNA-mediated regulation, and prioritization of candidate miRNAs for experimental follow-up.
miDGD contributes to a growing effort to recover hard-to-measure molecular layers from abundant, inexpensive mRNA data, lowering the barrier to studying miRNA regulation at scale and at single-cell resolution. By extending the Deep Generative Decoder family from single-cell multi-omics integration into cross-modal miRNA inference, it demonstrates that a generative, shared-latent formulation can transfer across heterogeneous datasets and outperform regression-based and prior inference methods such as miRSCAPE. As a recent preprint without an accompanying public code or weights release, its reproducibility and adoption will depend on subsequent availability of an implementation, and its predictions—like those of any inference model—should be validated experimentally before being treated as direct measurements.
Zamani, F., et al. (2026) miDGD: a multi-modal deep generative model predicts microRNA expression from bulk or single-cell mRNA expression. bioRxiv.
DOI: 10.64898/2026.05.29.727918Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data