bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
RNA foundation models
RNASingle-cell

miDGD

Aarhus University

A deep generative decoder that infers microRNA expression directly from bulk or single-cell mRNA gene expression via a shared mRNA/miRNA latent space.

Released: June 2026

miDGD is a multi-modal deep generative model that predicts microRNA (miRNA) expression directly from messenger RNA (mRNA) gene expression, operating on both bulk and single-cell transcriptomic data. It was developed by Farhad Zamani, Asta Mannstaedt Rasmussen, Viktoria Schuster, Mathilde Hartvig Diekema, Anders Krogh, and Jakob Skou Pedersen at Aarhus University and posted as a bioRxiv preprint in 2026. The model addresses a practical asymmetry in transcriptomics: mRNA expression is now routinely and inexpensively profiled across enormous sample collections and at single-cell resolution, whereas matched small-RNA sequencing that measures miRNA abundance is far less common, more expensive, and rarely available at single-cell resolution. By learning to infer miRNA levels computationally, miDGD makes it possible to study miRNA regulation in datasets where only mRNA was measured.

microRNAs are short non-coding RNAs that post-transcriptionally repress target transcripts and play central roles in development, differentiation, and disease, particularly cancer. Because miRNAs and their target mRNAs are coupled through regulatory interactions, the mRNA expression profile of a sample carries a measurable signature of its underlying miRNA activity. miDGD exploits this coupling by jointly modeling matched mRNA and miRNA profiles within a single shared representation, so that an unseen sample's mRNA can be mapped into the latent space and decoded into a predicted miRNA profile.

miDGD belongs to the Deep Generative Decoder (DGD) family of models from the same research group, which also produced multiDGD for single-cell multi-omics integration. Unlike a competing line of work that frames miRNA inference as a regression problem, miDGD adopts a generative latent-variable formulation, which lends itself to handling multiple data modalities and generalizing across distinct datasets.

#Key Features

  • mRNA-to-miRNA inference: Predicts a full miRNA expression profile from mRNA gene expression alone, removing the need for dedicated and costly small-RNA sequencing to study miRNA regulation.
  • Bulk and single-cell support: Operates on both bulk tissue transcriptomes and single-cell mRNA data, extending miRNA-level analysis to single-cell datasets where small-RNA profiling is generally infeasible.
  • Shared mRNA/miRNA latent space: Learns a joint latent representation of matched mRNA and miRNA profiles, capturing the regulatory coupling between the two modalities rather than treating prediction as an isolated regression.
  • Cross-dataset generalization: Trained across heterogeneous sources (tumor, healthy tissue, and cell lines) and shown to transfer across datasets, an important property given the variation in tissue composition and platform between transcriptomic resources.
  • State-of-the-art accuracy: Reported to outperform miRSCAPE and other recent miRNA-activity inference methods in predicting miRNA abundance across data types and experimental contexts.

#Technical Details

miDGD is built on the Deep Generative Decoder framework, in which sample representations are obtained by maximum a posteriori estimation over a learned latent distribution rather than through a separate amortized encoder as in a variational autoencoder. In miDGD this decoder-centric generative approach is extended to two coupled modalities: matched mRNA and miRNA expression profiles are embedded in a shared latent space, and a learned decoder maps latent representations to both mRNA and miRNA outputs. At inference time, a new sample's mRNA expression is used to find its latent representation, from which the corresponding miRNA profile is generated. This formulation contrasts with regression-based predictors that map mRNA features directly to individual miRNA values.

The model was trained on matched mRNA and miRNA data drawn from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), and human cell-line datasets, spanning tumor, normal tissue, and in vitro contexts. According to the authors, miDGD learns a shared latent representation of matched mRNA and miRNA profiles and generalizes across these datasets, outperforming the prior method miRSCAPE as well as more recent miRNA-activity inference approaches on the task of predicting miRNA abundance. The preprint is released under a CC BY-NC-ND license. At the time of cataloging, no public code repository or trained model weights had been released.

#Applications

miDGD is most useful for researchers who have mRNA expression data but lack matched miRNA measurements, which describes the majority of large transcriptomic resources and essentially all single-cell mRNA datasets. Cancer researchers can recover miRNA expression for TCGA-style tumor cohorts to study dysregulated miRNAs and their prognostic associations without additional sequencing. Single-cell biologists can begin to resolve miRNA activity across cell types and states in scRNA-seq experiments, a regime where direct miRNA profiling is largely impractical. More broadly, the model supports retrospective miRNA analysis of existing mRNA datasets, hypothesis generation about miRNA-mediated regulation, and prioritization of candidate miRNAs for experimental follow-up.

#Impact

miDGD contributes to a growing effort to recover hard-to-measure molecular layers from abundant, inexpensive mRNA data, lowering the barrier to studying miRNA regulation at scale and at single-cell resolution. By extending the Deep Generative Decoder family from single-cell multi-omics integration into cross-modal miRNA inference, it demonstrates that a generative, shared-latent formulation can transfer across heterogeneous datasets and outperform regression-based and prior inference methods such as miRSCAPE. As a recent preprint without an accompanying public code or weights release, its reproducibility and adoption will depend on subsequent availability of an implementation, and its predictions—like those of any inference model—should be validated experimentally before being treated as direct measurements.

Citation

miDGD: a multi-modal deep generative model predicts microRNA expression from bulk or single-cell mRNA expression

Zamani, F., et al. (2026) miDGD: a multi-modal deep generative model predicts microRNA expression from bulk or single-cell mRNA expression. bioRxiv.

DOI: 10.64898/2026.05.29.727918

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References54

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
8Closed
Usability — can I run it?7
Reproducibility — can I retrain it?10
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

autoencoderdeep_generative_decodergene_expressiongenerativemicrornamirna_expression_inferencemultimodalrepresentation_learningtranscriptomics

Resources

Research Paper