RNAGAN

Generative adversarial network trained on single-cell and bulk RNA-seq for sample stratification, marker analysis, and synthetic data generation.

Released: March 2026

RNAGAN is a generative adversarial network for human RNA sequencing analysis, developed by Zhaozheng Hou, Wei Dai, and colleagues at the University of Hong Kong and posted to bioRxiv in March 2026. The model addresses a recurring inefficiency in transcriptomic machine learning: practitioners typically train a separate model for each downstream task—classification, marker discovery, data augmentation, and feature extraction. RNAGAN instead packages these capabilities into a single shared adversarial training procedure, encapsulated by the paper's title, "Train One and Get Four."

The model is trained jointly on single-cell and bulk RNA-seq, drawing on roughly 4.6 million single cells spanning multiple organs and about 5,900 bulk samples covering various cancer types alongside normal references. By learning a common generative representation across both data modalities, RNAGAN can be reused for sample stratification, marker analysis, synthetic ("pseudo") data generation, and vectorization of expression profiles without retraining a bespoke architecture for each. Its emphasis on interpretability and small-sample robustness positions it as a practical tool for laboratories that lack large bespoke training cohorts.

Key Features

Four tasks from one model: A single adversarial training run yields stratification, marker analysis, pseudo-data generation, and vectorization, removing the need for separate task-specific pipelines.
Pathway-aware layer: A dedicated neural layer extracts activities of predefined pathways from MSigDB or newly learned pathways from single-cell data, improving the interpretability of learned features.
Cross-modality training: Joint training on single-cell and bulk RNA-seq lets the generator and discriminator share representations across data types.
Small-data capability: The design is reported to retain useful performance when downstream datasets are modest in size, lowering the barrier for typical wet-lab cohorts.

Technical Details

RNAGAN follows a classic generator–discriminator GAN structure augmented with a pathway neural layer that maps expression into curated or learned pathway activities. Training data comprise approximately 4.6 million single cells from public human atlases across multiple organs and roughly 5,900 bulk cancer/normal samples. The implementation is primarily MATLAB (with Python components) and ships with pretrained weights as both .mat files and a TensorFlow-exported weights.h5, allowing reuse outside the MATLAB environment. The code is released under GPL-3.0; documentation is provided as a bundled PDF and inline comments within each .m file.

Applications

RNAGAN is aimed at researchers analyzing human transcriptomes who want a single reusable model rather than a stack of task-specific tools. Concrete uses include stratifying tumor or tissue samples, identifying marker genes and pathway activities, generating synthetic expression profiles to augment small datasets, and producing fixed-length vector embeddings of cells or samples for downstream clustering and classification. The pathway layer makes it particularly suited to studies that need biologically interpretable features rather than opaque embeddings. In a follow-up study, the pretrained model was applied—without any additional training—to nasopharyngeal carcinoma metastasis, a cancer type absent from its training data. Used in an in-context, few-shot format, all four of its functions (stratification, vectorization, pseudo-data generation, and marker identification) transferred to the new cohort, demonstrating zero-shot reuse across cancer types.

Impact

By consolidating four common transcriptomic workflows into one trained model, RNAGAN offers a pragmatic alternative to the proliferation of single-purpose tools in single-cell and bulk RNA-seq analysis. Its interpretability-focused pathway layer and small-data emphasis target groups without the resources to train large bespoke models. A subsequent version 2.0 of the codebase accompanies a follow-up application to nasopharyngeal carcinoma metastasis, providing early evidence that the frozen model transfers to previously unseen cancer types. As a set of preprints, its broader adoption and head-to-head benchmarking against established single-cell foundation models remain to be established, and its MATLAB-centric implementation may limit integration into Python-dominant pipelines.

Citations

RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability

Hou, Z., et al. (2026) RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability. bioRxiv.

DOI: 10.64898/2026.03.17.712527

Foundation Model RNAGAN Enhances Biomedical Insight of Nasopharyngeal Carcinoma Metastasis

Hou, Z., et al. (2026) Foundation Model RNAGAN Enhances Biomedical Insight of Nasopharyngeal Carcinoma Metastasis. bioRxiv.

DOI: 10.64898/2026.07.02.736240

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations89

Influential3

References63

GitHub

Stars1

Forks0

Open Issues0

Contributors1

Last Push1mo ago

LanguageMATLAB

LicenseGPL-3.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

60Partial

Usability — can I run it?77

Reproducibility — can I retrain it?40

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository GitHub Repository Research Paper Research Paper

Key Features

Four tasks from one model: A single adversarial training run yields stratification, marker analysis, pseudo-data generation, and vectorization, removing the need for separate task-specific pipelines.

Pathway-aware layer: A dedicated neural layer extracts activities of predefined pathways from MSigDB or newly learned pathways from single-cell data, improving the interpretability of learned features.

Cross-modality training: Joint training on single-cell and bulk RNA-seq lets the generator and discriminator share representations across data types.

Small-data capability: The design is reported to retain useful performance when downstream datasets are modest in size, lowering the barrier for typical wet-lab cohorts.

Technical Details

Applications

Impact

Citations

RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability

Hou, Z., et al. (2026) RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability. bioRxiv.

DOI: 10.64898/2026.03.17.712527

Foundation Model RNAGAN Enhances Biomedical Insight of Nasopharyngeal Carcinoma Metastasis

Hou, Z., et al. (2026) Foundation Model RNAGAN Enhances Biomedical Insight of Nasopharyngeal Carcinoma Metastasis. bioRxiv.

DOI: 10.64898/2026.07.02.736240

RNAGAN

Key Features

Technical Details

Applications

Impact

Citations

RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability

Foundation Model RNAGAN Enhances Biomedical Insight of Nasopharyngeal Carcinoma Metastasis

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

RNAGAN

Key Features

Technical Details

Applications

Impact

Citations

RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability

Foundation Model RNAGAN Enhances Biomedical Insight of Nasopharyngeal Carcinoma Metastasis

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

RNAGAN

#Key Features

#Technical Details

#Applications

#Impact

Citations

RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability

Foundation Model RNAGAN Enhances Biomedical Insight of Nasopharyngeal Carcinoma Metastasis

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

RNAGAN

#Key Features

#Technical Details

#Applications

#Impact

Citations

RNAGAN: Train One and Get Four, Multipurpose Human RNA-Seq Analysis Tool with Enhanced Interpretability and Small Data Size Capability

Foundation Model RNAGAN Enhances Biomedical Insight of Nasopharyngeal Carcinoma Metastasis

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact