bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

RNAGAN

The University of Hong Kong

Multipurpose generative adversarial network trained once on single-cell and bulk RNA-seq to perform stratification, marker analysis, data generation, and vectorization.

Released: March 2026

RNAGAN is a generative adversarial network for human RNA sequencing analysis, developed by Zhaozheng Hou, Wei Dai, and colleagues at the University of Hong Kong and posted to bioRxiv in March 2026. The model addresses a recurring inefficiency in transcriptomic machine learning: practitioners typically train a separate model for each downstream task—classification, marker discovery, data augmentation, and feature extraction. RNAGAN instead packages these capabilities into a single shared adversarial training procedure, encapsulated by the paper's title, "Train One and Get Four."

The model is trained jointly on single-cell and bulk RNA-seq, drawing on roughly 4.6 million single cells spanning multiple organs and about 5,900 bulk samples covering various cancer types alongside normal references. By learning a common generative representation across both data modalities, RNAGAN can be reused for sample stratification, marker analysis, synthetic ("pseudo") data generation, and vectorization of expression profiles without retraining a bespoke architecture for each. Its emphasis on interpretability and small-sample robustness positions it as a practical tool for laboratories that lack large bespoke training cohorts.

#Key Features

  • Four tasks from one model: A single adversarial training run yields stratification, marker analysis, pseudo-data generation, and vectorization, removing the need for separate task-specific pipelines.
  • Pathway-aware layer: A dedicated neural layer extracts activities of predefined pathways from MSigDB or newly learned pathways from single-cell data, improving the interpretability of learned features.
  • Cross-modality training: Joint training on single-cell and bulk RNA-seq lets the generator and discriminator share representations across data types.
  • Small-data capability: The design is reported to retain useful performance when downstream datasets are modest in size, lowering the barrier for typical wet-lab cohorts.

#Technical Details

RNAGAN follows a classic generator–discriminator GAN structure augmented with a pathway neural layer that maps expression into curated or learned pathway activities. Training data comprise approximately 4.6 million single cells from public human atlases across multiple organs and roughly 5,900 bulk cancer/normal samples. The implementation is primarily MATLAB (with Python components) and ships with pretrained weights as both .mat files and a TensorFlow-exported weights.h5, allowing reuse outside the MATLAB environment. The code is released under GPL-3.0; documentation is provided as a bundled PDF and inline comments within each .m file.

#Applications

RNAGAN is aimed at researchers analyzing human transcriptomes who want a single reusable model rather than a stack of task-specific tools. Concrete uses include stratifying tumor or tissue samples, identifying marker genes and pathway activities, generating synthetic expression profiles to augment small datasets, and producing fixed-length vector embeddings of cells or samples for downstream clustering and classification. The pathway layer makes it particularly suited to studies that need biologically interpretable features rather than opaque embeddings.

#Impact

By consolidating four common transcriptomic workflows into one trained model, RNAGAN offers a pragmatic alternative to the proliferation of single-purpose tools in single-cell and bulk RNA-seq analysis. Its interpretability-focused pathway layer and small-data emphasis target groups without the resources to train large bespoke models. As a recent preprint, its real-world adoption and head-to-head benchmarking against established single-cell foundation models remain to be established, and its MATLAB-centric implementation may limit integration into Python-dominant pipelines.

Tags

gene_expressioncell_type_annotationdata_generationgenerative_adversarial_networkgenerativerepresentation_learningtranscriptomicscancer