bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell

RNABag

HomiGen Intelligence Technology Co., Ltd.

A transcriptome foundation model for precision oncology that generalizes zero-shot across tissue, plasma cfRNA, and tumor-educated platelet biopsy modalities.

Released: April 2026

RNABag is a transcriptome foundation model for precision oncology, developed by HomiGen Intelligence Technology Co., Ltd. and released as a bioRxiv preprint in April 2026. It targets a long-standing obstacle in applying RNA-seq to clinical inference: transcriptomic measurements are exquisitely sensitive to cancer state and progression, yet conventional analyses are undermined by technical batch effects and poor generalization across sequencing platforms, cohorts, and sample types.

The central design choice that distinguishes RNABag is its emphasis on robustness over raw capacity. Rather than ingesting the full transcriptome, the model restricts attention to a curated set of highly variable genes to suppress noise, and is pretrained with extensive data augmentation so that its learned representations are invariant to batch variation. This makes the model unusually portable: the authors report strong zero-shot generalization to external cohorts and to in-house clinical samples, in addition to internal validation performance.

RNABag sits alongside bulk RNA-seq foundation models for cancer such as GeneBag, but is differentiated by its explicit cross-modality reach. The same pretrained backbone is applied not only to solid-tissue RNA-seq but also to two liquid-biopsy modalities — plasma cell-free RNA (cfRNA) and tumor-educated platelets (TEPs) — positioning it as a unified model for both invasive and non-invasive cancer monitoring.

#Key Features

  • Cross-modality generalization: A single transcriptome model spans tissue RNA-seq, plasma cfRNA, and tumor-educated platelet biopsies, supporting both diagnostic and non-invasive monitoring workflows.
  • Batch-invariant pretraining: Extensive data augmentation during pretraining encourages representations that are robust to platform and batch effects, a primary cause of failure in transcriptome-based models.
  • Highly-variable-gene focus: Restricting input to a fixed panel of high-variability genes reduces noise and improves transfer to unseen datasets.
  • Zero-shot external validation: Reports strong performance on external cohorts and in-house clinical samples without task-specific retraining, plus stronger results after specialized fine-tuning.
  • Clinical-task breadth: After fine-tuning, supports pan-cancer tissue-of-origin classification, cancer detection, survival stratification, and relapse-risk prediction.
  • Released checkpoints and inference code: Task-specific .ckpt weights and a unified inference pipeline are provided publicly under an MIT license.

#Technical Details

RNABag uses a transformer-based architecture that operates on bulk RNA-seq expression profiles. Inputs are provided as FPKM expression matrices and preprocessed via gene-annotation mapping, filtering, log1p normalization, and reduction to a curated panel of 4,096 highly variable genes (selected using TCGA-derived criteria). Pretraining relies on heavy data augmentation to learn batch-invariant representations, after which the model is fine-tuned for individual downstream tasks. The public repository ships separate checkpoints for tissue cancer detection, tissue origin identification, plasma cancer detection, platelet cancer detection, and platelet tumor localization, each invoked through a unified main.py inference entry point. The authors report superior performance in pan-cancer tissue-of-origin classification and cancer detection on internal validation sets, with high diagnostic accuracy carried over to plasma cfRNA and TEP samples; precise parameter counts and per-task benchmark metrics are not detailed in the public documentation.

#Applications

RNABag is aimed at computational oncology and liquid-biopsy research. Its tissue-based tasks — pan-cancer tissue-of-origin classification, cancer detection, survival stratification, and relapse-risk prediction — support tumor characterization and prognosis from solid-biopsy RNA-seq, while its plasma cfRNA and tumor-educated platelet capabilities extend toward non-invasive, blood-based cancer detection and monitoring. Because the released checkpoints generalize zero-shot to external cohorts, the model is well suited to groups seeking a portable starting point for transcriptome-based diagnostics without assembling large platform-matched training sets. Interpretability analysis additionally surfaced tumor immune escape as a driver of cancer-induced plasma cfRNA signals, offering biological hypotheses alongside predictions.

#Impact

RNABag contributes to a growing line of transcriptome foundation models that treat bulk RNA-seq as a substrate for clinical inference, and it is notable for tackling the generalization gap that has limited the clinical translation of such models. By demonstrating that a batch-invariant, highly-variable-gene model can transfer across tissue, plasma cfRNA, and platelet modalities, it argues for unified rather than modality-specific approaches to cancer monitoring. As a 2026 preprint with publicly released checkpoints and inference code but without reported parameter counts or fully detailed benchmarks, its claims await peer review and broader independent validation; nonetheless, the cross-modality, zero-shot framing and open release make it a useful reference point for researchers building liquid-biopsy diagnostics.

Tags

cancer_detectioncell_type_annotationsurvival_predictiontransformerfoundation_modelself_supervisedzero_shottranscriptomicsoncology