Pan-cancer multi-omic BERT-like foundation model that jointly encodes CpG-island DNA methylation and RNA-seq for zero-shot cancer classification and mutation prediction.
ISTS (Islands of Signal and Transcriptomic Sequencing) is a pan-cancer, multi-omic foundation model that jointly represents two complementary readouts of tumor biology: CpG-island DNA methylation and bulk RNA-seq gene expression. Released as a December 2025 bioRxiv preprint by Alexandros Alexakos and Aristotelis Tsirigos at New York University, the model addresses a persistent challenge in computational oncology — most foundation models are trained on a single modality, yet methylation and transcription jointly encode lineage identity and the downstream consequences of driver mutations. By learning a shared embedding space across both, ISTS aims to produce compact, transferable representations of tumor state.
The model's central design choice is to compress raw, high-dimensional inputs into informative features before pretraining. Probe-level methylation arrays are aggregated into CpG-island-level features, and RNA-seq is reduced to a high-variance gene panel. These modality-specific inputs feed dedicated encoders, and a BERT-like transformer with masked reconstruction and cross-modal prediction objectives fuses them into a single representation that tolerates missing-modality inputs at inference.
ISTS sits alongside emerging multi-omic and methylation foundation models (such as MethylGPT and bimodal RNA/methylation models) but is distinguished by its CpG-island grouping strategy and its explicit pan-cancer, zero-shot evaluation across both lineage and mutation tasks.
ISTS uses modality-specific MLP encoders that map CpG-island methylation features and a high-variance RNA-seq gene panel into a common latent space, which a BERT-like transformer then refines via masked reconstruction and cross-modal prediction objectives. Pretraining draws on harmonized public pan-cancer resources — TCGA, TARGET, CPTAC-3, and HCMI — spanning a broad range of adult and pediatric tumor types. The learned representations are evaluated in two zero-shot settings: cancer-type classification via a linear probe on frozen embeddings, and mutation prediction across 214 genes via a shallow MLP head, with no finetuning of the transformer backbone. The authors report strong performance for many tumor types and gene–cancer pairs, and observe that the embedding space recovers biologically meaningful structure. Detailed hyperparameters (embedding dimension, layer count, and total parameter count) are specified in the preprint; the work is released under a CC BY license.
ISTS targets computational oncology workflows where joint multi-omic context improves inference: assigning tumor lineage or tissue of origin, prioritizing likely driver mutations from molecular profiles, and producing reusable tumor embeddings for downstream classifiers. Its tolerance for missing modalities makes it practical for retrospective cohorts and clinical archives, where samples frequently have only methylation arrays or only RNA-seq. Cancer genomics researchers and translational bioinformaticians benefit most, using frozen embeddings as a feature backbone rather than training bespoke models per task.
As one of a new wave of multi-omic foundation models for cancer, ISTS contributes evidence that pairing CpG-island grouping with cross-modal self-supervised pretraining yields compact, informative embeddings that transfer zero-shot to both lineage and mutation tasks. By demonstrating useful representations without encoder finetuning, it lowers the barrier to applying foundation-model embeddings in cancer genomics. As a recent preprint, its results await peer review and independent validation, and — at the time of writing — no public code repository, model weights, model card, or data card had been located, which currently limits external reproduction and adoption.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data