ISTS

Pan-cancer multi-omic foundation model encoding CpG-island DNA methylation and RNA-seq for zero-shot cancer classification and mutation prediction.

Released: December 2025

ISTS (Islands of Signal and Transcriptomic Sequencing) is a pan-cancer, multi-omic foundation model that jointly represents two complementary readouts of tumor biology: CpG-island DNA methylation and bulk RNA-seq gene expression. Released as a December 2025 bioRxiv preprint by Alexandros Alexakos and Aristotelis Tsirigos at New York University, the model addresses a persistent challenge in computational oncology — most foundation models are trained on a single modality, yet methylation and transcription jointly encode lineage identity and the downstream consequences of driver mutations. By learning a shared embedding space across both, ISTS aims to produce compact, transferable representations of tumor state.

The model's central design choice is to compress raw, high-dimensional inputs into informative features before pretraining. Probe-level methylation arrays are aggregated into CpG-island-level features, and RNA-seq is reduced to a high-variance gene panel. These modality-specific inputs feed dedicated encoders, and a BERT-like transformer with masked reconstruction and cross-modal prediction objectives fuses them into a single representation that tolerates missing-modality inputs at inference.

ISTS sits alongside emerging multi-omic and methylation foundation models (such as MethylGPT and bimodal RNA/methylation models) but is distinguished by its CpG-island grouping strategy and its explicit pan-cancer, zero-shot evaluation across both lineage and mutation tasks.

Key Features

Joint methylation + expression encoding: Combines CpG-island DNA methylation and high-variance RNA-seq genes in one model, capturing signal that neither modality reveals alone.
CpG-island aggregation: Collapses noisy, high-dimensional probe-level methylation into island-level features, yielding compact inputs and reducing the burden on the encoder.
BERT-like masked pretraining: Uses masked reconstruction plus cross-modal prediction objectives to learn a shared embedding space in a self-supervised manner, without task labels.
Missing-modality robustness: The shared embedding supports inputs where one modality is absent, reflecting the reality that many clinical samples have only methylation or only expression.
Zero-shot evaluation: Frozen embeddings are assessed without encoder finetuning, using a linear probe for cancer-type classification and a shallow MLP for mutation prediction.

Technical Details

ISTS uses modality-specific MLP encoders that map CpG-island methylation features and a high-variance RNA-seq gene panel into a common latent space, which a BERT-like transformer then refines via masked reconstruction and cross-modal prediction objectives. Pretraining draws on harmonized public pan-cancer resources — TCGA, TARGET, CPTAC-3, and HCMI — spanning a broad range of adult and pediatric tumor types. The learned representations are evaluated in two zero-shot settings: cancer-type classification via a linear probe on frozen embeddings, and mutation prediction across 214 genes via a shallow MLP head, with no finetuning of the transformer backbone. The authors report strong performance for many tumor types and gene–cancer pairs, and observe that the embedding space recovers biologically meaningful structure. Detailed hyperparameters (embedding dimension, layer count, and total parameter count) are specified in the preprint; the work is released under a CC BY license.

Applications

ISTS targets computational oncology workflows where joint multi-omic context improves inference: assigning tumor lineage or tissue of origin, prioritizing likely driver mutations from molecular profiles, and producing reusable tumor embeddings for downstream classifiers. Its tolerance for missing modalities makes it practical for retrospective cohorts and clinical archives, where samples frequently have only methylation arrays or only RNA-seq. Cancer genomics researchers and translational bioinformaticians benefit most, using frozen embeddings as a feature backbone rather than training bespoke models per task.

Impact

As one of a new wave of multi-omic foundation models for cancer, ISTS contributes evidence that pairing CpG-island grouping with cross-modal self-supervised pretraining yields compact, informative embeddings that transfer zero-shot to both lineage and mutation tasks. By demonstrating useful representations without encoder finetuning, it lowers the barrier to applying foundation-model embeddings in cancer genomics. As a recent preprint, its results await peer review and independent validation, and — at the time of writing — no public code repository, model weights, model card, or data card had been located, which currently limits external reproduction and adoption.

Citation

Islands of Signal and Transcriptomic Sequencing: A Foundation Model for Mutation and Lineage Prediction based on DNA Methylation and RNA-seq

Alexakos, A. & Tsirigos, A. (2025) Islands of Signal and Transcriptomic Sequencing: A Foundation Model for Mutation and Lineage Prediction based on DNA Methylation and RNA-seq. bioRxiv.

DOI: 10.64898/2025.12.01.691534

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References51

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

20Closed

Usability — can I run it?13

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

Joint methylation + expression encoding: Combines CpG-island DNA methylation and high-variance RNA-seq genes in one model, capturing signal that neither modality reveals alone.

CpG-island aggregation: Collapses noisy, high-dimensional probe-level methylation into island-level features, yielding compact inputs and reducing the burden on the encoder.

BERT-like masked pretraining: Uses masked reconstruction plus cross-modal prediction objectives to learn a shared embedding space in a self-supervised manner, without task labels.

Missing-modality robustness: The shared embedding supports inputs where one modality is absent, reflecting the reality that many clinical samples have only methylation or only expression.

Zero-shot evaluation: Frozen embeddings are assessed without encoder finetuning, using a linear probe for cancer-type classification and a shallow MLP for mutation prediction.

Technical Details

Applications

Impact

Citation

Islands of Signal and Transcriptomic Sequencing: A Foundation Model for Mutation and Lineage Prediction based on DNA Methylation and RNA-seq

Alexakos, A. & Tsirigos, A. (2025) Islands of Signal and Transcriptomic Sequencing: A Foundation Model for Mutation and Lineage Prediction based on DNA Methylation and RNA-seq. bioRxiv.

DOI: 10.64898/2025.12.01.691534

ISTS

Key Features

Technical Details

Applications

Impact

Citation

Islands of Signal and Transcriptomic Sequencing: A Foundation Model for Mutation and Lineage Prediction based on DNA Methylation and RNA-seq

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

ISTS

Key Features

Technical Details

Applications

Impact

Citation

Islands of Signal and Transcriptomic Sequencing: A Foundation Model for Mutation and Lineage Prediction based on DNA Methylation and RNA-seq

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

ISTS

#Key Features

#Technical Details

#Applications

#Impact

Citation

Islands of Signal and Transcriptomic Sequencing: A Foundation Model for Mutation and Lineage Prediction based on DNA Methylation and RNA-seq

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

ISTS

#Key Features

#Technical Details

#Applications

#Impact

Citation

Islands of Signal and Transcriptomic Sequencing: A Foundation Model for Mutation and Lineage Prediction based on DNA Methylation and RNA-seq

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact