muat

Transformer that classifies tumour types and subtypes from somatic variants in whole-genome and whole-exome data, with auto-downloading checkpoints.

Released: April 2026

muat is a transformer-based software tool for classifying tumour types and subtypes directly from somatic variants observed in whole-genome (WGS) and whole-exome (WES) sequencing data. Rather than reducing a tumour's mutational landscape to aggregated counts or signature exposures, muat applies an attention mechanism over individual mutations, representing each variant as a combination of its type and genomic context. This lets the model learn which mutations are characteristic of a given cancer type and produce interpretable, biologically grounded representations of a tumour.

The tool packages the Mutation-Attention (MuAt) approach — originally introduced by Sanjaya and colleagues at the University of Helsinki — into a portable, reproducible piece of software designed to run across heterogeneous research environments. A central motivation is deployability: cancer genomics increasingly takes place inside high-performance computing clusters and secure processing environments (such as national biobanks) where installing and reproducing deep learning pipelines is difficult. muat addresses this by distributing pretrained checkpoints that auto-download on first use and by shipping through standard package managers and containers.

Released as a preprint on bioRxiv in April 2026 by P. Sanjaya and E. Pitkänen, muat is distributed under the permissive Apache 2.0 license, making it usable in both academic and applied clinical-research settings.

Key Features

Mutation-level attention: The model attends to individual somatic variants rather than aggregated mutation counts, learning representations that combine mutation type and genomic position and highlighting tumour-type-characteristic mutations.
Portable, reproducible distribution: muat installs via Bioconda (conda install bioconda::muat) and is available as a container through BioContainers, targeting HPC systems and secure processing environments where reproducibility is critical.
Auto-downloading pretrained checkpoints: Benchmark WGS and WES models are fetched automatically from Hugging Face on first use, so users can run inference without retraining.
Strong zero-shot transfer: The pretrained model generalises across cohorts, reaching roughly 81% accuracy on a held-out cohort with no retraining and about 89% after fine-tuning.
Flexible input handling: Accepts raw variant calls (VCF, MAF, TSV) and a preprocessed .muat.tsv format, supporting SNVs and MNVs from both genome and exome assays.

Technical Details

muat implements a transformer with an attention mechanism operating over individual somatic mutations, each encoded by features such as mutation type and genomic position; the architecture builds on the Mutation-Attention (MuAt) representation-learning framework. The benchmark whole-genome checkpoint was trained on the Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort and the whole-exome checkpoint on TCGA Pan-Cancer Atlas data. On these references the models reach approximately 89% accuracy for histological tumour typing across 24 tumour types from whole genomes and about 64% across 20 types from exomes. In cross-cohort evaluation — applying a model trained on one dataset to an independent cohort — muat attains roughly 81% accuracy without any retraining and about 89% after fine-tuning, demonstrating that the learned representations transfer rather than overfitting to a single dataset. The software is implemented in PyTorch and distributed as a versioned Bioconda package.

Applications

muat is intended for cancer genomics researchers and translational teams who need to assign tumour type or subtype from somatic variant profiles, including investigations of cancers of unknown primary where the tissue of origin is uncertain. Because it runs from auto-downloaded checkpoints and installs through Conda or containers, it is well suited to large biobank-scale analyses inside secure processing environments (for example, Genomics England) and to HPC pipelines where reproducibility and offline-friendly deployment matter. The learned mutation representations can also support downstream representation learning beyond classification.

Impact

muat lowers the practical barrier to applying deep learning for tumour classification by turning a research model into portable, reproducible software with pretrained checkpoints and standard packaging. Its strong zero-shot cross-cohort performance (~81% without retraining) is notable because it suggests the somatic-mutation representations generalise across sequencing cohorts, a recurring challenge for genomic classifiers. As a preprint released in 2026, adoption metrics are still emerging, and reported accuracies depend on the diversity of tumour types in the training cohorts and on sequencing assay and variant-calling consistency between training and target data. By emphasising deployability in secure and high-performance environments, muat targets a real gap between published cancer-genomics models and their use on protected real-world datasets.

Citation

muat: portable transformer-based method for tumour classification and representation learning from somatic variants

Sanjaya, P. & Pitkänen, E. (2026) muat: portable transformer-based method for tumour classification and representation learning from somatic variants. bioRxiv.

DOI: 10.64898/2026.04.01.715762

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations2

Influential0

References40

GitHub

Stars8

Forks3

Open Issues0

Contributors1

Last Push25d ago

LanguagePython

LicenseApache-2.0

HuggingFace

Downloads0

Likes0

Last Modified1y ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible

65Partial

Usability — can I run it?85

Reproducibility — can I retrain it?54

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Mutation-level attention: The model attends to individual somatic variants rather than aggregated mutation counts, learning representations that combine mutation type and genomic position and highlighting tumour-type-characteristic mutations.

Portable, reproducible distribution: muat installs via Bioconda (conda install bioconda::muat) and is available as a container through BioContainers, targeting HPC systems and secure processing environments where reproducibility is critical.

Auto-downloading pretrained checkpoints: Benchmark WGS and WES models are fetched automatically from Hugging Face on first use, so users can run inference without retraining.

Strong zero-shot transfer: The pretrained model generalises across cohorts, reaching roughly 81% accuracy on a held-out cohort with no retraining and about 89% after fine-tuning.

Flexible input handling: Accepts raw variant calls (VCF, MAF, TSV) and a preprocessed .muat.tsv format, supporting SNVs and MNVs from both genome and exome assays.

Technical Details

Applications

Impact

Citation

muat: portable transformer-based method for tumour classification and representation learning from somatic variants

Sanjaya, P. & Pitkänen, E. (2026) muat: portable transformer-based method for tumour classification and representation learning from somatic variants. bioRxiv.

DOI: 10.64898/2026.04.01.715762

muat

Key Features

Technical Details

Applications

Impact

Citation

muat: portable transformer-based method for tumour classification and representation learning from somatic variants

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

muat

Key Features

Technical Details

Applications

Impact

Citation

muat: portable transformer-based method for tumour classification and representation learning from somatic variants

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

muat

#Key Features

#Technical Details

#Applications

#Impact

Citation

muat: portable transformer-based method for tumour classification and representation learning from somatic variants

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

muat

#Key Features

#Technical Details

#Applications

#Impact

Citation

muat: portable transformer-based method for tumour classification and representation learning from somatic variants

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact