bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene

muat

University of Helsinki

A portable transformer that classifies tumour types and learns representations from somatic variants, with auto-downloading WGS and WES checkpoints.

Released: April 2026

muat is a transformer-based software tool for classifying tumour types and subtypes directly from somatic variants observed in whole-genome (WGS) and whole-exome (WES) sequencing data. Rather than reducing a tumour's mutational landscape to aggregated counts or signature exposures, muat applies an attention mechanism over individual mutations, representing each variant as a combination of its type and genomic context. This lets the model learn which mutations are characteristic of a given cancer type and produce interpretable, biologically grounded representations of a tumour.

The tool packages the Mutation-Attention (MuAt) approach — originally introduced by Sanjaya and colleagues at the University of Helsinki — into a portable, reproducible piece of software designed to run across heterogeneous research environments. A central motivation is deployability: cancer genomics increasingly takes place inside high-performance computing clusters and secure processing environments (such as national biobanks) where installing and reproducing deep learning pipelines is difficult. muat addresses this by distributing pretrained checkpoints that auto-download on first use and by shipping through standard package managers and containers.

Released as a preprint on bioRxiv in April 2026 by P. Sanjaya and E. Pitkänen, muat is distributed under the permissive Apache 2.0 license, making it usable in both academic and applied clinical-research settings.

#Key Features

  • Mutation-level attention: The model attends to individual somatic variants rather than aggregated mutation counts, learning representations that combine mutation type and genomic position and highlighting tumour-type-characteristic mutations.
  • Portable, reproducible distribution: muat installs via Bioconda (conda install bioconda::muat) and is available as a container through BioContainers, targeting HPC systems and secure processing environments where reproducibility is critical.
  • Auto-downloading pretrained checkpoints: Benchmark WGS and WES models are fetched automatically from Hugging Face on first use, so users can run inference without retraining.
  • Strong zero-shot transfer: The pretrained model generalises across cohorts, reaching roughly 81% accuracy on a held-out cohort with no retraining and about 89% after fine-tuning.
  • Flexible input handling: Accepts raw variant calls (VCF, MAF, TSV) and a preprocessed .muat.tsv format, supporting SNVs and MNVs from both genome and exome assays.

#Technical Details

muat implements a transformer with an attention mechanism operating over individual somatic mutations, each encoded by features such as mutation type and genomic position; the architecture builds on the Mutation-Attention (MuAt) representation-learning framework. The benchmark whole-genome checkpoint was trained on the Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort and the whole-exome checkpoint on TCGA Pan-Cancer Atlas data. On these references the models reach approximately 89% accuracy for histological tumour typing across 24 tumour types from whole genomes and about 64% across 20 types from exomes. In cross-cohort evaluation — applying a model trained on one dataset to an independent cohort — muat attains roughly 81% accuracy without any retraining and about 89% after fine-tuning, demonstrating that the learned representations transfer rather than overfitting to a single dataset. The software is implemented in PyTorch and distributed as a versioned Bioconda package.

#Applications

muat is intended for cancer genomics researchers and translational teams who need to assign tumour type or subtype from somatic variant profiles, including investigations of cancers of unknown primary where the tissue of origin is uncertain. Because it runs from auto-downloaded checkpoints and installs through Conda or containers, it is well suited to large biobank-scale analyses inside secure processing environments (for example, Genomics England) and to HPC pipelines where reproducibility and offline-friendly deployment matter. The learned mutation representations can also support downstream representation learning beyond classification.

#Impact

muat lowers the practical barrier to applying deep learning for tumour classification by turning a research model into portable, reproducible software with pretrained checkpoints and standard packaging. Its strong zero-shot cross-cohort performance (~81% without retraining) is notable because it suggests the somatic-mutation representations generalise across sequencing cohorts, a recurring challenge for genomic classifiers. As a preprint released in 2026, adoption metrics are still emerging, and reported accuracies depend on the diversity of tumour types in the training cohorts and on sequencing assay and variant-calling consistency between training and target data. By emphasising deployability in secure and high-performance environments, muat targets a real gap between published cancer-genomics models and their use on protected real-world datasets.

Tags

tumour_classificationrepresentation_learningtransformerattentionself_supervisedtransfer_learningcancer_genomicssomatic_mutations