Popformer

Self-supervised transformer for population genetics, pretrained on 1000 Genomes data, that detects positive selection via haplotype-wise attention.

Released: March 2026

Popformer is a self-supervised transformer for population genetics, introduced by Leon Zong, Sorelle A. Friedler, and Sara Mathieson in a March 2026 bioRxiv preprint. The model brings the pretraining-then-finetuning paradigm that transformed protein and genomic sequence modeling into population-scale analysis, where the unit of study is not a single sequence but a panel of haplotypes sampled across many individuals. It is among the first population-genetics foundation models, learning general representations of genetic variation that can be reused across downstream evolutionary inference tasks.

The central problem Popformer addresses is detecting signatures of positive selection—genomic regions where particular variants have risen in frequency faster than neutral expectation. Traditional approaches rely on hand-engineered summary statistics or supervised classifiers trained on simulations under a specific demographic model, which can fail when the assumed demography is misspecified. Popformer instead pretrains on real human data and learns representations that transfer, remaining accurate even when the downstream selection classifier is evaluated under mis-specified demographic scenarios.

Key Features

Site- and haplotype-wise attention: Two complementary attention mechanisms let the model capture variation both across genomic positions and across individuals in a sample, matching the two-dimensional structure of a haplotype matrix.
Masked-modeling pretraining on real data: Popformer is pretrained with a masked-language-modeling analog on real 1000 Genomes haplotypes, an objective closely related to genotype imputation, rather than relying solely on simulations.
Zero-shot population structure: Pretrained embeddings of genomic windows recover population structure without any labels, indicating that the model learns biologically meaningful representations.
Robust selection classification: Fine-tuned for selection detection, Popformer outperforms specialized methods under both well-specified and mis-specified demographic models.

Technical Details

Popformer is a transformer architecture adapted to operate on haplotype matrices, combining site-wise attention (across SNP positions) with haplotype-wise attention (across sampled individuals). Pretraining uses a masked-modeling objective on real human genomic data from the 1000 Genomes Project, conceptually analogous to genetic imputation: the model learns to reconstruct masked genotypes from surrounding context. The resulting embeddings of genomic windows align with known population structure in a zero-shot setting. For selection detection, the pretrained encoder is fine-tuned as a classifier and benchmarked against specialized selection-scan methods on simulations spanning both correctly specified and mis-specified demographic histories, where it reports higher accuracy.

Applications

Popformer is intended for population geneticists and evolutionary biologists who study natural selection, demographic history, and the structure of human genetic variation. Beyond selection scans, the authors point to future applications such as inferring recombination rates and local ancestry, leveraging the same pretrained backbone. Because the model learns transferable representations, it can serve as a shared starting point for multiple population-genomic inference tasks rather than requiring a bespoke estimator for each.

Impact

By demonstrating that a self-supervised transformer pretrained on real human genomes can capture population structure zero-shot and improve selection inference under model misspecification, Popformer extends foundation-model methodology into a field that has historically depended on simulation-trained, demography-specific estimators. As an early population-genetics foundation model, it charts a path toward reusable representations for evolutionary inference. As a recent preprint without a confirmed public code release or pretrained weights, its broader adoption and independent benchmarking remain to be seen.

Citation

Popformer: Learning general signatures of positive selection with a self-supervised transformer

Zong, L., et al. (2026) Popformer: Learning general signatures of positive selection with a self-supervised transformer. bioRxiv.

DOI: 10.64898/2026.03.06.710163

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations37

Influential5

References69

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

19Closed

Usability — can I run it?14

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper

Key Features

Site- and haplotype-wise attention: Two complementary attention mechanisms let the model capture variation both across genomic positions and across individuals in a sample, matching the two-dimensional structure of a haplotype matrix.

Masked-modeling pretraining on real data: Popformer is pretrained with a masked-language-modeling analog on real 1000 Genomes haplotypes, an objective closely related to genotype imputation, rather than relying solely on simulations.

Zero-shot population structure: Pretrained embeddings of genomic windows recover population structure without any labels, indicating that the model learns biologically meaningful representations.

Robust selection classification: Fine-tuned for selection detection, Popformer outperforms specialized methods under both well-specified and mis-specified demographic models.

Technical Details

Applications

Impact

Popformer

Key Features

Technical Details

Applications

Impact

Citation

Popformer: Learning general signatures of positive selection with a self-supervised transformer

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

Popformer

Key Features

Technical Details

Applications

Impact

Citation

Popformer: Learning general signatures of positive selection with a self-supervised transformer

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

Popformer

#Key Features

#Technical Details

#Applications

#Impact

Citation

Popformer: Learning general signatures of positive selection with a self-supervised transformer

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Popformer

#Key Features

#Technical Details

#Applications

#Impact

Citation

Popformer: Learning general signatures of positive selection with a self-supervised transformer

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact