PRIMO (PRotein In-context Mutation Oracle) is a transformer-based framework for few-shot protein fitness prediction. Protein engineers frequently need to rank variants of a target protein after measuring only a handful of examples—often no more than a single 96-well plate—yet most supervised fitness models require hundreds of labeled observations, including a separate validation set to prevent overfitting. PRIMO addresses this gap by combining in-context learning (ICL) with test-time training (TTT), allowing it to adapt rapidly to a new protein or assay without large task-specific datasets and without a dedicated validation split.

The model was introduced in December 2025 by Felix Teufel, Aaron Kollasch, Yining Huang, Ole Winther, Kevin Yang, Pascal Notin, and Debora Marks, spanning Harvard Medical School, the University of Copenhagen, Novo Nordisk, Microsoft Research (Cambridge, MA), and the Technical University of Denmark. It builds on the ProteinGym / Tranception / ProteinNPT lineage from the Marks and Notin groups, and was published at the AI for Science Workshop at NeurIPS 2025.

PRIMO's central idea is to pre-train a single model across many deep mutational scanning (DMS) assays so that it learns to extract fitness signal from labeled context sets, then sharpen that model on each new task at inference time. Unlike prior set-based methods such as ProteinNPT or Metalic, PRIMO handles both substitution and insertion/deletion (indel) variants, broadening its applicability across protein engineering tasks.

Key Features

In-context learning over labeled sets: PRIMO processes a set of sequence variants together, treating each amino acid sequence, its fitness label, an assay-type ID, and an auxiliary zero-shot score as a unified token set, and learns to rank masked variants via a preference-based loss.
Test-time training: At inference, PRIMO fine-tunes its weights on the available few-shot observations for a fixed 25 gradient steps before predicting, then discards the adapted weights—remedying the distribution shift that limits pure ICL.
Handles indels and substitutions: A pooled cross-sequence attention mechanism (instead of column attention) accommodates variable-length sequences, so PRIMO scores insertions and deletions as well as point mutations.
No validation set required: Because PRIMO is trained with a ranking objective, it avoids the held-out validation data that fine-tuning and ProteinNPT-scale models need, making it usable in strict low-budget regimes.
Rigorous data splitting: The authors curate a held-out ProteinGym split controlled for sequence-identity overlap, showing that prior splits inflate apparent zero-shot performance.

Technical Details

PRIMO is a masked language model with 6 PRIMO layers, a hidden size of 400, 8 attention heads, and a feedforward factor of 4. Amino acid sequences are embedded with a frozen ESM-2 650M protein language model, and autoregressive zero-shot scores come from ProGen2-medium; both pretrained models stay frozen during training. Each PRIMO layer combines per-sequence self-attention with an attention-pooling step (3 pooled vectors per sequence) and pooled cross-sequence attention, keeping the limiting complexity at O(NL²) rather than the O(N²L²) of full sequence-of-sequences attention. It uses rotary positional embeddings, pre-LayerNorm, and skip connections. Pre-training draws 150,000 sets of size N=32 (sequences cropped to 512 residues) from 116 ProteinGym DMS assays spanning stability, enzymatic activity, abundance, fluorescence, and binding, on a single RTX 6000 GPU. On a sequence-identity-controlled held-out split, PRIMO with TTT improves from an average Spearman correlation of 0.51 at zero shots to 0.67 at 128 shots, outperforming Gaussian process, ridge regression, and random forest baselines at every level of N, and beating Metalic on a clean split. On a new "natural evolution" benchmark (chorismate mutase, Rubisco, PPAT), PRIMO with TTT reaches 0.30 Spearman at 32 shots versus roughly 0.24 for the baselines.

Applications

PRIMO targets protein engineering campaigns where labeled fitness data is scarce and expensive to generate. After measuring a small number of variants—for properties such as thermostability, enzymatic activity, binding affinity, or fluorescence—researchers can use PRIMO to prioritize promising candidates for the next experimental round, including designs involving insertions and deletions that many variant-effect models cannot score. Its ability to operate without a validation set makes it suitable for directed-evolution and machine-learning-guided design workflows constrained to a single plate of measurements.

Impact

PRIMO demonstrates that pre-training across diverse deep mutational scans, followed by efficient test-time adaptation, can deliver state-of-the-art few-shot fitness prediction while supporting both substitutions and indels. Equally influential is the paper's methodological critique: by exposing how sequence-identity overlap between train and test partitions inflates reported "zero-shot" performance, it underscores the need for fit-for-purpose data splits in protein fitness benchmarking. The model is trained only on ProteinGym's 116 assays, which the authors note limits pure in-context learning; pretrained weights are not released, and the public code targets reproduction from ProteinGym rather than turnkey inference, so adoption currently requires retraining. Even so, PRIMO offers a clear template for ICL-plus-TTT approaches as larger, more diverse fitness datasets become available.

Key Features

In-context learning over labeled sets: PRIMO processes a set of sequence variants together, treating each amino acid sequence, its fitness label, an assay-type ID, and an auxiliary zero-shot score as a unified token set, and learns to rank masked variants via a preference-based loss.

Test-time training: At inference, PRIMO fine-tunes its weights on the available few-shot observations for a fixed 25 gradient steps before predicting, then discards the adapted weights—remedying the distribution shift that limits pure ICL.

Handles indels and substitutions: A pooled cross-sequence attention mechanism (instead of column attention) accommodates variable-length sequences, so PRIMO scores insertions and deletions as well as point mutations.

No validation set required: Because PRIMO is trained with a ranking objective, it avoids the held-out validation data that fine-tuning and ProteinNPT-scale models need, making it usable in strict low-budget regimes.

Rigorous data splitting: The authors curate a held-out ProteinGym split controlled for sequence-identity overlap, showing that prior splits inflate apparent zero-shot performance.

Technical Details

Applications

Impact

PRIMO

Key Features

Technical Details

Applications

Impact

Citation

Recent citations

Top citations

Fields of citing research

Openness

Tags

Resources

PRIMO

Key Features

Technical Details

Applications

Impact

Citation

Recent citations

Top citations

Fields of citing research

Openness

Tags

Resources

PRIMO

#Key Features

#Technical Details

#Applications

#Impact

Citation

Recent citations

Top citations

Fields of citing research

Openness

Tags

Resources

PRIMO

#Key Features

#Technical Details

#Applications

#Impact

Citation

Recent citations

Top citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact