Proust

Causal 309M-parameter protein language model that scores variant fitness zero-shot and generates sequences, reaching 0.390 Spearman on ProteinGym.

Released: February 2026

Parameters: 309 Million

Proust is a 309-million-parameter causal protein language model designed to close a long-standing gap in the field: masked protein language models (such as the ESM series) are strong at scoring the fitness effect of mutations, while causal (autoregressive) models are needed for generating new sequences, but each family has historically been weak at the other task. Proust is built to do both well — it estimates variant fitness zero-shot while retaining the generative capability of an autoregressive model. It was introduced by Furkan Eris (ETH Zurich) in a February 2026 arXiv preprint.

The central claim of the work is that careful architectural design, rather than sheer scale, can make a causal model competitive with much larger masked models on fitness prediction. Proust reaches a Spearman correlation of 0.390 on the ProteinGym substitution benchmark and reports state-of-the-art results on indel (insertion/deletion) tasks, while remaining small and inexpensive to train relative to contemporary protein language models.

Proust is positioned alongside generative protein language models like ProtGPT2 and RITA, but with an explicit emphasis on matching the variant-effect performance usually associated with bidirectional masked models. Both code and pretrained weights have been released, making it directly usable for scoring and embedding extraction.

Key Features

Dual capability: A single causal model performs both zero-shot fitness estimation and native autoregressive sequence generation, avoiding the usual tradeoff between masked and causal modeling objectives.
Efficient attention design: The GQA-S2 transformer uses grouped-query attention with key-value sharing and depthwise causal convolutions, reducing memory cost while preserving sequence modeling quality.
Strong indel performance: The authors report state-of-the-art results on insertion/deletion variant tasks, a regime where many protein language models struggle.
Compute-efficient training: The model was trained on roughly 33 billion tokens in about 40 B200 GPU-hours, modest compared to larger contemporaries.
Released weights and code: Pretrained checkpoints (nappenstance/proust_v0) and inference code are publicly available for log-likelihood scoring and embedding extraction.

Technical Details

Proust is a 24-layer decoder-only transformer with a hidden dimension of 1,024, 16 attention heads, and 2 key-value heads, totalling 309 million parameters. The architecture, termed GQA-S2, combines grouped-query attention with KV-sharing and rotary position information, augmented by depthwise causal convolutions and cross-layer value residuals to improve representation quality without increasing model size. It uses an ESM-style 32-token vocabulary (20 standard amino acids plus special tokens). Training consumed approximately 33 billion tokens in roughly 40 B200 GPU-hours.

On the ProteinGym substitution benchmark, Proust attains a Spearman correlation of 0.390, competitive with masked models several times larger, and the authors report state-of-the-art performance on indel tasks and strong results on the EVEREST viral fitness benchmarks. Code and weights are distributed under a PolyForm Noncommercial license, with weights downloaded automatically from Hugging Face on first use.

Applications

Proust is intended for protein engineering and variant-effect workflows where both scoring and generation are useful. Because it produces zero-shot fitness estimates from sequence log-likelihoods, it can rank point mutations, insertions, and deletions without task-specific labeled data, which is valuable for prioritizing variants in directed-evolution and stability-engineering campaigns. Its autoregressive nature also allows sampling of novel candidate sequences, and its embedding interface supports downstream property prediction. The small footprint makes it practical for groups without large GPU budgets.

Impact

Proust contributes to an ongoing line of work questioning whether large model scale is necessary for strong protein language modeling, showing that a 309M-parameter causal model can rival much larger masked models on fitness benchmarks while remaining generative. As a recent (February 2026) preprint, its broader adoption and independent validation are still emerging, and reported benchmark numbers come from the authors. The noncommercial license may limit some industrial use, but the public release of weights and inference code lowers the barrier for academic experimentation with efficient causal protein models.

Citation

No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation

Preprint

Eris, F. (2026) No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation. arXiv.org.

DOI: 10.48550/arXiv.2602.01845

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References82

GitHub

Stars9

Forks0

Open Issues0

Contributors1

Last Push2mo ago

LanguagePython

HuggingFace

Downloads0

Likes2

Last Modified2mo ago

Pipelinetext-generation

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

9Closed

Usability — can I run it?11

Reproducibility — can I retrain it?5

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Dual capability: A single causal model performs both zero-shot fitness estimation and native autoregressive sequence generation, avoiding the usual tradeoff between masked and causal modeling objectives.

Efficient attention design: The GQA-S2 transformer uses grouped-query attention with key-value sharing and depthwise causal convolutions, reducing memory cost while preserving sequence modeling quality.

Strong indel performance: The authors report state-of-the-art results on insertion/deletion variant tasks, a regime where many protein language models struggle.

Compute-efficient training: The model was trained on roughly 33 billion tokens in about 40 B200 GPU-hours, modest compared to larger contemporaries.

Released weights and code: Pretrained checkpoints (nappenstance/proust_v0) and inference code are publicly available for log-likelihood scoring and embedding extraction.

Technical Details

Applications

Impact

Citation

No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation

Preprint

Eris, F. (2026) No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation. arXiv.org.

DOI: 10.48550/arXiv.2602.01845

Proust

Key Features

Technical Details

Applications

Impact

Citation

No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Proust

Key Features

Technical Details

Applications

Impact

Citation

No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Proust

#Key Features

#Technical Details

#Applications

#Impact

Citation

No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Proust

#Key Features

#Technical Details

#Applications

#Impact

Citation

No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact