AINN-P1

Compact 167M-parameter protein language model built on a multiplicative LSTM, giving zero-shot variant effect and fitness prediction from sequence.

Released: March 2026

Parameters: 167 Million

AINN-P1 is a compact protein language model developed by Ainnocence, an AI drug discovery company, and released as a bioRxiv preprint in March 2026. It targets a practical question raised by the recent generation of very large protein language models: how much of their predictive power actually depends on billions of parameters, attention mechanisms, or multiple sequence alignments (MSAs). AINN-P1 answers this by showing that a 167M-parameter, attention-free, sequence-only model can match much larger transformers on standard protein-fitness benchmarks.

Rather than the now-dominant transformer architecture, AINN-P1 is built on a multiplicative LSTM (mLSTM), a recurrent design that scales linearly with sequence length and avoids the growing key-value caches that make transformer inference expensive on long sequences. The model is trained autoregressively on UniRef with straightforward next-token prediction, and protein fitness is read out zero-shot from the model's likelihoods, with optional frozen-encoder few-shot regression on top of its embeddings.

The work positions itself as evidence for a "sequence-only paradigm" — strong variant-effect prediction achievable without structural inputs, MSAs, or hundred-billion-parameter scale — which is attractive for cost-sensitive, high-throughput screening in drug discovery pipelines.

Key Features

Attention-free recurrent backbone: A multiplicative LSTM provides input-conditioned gating and nonlinear residue dependencies while scaling linearly in sequence length, avoiding the quadratic attention cost of transformers.
Compact parameter budget: At 167M parameters, AINN-P1 is orders of magnitude smaller than the 100B-scale transformers it is benchmarked against, lowering training and inference cost.
Sequence-only, MSA-free: Fitness is predicted directly from single sequences without structural inputs or multiple sequence alignments, simplifying deployment.
Zero-shot and few-shot readouts: The model supports both zero-shot likelihood scoring and frozen-encoder few-shot regression on its embeddings for fitness tasks.

Technical Details

AINN-P1 is a 167M-parameter mLSTM trained by autoregressive next-token prediction on UniRef protein sequences. On the ProteinGym deep mutational scanning benchmark, it reports an average Spearman rho of 0.441 across four task categories and 0.625 on the stability category. The authors report that this average edges past the ProSST baseline (0.438) while using far fewer parameters than billion- and hundred-billion-scale transformer protein language models. Evaluation uses both zero-shot scoring from model likelihoods and few-shot regression on top of frozen encoder representations.

Applications

AINN-P1 is aimed at protein engineering and therapeutic discovery workflows where variant-effect and fitness prediction must be run at scale and at low cost. Because it requires only a single sequence — no structure, no MSA — and runs on a modest parameter budget, it is well suited to high-throughput in silico screening of mutant libraries, stability optimization, and prioritization of candidate variants before wet-lab validation. As a commercial model from Ainnocence, it is positioned to support the company's drug discovery platform.

Impact

AINN-P1 contributes to an ongoing debate about scaling in protein language models, offering concrete evidence that competitive fitness prediction does not strictly require massive transformers or alignment-derived signals. Its mLSTM design also revisits recurrent architectures as an efficient alternative for biological sequence modeling. The main limitation for the broader community is access: the model is a commercial release under a CC-BY-NC license, and no public weights or code accompany the preprint, so independent reproduction and downstream use are currently constrained.

Citation

AINN-P1: A Compact Sequence-Only Protein Language Model Achieves Competitive Fitness Prediction on ProteinGym

Wang, R., et al. (2026) AINN-P1: A Compact Sequence-Only Protein Language Model Achieves Competitive Fitness Prediction on ProteinGym. bioRxiv.

DOI: 10.64898/2026.03.26.714619

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References22

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

12Closed

Usability — can I run it?7

Reproducibility — can I retrain it?18

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper

Key Features

Attention-free recurrent backbone: A multiplicative LSTM provides input-conditioned gating and nonlinear residue dependencies while scaling linearly in sequence length, avoiding the quadratic attention cost of transformers.

Compact parameter budget: At 167M parameters, AINN-P1 is orders of magnitude smaller than the 100B-scale transformers it is benchmarked against, lowering training and inference cost.

Sequence-only, MSA-free: Fitness is predicted directly from single sequences without structural inputs or multiple sequence alignments, simplifying deployment.

Zero-shot and few-shot readouts: The model supports both zero-shot likelihood scoring and frozen-encoder few-shot regression on its embeddings for fitness tasks.

Technical Details

Applications

Impact

AINN-P1

Key Features

Technical Details

Applications

Impact

Citation

AINN-P1: A Compact Sequence-Only Protein Language Model Achieves Competitive Fitness Prediction on ProteinGym

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

AINN-P1

Key Features

Technical Details

Applications

Impact

Citation

AINN-P1: A Compact Sequence-Only Protein Language Model Achieves Competitive Fitness Prediction on ProteinGym

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

AINN-P1

#Key Features

#Technical Details

#Applications

#Impact

Citation

AINN-P1: A Compact Sequence-Only Protein Language Model Achieves Competitive Fitness Prediction on ProteinGym

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

AINN-P1

#Key Features

#Technical Details

#Applications

#Impact

Citation

AINN-P1: A Compact Sequence-Only Protein Language Model Achieves Competitive Fitness Prediction on ProteinGym

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact