Aiki-XP is a multimodal model from Aikium Inc. that predicts the relative expression of a protein within its host bacterium's proteome. Rather than forecasting absolute yield in µg/mL, it ranks candidate genes by per-species z-scored abundance — answering the practical question of which sequences a given organism is likely to express well. This addresses a long-standing bottleneck in heterologous protein production and synthetic biology, where expression levels are notoriously difficult to predict from sequence alone.
The central methodological contribution is rigorous leakage control. Because homologous proteins recur across bacterial genomes, naive train/test splits leak information and inflate reported performance. Aiki-XP instead groups genes into MMseqs2 sequence clusters and partitions those clusters between training and evaluation, so the model is tested on genuinely novel sequences. Critically, the authors report that all training recipes and hyperparameters were locked before external evaluation, and the model demonstrates transfer to bacterial phyla not seen during training — evidence that it captures generalizable determinants of expression rather than memorizing phylogenetic signal.
Posted to bioRxiv in April 2026, Aiki-XP is part of Aikium's broader family of "Aiki" foundation models for protein phenotypes, and operates at pan-bacterial scale across hundreds of species.
Aiki-XP is built around a 25M-parameter multimodal fusion head that combines embeddings from several pretrained foundation models with handcrafted biophysical descriptors. Training spans 492,026 genes drawn from 385 bacterial species (with 1,831 host genomes available), using roughly 360 A100-hours for five-fold fusion-head training plus about 1,000 A100-hours to precompute upstream embeddings. On non-conserved (held-out) genes, the full Tier D/XP5 model reaches a Spearman correlation of ρ_nc ≈ 0.59, compared with 0.518 for the protein-only Tier A and 0.509 for an ESM-C 600M baseline; the median absolute error is roughly 0.47 z-scores, with 95% of predictions within |Δ| < 1.5. Code is released under Apache 2.0, while model weights and training data are archived on Zenodo (DOI 10.5281/zenodo.19639621, CC-BY 4.0, ~28 GB), with a Python client, Docker images, and a hosted demo for inference.
Aiki-XP is aimed at researchers and engineers optimizing recombinant protein production, where selecting expression-friendly constructs or host organisms can dramatically reduce trial-and-error at the bench. By ranking candidates before synthesis, it can accelerate protein engineering campaigns, inform choice of expression host, and help prioritize sequence variants in synthetic biology and industrial enzyme workflows. The tiered design lets users apply the model with only a protein sequence or with full genomic and operon context, fitting both early-stage triage and detailed construct design.
By foregrounding leakage control and pre-registered recipes, Aiki-XP offers a more honest benchmark for bacterial expression prediction than splits that allow homolog leakage, and its demonstrated cross-phylum transfer suggests the learned signal reflects real determinants of expression. The release of permissively licensed code, weights, training data, a client library, and a live demo lowers the barrier for adoption and reproduction. As a recent preprint, its broader influence on protein engineering practice and downstream tooling remains to be established, and predictions are explicitly relative rankings rather than calibrated absolute yields — a limitation the authors emphasize.