Multimodal antibody developability predictor that combines text and protein language models with in-context learning to predict many properties without retraining.
CMAP (Context-aware Multi-Property Antibody Predictor) is a multimodal framework from Amazon that predicts antibody developability properties by integrating a text language model with a protein language model. It was released as a preprint in January 2026 and subsequently published in npj Systems Biology and Applications. The model addresses a practical bottleneck in therapeutic antibody discovery: property predictors typically must be retrained for each laboratory-specific assay, struggle with incomplete datasets, and are vulnerable to batch effects that introduce systematic bias across measurement campaigns.
Rather than fine-tuning a separate model per property, CMAP frames developability prediction as an in-context learning task. A prompt supplies a handful of example antibody sequence/property pairs, and the model conditions its prediction for a query antibody on those examples. This lets a single trained model adapt to new or laboratory-specific properties at inference time without parameter updates, which is valuable when assay data are scarce or expensive to generate.
CMAP is distinguished from approaches that rely on extensive fine-tuning of large foundation models (such as TxGemma) by being a compact, prompt-driven predictor. Its multimodal design pairs natural-language context with learned protein representations, positioning it within the emerging space of text-plus-protein models applied to antibody engineering.
CMAP is a compact multimodal transformer that links a text language model and a protein language model via a dedicated tokenization and embedding-projection module. It was trained on 876,898 antibodies using the AB-context-aware learning strategy, which constructs prompts of example sequence/property pairs so the model learns to read context rather than memorize sequence-to-label mappings. Reported performance reaches Spearman's rho greater than 0.8 across several developability properties, with the framework designed to handle incomplete datasets and batch effects that commonly degrade single-property predictors.
CMAP targets antibody engineers and biopharma discovery teams who must triage large candidate panels for developability—properties affecting expression, stability, aggregation, and manufacturability—often using assays that differ across labs. By adapting through in-context examples, it lets teams apply one model to bespoke or low-data assays, prioritizing candidates for experimental characterization without standing up a new predictor for every property or measurement protocol.
CMAP illustrates how in-context learning can make antibody property prediction more portable across assays and laboratories, reducing the retraining burden that limits many developability models. Its publication in npj Systems Biology and Applications signals peer acceptance of the text-plus-protein, prompt-conditioned approach. A notable limitation is that the model and training code are not publicly released (it originates from industry research), so independent reproduction and external benchmarking on novel targets remain constrained.