CMAP

Antibody developability predictor pairing text and protein language models, using in-context learning to fit new assays without retraining.

Released: January 2026

CMAP (Context-aware Multi-Property Antibody Predictor) is a multimodal framework from Amazon that predicts antibody developability properties by integrating a text language model with a protein language model. It was released as a preprint in January 2026 and subsequently published in npj Systems Biology and Applications. The model addresses a practical bottleneck in therapeutic antibody discovery: property predictors typically must be retrained for each laboratory-specific assay, struggle with incomplete datasets, and are vulnerable to batch effects that introduce systematic bias across measurement campaigns.

Rather than fine-tuning a separate model per property, CMAP frames developability prediction as an in-context learning task. A prompt supplies a handful of example antibody sequence/property pairs, and the model conditions its prediction for a query antibody on those examples. This lets a single trained model adapt to new or laboratory-specific properties at inference time without parameter updates, which is valuable when assay data are scarce or expensive to generate.

CMAP is distinguished from approaches that rely on extensive fine-tuning of large foundation models (such as TxGemma) by being a compact, prompt-driven predictor. Its multimodal design pairs natural-language context with learned protein representations, positioning it within the emerging space of text-plus-protein models applied to antibody engineering.

Key Features

In-context multi-property prediction: A single model predicts multiple developability properties from prompts containing example antibody sequence/property pairs, removing the need to retrain for each property.
Text + protein multimodality: CMAP combines a text language model with a protein language model through a specialized tokenization and embedding-projection system that fuses the two modalities.
Context-aware training strategy: An "AB-context-aware" objective forces the model to condition predictions on the in-context examples rather than learning shortcuts from sequence alone, improving robustness to incomplete data and batch effects.
Strong correlation across properties: The authors report Spearman's rho above 0.8 across multiple developability properties on their evaluation set.
Rapid assay adaptation: Because adaptation happens at inference via prompting, the model can be applied to new laboratory-specific assays without a new training run.

Technical Details

CMAP is a compact multimodal transformer that links a text language model and a protein language model via a dedicated tokenization and embedding-projection module. It was trained on 876,898 antibodies using the AB-context-aware learning strategy, which constructs prompts of example sequence/property pairs so the model learns to read context rather than memorize sequence-to-label mappings. Reported performance reaches Spearman's rho greater than 0.8 across several developability properties, with the framework designed to handle incomplete datasets and batch effects that commonly degrade single-property predictors.

Applications

CMAP targets antibody engineers and biopharma discovery teams who must triage large candidate panels for developability—properties affecting expression, stability, aggregation, and manufacturability—often using assays that differ across labs. By adapting through in-context examples, it lets teams apply one model to bespoke or low-data assays, prioritizing candidates for experimental characterization without standing up a new predictor for every property or measurement protocol.

Impact

CMAP illustrates how in-context learning can make antibody property prediction more portable across assays and laboratories, reducing the retraining burden that limits many developability models. Its publication in npj Systems Biology and Applications signals peer acceptance of the text-plus-protein, prompt-conditioned approach. A notable limitation is that the model and training code are not publicly released (it originates from industry research), so independent reproduction and external benchmarking on novel targets remain constrained.

Citation

Context-aware Multi-Property Antibody Predictor: a Novel Framework Integrating Text and Protein Language Models

(2026) Context-aware Multi-Property Antibody Predictor: a Novel Framework Integrating Text and Protein Language Models. bioRxiv.

DOI: 10.64898/2026.01.07.698270

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References34

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

4Closed

Usability — can I run it?7

Reproducibility — can I retrain it?0

not reproducible

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper Research Paper

Key Features

In-context multi-property prediction: A single model predicts multiple developability properties from prompts containing example antibody sequence/property pairs, removing the need to retrain for each property.

Text + protein multimodality: CMAP combines a text language model with a protein language model through a specialized tokenization and embedding-projection system that fuses the two modalities.

Context-aware training strategy: An "AB-context-aware" objective forces the model to condition predictions on the in-context examples rather than learning shortcuts from sequence alone, improving robustness to incomplete data and batch effects.

Strong correlation across properties: The authors report Spearman's rho above 0.8 across multiple developability properties on their evaluation set.

Rapid assay adaptation: Because adaptation happens at inference via prompting, the model can be applied to new laboratory-specific assays without a new training run.

Technical Details

Applications

Impact

CMAP

Key Features

Technical Details

Applications

Impact

Citation

Context-aware Multi-Property Antibody Predictor: a Novel Framework Integrating Text and Protein Language Models

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

CMAP

Key Features

Technical Details

Applications

Impact

Citation

Context-aware Multi-Property Antibody Predictor: a Novel Framework Integrating Text and Protein Language Models

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

CMAP

#Key Features

#Technical Details

#Applications

#Impact

Citation

Context-aware Multi-Property Antibody Predictor: a Novel Framework Integrating Text and Protein Language Models

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

CMAP

#Key Features

#Technical Details

#Applications

#Impact

Citation

Context-aware Multi-Property Antibody Predictor: a Novel Framework Integrating Text and Protein Language Models

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact