bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
ProteinLanguage model

CMAP

Amazon Web Services

Multimodal antibody developability predictor that combines text and protein language models with in-context learning to predict many properties without retraining.

Released: January 2026

CMAP (Context-aware Multi-Property Antibody Predictor) is a multimodal framework from Amazon that predicts antibody developability properties by integrating a text language model with a protein language model. It was released as a preprint in January 2026 and subsequently published in npj Systems Biology and Applications. The model addresses a practical bottleneck in therapeutic antibody discovery: property predictors typically must be retrained for each laboratory-specific assay, struggle with incomplete datasets, and are vulnerable to batch effects that introduce systematic bias across measurement campaigns.

Rather than fine-tuning a separate model per property, CMAP frames developability prediction as an in-context learning task. A prompt supplies a handful of example antibody sequence/property pairs, and the model conditions its prediction for a query antibody on those examples. This lets a single trained model adapt to new or laboratory-specific properties at inference time without parameter updates, which is valuable when assay data are scarce or expensive to generate.

CMAP is distinguished from approaches that rely on extensive fine-tuning of large foundation models (such as TxGemma) by being a compact, prompt-driven predictor. Its multimodal design pairs natural-language context with learned protein representations, positioning it within the emerging space of text-plus-protein models applied to antibody engineering.

#Key Features

  • In-context multi-property prediction: A single model predicts multiple developability properties from prompts containing example antibody sequence/property pairs, removing the need to retrain for each property.
  • Text + protein multimodality: CMAP combines a text language model with a protein language model through a specialized tokenization and embedding-projection system that fuses the two modalities.
  • Context-aware training strategy: An "AB-context-aware" objective forces the model to condition predictions on the in-context examples rather than learning shortcuts from sequence alone, improving robustness to incomplete data and batch effects.
  • Strong correlation across properties: The authors report Spearman's rho above 0.8 across multiple developability properties on their evaluation set.
  • Rapid assay adaptation: Because adaptation happens at inference via prompting, the model can be applied to new laboratory-specific assays without a new training run.

#Technical Details

CMAP is a compact multimodal transformer that links a text language model and a protein language model via a dedicated tokenization and embedding-projection module. It was trained on 876,898 antibodies using the AB-context-aware learning strategy, which constructs prompts of example sequence/property pairs so the model learns to read context rather than memorize sequence-to-label mappings. Reported performance reaches Spearman's rho greater than 0.8 across several developability properties, with the framework designed to handle incomplete datasets and batch effects that commonly degrade single-property predictors.

#Applications

CMAP targets antibody engineers and biopharma discovery teams who must triage large candidate panels for developability—properties affecting expression, stability, aggregation, and manufacturability—often using assays that differ across labs. By adapting through in-context examples, it lets teams apply one model to bespoke or low-data assays, prioritizing candidates for experimental characterization without standing up a new predictor for every property or measurement protocol.

#Impact

CMAP illustrates how in-context learning can make antibody property prediction more portable across assays and laboratories, reducing the retraining burden that limits many developability models. Its publication in npj Systems Biology and Applications signals peer acceptance of the text-plus-protein, prompt-conditioned approach. A notable limitation is that the model and training code are not publicly released (it originates from industry research), so independent reproduction and external benchmarking on novel targets remain constrained.

Openness

bio.rodeo opennessClosed · low usability and reproducibility
4Closed
Usability — can I run it?7
Reproducibility — can I retrain it?0
not reproducible
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

property_predictionantibody_developabilitytransformermultimodalin_context_learninglanguage_modelantibody

Resources

Research PaperResearch Paper