Parapred

Antibody paratope prediction model that identifies antigen-contacting residues from heavy and light CDR sequences alone, using CNN and RNN layers.

Released: September 2018

Parapred is a sequence-based machine learning algorithm for predicting the paratope of an antibody — the subset of residues within the hypervariable complementarity-determining regions (CDRs) that directly contact the antigen during binding. Developed by Edgar Liberis, Petar Velickovic, Pietro Sormanni, Michele Vendruscolo, and Pietro Liò at the University of Cambridge and published in Bioinformatics in September 2018, Parapred addresses a fundamental challenge in antibody science: while CDR loops can be identified readily from sequence using established numbering schemes, pinpointing which specific residues engage in binding is considerably harder and conventionally required experimental structural data.

The model works purely from amino acid sequence, requiring only the CDR sequences of the heavy and light chains as input — no antigen structure or sequence is needed. This positions Parapred as an accessible, computationally lightweight complement to the broader antibody structure prediction and design ecosystem, applicable at early stages of antibody discovery and engineering before structural characterization is available.

Parapred was trained and evaluated on a non-redundant set of 239 antibody-antigen complexes curated from the Structural Antibody Database (SAbDab), filtered to ensure crystal resolution better than 3 Angstroms, no two antibody sequences sharing more than 95% identity, and each complex containing at least five antigen-contacting residues. This careful curation makes the benchmark realistic and avoids overfitting to redundant structures.

Key Features

Sequence-only input: Accepts CDR sequences (including two flanking framework residues at each end, known to occasionally participate in binding) without any requirement for antigen information or 3D structural coordinates.
Hybrid CNN-LSTM architecture: Combines convolutional layers for capturing local residue neighborhood features with bidirectional LSTM layers for modeling long-range dependencies along the CDR sequence, enabling both spatial and sequential context to inform predictions.
Residue-level probabilistic output: Produces per-residue binding probability scores, allowing researchers to rank and threshold contact predictions and understand confidence levels across different CDR positions.
Chothia numbering integration: CDR identification uses the Chothia antibody numbering scheme, which is consistent with standard structural annotation pipelines and ensures compatibility with other antibody modeling tools.
Docking-compatible predictions: The authors demonstrated that Parapred's predicted paratopes, when used as constraints in rigid-body docking simulations, improve docking accuracy to a degree approaching that of using experimentally determined paratopes — offering a practical route to structure-guided antigen docking from sequence alone.
Freely available: The model is openly released on GitHub with a Python implementation and a companion web server, making it accessible without specialized infrastructure.

Technical Details

Parapred's architecture processes each CDR independently as a short amino acid sequence. Residues are encoded as one-hot vectors augmented with physicochemical property features, and passed through a stack of 1D convolutional filters to extract local sequence motifs. The resulting feature maps are then fed into a bidirectional LSTM, whose hidden states capture context from both the N-terminal and C-terminal directions across the CDR. A final dense layer with sigmoid activation produces a per-residue binding probability. The model is trained with binary cross-entropy loss on contact labels derived from the SAbDab complexes using a 4 Angstrom heavy-atom distance cutoff to define paratope residues.

On the 239-complex SAbDab benchmark, Parapred outperformed prior sequence-based paratope prediction approaches and provided predictions of practical utility for downstream docking. The architecture is lightweight by modern standards — CDR sequences are typically 10 to 20 residues — and inference is essentially instantaneous on a standard laptop CPU, making it well suited for high-throughput screening across large antibody repertoires.

Applications

Parapred is primarily useful in early-stage antibody discovery and optimization workflows, where crystal or cryo-EM structures are not yet available. Predicted paratope probabilities can focus mutagenesis campaigns on putative contact residues, guide the design of variant libraries for affinity maturation, and identify which CDR positions are likely important for antigen engagement before investing in structural experiments. The docking application is particularly valuable for epitope mapping campaigns: paratope predictions serve as spatial constraints that improve docked pose quality without requiring experimental structural data. Parapred is also used as a standard benchmark comparison point when evaluating newer paratope prediction algorithms, reflecting its role as an established sequence-based baseline in the field.

Impact

Parapred established sequence-based deep learning as a viable approach to paratope prediction and provided the community with a practical, open tool that remains in active use. It has served as a reference baseline for a subsequent generation of paratope predictors, including Paragraph (which uses graph neural networks over predicted antibody structures), ParaAntiProt (which leverages protein language model embeddings), and ParaDeep (which extends the BiLSTM-CNN approach with chain-aware modeling). These comparisons consistently acknowledge Parapred as the foundational sequence-only method against which improvements are measured. A notable limitation is the relatively small training set of 239 complexes, which constrains generalization across the full diversity of antibody-antigen binding modes. The model also does not capture antibody-antigen complementarity — predictions are made solely from the antibody side — and cannot account for epitope-specific contact geometries that more recent structure-aware methods exploit.

Citation

Parapred: antibody paratope prediction using convolutional and recurrent neural networks

Liberis, E., et al. (2018) Parapred: antibody paratope prediction using convolutional and recurrent neural networks. Bioinform..

DOI: 10.1093/bioinformatics/bty305

Recent citations

Papers that recently cited this model.

In silico discovery and serological validation of Trypanosoma cruzi-specific B-cell epitopes for high-precision Chagas disease diagnosis
Mayron Antonio Candia-Puma, L. D. Goyzueta-Mamani, H. L. Barazorda-Ccahuana, et al.
Frontiers in Microbiology · Jul 2026
0
AI-Driven Design Platforms of Next-Generation Antibody Therapeutics
Ying-Jie Wang, Afsheen Saba, Yue Ran, et al.
Topics in current chemistry · Jun 2026
0
Discovery and characterization of an anti-Neisseria gonorrhoeae NGO_1985 monoclonal antibody and cognate antigen
Pardis Mokhtary, S. Stazzoni, Eleonora Marini, et al.
Frontiers in Microbiology · Jun 2026
0

Top citations

The most-cited papers that cite this model.

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development
Kexin Huang, Tianfan Fu, Wenhao Gao, et al.
NeurIPS Datasets and Benchmarks · Feb 2021
449
BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning
David Prihoda, Jad Maamary, A. Waight, et al.
bioRxiv · Aug 2021
186
Computational approaches to therapeutic antibody design: established methods and emerging trends
R. A. Norman, Francesco Ambrosetti, A. Bonvin, et al.
Briefings Bioinform. · Oct 2019
182Influential
A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding
R. Akbar, Philippe A. Robert, Milena Pavlović, et al.
bioRxiv · Sep 2019
181
Antibody complementarity determining region design using high-capacity machine learning
Ge Liu, Haoyang Zeng, Jonas W. Mueller, et al.
bioRxiv · Jun 2019
171

Citations

Total Citations153

Influential18

References29

GitHub

Stars61

Forks16

Open Issues8

Contributors1

Last Push3y ago

LanguagePython

LicenseMIT

Fields of citing research

Computer Science82%
Medicine82%
Biology65%
Chemistry14%
Engineering3%
Business3%
Mathematics1%
Materials Science1%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

88Open

Usability — can I run it?90

Reproducibility — can I retrain it?92

Model Openness Framework

Unclassified

No formal model card / data card

Resources

GitHub Repository Research Paper Research Paper Dataset

Key Features

Sequence-only input: Accepts CDR sequences (including two flanking framework residues at each end, known to occasionally participate in binding) without any requirement for antigen information or 3D structural coordinates.

Hybrid CNN-LSTM architecture: Combines convolutional layers for capturing local residue neighborhood features with bidirectional LSTM layers for modeling long-range dependencies along the CDR sequence, enabling both spatial and sequential context to inform predictions.

Residue-level probabilistic output: Produces per-residue binding probability scores, allowing researchers to rank and threshold contact predictions and understand confidence levels across different CDR positions.

Chothia numbering integration: CDR identification uses the Chothia antibody numbering scheme, which is consistent with standard structural annotation pipelines and ensures compatibility with other antibody modeling tools.

Docking-compatible predictions: The authors demonstrated that Parapred's predicted paratopes, when used as constraints in rigid-body docking simulations, improve docking accuracy to a degree approaching that of using experimentally determined paratopes — offering a practical route to structure-guided antigen docking from sequence alone.

Freely available: The model is openly released on GitHub with a Python implementation and a companion web server, making it accessible without specialized infrastructure.

Technical Details

Applications

Impact

Parapred

#Key Features

#Technical Details

#Applications

#Impact

Citation

Parapred: antibody paratope prediction using convolutional and recurrent neural networks

Recent citations

Top citations

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Parapred

#Key Features

#Technical Details

#Applications

#Impact

Citation

Parapred: antibody paratope prediction using convolutional and recurrent neural networks

Recent citations

Top citations

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact