bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Protein

Parapred

University of Cambridge

Sequence-based deep learning model for antibody paratope prediction using convolutional and recurrent neural networks. Identifies antigen-contacting residues from CDR sequences alone.

Released: 2018

Overview

Parapred is a sequence-based machine learning algorithm for predicting the paratope of an antibody — the subset of residues within the hypervariable complementarity-determining regions (CDRs) that directly contact the antigen during binding. Developed by Edgar Liberis, Petar Velickovic, Pietro Sormanni, Michele Vendruscolo, and Pietro Liò at the University of Cambridge and published in Bioinformatics in September 2018, Parapred addresses a fundamental challenge in antibody science: while CDR loops can be identified readily from sequence using established numbering schemes, pinpointing which specific residues engage in binding is considerably harder and conventionally required experimental structural data.

The model works purely from amino acid sequence, requiring only the CDR sequences of the heavy and light chains as input — no antigen structure or sequence is needed. This positions Parapred as an accessible, computationally lightweight complement to the broader antibody structure prediction and design ecosystem, applicable at early stages of antibody discovery and engineering before structural characterization is available.

Parapred was trained and evaluated on a non-redundant set of 239 antibody-antigen complexes curated from the Structural Antibody Database (SAbDab), filtered to ensure crystal resolution better than 3 Angstroms, no two antibody sequences sharing more than 95% identity, and each complex containing at least five antigen-contacting residues. This careful curation makes the benchmark realistic and avoids overfitting to redundant structures.

Key Features

  • Sequence-only input: Accepts CDR sequences (including two flanking framework residues at each end, known to occasionally participate in binding) without any requirement for antigen information or 3D structural coordinates.
  • Hybrid CNN-LSTM architecture: Combines convolutional layers for capturing local residue neighborhood features with bidirectional LSTM layers for modeling long-range dependencies along the CDR sequence, enabling both spatial and sequential context to inform predictions.
  • Residue-level probabilistic output: Produces per-residue binding probability scores, allowing researchers to rank and threshold contact predictions and understand confidence levels across different CDR positions.
  • Chothia numbering integration: CDR identification uses the Chothia antibody numbering scheme, which is consistent with standard structural annotation pipelines and ensures compatibility with other antibody modeling tools.
  • Docking-compatible predictions: The authors demonstrated that Parapred's predicted paratopes, when used as constraints in rigid-body docking simulations, improve docking accuracy to a degree approaching that of using experimentally determined paratopes — offering a practical route to structure-guided antigen docking from sequence alone.
  • Freely available: The model is openly released on GitHub with a Python implementation and a companion web server, making it accessible without specialized infrastructure.

Technical Details

Parapred's architecture processes each CDR independently as a short amino acid sequence. Residues are encoded as one-hot vectors augmented with physicochemical property features, and passed through a stack of 1D convolutional filters to extract local sequence motifs. The resulting feature maps are then fed into a bidirectional LSTM, whose hidden states capture context from both the N-terminal and C-terminal directions across the CDR. A final dense layer with sigmoid activation produces a per-residue binding probability. The model is trained with binary cross-entropy loss on contact labels derived from the SAbDab complexes using a 4 Angstrom heavy-atom distance cutoff to define paratope residues.

On the 239-complex SAbDab benchmark, Parapred outperformed prior sequence-based paratope prediction approaches and provided predictions of practical utility for downstream docking. The architecture is lightweight by modern standards — CDR sequences are typically 10 to 20 residues — and inference is essentially instantaneous on a standard laptop CPU, making it well suited for high-throughput screening across large antibody repertoires.

Applications

Parapred is primarily useful in early-stage antibody discovery and optimization workflows, where crystal or cryo-EM structures are not yet available. Predicted paratope probabilities can focus mutagenesis campaigns on putative contact residues, guide the design of variant libraries for affinity maturation, and identify which CDR positions are likely important for antigen engagement before investing in structural experiments. The docking application is particularly valuable for epitope mapping campaigns: paratope predictions serve as spatial constraints that improve docked pose quality without requiring experimental structural data. Parapred is also used as a standard benchmark comparison point when evaluating newer paratope prediction algorithms, reflecting its role as an established sequence-based baseline in the field.

Impact

Parapred established sequence-based deep learning as a viable approach to paratope prediction and provided the community with a practical, open tool that remains in active use. It has served as a reference baseline for a subsequent generation of paratope predictors, including Paragraph (which uses graph neural networks over predicted antibody structures), ParaAntiProt (which leverages protein language model embeddings), and ParaDeep (which extends the BiLSTM-CNN approach with chain-aware modeling). These comparisons consistently acknowledge Parapred as the foundational sequence-only method against which improvements are measured. A notable limitation is the relatively small training set of 239 complexes, which constrains generalization across the full diversity of antibody-antigen binding modes. The model also does not capture antibody-antigen complementarity — predictions are made solely from the antibody side — and cannot account for epitope-specific contact geometries that more recent structure-aware methods exploit.

Citation

Parapred: antibody paratope prediction using convolutional and recurrent neural networks

Liberis, E., et al. (2018) Parapred: antibody paratope prediction using convolutional and recurrent neural networks. Bioinform..

DOI: 10.1093/bioinformatics/bty305

Metrics

GitHub

Stars61
Forks16
Open Issues8
Contributors1
Last Push2y ago
LanguagePython
LicenseMIT

Citations

Total Citations148
Influential18
References29

Tags

paratope predictionantibody

Resources

GitHub RepositoryResearch Paper