Overview

ProteinINR is a multimodal pre-training framework developed by Kakao Brain that addresses a long-standing gap in protein representation learning: the underutilization of molecular surface information. Protein surfaces govern interactions with other molecules — including drug candidates, binding partners, and cellular receptors — yet most representation learning approaches at the time of its publication relied exclusively on amino acid sequences or backbone atomic coordinates. ProteinINR proposes integrating all three levels of protein description — sequence, 3D structure, and surface — into a unified pre-training pipeline.

The core technical contribution is the application of Implicit Neural Representations (INRs) to protein molecular surfaces. Rather than discretizing surfaces into fixed-resolution meshes (which introduces resolution artifacts and limits generalization), INRs parameterize the surface as a continuous function of spatial coordinates. A small neural network learns to map any 3D query point to its corresponding surface property, enabling resolution-independent surface reconstruction. This allows ProteinINR to capture fine-grained geometric and chemical features of protein surfaces that are not recoverable from backbone coordinates alone.

ProteinINR was published as a conference paper at ICLR 2024 by Youhan Lee, Hasun Yu, Jaemyung Lee, and Jaehoon Kim. The work demonstrates that incorporating surface pre-training consistently improves downstream task performance compared to methods trained on sequence and structure alone, providing empirical support for the hypothesis that surface geometry encodes functional information complementary to what is captured by sequence and backbone encoders.

Key Features

Resolution-independent surface learning: Surfaces are represented as continuous functions via INRs rather than fixed-resolution meshes, enabling the model to query any point on the protein surface at arbitrary precision without discretization artifacts.
Series-fusion pre-training: The framework uses a staged pipeline — a sequence encoder is pre-trained first, then a structure encoder is pre-trained on surfaces using ProteinINR, and finally the structure encoder is refined through multi-view contrastive learning on 3D structures — allowing each modality to build on the previous.
Locality-aware surface decoder: The surface decoder incorporates a locality inductive bias, improving reconstruction of local surface detail and achieving over 50% improvement relative to the prior TransINR baseline.
Multimodal representation: The final protein representations integrate information from all three biological description levels, capturing evolutionary signals (sequence), geometric topology (structure), and interaction-relevant shape and chemistry (surface).
Downstream generality: Pre-trained representations transfer to a range of protein tasks including function prediction, fold classification, and enzyme reaction prediction without task-specific architectural changes.

Technical Details

ProteinINR builds on a two-encoder architecture: a sequence encoder (typically a protein language model backbone) processes amino acid sequences, and a structure encoder processes 3D coordinates. The surface learning component uses a transformer encoder to produce a latent embedding of the protein instance, which is then decoded by an INR decoder that maps 3D query coordinates to surface occupancy or property values. The decoder introduces a locality inductive bias — attending preferentially to spatially proximate surface features — that substantially improves reconstruction of fine surface detail over generic INR decoders such as TransINR.

Pre-training proceeds in three stages. First, the sequence encoder is pre-trained on protein sequence data. Second, the structure encoder is pre-trained on molecular surfaces using the ProteinINR objective, which trains the model to reconstruct protein surfaces from the learned structural embeddings. Third, the two encoders are jointly fine-tuned using multi-view contrastive learning across sequence and structure modalities. Downstream evaluation was conducted on standard protein representation benchmarks including fold classification, enzyme reaction classification, gene ontology term prediction, and protein function annotation. The paper was presented as a poster at ICLR 2024 and benchmarks against GearNet and other leading structure-based pre-training methods of the era.

Applications

ProteinINR is particularly relevant to tasks where protein surface geometry determines biological outcome: protein-protein interaction prediction, binding site identification, drug-protein docking, and functional annotation of structurally characterized proteins. Researchers working on computational drug discovery can leverage ProteinINR representations to better model surface complementarity between targets and potential ligands. The surface-informed embeddings also benefit enzyme engineering workflows where the active site geometry — a surface property — dictates catalytic specificity and efficiency.

Impact

ProteinINR contributes to a broader movement in computational biology toward multimodal protein representations that capture information at multiple scales of biological description. By demonstrating that molecular surface geometry can be encoded in a generalizable, resolution-independent way and incorporated into pre-training, the work establishes a new modality for protein representation learning alongside sequences and backbone structures. The surface-as-modality framing has subsequently influenced related multimodal models. A practical limitation is the absence of a public code release at the time of publication, which constrains independent replication and downstream adoption. The work also relies on pre-computed surface representations, which adds an additional preprocessing step compared to sequence- or structure-only baselines.

Overview

Key Features

Resolution-independent surface learning: Surfaces are represented as continuous functions via INRs rather than fixed-resolution meshes, enabling the model to query any point on the protein surface at arbitrary precision without discretization artifacts.

Series-fusion pre-training: The framework uses a staged pipeline — a sequence encoder is pre-trained first, then a structure encoder is pre-trained on surfaces using ProteinINR, and finally the structure encoder is refined through multi-view contrastive learning on 3D structures — allowing each modality to build on the previous.

Locality-aware surface decoder: The surface decoder incorporates a locality inductive bias, improving reconstruction of local surface detail and achieving over 50% improvement relative to the prior TransINR baseline.

Multimodal representation: The final protein representations integrate information from all three biological description levels, capturing evolutionary signals (sequence), geometric topology (structure), and interaction-relevant shape and chemistry (surface).

Downstream generality: Pre-trained representations transfer to a range of protein tasks including function prediction, fold classification, and enzyme reaction prediction without task-specific architectural changes.

Technical Details

Applications

Impact

ProteinINR

Overview

Key Features

Technical Details

Applications

Impact

Citation

Hey Dona! Can you help me with student course registration?

Metrics

GitHub

Citations

Tags

Resources

ProteinINR

Overview

Key Features

Technical Details

Applications

Impact

Citation

Hey Dona! Can you help me with student course registration?

Metrics

GitHub

Citations

Tags

Resources