ATOMICA

Geometric deep learning model that learns atomic-scale representations of molecular interfaces across proteins, small molecules, and nucleic acids.

Released: March 2026

Molecular function in cells is mediated by physical interactions at interfaces: a drug docking into an enzyme pocket, a metal ion coordinating a catalytic site, a protein recognizing an RNA strand. ATOMICA is a geometric deep learning model that learns a single, universal representation of these intermolecular interfaces at atomic resolution, rather than treating each interaction type with a separate, modality-specific tool. It spans five chemical modalities — proteins, small molecules, metal ions, lipids, and nucleic acids — in one shared embedding space.

Developed by the Zitnik lab at Harvard University and collaborators (including the Loscalzo and Pentelute groups), ATOMICA was released as a preprint in 2025 and revised in 2026. It addresses a long-standing fragmentation in structural biology: models for protein–protein, protein–ligand, protein–nucleic-acid, and metal-binding interactions have historically been trained and evaluated in isolation. By learning from all of these jointly, ATOMICA captures the shared physical grammar of interfaces and transfers knowledge across interaction types.

The model is trained self-supervised on over 2 million experimentally determined interaction complexes, producing embeddings that support downstream tasks without task-specific labels. This positions ATOMICA alongside foundation models such as ESM and structural tools like AlphaFold, but with an explicit focus on the chemistry of binding interfaces across modalities.

Key Features

Five-modality coverage: A single model represents interfaces involving proteins, small molecules, metal ions, lipids, and nucleic acids, enabling cross-modality transfer in one embedding space.
Atomic-resolution geometry: ATOMICA operates on atomic graphs, capturing fine-grained spatial arrangements that coarser residue- or fragment-level models miss.
Large-scale self-supervised pretraining: Trained on more than 2 million interaction complexes, the model learns interface representations without requiring task-specific annotations.
Zero-shot ligand prediction in the dark proteome: ATOMICA predicts binding sites and ligands for understudied proteins, with five predicted heme-binding sites experimentally confirmed.
State-of-the-art on RNA benchmarks: The model achieves leading performance on RNAGlib 3D structural benchmarks, demonstrating strength on protein–nucleic-acid interfaces.

Technical Details

ATOMICA is a geometric (equivariant) graph neural network that encodes molecular interfaces as atomic graphs, preserving 3D spatial relationships between atoms across the interacting partners. It is pretrained in a self-supervised manner on 2,037,972 interaction complexes drawn from experimental structural data spanning the five supported modalities. The learned embeddings transfer to multiple downstream evaluations: ATOMICA reports state-of-the-art results on RNAGlib 3D benchmarks for RNA structural tasks and supports zero-shot prediction of ligand binding in poorly characterized "dark proteome" proteins. In a notable wet-lab validation, five ATOMICA-predicted heme-binding sites were experimentally confirmed, illustrating that the representations capture chemically meaningful binding determinants rather than dataset artifacts.

Applications

ATOMICA is useful for researchers studying molecular recognition across chemical classes: identifying and characterizing binding pockets, annotating ligand- and ion-binding sites in understudied proteins, analyzing protein–RNA interfaces, and generating transferable interface embeddings for downstream machine learning. Its zero-shot capabilities make it especially valuable for the dark proteome, where functional annotations are scarce, and for drug discovery workflows that need a unified view of protein, small-molecule, and nucleic-acid interactions.

Impact

By unifying five interaction modalities under a single atomic-scale representation, ATOMICA offers a foundation-model approach to interface biology that previously required a patchwork of specialized tools. The experimental confirmation of predicted heme-binding sites lends credibility to its zero-shot predictions, and its strong RNA benchmark results extend foundation-model thinking into protein–nucleic-acid space. As a representation backbone, it has the potential to accelerate binding-site annotation, ligand discovery, and interface-aware modeling across structural biology. As a recent preprint, its broader adoption and generalization across diverse protein families remain to be established.

Citation

Learning Universal Representations of Intermolecular Interactions with ATOMICA

Preprint

Fang, A., et al. (2025) Learning Universal Representations of Intermolecular Interactions with ATOMICA. bioRxiv.

DOI: 10.1101/2025.04.02.646906

Recent citations

Papers that recently cited this model.

Structure-Based TCR–pMHC Binding Prediction and Generalization to Unseen Peptides
A. Abeer, Rajashree Roy, Xiaoning Qian, et al.
bioRxiv · Feb 2026
0
Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models
Robert Calef, A. Liang, M. Kellis, et al.
arXiv.org · Dec 2025
2
Tokenizing loops of antibodies
Ada Fang, R. Alberstein, Simon Kelow, et al.
mAbs · Sep 2025
2

Top citations

The most-cited papers that cite this model.

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models
Robert Calef, A. Liang, M. Kellis, et al.
arXiv.org · Dec 2025
2
Tokenizing loops of antibodies
Ada Fang, R. Alberstein, Simon Kelow, et al.
mAbs · Sep 2025
2
Structure-Based TCR–pMHC Binding Prediction and Generalization to Unseen Peptides
A. Abeer, Rajashree Roy, Xiaoning Qian, et al.
bioRxiv · Feb 2026
0

Citations

Total Citations3

Influential0

References191

Fields of citing research

Biology100%
Computer Science100%
Medicine100%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

88Open

Usability — can I run it?93

Reproducibility — can I retrain it?87

Model Openness Framework

Class II

Open Tooling

Resources

Research Paper

Key Features

Five-modality coverage: A single model represents interfaces involving proteins, small molecules, metal ions, lipids, and nucleic acids, enabling cross-modality transfer in one embedding space.

Atomic-resolution geometry: ATOMICA operates on atomic graphs, capturing fine-grained spatial arrangements that coarser residue- or fragment-level models miss.

Large-scale self-supervised pretraining: Trained on more than 2 million interaction complexes, the model learns interface representations without requiring task-specific annotations.

Zero-shot ligand prediction in the dark proteome: ATOMICA predicts binding sites and ligands for understudied proteins, with five predicted heme-binding sites experimentally confirmed.

State-of-the-art on RNA benchmarks: The model achieves leading performance on RNAGlib 3D structural benchmarks, demonstrating strength on protein–nucleic-acid interfaces.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Structure-Based TCR–pMHC Binding Prediction and Generalization to Unseen Peptides

A. Abeer, Rajashree Roy, Xiaoning Qian, et al.

bioRxiv · Feb 2026

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Robert Calef, A. Liang, M. Kellis, et al.

arXiv.org · Dec 2025

Tokenizing loops of antibodies

Ada Fang, R. Alberstein, Simon Kelow, et al.

mAbs · Sep 2025

Top citations

The most-cited papers that cite this model.

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Robert Calef, A. Liang, M. Kellis, et al.

arXiv.org · Dec 2025

Tokenizing loops of antibodies

Ada Fang, R. Alberstein, Simon Kelow, et al.

mAbs · Sep 2025

Structure-Based TCR–pMHC Binding Prediction and Generalization to Unseen Peptides

A. Abeer, Rajashree Roy, Xiaoning Qian, et al.

bioRxiv · Feb 2026

ATOMICA

Key Features

Technical Details

Applications

Impact

Citation

Learning Universal Representations of Intermolecular Interactions with ATOMICA

Recent citations

Structure-Based TCR–pMHC Binding Prediction and Generalization to Unseen Peptides

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Tokenizing loops of antibodies

Top citations

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Tokenizing loops of antibodies

Structure-Based TCR–pMHC Binding Prediction and Generalization to Unseen Peptides

Citations

Fields of citing research

Openness

Tags

Resources

ATOMICA

Key Features

Technical Details

Applications

Impact

Citation

Learning Universal Representations of Intermolecular Interactions with ATOMICA

Recent citations

Structure-Based TCR–pMHC Binding Prediction and Generalization to Unseen Peptides

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Tokenizing loops of antibodies

Top citations

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Tokenizing loops of antibodies

Structure-Based TCR–pMHC Binding Prediction and Generalization to Unseen Peptides

Citations

Fields of citing research

Openness

Tags

Resources

ATOMICA

#Key Features

#Technical Details

#Applications

#Impact

Citation

Learning Universal Representations of Intermolecular Interactions with ATOMICA

Recent citations

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Top citations

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Related models

Citations

Fields of citing research

Openness

Tags

Resources

ATOMICA

#Key Features

#Technical Details

#Applications

#Impact

Citation

Learning Universal Representations of Intermolecular Interactions with ATOMICA

Recent citations

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Top citations

Greater than the sum of Its Parts: Building Substructure into Protein Encoding Models

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact