Geometric deep learning model that learns universal atomic-scale representations of intermolecular interfaces across proteins, small molecules, ions, lipids, and nucleic acids.
Molecular function in cells is mediated by physical interactions at interfaces: a drug docking into an enzyme pocket, a metal ion coordinating a catalytic site, a protein recognizing an RNA strand. ATOMICA is a geometric deep learning model that learns a single, universal representation of these intermolecular interfaces at atomic resolution, rather than treating each interaction type with a separate, modality-specific tool. It spans five chemical modalities — proteins, small molecules, metal ions, lipids, and nucleic acids — in one shared embedding space.
Developed by the Zitnik lab at Harvard University and collaborators (including the Loscalzo and Pentelute groups), ATOMICA was released as a preprint in 2025 and revised in 2026. It addresses a long-standing fragmentation in structural biology: models for protein–protein, protein–ligand, protein–nucleic-acid, and metal-binding interactions have historically been trained and evaluated in isolation. By learning from all of these jointly, ATOMICA captures the shared physical grammar of interfaces and transfers knowledge across interaction types.
The model is trained self-supervised on over 2 million experimentally determined interaction complexes, producing embeddings that support downstream tasks without task-specific labels. This positions ATOMICA alongside foundation models such as ESM and structural tools like AlphaFold, but with an explicit focus on the chemistry of binding interfaces across modalities.
ATOMICA is a geometric (equivariant) graph neural network that encodes molecular interfaces as atomic graphs, preserving 3D spatial relationships between atoms across the interacting partners. It is pretrained in a self-supervised manner on 2,037,972 interaction complexes drawn from experimental structural data spanning the five supported modalities. The learned embeddings transfer to multiple downstream evaluations: ATOMICA reports state-of-the-art results on RNAGlib 3D benchmarks for RNA structural tasks and supports zero-shot prediction of ligand binding in poorly characterized "dark proteome" proteins. In a notable wet-lab validation, five ATOMICA-predicted heme-binding sites were experimentally confirmed, illustrating that the representations capture chemically meaningful binding determinants rather than dataset artifacts.
ATOMICA is useful for researchers studying molecular recognition across chemical classes: identifying and characterizing binding pockets, annotating ligand- and ion-binding sites in understudied proteins, analyzing protein–RNA interfaces, and generating transferable interface embeddings for downstream machine learning. Its zero-shot capabilities make it especially valuable for the dark proteome, where functional annotations are scarce, and for drug discovery workflows that need a unified view of protein, small-molecule, and nucleic-acid interactions.
By unifying five interaction modalities under a single atomic-scale representation, ATOMICA offers a foundation-model approach to interface biology that previously required a patchwork of specialized tools. The experimental confirmation of predicted heme-binding sites lends credibility to its zero-shot predictions, and its strong RNA benchmark results extend foundation-model thinking into protein–nucleic-acid space. As a representation backbone, it has the potential to accelerate binding-site annotation, ligand discovery, and interface-aware modeling across structural biology. As a recent preprint, its broader adoption and generalization across diverse protein families remain to be established.