Kihara Lab / Purdue University
A two-stage deep learning framework that detects ligand densities in cryo-EM maps and reconstructs their atomic structures with a diffusion generative model.
Cryo-electron microscopy (cryo-EM) has become a dominant technique for determining the structures of macromolecular complexes, but identifying and modeling the small-molecule ligands bound within these structures remains a manual, expertise-intensive bottleneck. Ligand densities are often weak, fragmented, or ambiguous, and at moderate resolutions they can be difficult to distinguish from water, ions, or noise. Emap2lig, developed by the Kihara Lab at Purdue University and released as a bioRxiv preprint in June 2026, addresses this problem with a fully automated, two-stage deep learning pipeline that both locates ligand densities and reconstructs their atomic structures directly from a cryo-EM map.
The framework decomposes the task into detection and modeling. The first stage, Emap2lig-Find, segments candidate ligand density blobs from the input map, operating at resolutions as coarse as roughly 5 Å where ligand features are typically hard to interpret by eye. The second stage, Emap2lig-Build, takes a segmented density together with the chemical identity of the candidate ligand and uses a diffusion-based generative model to produce atomic coordinates that fit the density.
By unifying ligand discovery and atomic modeling into a single framework, Emap2lig complements other cryo-EM tools from the Kihara group and the broader community—such as map-enhancement methods like EMReady2 and segmentation approaches like CryoSAM—while focusing specifically on the underserved problem of ligand interpretation across a broad range of map resolutions.
Emap2lig is a deep learning system rather than a single monolithic network. Stage one (Emap2lig-Find) uses a MUNet-style segmentation architecture to identify ligand-associated densities within a cryo-EM map supplied as a .map.gz volume together with a list of candidate ligands. Stage two (Emap2lig-Build) reconstructs atomic ligand structures using a PairFormer attention module coupled to an AtomDiffusion generative process, an approach inspired by recent diffusion-based structure predictors. The pipeline is implemented in Python (3.12) and requires an NVIDIA GPU with at least 8 GB of VRAM and a CUDA 12/13 driver; pretrained weights are pulled from the Hugging Face Hub on first run. The repository ships inference code only—training code is not included—so the model behaves as a fixed predictor. The associated preprint, "Direct Detection and Atomic Modeling of Ligands in Cryo-EM Maps Using Deep Learning" (Li, Jain, Kagaya, Park, and Kihara), reports the framework as a unified solution spanning a broad range of resolutions, though quantitative benchmark figures should be read from the preprint itself.
Emap2lig is aimed at structural biologists and cryo-EM practitioners who need to identify and model small-molecule ligands—such as drug candidates, cofactors, and substrates—bound to proteins and complexes resolved by cryo-EM. It is most useful when ligand density is present but difficult to interpret manually, including at moderate resolutions, and can accelerate structure-based drug discovery and mechanistic studies by automating a step that traditionally requires expert intervention. The free web server lowers the barrier for labs without dedicated GPU resources.
Emap2lig extends the trajectory of deep learning into one of the last manual frontiers of cryo-EM model building: ligand interpretation. By pairing density segmentation with diffusion-based atomic generation, it offers an automated, reproducible alternative to hand-fitting ligands and broadens the resolution range over which ligands can be modeled with confidence. As a recent (June 2026) preprint, its real-world adoption and accuracy relative to established ligand-fitting workflows remain to be established through community use and peer review, and—because only inference weights are released—the method is best understood as a ready-to-use predictor rather than a retrainable system.
Li, S., et al. (2026) Direct Detection and Atomic Modeling of Ligands in Cryo-EM Maps Using Deep Learning. bioRxiv.
DOI: 10.64898/2026.06.01.729423Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data