bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging foundation models
ImagingSmall molecule

Emap2lig

Kihara Lab / Purdue University

A two-stage deep learning framework that detects ligand densities in cryo-EM maps and reconstructs their atomic structures with a diffusion generative model.

Released: June 2026

Cryo-electron microscopy (cryo-EM) has become a dominant technique for determining the structures of macromolecular complexes, but identifying and modeling the small-molecule ligands bound within these structures remains a manual, expertise-intensive bottleneck. Ligand densities are often weak, fragmented, or ambiguous, and at moderate resolutions they can be difficult to distinguish from water, ions, or noise. Emap2lig, developed by the Kihara Lab at Purdue University and released as a bioRxiv preprint in June 2026, addresses this problem with a fully automated, two-stage deep learning pipeline that both locates ligand densities and reconstructs their atomic structures directly from a cryo-EM map.

The framework decomposes the task into detection and modeling. The first stage, Emap2lig-Find, segments candidate ligand density blobs from the input map, operating at resolutions as coarse as roughly 5 Å where ligand features are typically hard to interpret by eye. The second stage, Emap2lig-Build, takes a segmented density together with the chemical identity of the candidate ligand and uses a diffusion-based generative model to produce atomic coordinates that fit the density.

By unifying ligand discovery and atomic modeling into a single framework, Emap2lig complements other cryo-EM tools from the Kihara group and the broader community—such as map-enhancement methods like EMReady2 and segmentation approaches like CryoSAM—while focusing specifically on the underserved problem of ligand interpretation across a broad range of map resolutions.

#Key Features

  • Two-stage detect-then-build design: Emap2lig-Find first segments ligand density blobs, then Emap2lig-Build reconstructs atomic coordinates, separating the localization problem from the structure-generation problem.
  • Robust to moderate resolution: Ligand detection is designed to work down to approximately 5 Å, a regime where manual ligand placement is especially error-prone.
  • Diffusion-based atomic modeling: The build stage uses a diffusion generative model (with a PairFormer-style attention module and an AtomDiffusion process) to generate ligand atomic structures conditioned on density and ligand identity.
  • Inference-only pipeline: The released pipeline runs from fixed pretrained weights that download automatically from Hugging Face; no user training or fine-tuning is required.
  • Multiple access modes: Available as a local command-line tool and web GUI, and as a free GPU-backed web server at em.kiharalab.org for users without local hardware.

#Technical Details

Emap2lig is a deep learning system rather than a single monolithic network. Stage one (Emap2lig-Find) uses a MUNet-style segmentation architecture to identify ligand-associated densities within a cryo-EM map supplied as a .map.gz volume together with a list of candidate ligands. Stage two (Emap2lig-Build) reconstructs atomic ligand structures using a PairFormer attention module coupled to an AtomDiffusion generative process, an approach inspired by recent diffusion-based structure predictors. The pipeline is implemented in Python (3.12) and requires an NVIDIA GPU with at least 8 GB of VRAM and a CUDA 12/13 driver; pretrained weights are pulled from the Hugging Face Hub on first run. The repository ships inference code only—training code is not included—so the model behaves as a fixed predictor. The associated preprint, "Direct Detection and Atomic Modeling of Ligands in Cryo-EM Maps Using Deep Learning" (Li, Jain, Kagaya, Park, and Kihara), reports the framework as a unified solution spanning a broad range of resolutions, though quantitative benchmark figures should be read from the preprint itself.

#Applications

Emap2lig is aimed at structural biologists and cryo-EM practitioners who need to identify and model small-molecule ligands—such as drug candidates, cofactors, and substrates—bound to proteins and complexes resolved by cryo-EM. It is most useful when ligand density is present but difficult to interpret manually, including at moderate resolutions, and can accelerate structure-based drug discovery and mechanistic studies by automating a step that traditionally requires expert intervention. The free web server lowers the barrier for labs without dedicated GPU resources.

#Impact

Emap2lig extends the trajectory of deep learning into one of the last manual frontiers of cryo-EM model building: ligand interpretation. By pairing density segmentation with diffusion-based atomic generation, it offers an automated, reproducible alternative to hand-fitting ligands and broadens the resolution range over which ligands can be modeled with confidence. As a recent (June 2026) preprint, its real-world adoption and accuracy relative to established ligand-fitting workflows remain to be established through community use and peer review, and—because only inference weights are released—the method is best understood as a ready-to-use predictor rather than a retrainable system.

Citation

Direct Detection and Atomic Modeling of Ligands in Cryo-EM Maps Using Deep Learning

Li, S., et al. (2026) Direct Detection and Atomic Modeling of Ligands in Cryo-EM Maps Using Deep Learning. bioRxiv.

DOI: 10.64898/2026.06.01.729423

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References36

GitHub

Stars1
Forks1
Open Issues0
Contributors1
Last Push4d ago
LanguagePython
LicenseGPL-3.0

HuggingFace

Downloads0
Likes0
Last Modified5d ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
25Closed
Usability — can I run it?54
Reproducibility — can I retrain it?0
not reproducible
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

atomic_modelingcryo_emdiffusiongenerativeligand_detectionsegmentationsupervisedtransformerunet

Resources

GitHub RepositoryResearch PaperOfficial WebsiteHuggingFace Model