bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
ImagingProtein

CryoProt

Hunan University / Xiangtan University

Protein pretraining framework that learns representations directly from cryo-EM density maps, transferring to flexibility, active-site, binding-affinity, and stability tasks.

Released: June 2026

CryoProt is a protein pretraining framework that learns structural representations directly from cryo-electron microscopy (cryo-EM) density maps, rather than from the atomic coordinate structures or amino acid sequences that dominate existing protein foundation models. Developed by researchers at Hunan University and Xiangtan University and released as an arXiv preprint in June 2026, it targets a largely untapped data modality: the volumetric density maps deposited in the Electron Microscopy Data Bank (EMDB), which encode experimentally observed structural information including conformational heterogeneity and regions of low local order that are often lost or idealized in downstream coordinate models.

The central premise is that density maps carry signal complementary to sequence- and coordinate-based representations. Sequence models such as ESM capture evolutionary constraints, and structure-based models capture geometry from solved coordinates, but neither directly ingests the raw experimental density that reflects how flexible or well-resolved each region of a protein actually is. CryoProt learns from this density signal through self-supervised pretraining, producing representations intended to transfer to a range of downstream protein property prediction tasks.

CryoProt fits into the emerging landscape of cryo-EM machine learning alongside models such as CryoFM and CryoViT, but differs in objective: rather than processing or segmenting maps for structure determination, it treats density maps as a pretraining corpus for learning general-purpose protein representations.

#Key Features

  • Density-map pretraining: Learns protein representations directly from cryo-EM volumetric density maps, a modality complementary to sequence and atomic-coordinate inputs and reflective of experimentally observed flexibility.
  • Transformer Map Encoder: A transformer-based encoder processes 3D density volumes, capturing structural context across the map for downstream transfer.
  • Multi-head latent attention (MLA): Employs MLA to compress and attend over latent representations, reducing memory overhead relative to standard multi-head attention while modeling long-range dependencies.
  • Cross-box dependency modeling: Models dependencies across the partitioned sub-volumes ("boxes") of a density map, allowing the encoder to integrate information that spans local crop boundaries.
  • Multi-task transfer: A single pretrained encoder is fine-tuned to flexibility prediction, active-site identification, binding affinity prediction, and stability change (ΔΔG) estimation.

#Technical Details

CryoProt is pretrained on approximately 20,530 density maps drawn from the EMDB, filtered to a reported resolution range of roughly 2–4 Angstroms to ensure usable structural detail. The architecture centers on a transformer Map Encoder that partitions each density map into sub-volumes and applies multi-head latent attention together with a cross-box dependency mechanism to integrate information across those partitions. After self-supervised pretraining on the map corpus, the encoder is adapted to downstream tasks through fine-tuning.

The authors report that fine-tuned CryoProt representations improve performance on downstream benchmarks — including protein flexibility prediction, active-site identification, binding affinity prediction, and ΔΔG (stability change) estimation — by up to approximately 12% over baselines. As of the June 2026 preprint, the reported results are obtained via task-specific fine-tuning; zero-shot transfer of the pretrained representations is not confirmed. The only code reference is an anonymized peer-review placeholder (anonymous.4open.science), and no permanent GitHub or HuggingFace release, weights, or license has been published; these details should be verified before the framework is used in production settings.

#Applications

CryoProt is aimed at computational structural biologists and protein engineers who want to leverage the growing volume of cryo-EM data beyond structure determination. By transferring density-derived representations, it can support flexibility analysis for understanding conformational dynamics, active-site identification for functional annotation and enzyme studies, binding affinity prediction relevant to drug discovery, and stability change prediction (ΔΔG) for protein engineering and mutational analysis. Because the framework draws on experimentally observed density rather than idealized coordinates, it is potentially most useful for proteins whose flexibility or partial disorder is poorly captured by static structure models.

#Impact

CryoProt introduces cryo-EM density maps as a pretraining modality for protein representation learning, expanding the set of data sources that protein foundation models can exploit beyond sequence and solved coordinates. If the reported gains hold under peer review, the work suggests that experimental density carries transferable signal complementary to existing approaches. Its current significance should be weighed against clear limitations: the work is an unreviewed preprint, results are reported only with fine-tuning rather than zero-shot evaluation, and no permanent code repository, released weights, or confirmed license is yet available — leaving reproducibility and adoption open questions pending a full release and peer review.

Citation

Preprint

DOI: 10.48550/arXiv.2606.00955

DOI: 10.48550/arXiv.2606.00955

Openness

Unclassified
Restrictive license on core components

Tags

active_site_identificationbinding_affinitycryo_emdensity_mapsfoundation_modelrepresentation_learningself_supervisedstability_predictiontransfer_learningtransformer

Resources

Research Paper