bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

HD-Prot

The Hong Kong Polytechnic University / genbio.ai / Mohamed bin Zayed University of Artificial Intelligence

A hybrid-diffusion protein language model that adds a continuous-token diffusion head to a discrete pLM for joint sequence-structure modeling.

Released: December 2025

HD-Prot is a multimodal protein language model that jointly models protein sequence and structure within a single architecture. Proteins have an inherent sequence-structure duality, and while sequence data is abundant and naturally expressed as discrete tokens, structure is continuous and three-dimensional. Most multimodal protein language models reconcile this mismatch by discretizing structure with a vector-quantized codebook (as in ESM3 and DPLM-2), which loses fine-grained geometric detail. HD-Prot's central argument is that this loss is avoidable: a sequence-based pLM can be extended to the structure modality using continuous tokens — high-fidelity structure latents that skip quantization entirely.

To do this, HD-Prot places a continuous-valued diffusion "head" on top of a discrete protein language model. The model operates over a mixed stream of discrete sequence tokens and continuous structure tokens, tying them together through a single absorbing diffusion process. At each token, it either predicts a categorical distribution (for amino-acid identity) or runs a small continuous diffusion sampler (for the structure latent), so both modalities are estimated inside one unified language-model backbone.

The work comes from researchers at The Hong Kong Polytechnic University, GenBio AI, and the Mohamed bin Zayed University of Artificial Intelligence, posted to arXiv in December 2025 and accepted to KDD 2026. A notable framing is efficiency: the authors report matching state-of-the-art multimodal pLMs while using less than one-tenth the compute budget for the modality-extension fine-tuning stage.

#Key Features

  • Continuous structure tokens: Structures are encoded as non-quantized latents via the salad protein autoencoder (a sparse invariant point attention design, latent dimension 20), avoiding the information loss of VQ-VAE codebooks used by prior multimodal pLMs.
  • Hybrid diffusion head: A continuous diffusion module is mounted on a discrete pLM so the same model emits categorical predictions for sequence and continuous diffusion samples for structure.
  • Unified absorbing diffusion: A single absorbing-state diffusion process captures inter-token dependencies across both modalities rather than training separate sequence and structure models.
  • Multi-task capability: One trained model handles unconditional sequence-structure co-generation, motif-scaffolding, structure prediction, and inverse folding.
  • Compute-efficient adaptation: The modality-extension fine-tuning reportedly uses under one-tenth the budget of comparable SOTA multimodal pLMs.

#Technical Details

HD-Prot extends a sequence-pretrained discrete pLM and is evaluated at roughly 155M and 670M parameter scales. Structure latents come from the salad autoencoder (Jendrusch & Korbel, 2025); the modality-extension stage was trained on approximately 210K filtered protein structures, following the data setup used by DPLM-2. On reported benchmarks for the 670M model, unconditional co-generation at 300 residues reaches pLDDT 81.1, self-consistency RMSD 4.9 Å, and scTM 0.878; motif-scaffolding solves 19.4 of 24 tasks (24.1% success rate); structure prediction on CAMEO reaches 7.47 Å RMSD and 0.769 TM-score; and inverse folding yields scRMSD 4.7 Å with scTM 0.866. These results place HD-Prot on par with state-of-the-art multimodal pLMs despite the reduced training budget.

#Applications

HD-Prot targets protein design and analysis workflows that benefit from joint reasoning over sequence and structure. Unconditional co-generation produces novel sequence-structure pairs for de novo design; motif-scaffolding builds new proteins around a fixed functional motif; structure prediction folds a given sequence; and inverse folding designs sequences for a target backbone. Researchers in computational protein engineering and generative biology can use a single model across these tasks rather than maintaining separate specialized pipelines.

#Impact

HD-Prot offers evidence that multimodal protein language models can incorporate structure through continuous tokens instead of quantized codebooks, preserving geometric detail while remaining compatible with the language-modeling framework. By demonstrating that categorical and continuous distributions can be estimated together in one architecture — and doing so under a small compute budget — it points to a practical alternative direction for multimodal pLM design. Limitations include the modest training-set size (~210K structures) and that, at the time of the preprint, pretrained weights had not yet been publicly released: code and preprocessed data are on GitHub, while checkpoints (hdprot_155m and hdprot_670m) were stated to be released upon paper acceptance, with availability on request in the interim.

Citation

Preprint

DOI: 10.48550/arXiv.2512.15133

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
14Closed
Usability — can I run it?11
Reproducibility — can I retrain it?18
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

diffusiongenerativeinverse_foldinglanguage_modelmotif_scaffoldingmultimodalprotein_designstructure_predictiontransformer

Resources

GitHub RepositoryResearch Paper