bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein

MuseDrift

University of Florida / University of California, Irvine

An 85M-parameter conditional discrete diffusion model for protein variant generation with a calibrated identity dial to steer similarity to a wild-type sequence.

Released: May 2026
Parameters: 85 Million

MuseDrift is a conditional discrete diffusion model for protein variant generation that lets researchers explicitly control how far a generated sequence drifts from a starting wild-type protein. Rather than sampling sequences unconstrained or relying on coarse temperature parameters, MuseDrift exposes a calibrated "identity dial" that targets a desired sequence-identity range relative to the input, allowing users to navigate a protein's evolutionary manifold from conservative, near-native variants to more divergent designs. The model was introduced in May 2026 by Chaoyang Wang (University of California, Irvine) and Yiquan Wang (University of Florida) in a bioRxiv preprint.

The central problem MuseDrift addresses is steerability. Many generative protein models can produce plausible sequences but offer limited, well-calibrated control over the degree of similarity between the output and a reference, which is exactly the knob protein engineers care about when balancing novelty against the risk of disrupting function. By framing variant generation as conditional discrete diffusion and training on a large corpus of related sequence pairs, MuseDrift learns transitions between evolutionarily connected proteins and can be guided at inference time toward a target identity level.

Notably, the authors report that this 85M-parameter model matches or exceeds the structural-confidence performance of models roughly 20 times larger, suggesting that explicit conditioning and a well-constructed training corpus can substitute for raw scale on these benchmarks.

#Key Features

  • Calibrated identity dial: A tunable parameter τ ∈ [0.55, 0.95] steers the expected sequence identity between a generated variant and the wild-type, giving fine-grained control over conservative versus divergent designs.
  • Conditional discrete diffusion: Variant generation is modeled as a discrete diffusion process conditioned on a reference sequence, operating directly in amino-acid token space rather than continuous latent embeddings.
  • Compact yet competitive: At 85M parameters, the model matches or surpasses much larger systems on structural-confidence (pLDDT) benchmarks, reflecting efficiency from conditioning and curated data.
  • Evolution-aware training corpus: Trained on the Seed-and-Stratify corpus of 38.2M sequence pairs, which captures evolutionary relationships used to learn realistic sequence transitions.

#Technical Details

MuseDrift is built on a conditional discrete diffusion framework with approximately 85 million parameters. It was trained on the Seed-and-Stratify corpus comprising 38.2 million sequence pairs, a construction designed to expose the model to evolutionarily related proteins so that it can learn well-behaved transitions across a protein's sequence neighborhood. At inference, the identity dial τ spans 0.55 to 0.95, and the calibration ensures that requested identity targets correspond to the realized similarity of generated variants.

On structural-confidence benchmarks measured with predicted local distance difference test (pLDDT) scores, MuseDrift reports 84.97 on Mol-Instructions and 83.14 on CAMEO at τ = 0.95 (the most conservative, highest-identity setting). The authors emphasize that these results match or exceed those of generative protein models roughly 20× larger in parameter count, framing MuseDrift as a parameter-efficient approach to controllable variant design.

#Applications

MuseDrift is aimed at protein engineers and computational biologists who need to generate variant libraries around a known wild-type with explicit control over how conservative or exploratory each design is. The identity dial maps naturally onto common engineering workflows, such as proposing near-native point-variant sets for stability or expression optimization at high τ, or generating more divergent candidates for functional diversification or scaffold exploration at lower τ. Because generation is conditioned on a reference sequence, the model fits into directed-evolution-style campaigns and in silico library design where similarity to the parent must be controlled.

#Impact

MuseDrift contributes to a growing line of work on steerable generative models for proteins, arguing that calibrated conditioning and an evolution-aware training corpus can rival brute-force scaling on structural-confidence benchmarks. If the reported parameter efficiency holds up under independent evaluation, it points toward more accessible, controllable design tools that run on modest hardware. As of this writing the work is a bioRxiv preprint and has not yet completed peer review; no model weights or code repository were located, and the preprint is released under a CC BY-NC 4.0 (non-commercial) license, which constrains commercial use and reuse until further resources are published.

Citation

MuseDrift: Navigating Protein Evolutionary Manifolds with Conditional Discrete Diffusion

Wang, C. & Wang, Y. (2026) MuseDrift: Navigating Protein Evolutionary Manifolds with Conditional Discrete Diffusion. bioRxiv.

DOI: 10.64898/2026.05.11.724439

Openness

Unclassified
Restrictive license on core components

Tags

de_novo_designdiffusiongenerativeprotein_designself_supervisedtransformer

Resources

Research Paper