University of Florida / University of California, Irvine
An 85M-parameter conditional discrete diffusion model for protein variant generation with a calibrated identity dial to steer similarity to a wild-type sequence.
MuseDrift is a conditional discrete diffusion model for protein variant generation that lets researchers explicitly control how far a generated sequence drifts from a starting wild-type protein. Rather than sampling sequences unconstrained or relying on coarse temperature parameters, MuseDrift exposes a calibrated "identity dial" that targets a desired sequence-identity range relative to the input, allowing users to navigate a protein's evolutionary manifold from conservative, near-native variants to more divergent designs. The model was introduced in May 2026 by Chaoyang Wang (University of California, Irvine) and Yiquan Wang (University of Florida) in a bioRxiv preprint.
The central problem MuseDrift addresses is steerability. Many generative protein models can produce plausible sequences but offer limited, well-calibrated control over the degree of similarity between the output and a reference, which is exactly the knob protein engineers care about when balancing novelty against the risk of disrupting function. By framing variant generation as conditional discrete diffusion and training on a large corpus of related sequence pairs, MuseDrift learns transitions between evolutionarily connected proteins and can be guided at inference time toward a target identity level.
Notably, the authors report that this 85M-parameter model matches or exceeds the structural-confidence performance of models roughly 20 times larger, suggesting that explicit conditioning and a well-constructed training corpus can substitute for raw scale on these benchmarks.
MuseDrift is built on a conditional discrete diffusion framework with approximately 85 million parameters. It was trained on the Seed-and-Stratify corpus comprising 38.2 million sequence pairs, a construction designed to expose the model to evolutionarily related proteins so that it can learn well-behaved transitions across a protein's sequence neighborhood. At inference, the identity dial τ spans 0.55 to 0.95, and the calibration ensures that requested identity targets correspond to the realized similarity of generated variants.
On structural-confidence benchmarks measured with predicted local distance difference test (pLDDT) scores, MuseDrift reports 84.97 on Mol-Instructions and 83.14 on CAMEO at τ = 0.95 (the most conservative, highest-identity setting). The authors emphasize that these results match or exceed those of generative protein models roughly 20× larger in parameter count, framing MuseDrift as a parameter-efficient approach to controllable variant design.
MuseDrift is aimed at protein engineers and computational biologists who need to generate variant libraries around a known wild-type with explicit control over how conservative or exploratory each design is. The identity dial maps naturally onto common engineering workflows, such as proposing near-native point-variant sets for stability or expression optimization at high τ, or generating more divergent candidates for functional diversification or scaffold exploration at lower τ. Because generation is conditioned on a reference sequence, the model fits into directed-evolution-style campaigns and in silico library design where similarity to the parent must be controlled.
MuseDrift contributes to a growing line of work on steerable generative models for proteins, arguing that calibrated conditioning and an evolution-aware training corpus can rival brute-force scaling on structural-confidence benchmarks. If the reported parameter efficiency holds up under independent evaluation, it points toward more accessible, controllable design tools that run on modest hardware. As of this writing the work is a bioRxiv preprint and has not yet completed peer review; no model weights or code repository were located, and the preprint is released under a CC BY-NC 4.0 (non-commercial) license, which constrains commercial use and reuse until further resources are published.
Wang, C. & Wang, Y. (2026) MuseDrift: Navigating Protein Evolutionary Manifolds with Conditional Discrete Diffusion. bioRxiv.
DOI: 10.64898/2026.05.11.724439