MoE-Bind is an autoregressive protein-binder generator developed by Dipayan Sarkar and Chiranjib Sarkar at the Computational Systems Biology Laboratory, Department of Bioinformatics, University of North Bengal, India, and posted to bioRxiv in June 2026. It addresses a practical bottleneck in de novo binder design: structure-based pipelines such as RFdiffusion-style workflows require a known three-dimensional target conformation and consume substantial compute and wall-clock time per design, which limits throughput for large-scale binder exploration.

The model's central premise is that sparse architectural design, rather than sheer scale, can deliver fast, structure-free binder generation. Sequence-only generative models had promised a lighter alternative to structure-based methods, but existing systems remained uniformly dense and often reintroduced structural computation at inference, eroding the speed advantage they were meant to provide. MoE-Bind is, to the authors' knowledge, the first sequence-only protein binder generator to adopt a sparse Mixture-of-Experts (MoE) feed-forward design, mirroring the dense-to-sparse transition that reshaped natural-language transformers.

MoE-Bind generates candidate binder sequences conditioned on a receptor (target) sequence alone, with no 3D structure input required at generation time. Generated binders are then evaluated zero-shot using two independent external structure predictors, Boltz-2 and AlphaFold2-Multimer, on a leakage-free Docking Benchmark 5.0 (DB5.0) protocol.

Key Features

Structure-free, receptor-conditioned generation: Produces full-length binder sequences from a target receptor sequence alone, removing the requirement for a resolved or predicted 3D target structure at design time.
Sparse Mixture-of-Experts efficiency: Combines Multi-head Latent Attention with a sparse MoE feed-forward network (top-2 of 8 experts plus a shared expert), activating fewer than half the per-token parameters of compute-matched dense baselines while matching or exceeding their binder quality.
Zero-shot transfer to short peptides: Transfers to short-peptide design without peptide-specific training, indicating the learned representations generalize beyond the full-length receptor-conditioned setting.
Interpretable expert specialization: Routing analysis on generated binders reveals expert specialization at both the individual amino acid and biochemical group level, a structured expert-token alignment not previously reported for natural-language MoE models.
Reduced training and inference compute: Decouples model capacity from per-token compute, cutting both training and inference cost by a large margin relative to dense baselines.

Technical Details

MoE-Bind is an approximately 100M-parameter decoder-only autoregressive transformer. Attention uses Multi-head Latent Attention, and each feed-forward block is replaced by a sparse MoE layer that routes each token to the top 2 of 8 experts alongside a persistent shared expert, yielding roughly 38.9M active parameters per token. Training follows a two-stage pipeline: pre-training on protein sequences (tokenized from FASTA corpora) followed by fine-tuning on receptor-binder interaction pairs. Evaluation is run on a leakage-free DB5.0 split, with generated binders scored zero-shot by Boltz-2 and AlphaFold2-Multimer; under both predictors MoE-Bind matches or exceeds compute-matched dense baselines despite its lower per-token compute. The reference implementation is open source under an MIT license, written in Python, and ships with a small demonstration dataset and 100M-scale configurations for end-to-end runs. Note that pretrained weights are not distributed; the repository provides training code and configs and instructs users to supply their own corpora to reproduce the 100M-scale model.

Applications

MoE-Bind targets early-stage de novo binder discovery where a target sequence is known but a high-quality experimental or predicted structure may be unavailable, costly, or slow to obtain. By generating receptor-conditioned binder candidates from sequence alone, it suits high-throughput in silico exploration that can be triaged downstream with structure predictors such as Boltz-2 or AlphaFold2-Multimer before committing to expensive folding or wet-lab validation. Its zero-shot transfer to short peptides extends its utility to peptide binder and therapeutic-lead ideation, and the interpretable routing signal offers a handle for analyzing which sequence and biochemical features drive a given design.

Impact

MoE-Bind is an early demonstration that sparse Mixture-of-Experts architectures, long established in language modeling, can be brought to sequence-only protein binder generation without sacrificing quality. Its main contributions are methodological: showing that architectural sparsity rather than scale can deliver competitive, structure-free binder design, and surfacing biochemically interpretable expert specialization. As a recent preprint from a single academic lab, it has not yet undergone peer review and reports modest adoption, and its practical reach is constrained by the absence of released pretrained weights, which means reproduction requires users to assemble their own training corpora. Its independent evaluation under two structure predictors on a leakage-free DB5.0 protocol nonetheless provides a credible signal that the sparse approach is worth pursuing further.

Key Features

Structure-free, receptor-conditioned generation: Produces full-length binder sequences from a target receptor sequence alone, removing the requirement for a resolved or predicted 3D target structure at design time.

Sparse Mixture-of-Experts efficiency: Combines Multi-head Latent Attention with a sparse MoE feed-forward network (top-2 of 8 experts plus a shared expert), activating fewer than half the per-token parameters of compute-matched dense baselines while matching or exceeding their binder quality.

Zero-shot transfer to short peptides: Transfers to short-peptide design without peptide-specific training, indicating the learned representations generalize beyond the full-length receptor-conditioned setting.

Interpretable expert specialization: Routing analysis on generated binders reveals expert specialization at both the individual amino acid and biochemical group level, a structured expert-token alignment not previously reported for natural-language MoE models.

Reduced training and inference compute: Decouples model capacity from per-token compute, cutting both training and inference cost by a large margin relative to dense baselines.

Technical Details

Applications

Impact

MoE-Bind

Key Features

Technical Details

Applications

Impact

Citation

MoE-Bind: Guiding De Novo Protein Binder Generation with Sparse Experts

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MoE-Bind

Key Features

Technical Details

Applications

Impact

Citation

MoE-Bind: Guiding De Novo Protein Binder Generation with Sparse Experts

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MoE-Bind

#Key Features

#Technical Details

#Applications

#Impact

Citation

MoE-Bind: Guiding De Novo Protein Binder Generation with Sparse Experts

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

MoE-Bind

#Key Features

#Technical Details

#Applications

#Impact

Citation

MoE-Bind: Guiding De Novo Protein Binder Generation with Sparse Experts

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact