A ~100M-parameter autoregressive protein-binder generator using a sparse Mixture-of-Experts architecture for sequence-only, receptor-conditioned binder design without 3D structure inputs.
MoE-Bind is an autoregressive protein-binder generator developed by Dipayan Sarkar and Chiranjib Sarkar at the Computational Systems Biology Laboratory, Department of Bioinformatics, University of North Bengal, India, and posted to bioRxiv in June 2026. It addresses a practical bottleneck in de novo binder design: structure-based pipelines such as RFdiffusion-style workflows require a known three-dimensional target conformation and consume substantial compute and wall-clock time per design, which limits throughput for large-scale binder exploration.
The model's central premise is that sparse architectural design, rather than sheer scale, can deliver fast, structure-free binder generation. Sequence-only generative models had promised a lighter alternative to structure-based methods, but existing systems remained uniformly dense and often reintroduced structural computation at inference, eroding the speed advantage they were meant to provide. MoE-Bind is, to the authors' knowledge, the first sequence-only protein binder generator to adopt a sparse Mixture-of-Experts (MoE) feed-forward design, mirroring the dense-to-sparse transition that reshaped natural-language transformers.
MoE-Bind generates candidate binder sequences conditioned on a receptor (target) sequence alone, with no 3D structure input required at generation time. Generated binders are then evaluated zero-shot using two independent external structure predictors, Boltz-2 and AlphaFold2-Multimer, on a leakage-free Docking Benchmark 5.0 (DB5.0) protocol.
MoE-Bind is an approximately 100M-parameter decoder-only autoregressive transformer. Attention uses Multi-head Latent Attention, and each feed-forward block is replaced by a sparse MoE layer that routes each token to the top 2 of 8 experts alongside a persistent shared expert, yielding roughly 38.9M active parameters per token. Training follows a two-stage pipeline: pre-training on protein sequences (tokenized from FASTA corpora) followed by fine-tuning on receptor-binder interaction pairs. Evaluation is run on a leakage-free DB5.0 split, with generated binders scored zero-shot by Boltz-2 and AlphaFold2-Multimer; under both predictors MoE-Bind matches or exceeds compute-matched dense baselines despite its lower per-token compute. The reference implementation is open source under an MIT license, written in Python, and ships with a small demonstration dataset and 100M-scale configurations for end-to-end runs. Note that pretrained weights are not distributed; the repository provides training code and configs and instructs users to supply their own corpora to reproduce the 100M-scale model.
MoE-Bind targets early-stage de novo binder discovery where a target sequence is known but a high-quality experimental or predicted structure may be unavailable, costly, or slow to obtain. By generating receptor-conditioned binder candidates from sequence alone, it suits high-throughput in silico exploration that can be triaged downstream with structure predictors such as Boltz-2 or AlphaFold2-Multimer before committing to expensive folding or wet-lab validation. Its zero-shot transfer to short peptides extends its utility to peptide binder and therapeutic-lead ideation, and the interpretable routing signal offers a handle for analyzing which sequence and biochemical features drive a given design.
MoE-Bind is an early demonstration that sparse Mixture-of-Experts architectures, long established in language modeling, can be brought to sequence-only protein binder generation without sacrificing quality. Its main contributions are methodological: showing that architectural sparsity rather than scale can deliver competitive, structure-free binder design, and surfacing biochemically interpretable expert specialization. As a recent preprint from a single academic lab, it has not yet undergone peer review and reports modest adoption, and its practical reach is constrained by the absence of released pretrained weights, which means reproduction requires users to assemble their own training corpora. Its independent evaluation under two structure predictors on a leakage-free DB5.0 protocol nonetheless provides a credible signal that the sparse approach is worth pursuing further.
Sarkar, D. & Sarkar, C. (2026) MoE-Bind: Guiding De Novo Protein Binder Generation with Sparse Experts. bioRxiv.
DOI: 10.64898/2026.06.13.732043Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data