bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & Gene

PlantBiMoE

Huazhong University of Science and Technology

A lightweight plant genome foundation model pairing a bidirectional Mamba backbone with a sparse Mixture-of-Experts, pretrained on 25.4B nucleotides from 42 plant species.

Released: December 2025
Parameters: 116 Million

PlantBiMoE is a genome foundation model for plant DNA that combines a bidirectional Mamba (state-space) backbone with a sparse Mixture-of-Experts (SparseMoE) feedforward design. It was introduced in December 2025 by Kepeng Lin, Qizhe Zhang, Rui Wang, Xuehai Hu, and Wei Xu at Huazhong University of Science and Technology, and the work was accepted to the IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

Plant genomics has lagged behind human and microbial genomics in the availability of strong sequence models, and the genomic models that do exist for plants — such as AgroNT (a 1B-parameter Nucleotide-Transformer-style model) and prior plant DNA language models (PDLLMs) — are often computationally heavy. PlantBiMoE addresses this by replacing the quadratic-cost transformer attention with a linear-time Mamba state-space backbone and by activating only a sparse subset of experts per token, keeping inference cost low while preserving model capacity.

The result is a model with 116M total parameters but only 64M active per token (roughly 55% sparsity), placing it in the same size class as DNABERT-2 while remaining far smaller than the billion-parameter AgroNT. Despite this lightweight footprint, PlantBiMoE reports state-of-the-art results across a broad plant genomics benchmark.

#Key Features

  • Bidirectional Mamba backbone: Sequences are processed in the forward and reverse-complement directions independently, and the two representations are fused by element-wise addition, capturing both-strand dependencies without adding parameters.
  • Sparse Mixture-of-Experts: Top-k routing activates only a fraction of experts per token, so the model holds 116M parameters but uses just 64M per forward pass, trading dense compute for sparse capacity.
  • Single-base tokenization: A nucleotide-level tokenizer (A, T, C, G, N) avoids the information loss of k-mer or BPE schemes and supports a long pretraining context window of up to ~32,768 tokens.
  • Long-context pretraining: Genomic sequences are segmented to 32,768 bp with 64–128 bp overlaps, enabling the model to learn over long-range plant genomic structure.
  • Broad benchmark coverage: Evaluated on the Modified Plants Genome Benchmark (MPGB) spanning 11 task categories and 31 datasets.

#Technical Details

PlantBiMoE was pretrained with masked language modeling (15% masking, BERT-style) on approximately 25.40 billion nucleotides drawn from 42 representative plant species in NCBI. Preprocessing replaced non-standard bases with N, filtered sequences with more than 2% N, and reverse-complemented 30% of the data for strand augmentation. Pretraining ran for roughly 166 hours over 10 epochs on 8 NVIDIA A800-80GB GPUs using AdamW (β₁=0.95, β₂=0.9, weight decay 0.1) with a linear warmup to a peak learning rate of 8e-3 followed by cosine decay. The architecture interleaves SparseMoE layers with SwiGLU feedforward blocks atop the bidirectional Mamba encoder.

On the MPGB — which covers polyadenylation, splice sites, lncRNA, enhancer regions, chromatin accessibility, promoter strength, terminator strength, histone modification, core promoter, conservation, and open chromatin tasks across sequence lengths of 50–6,000 bp — PlantBiMoE achieves the best performance on 20 of 31 datasets and the best average overall. Reported examples include promoter strength R² of 75.23 (vs. 73.85 for AgroNT), chromatin accessibility AUC of 96.55 (vs. 96.37), and open chromatin MCC of 46.88 (vs. 43.21).

#Applications

PlantBiMoE provides plant biologists and crop genomics researchers with a compact, fine-tunable backbone for downstream sequence prediction tasks: identifying promoters and terminators, predicting promoter strength and chromatin accessibility, annotating splice sites and lncRNAs, and scoring regulatory and conservation features. Because the model is small and uses sparse activation, it can be fine-tuned and deployed on modest hardware, lowering the barrier for agricultural and plant-science labs that lack large GPU clusters but want genome-scale representation learning for breeding, gene regulation, and functional genomics studies.

#Impact

PlantBiMoE demonstrates that linear-time state-space backbones combined with sparse Mixture-of-Experts can match or exceed much larger transformer-based plant genome models while using a fraction of the active parameters. By matching DNABERT-2's size class yet outperforming the 1B-parameter AgroNT on the majority of MPGB datasets, it makes a case for efficiency-oriented architectures in genomics. The pretrained weights are released on HuggingFace under the Apache-2.0 license, and the training and finetuning scripts (pretrain.py, finetune.py) are publicly available on GitHub, though that repository carries no explicit license file. As a preprint accepted to BIBM 2025, its longer-term influence on plant genomic modeling remains to be established.

Citation

Preprint

DOI: 10.48550/arXiv.2512.07113

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
53Partial
Usability — can I run it?79
Reproducibility — can I retrain it?24
open weights, closed recipe
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

dnafoundation_modelgene_expressiongenomicsmixture_of_expertspromoter_predictionself_supervisedstate_space_modelvariant_effect_prediction

Resources

GitHub RepositoryResearch PaperHuggingFace Model