Huazhong University of Science and Technology
A lightweight plant genome foundation model pairing a bidirectional Mamba backbone with a sparse Mixture-of-Experts, pretrained on 25.4B nucleotides from 42 plant species.
PlantBiMoE is a genome foundation model for plant DNA that combines a bidirectional Mamba (state-space) backbone with a sparse Mixture-of-Experts (SparseMoE) feedforward design. It was introduced in December 2025 by Kepeng Lin, Qizhe Zhang, Rui Wang, Xuehai Hu, and Wei Xu at Huazhong University of Science and Technology, and the work was accepted to the IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
Plant genomics has lagged behind human and microbial genomics in the availability of strong sequence models, and the genomic models that do exist for plants — such as AgroNT (a 1B-parameter Nucleotide-Transformer-style model) and prior plant DNA language models (PDLLMs) — are often computationally heavy. PlantBiMoE addresses this by replacing the quadratic-cost transformer attention with a linear-time Mamba state-space backbone and by activating only a sparse subset of experts per token, keeping inference cost low while preserving model capacity.
The result is a model with 116M total parameters but only 64M active per token (roughly 55% sparsity), placing it in the same size class as DNABERT-2 while remaining far smaller than the billion-parameter AgroNT. Despite this lightweight footprint, PlantBiMoE reports state-of-the-art results across a broad plant genomics benchmark.
PlantBiMoE was pretrained with masked language modeling (15% masking, BERT-style) on approximately 25.40 billion nucleotides drawn from 42 representative plant species in NCBI. Preprocessing replaced non-standard bases with N, filtered sequences with more than 2% N, and reverse-complemented 30% of the data for strand augmentation. Pretraining ran for roughly 166 hours over 10 epochs on 8 NVIDIA A800-80GB GPUs using AdamW (β₁=0.95, β₂=0.9, weight decay 0.1) with a linear warmup to a peak learning rate of 8e-3 followed by cosine decay. The architecture interleaves SparseMoE layers with SwiGLU feedforward blocks atop the bidirectional Mamba encoder.
On the MPGB — which covers polyadenylation, splice sites, lncRNA, enhancer regions, chromatin accessibility, promoter strength, terminator strength, histone modification, core promoter, conservation, and open chromatin tasks across sequence lengths of 50–6,000 bp — PlantBiMoE achieves the best performance on 20 of 31 datasets and the best average overall. Reported examples include promoter strength R² of 75.23 (vs. 73.85 for AgroNT), chromatin accessibility AUC of 96.55 (vs. 96.37), and open chromatin MCC of 46.88 (vs. 43.21).
PlantBiMoE provides plant biologists and crop genomics researchers with a compact, fine-tunable backbone for downstream sequence prediction tasks: identifying promoters and terminators, predicting promoter strength and chromatin accessibility, annotating splice sites and lncRNAs, and scoring regulatory and conservation features. Because the model is small and uses sparse activation, it can be fine-tuned and deployed on modest hardware, lowering the barrier for agricultural and plant-science labs that lack large GPU clusters but want genome-scale representation learning for breeding, gene regulation, and functional genomics studies.
PlantBiMoE demonstrates that linear-time state-space backbones combined with sparse Mixture-of-Experts can match or exceed much larger transformer-based plant genome models while using a fraction of the active parameters. By matching DNABERT-2's size class yet outperforming the 1B-parameter AgroNT on the majority of MPGB datasets, it makes a case for efficiency-oriented architectures in genomics. The pretrained weights are released on HuggingFace under the Apache-2.0 license, and the training and finetuning scripts (pretrain.py, finetune.py) are publicly available on GitHub, though that repository carries no explicit license file. As a preprint accepted to BIBM 2025, its longer-term influence on plant genomic modeling remains to be established.
Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data