A 1.25B-parameter Mixture-of-Experts genomic foundation model for rice, pretrained on 422 Oryza genomes with a 1 Mbp context window.
OneGenome-Rice (OGR) is a genomic foundation model purpose-built for rice (genus Oryza), one of the world's most important food crops. Developed jointly by Zhejiang Lab and BGI Research and released as a bioRxiv preprint in April 2026, the model addresses a gap in genomic deep learning: most large DNA foundation models are trained across broad swaths of life or focused on the human genome, leaving crop genomics—where pangenome diversity and long-range regulatory context matter enormously—comparatively underserved.
Rather than training on a single reference assembly, OGR is pretrained on 422 cultivated and wild rice genomes, capturing the structural and sequence variation that distinguishes rice subspecies and populations. The model pairs this diverse pretraining corpus with a 1 million base-pair context window, allowing it to reason over long-range regulatory relationships that shorter-context models cannot represent. This combination is designed to make a single pretrained checkpoint useful across a wide spectrum of functional genomics and population genetics tasks in rice.
OGR sits alongside genomic foundation models such as Evo, Nucleotide Transformer, and plant-specific efforts, but is distinguished by its crop-specific, pangenome-scale pretraining and its sparse Mixture-of-Experts (MoE) design that keeps inference cost low relative to its total capacity.
OGR is a 12-layer transformer with a Mixture-of-Experts feed-forward design, totaling 1.25 billion parameters of which approximately 0.33 billion are activated per token. Self-supervised pretraining was performed over 422 rice genomes, and the model operates on contexts up to 1,000,000 base pairs. Evaluation is anchored on RiceBenchmark, a 26-category benchmark covering functional genomics tasks (chromatin accessibility, histone and other epigenetic marks, splice site identification) as well as population-genetics tasks such as population structure and subspecies introgression, where OGR reports strong results across the suite using zero-shot, few-shot, frozen-encoder, and fine-tuned protocols.
OGR targets plant genomicists and crop-breeding researchers who need predictive models of regulatory and functional genomic signals in rice. Practical use cases include predicting chromatin accessibility and epigenetic marks, annotating splice sites, forecasting gene expression, and analyzing population structure and subspecies introgression directly from the pretrained checkpoint. Because the model supports frozen-encoder and few-shot workflows, groups with limited labeled data can extract useful representations without large fine-tuning budgets, supporting tasks from variant interpretation to candidate regulatory-region discovery in breeding programs.
By bringing pangenome-scale, long-context foundation modeling to a single staple crop, OneGenome-Rice demonstrates how species-focused training can yield broadly capable models for agricultural genomics. Its permissive Apache 2.0 release of weights, code, and the accompanying RiceBenchmark suite lowers the barrier for the plant-genomics community to evaluate and build on genomic foundation models, and provides a reusable benchmark for measuring progress on rice functional genomics. As a recent preprint, results await peer review and independent replication, but the model offers a template for crop-specific foundation models beyond rice.
Qian, B., et al. (2026) OneGenome-Rice (OGR): A genomic foundation model for rice. bioRxiv.
DOI: 10.64898/2026.04.21.719822