bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene

OneGenome-Rice

Zhejiang Lab / BGI Research

A 1.25B-parameter Mixture-of-Experts genomic foundation model for rice, pretrained on 422 Oryza genomes with a 1 Mbp context window.

Released: April 2026
Parameters: 1.3 Billion

OneGenome-Rice (OGR) is a genomic foundation model purpose-built for rice (genus Oryza), one of the world's most important food crops. Developed jointly by Zhejiang Lab and BGI Research and released as a bioRxiv preprint in April 2026, the model addresses a gap in genomic deep learning: most large DNA foundation models are trained across broad swaths of life or focused on the human genome, leaving crop genomics—where pangenome diversity and long-range regulatory context matter enormously—comparatively underserved.

Rather than training on a single reference assembly, OGR is pretrained on 422 cultivated and wild rice genomes, capturing the structural and sequence variation that distinguishes rice subspecies and populations. The model pairs this diverse pretraining corpus with a 1 million base-pair context window, allowing it to reason over long-range regulatory relationships that shorter-context models cannot represent. This combination is designed to make a single pretrained checkpoint useful across a wide spectrum of functional genomics and population genetics tasks in rice.

OGR sits alongside genomic foundation models such as Evo, Nucleotide Transformer, and plant-specific efforts, but is distinguished by its crop-specific, pangenome-scale pretraining and its sparse Mixture-of-Experts (MoE) design that keeps inference cost low relative to its total capacity.

#Key Features

  • Pangenome-scale pretraining: Trained on 422 cultivated and wild Oryza genomes rather than a single reference, the model internalizes sequence and structural diversity across rice subspecies and populations.
  • Sparse Mixture-of-Experts architecture: With 1.25B total parameters but only ~0.33B activated per forward pass, OGR achieves high representational capacity while keeping per-token compute modest.
  • Long-range context: A 1 million base-pair context window lets the model capture distal regulatory signals and large-scale genomic structure in a single pass.
  • Broad benchmark coverage: Strong performance across the 26-category RiceBenchmark suite, spanning chromatin accessibility, epigenetic marks, splice sites, and population structure.
  • Flexible adaptation modes: The pretrained checkpoint supports zero-/few-shot use, frozen-encoder feature extraction, and full fine-tuning, including gene-expression prediction and subspecies introgression analysis.
  • Open release: Weights are distributed in Safetensors format on HuggingFace under the Apache 2.0 license, with the RiceBenchmark dataset available on both HuggingFace and ModelScope.

#Technical Details

OGR is a 12-layer transformer with a Mixture-of-Experts feed-forward design, totaling 1.25 billion parameters of which approximately 0.33 billion are activated per token. Self-supervised pretraining was performed over 422 rice genomes, and the model operates on contexts up to 1,000,000 base pairs. Evaluation is anchored on RiceBenchmark, a 26-category benchmark covering functional genomics tasks (chromatin accessibility, histone and other epigenetic marks, splice site identification) as well as population-genetics tasks such as population structure and subspecies introgression, where OGR reports strong results across the suite using zero-shot, few-shot, frozen-encoder, and fine-tuned protocols.

#Applications

OGR targets plant genomicists and crop-breeding researchers who need predictive models of regulatory and functional genomic signals in rice. Practical use cases include predicting chromatin accessibility and epigenetic marks, annotating splice sites, forecasting gene expression, and analyzing population structure and subspecies introgression directly from the pretrained checkpoint. Because the model supports frozen-encoder and few-shot workflows, groups with limited labeled data can extract useful representations without large fine-tuning budgets, supporting tasks from variant interpretation to candidate regulatory-region discovery in breeding programs.

#Impact

By bringing pangenome-scale, long-context foundation modeling to a single staple crop, OneGenome-Rice demonstrates how species-focused training can yield broadly capable models for agricultural genomics. Its permissive Apache 2.0 release of weights, code, and the accompanying RiceBenchmark suite lowers the barrier for the plant-genomics community to evaluate and build on genomic foundation models, and provides a reusable benchmark for measuring progress on rice functional genomics. As a recent preprint, results await peer review and independent replication, but the model offers a template for crop-specific foundation models beyond rice.

Citation

OneGenome-Rice (OGR): A genomic foundation model for rice

Qian, B., et al. (2026) OneGenome-Rice (OGR): A genomic foundation model for rice. bioRxiv.

DOI: 10.64898/2026.04.21.719822

Openness

Class II
Open Tooling

Tags

chromatinchromatin_accessibilitydnafoundation_modelgene_expressiongenomicsmixture_of_expertsself_supervisedsplice_site_predictiontransformerzero_shot

Resources

GitHub RepositoryResearch PaperHuggingFace ModelDataset