OneGenome-Rice

Genomic foundation model for rice, pretrained on 422 Oryza genomes with a 1 Mbp context window and a 1.25B-parameter mixture-of-experts transformer.

Released: April 2026

Parameters: 1.3 Billion

OneGenome-Rice (OGR) is a genomic foundation model purpose-built for rice (genus Oryza), one of the world's most important food crops. Developed jointly by Zhejiang Lab and BGI Research and released as a bioRxiv preprint in April 2026, the model addresses a gap in genomic deep learning: most large DNA foundation models are trained across broad swaths of life or focused on the human genome, leaving crop genomics—where pangenome diversity and long-range regulatory context matter enormously—comparatively underserved.

Rather than training on a single reference assembly, OGR is pretrained on 422 cultivated and wild rice genomes, capturing the structural and sequence variation that distinguishes rice subspecies and populations. The model pairs this diverse pretraining corpus with a 1 million base-pair context window, allowing it to reason over long-range regulatory relationships that shorter-context models cannot represent. This combination is designed to make a single pretrained checkpoint useful across a wide spectrum of functional genomics and population genetics tasks in rice.

OGR sits alongside genomic foundation models such as Evo, Nucleotide Transformer, and plant-specific efforts, but is distinguished by its crop-specific, pangenome-scale pretraining and its sparse Mixture-of-Experts (MoE) design that keeps inference cost low relative to its total capacity.

Key Features

Pangenome-scale pretraining: Trained on 422 cultivated and wild Oryza genomes rather than a single reference, the model internalizes sequence and structural diversity across rice subspecies and populations.
Sparse Mixture-of-Experts architecture: With 1.25B total parameters but only ~0.33B activated per forward pass, OGR achieves high representational capacity while keeping per-token compute modest.
Long-range context: A 1 million base-pair context window lets the model capture distal regulatory signals and large-scale genomic structure in a single pass.
Broad benchmark coverage: Strong performance across the 26-category RiceBenchmark suite, spanning chromatin accessibility, epigenetic marks, splice sites, and population structure.
Flexible adaptation modes: The pretrained checkpoint supports zero-/few-shot use, frozen-encoder feature extraction, and full fine-tuning, including gene-expression prediction and subspecies introgression analysis.
Open release: Weights are distributed in Safetensors format on HuggingFace under the Apache 2.0 license, with the RiceBenchmark dataset available on both HuggingFace and ModelScope.

Technical Details

OGR is a 12-layer transformer with a Mixture-of-Experts feed-forward design, totaling 1.25 billion parameters of which approximately 0.33 billion are activated per token. Self-supervised pretraining was performed over 422 rice genomes, and the model operates on contexts up to 1,000,000 base pairs. Evaluation is anchored on RiceBenchmark, a 26-category benchmark covering functional genomics tasks (chromatin accessibility, histone and other epigenetic marks, splice site identification) as well as population-genetics tasks such as population structure and subspecies introgression, where OGR reports strong results across the suite using zero-shot, few-shot, frozen-encoder, and fine-tuned protocols.

Applications

OGR targets plant genomicists and crop-breeding researchers who need predictive models of regulatory and functional genomic signals in rice. Practical use cases include predicting chromatin accessibility and epigenetic marks, annotating splice sites, forecasting gene expression, and analyzing population structure and subspecies introgression directly from the pretrained checkpoint. Because the model supports frozen-encoder and few-shot workflows, groups with limited labeled data can extract useful representations without large fine-tuning budgets, supporting tasks from variant interpretation to candidate regulatory-region discovery in breeding programs.

Impact

By bringing pangenome-scale, long-context foundation modeling to a single staple crop, OneGenome-Rice demonstrates how species-focused training can yield broadly capable models for agricultural genomics. Its permissive Apache 2.0 release of weights, code, and the accompanying RiceBenchmark suite lowers the barrier for the plant-genomics community to evaluate and build on genomic foundation models, and provides a reusable benchmark for measuring progress on rice functional genomics. As a recent preprint, results await peer review and independent replication, but the model offers a template for crop-specific foundation models beyond rice.

Citation

OneGenome-Rice (OGR): A genomic foundation model for rice

Qian, B., et al. (2026) OneGenome-Rice (OGR): A genomic foundation model for rice. bioRxiv.

DOI: 10.64898/2026.04.21.719822

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations73

Influential3

References62

GitHub

Stars22

Forks5

Open Issues3

Contributors1

Last Push1mo ago

LanguagePython

LicenseApache-2.0

HuggingFace

Downloads11

Likes4

Last Modified2mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible

90Open

Usability — can I run it?99

Reproducibility — can I retrain it?82

Model Openness Framework

Class II

Open Tooling

Resources

GitHub Repository Research Paper HuggingFace Model Dataset

Key Features

Pangenome-scale pretraining: Trained on 422 cultivated and wild Oryza genomes rather than a single reference, the model internalizes sequence and structural diversity across rice subspecies and populations.

Sparse Mixture-of-Experts architecture: With 1.25B total parameters but only ~0.33B activated per forward pass, OGR achieves high representational capacity while keeping per-token compute modest.

Long-range context: A 1 million base-pair context window lets the model capture distal regulatory signals and large-scale genomic structure in a single pass.

Broad benchmark coverage: Strong performance across the 26-category RiceBenchmark suite, spanning chromatin accessibility, epigenetic marks, splice sites, and population structure.

Flexible adaptation modes: The pretrained checkpoint supports zero-/few-shot use, frozen-encoder feature extraction, and full fine-tuning, including gene-expression prediction and subspecies introgression analysis.

Open release: Weights are distributed in Safetensors format on HuggingFace under the Apache 2.0 license, with the RiceBenchmark dataset available on both HuggingFace and ModelScope.

Technical Details

Applications

Impact

OneGenome-Rice

Key Features

Technical Details

Applications

Impact

Citation

OneGenome-Rice (OGR): A genomic foundation model for rice

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

OneGenome-Rice

Key Features

Technical Details

Applications

Impact

Citation

OneGenome-Rice (OGR): A genomic foundation model for rice

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

OneGenome-Rice

#Key Features

#Technical Details

#Applications

#Impact

Citation

OneGenome-Rice (OGR): A genomic foundation model for rice

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

OneGenome-Rice

#Key Features

#Technical Details

#Applications

#Impact

Citation

OneGenome-Rice (OGR): A genomic foundation model for rice

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact