Deep-Plant

Colorado State University / University of Michigan

Chromatin-informed foundation model predicting regulatory activity and chromatin state directly from plant genomic sequence in Arabidopsis and rice.

Released: April 2026

Sequence-to-function deep learning models have transformed regulatory genomics by learning to predict molecular phenotypes directly from DNA sequence, but the vast majority of this progress has concentrated on human and mammalian genomes. Plant regulatory genomics has remained comparatively underexplored, despite its importance for crop improvement and basic plant biology. Deep-Plant, introduced in a 2026 bioRxiv preprint from researchers at Colorado State University and the University of Michigan, addresses this gap with a supervised foundation model trained to predict chromatin state directly from plant genomic sequence.

Rather than following the self-supervised DNA language model paradigm—where a model learns from raw sequence alone—Deep-Plant is trained on a large collection of genome-wide functional experiments. This supervised, chromatin-informed pretraining gives the model biological context beyond the sequence itself, which the authors position as a more practical and effective alternative to fine-tuning general-purpose DNA language models for plants. The design follows the spirit of human models such as Enformer, adapted to the data and species of the plant kingdom.

The pretrained chromatin model serves as a reusable backbone that is then fine-tuned for downstream regulatory tasks. Deep-Plant models are released for Arabidopsis thaliana and rice (Oryza sativa), and the authors show they transfer usefully as a building block for related species such as corn (maize).

Key Features

Chromatin-state pretraining: The foundation model is trained to predict chromatin state across tissues and conditions from sequence, using DNA accessibility, transcription factor binding, and histone modification data as supervision.
Three downstream tasks: A single backbone is fine-tuned for chromatin state prediction (CSP), gene expression prediction (GEP), and enhancer activity prediction (EAP, available for Arabidopsis).
Multi-species coverage: Pretrained models are provided for Arabidopsis and rice, with demonstrated utility as a starting point for sequence modeling in corn.
Interpretability: The supervised design supports in-silico mutagenesis (ISM) and variant scoring, enabling identification of regulatory regions and prediction of the effects of sequence variants.
Open weights and data: Pretrained weights and training datasets are released under an Apache 2.0 license, with a command-line tool and notebooks for analysis.

Technical Details

Deep-Plant is a supervised sequence-to-function model that operates on fixed 2.5 kb input windows, with sequences center-cropped or padded to length. The pretraining objective predicts chromatin state profiles—derived from DNA accessibility, transcription factor binding, and histone modification assays—and the resulting representation is fine-tuned for gene expression and enhancer activity readouts. The authors report large improvements in speed, accuracy, and interpretability relative to the complementary approach of fine-tuning self-supervised DNA language models on the same plant tasks. Pretrained weights (~9.9 GB across tasks and species) and training data (~26.5 GB) are distributed via Zenodo, and a command-line tool accepts FASTA sequences, genomic loci, or gene identifiers as input. Exact parameter counts and the full architecture specification are detailed in the configuration files of the code release rather than summarized here.

Applications

Deep-Plant is aimed at plant genomicists and crop scientists who need accurate, interpretable predictions of regulatory activity from sequence. Concrete use cases include annotating chromatin state and candidate enhancers across the genome, predicting gene expression from promoter and regulatory sequence, and scoring the likely functional impact of natural or engineered variants—work directly relevant to breeding, trait dissection, and synthetic promoter design. Because the model transfers to related species, researchers studying crops without their own large functional genomics datasets can use the Arabidopsis or rice backbones as a starting point.

Impact

By demonstrating that supervised, chromatin-informed pretraining can outperform the fine-tuning of DNA language models on plant regulatory tasks, Deep-Plant offers the plant genomics community an Enformer-style foundation model tailored to its organisms and data. It helps close the gap between the rapidly advancing human regulatory genomics toolkit and the comparatively under-resourced plant field. As a preprint, its benchmark claims await peer review, and downstream adoption will depend on validation across additional species and assays; the open release of weights, data, and tooling lowers the barrier for the community to build on and test the approach.

Citation

Deep-Plant: a supervised foundation model for plant regulatory genomics

Daoud, A., et al. (2026) Deep-Plant: a supervised foundation model for plant regulatory genomics. bioRxiv.

DOI: 10.64898/2026.04.06.716755

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References72

GitHub

Stars1

Forks0

Open Issues0

Contributors1

Last Push26d ago

LanguageJupyter Notebook

LicenseApache-2.0

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible

87Open

Usability — can I run it?100

Reproducibility — can I retrain it?87

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper Demo Dataset

Key Features

Chromatin-state pretraining: The foundation model is trained to predict chromatin state across tissues and conditions from sequence, using DNA accessibility, transcription factor binding, and histone modification data as supervision.

Three downstream tasks: A single backbone is fine-tuned for chromatin state prediction (CSP), gene expression prediction (GEP), and enhancer activity prediction (EAP, available for Arabidopsis).

Multi-species coverage: Pretrained models are provided for Arabidopsis and rice, with demonstrated utility as a starting point for sequence modeling in corn.

Interpretability: The supervised design supports in-silico mutagenesis (ISM) and variant scoring, enabling identification of regulatory regions and prediction of the effects of sequence variants.

Open weights and data: Pretrained weights and training datasets are released under an Apache 2.0 license, with a command-line tool and notebooks for analysis.

Technical Details

Applications

Impact

Deep-Plant

Key Features

Technical Details

Applications

Impact

Citation

Deep-Plant: a supervised foundation model for plant regulatory genomics

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Deep-Plant

Key Features

Technical Details

Applications

Impact

Citation

Deep-Plant: a supervised foundation model for plant regulatory genomics

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Deep-Plant

#Key Features

#Technical Details

#Applications

#Impact

Citation

Deep-Plant: a supervised foundation model for plant regulatory genomics

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Deep-Plant

#Key Features

#Technical Details

#Applications

#Impact

Citation

Deep-Plant: a supervised foundation model for plant regulatory genomics

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact