bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene

Deep-Plant

Colorado State University / University of Michigan

A supervised, chromatin-informed foundation model that predicts regulatory activity directly from plant genomic sequence in Arabidopsis and rice.

Released: April 2026

Sequence-to-function deep learning models have transformed regulatory genomics by learning to predict molecular phenotypes directly from DNA sequence, but the vast majority of this progress has concentrated on human and mammalian genomes. Plant regulatory genomics has remained comparatively underexplored, despite its importance for crop improvement and basic plant biology. Deep-Plant, introduced in a 2026 bioRxiv preprint from researchers at Colorado State University and the University of Michigan, addresses this gap with a supervised foundation model trained to predict chromatin state directly from plant genomic sequence.

Rather than following the self-supervised DNA language model paradigm—where a model learns from raw sequence alone—Deep-Plant is trained on a large collection of genome-wide functional experiments. This supervised, chromatin-informed pretraining gives the model biological context beyond the sequence itself, which the authors position as a more practical and effective alternative to fine-tuning general-purpose DNA language models for plants. The design follows the spirit of human models such as Enformer, adapted to the data and species of the plant kingdom.

The pretrained chromatin model serves as a reusable backbone that is then fine-tuned for downstream regulatory tasks. Deep-Plant models are released for Arabidopsis thaliana and rice (Oryza sativa), and the authors show they transfer usefully as a building block for related species such as corn (maize).

#Key Features

  • Chromatin-state pretraining: The foundation model is trained to predict chromatin state across tissues and conditions from sequence, using DNA accessibility, transcription factor binding, and histone modification data as supervision.
  • Three downstream tasks: A single backbone is fine-tuned for chromatin state prediction (CSP), gene expression prediction (GEP), and enhancer activity prediction (EAP, available for Arabidopsis).
  • Multi-species coverage: Pretrained models are provided for Arabidopsis and rice, with demonstrated utility as a starting point for sequence modeling in corn.
  • Interpretability: The supervised design supports in-silico mutagenesis (ISM) and variant scoring, enabling identification of regulatory regions and prediction of the effects of sequence variants.
  • Open weights and data: Pretrained weights and training datasets are released under an Apache 2.0 license, with a command-line tool and notebooks for analysis.

#Technical Details

Deep-Plant is a supervised sequence-to-function model that operates on fixed 2.5 kb input windows, with sequences center-cropped or padded to length. The pretraining objective predicts chromatin state profiles—derived from DNA accessibility, transcription factor binding, and histone modification assays—and the resulting representation is fine-tuned for gene expression and enhancer activity readouts. The authors report large improvements in speed, accuracy, and interpretability relative to the complementary approach of fine-tuning self-supervised DNA language models on the same plant tasks. Pretrained weights (~9.9 GB across tasks and species) and training data (~26.5 GB) are distributed via Zenodo, and a command-line tool accepts FASTA sequences, genomic loci, or gene identifiers as input. Exact parameter counts and the full architecture specification are detailed in the configuration files of the code release rather than summarized here.

#Applications

Deep-Plant is aimed at plant genomicists and crop scientists who need accurate, interpretable predictions of regulatory activity from sequence. Concrete use cases include annotating chromatin state and candidate enhancers across the genome, predicting gene expression from promoter and regulatory sequence, and scoring the likely functional impact of natural or engineered variants—work directly relevant to breeding, trait dissection, and synthetic promoter design. Because the model transfers to related species, researchers studying crops without their own large functional genomics datasets can use the Arabidopsis or rice backbones as a starting point.

#Impact

By demonstrating that supervised, chromatin-informed pretraining can outperform the fine-tuning of DNA language models on plant regulatory tasks, Deep-Plant offers the plant genomics community an Enformer-style foundation model tailored to its organisms and data. It helps close the gap between the rapidly advancing human regulatory genomics toolkit and the comparatively under-resourced plant field. As a preprint, its benchmark claims await peer review, and downstream adoption will depend on validation across additional species and assays; the open release of weights, data, and tooling lowers the barrier for the community to build on and test the approach.

Tags

gene_expressionvariant_effect_predictionenhancer_predictioncnnfoundation_modelsupervisedtransfer_learningchromatingenomics