bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & Gene

BOTANIC-0

Living Models

Family of plant genomic foundation models (0.1B-1B params) pretrained on 43 phylogenetically diverse plant genomes for regulatory, expression, and variant tasks.

Released: February 2026
Parameters: 1 Billion

BOTANIC-0 is a family of genomic foundation models built specifically for plant biology. Most large DNA language models are trained predominantly on human, mammalian, or microbial genomes, leaving crop and plant genomics comparatively underserved despite its importance for food security, climate resilience, and agricultural biotechnology. BOTANIC-0 targets this gap by pretraining on a broad, phylogenetically diverse panel of plant genomes.

The models were developed by Living Models, a Franco-American (Paris and Berkeley) startup building foundation models for living systems, and released alongside the company's emergence from stealth in early 2026 (preprint on bioRxiv, February 2026). BOTANIC-0 is offered in three sizes, Botanic0-S (~0.1B), Botanic0-M (~0.3B), and Botanic0-L (~1B parameters), forming the first generation of a longer-term research program on genotype-to-phenotype modeling and sequence-based genome editing.

Despite a modest training budget, the models reach performance competitive with state-of-the-art genomic foundation models across a suite of plant genomic and genetic prediction tasks, in both zero-shot and fine-tuned settings, and the open weights are available on Hugging Face for the research community.

#Key Features

  • Plant-specialized pretraining: Trained on nuclear genome assemblies from 43 phylogenetically diverse plant species, capturing sequence patterns specific to plant regulatory and coding regions.
  • Three model sizes: Released as S (~0.1B), M (~0.3B), and L (~1B) parameter variants, enabling a compute-versus-accuracy tradeoff and scaling analysis.
  • Broad task coverage: Evaluated on regulatory element annotation, gene expression inference, and variant effect prediction, with strong results both zero-shot and after fine-tuning across roughly 22 benchmark tasks.
  • Efficient and open: Trained on a small GPU footprint (reported on eight NVIDIA H100 GPUs) and released as open weights on Hugging Face under a research license.

#Technical Details

BOTANIC-0 uses an encoder-only transformer pretrained with masked language modeling (15% masking) over a 6-mer DNA tokenizer with a vocabulary of 4,105 tokens. The largest variant, Botanic0-L, has roughly 1B parameters with a hidden size of 1,500, 40 layers, 20 attention heads, an intermediate size of 5,120, and a maximum sequence length of 1,026 tokens (approximately 6,156 base pairs of DNA per context). Pretraining data comprise nuclear genome assemblies from 43 plant species selected for phylogenetic diversity. Across the reported benchmark suite, the models match state-of-the-art genomic foundation models, and scaling analyses show consistent improvements in predictive power with increased model capacity.

#Applications

BOTANIC-0 is aimed at plant and crop scientists working on genotype-to-phenotype prediction, identification of genetic markers for traits such as disease resistance and climate resilience, regulatory element annotation, and variant effect prediction. Because the weights are openly available, researchers can extract embeddings, fine-tune on their own crop datasets, or use the models for zero-shot scoring within breeding and functional-genomics pipelines.

#Impact

BOTANIC-0 is among the first openly released foundation-model families dedicated to plant genomics, addressing a domain that has lagged behind human and microbial genomic modeling. By demonstrating competitive performance at modest compute and releasing open weights in three sizes, it lowers the barrier for plant-genomics groups to adopt foundation-model methods. As a research-licensed preprint release, broader validation across crops and tasks remains ongoing, but it establishes a practical baseline for plant sequence modeling.

Tags

variant_effect_predictiongene_expressionregulatory_element_annotationtransformerfoundation_modelself_supervisedplant_genomicsdna