bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
DNA & Gene

OmniNA

Beijing Institute of Genomics / Chinese Academy of Sciences

Self-supervised generative foundation model jointly trained on 91.7M nucleotide sequences and structured annotations spanning 1.076 trillion bases, achieving SOTA on 23 nucleotide-language benchmarks.

Released: 2026

Overview

OmniNA is a generative DNA foundation model published in Nucleic Acids Research in April 2026. Developed at the Beijing Institute of Genomics under the Chinese Academy of Sciences, OmniNA is trained on 91.7 million nucleotide sequences spanning 1.076 trillion bases, jointly learning over raw sequences and their structured annotations (gene names, species, functional descriptions, ontology terms). This joint training objective bridges raw sequence modeling with semantic annotation learning in a single foundation model.

OmniNA achieves state-of-the-art results across 23 benchmarks covering sequence detection, species classification, and mutation-effect prediction, outperforming prior DNA language models including DNABERT-2, Nucleotide Transformer, and Caduceus on most evaluated tasks.

Key Features

  • Joint sequence-annotation training: Co-trained on nucleotide sequences and their structured metadata, allowing the model to bridge raw sequence and semantic annotation tasks within one set of weights.
  • Massive training corpus: 91.7M sequences and 1.076 trillion bases drawn from broad public genomic resources, with annotation pulled from RefSeq, Ensembl, and ontology databases.
  • State-of-the-art on 23 benchmarks: Including sequence detection (taxonomic and functional), species classification, mutation-effect prediction, and regulatory-element identification.
  • Generative annotation: Can generate plausible functional annotations conditioned on raw sequence inputs.
  • Open access via NAR: Published in Nucleic Acids Research with code and model weights available.

Technical Details

OmniNA uses a transformer-based architecture trained with a self-supervised next-token prediction objective over an interleaved sequence+annotation stream. Annotation tokens are introduced into the training corpus alongside nucleotide tokens, allowing the model to learn cross-modal correspondences between sequence and metadata.

Training was performed on standard transformer infrastructure; the published paper provides hyperparameter, ablation, and benchmark details. Evaluation spans 23 downstream tasks, including TF binding-site detection, promoter classification, splice-site identification, taxonomic classification, and missense variant pathogenicity prediction.

Applications

OmniNA is suited for genomics researchers building automated annotation pipelines, variant interpretation workflows, and species classification tools. The joint sequence-annotation training is particularly useful when annotation data is sparse and the model must propagate semantic information from related sequences. The generative annotation capability supports rapid functional hypothesis generation for previously uncharacterized sequences.

Impact

OmniNA advances the state of the art in DNA foundation modeling by integrating semantic annotation learning into the pretraining objective rather than treating annotation as a downstream prediction target. The 23-benchmark sweep demonstrates broad applicability and competitive performance against narrowly specialized prior models. Published in Nucleic Acids Research with open weights, OmniNA is well-positioned for adoption in academic genomics workflows.

Citation

A foundation model for nucleotide sequences.

Shen, X., et al. (2026) A foundation model for nucleotide sequences.. Nucleic Acids Research.

DOI: 10.1093/nar/gkag083

Metrics

Citations

Total Citations1
Influential0
References50

Tags

sequence detectionspecies classificationmutation effect predictiongenomic annotationtransformerself-supervisedfoundation modelDNAgenomenucleotide

Resources

Research Paper