Overview

OmniNA is a generative DNA foundation model published in Nucleic Acids Research in April 2026. Developed at the Beijing Institute of Genomics under the Chinese Academy of Sciences, OmniNA is trained on 91.7 million nucleotide sequences spanning 1.076 trillion bases, jointly learning over raw sequences and their structured annotations (gene names, species, functional descriptions, ontology terms). This joint training objective bridges raw sequence modeling with semantic annotation learning in a single foundation model.

OmniNA achieves state-of-the-art results across 23 benchmarks covering sequence detection, species classification, and mutation-effect prediction, outperforming prior DNA language models including DNABERT-2, Nucleotide Transformer, and Caduceus on most evaluated tasks.

Key Features

Joint sequence-annotation training: Co-trained on nucleotide sequences and their structured metadata, allowing the model to bridge raw sequence and semantic annotation tasks within one set of weights.
Massive training corpus: 91.7M sequences and 1.076 trillion bases drawn from broad public genomic resources, with annotation pulled from RefSeq, Ensembl, and ontology databases.
State-of-the-art on 23 benchmarks: Including sequence detection (taxonomic and functional), species classification, mutation-effect prediction, and regulatory-element identification.
Generative annotation: Can generate plausible functional annotations conditioned on raw sequence inputs.
Open access via NAR: Published in Nucleic Acids Research with code and model weights available.

Technical Details

OmniNA uses a transformer-based architecture trained with a self-supervised next-token prediction objective over an interleaved sequence+annotation stream. Annotation tokens are introduced into the training corpus alongside nucleotide tokens, allowing the model to learn cross-modal correspondences between sequence and metadata.

Training was performed on standard transformer infrastructure; the published paper provides hyperparameter, ablation, and benchmark details. Evaluation spans 23 downstream tasks, including TF binding-site detection, promoter classification, splice-site identification, taxonomic classification, and missense variant pathogenicity prediction.

Applications

OmniNA is suited for genomics researchers building automated annotation pipelines, variant interpretation workflows, and species classification tools. The joint sequence-annotation training is particularly useful when annotation data is sparse and the model must propagate semantic information from related sequences. The generative annotation capability supports rapid functional hypothesis generation for previously uncharacterized sequences.

Impact

OmniNA advances the state of the art in DNA foundation modeling by integrating semantic annotation learning into the pretraining objective rather than treating annotation as a downstream prediction target. The 23-benchmark sweep demonstrates broad applicability and competitive performance against narrowly specialized prior models. Published in Nucleic Acids Research with open weights, OmniNA is well-positioned for adoption in academic genomics workflows.

Overview

Key Features

Joint sequence-annotation training: Co-trained on nucleotide sequences and their structured metadata, allowing the model to bridge raw sequence and semantic annotation tasks within one set of weights.

Massive training corpus: 91.7M sequences and 1.076 trillion bases drawn from broad public genomic resources, with annotation pulled from RefSeq, Ensembl, and ontology databases.

State-of-the-art on 23 benchmarks: Including sequence detection (taxonomic and functional), species classification, mutation-effect prediction, and regulatory-element identification.

Generative annotation: Can generate plausible functional annotations conditioned on raw sequence inputs.

Open access via NAR: Published in Nucleic Acids Research with code and model weights available.

Technical Details

Applications

Impact

OmniNA

Overview

Key Features

Technical Details

Applications

Impact

Citation

A foundation model for nucleotide sequences.

Metrics

Citations

Tags

Resources

OmniNA

Overview

Key Features

Technical Details

Applications

Impact

Citation

A foundation model for nucleotide sequences.

Metrics

Citations

Tags

Resources