Overview

Evo is a 7-billion-parameter genomic foundation model developed by the Arc Institute in collaboration with Hazy Research and Together AI, published in Science in November 2024. The model was built to address a fundamental limitation of existing biological language models: their inability to process DNA at the raw nucleotide level across the full range of scales at which genomic information operates — from individual codons to entire bacterial chromosomes. Evo operates at single-nucleotide, byte-level resolution, processing up to 131,072 nucleotides in a single context window, enabling it to model the long-range dependencies that govern genome organization and function.

The central insight motivating Evo is that the three molecular layers of the central dogma — DNA, RNA, and protein — are not independent. Gene regulatory elements, non-coding RNAs, and protein-coding sequences are physically and evolutionarily intertwined within whole genomes. By training on 2.7 million complete prokaryotic and phage genomes (300 billion nucleotide tokens, compiled into a dataset called OpenGenome), Evo learns the statistical relationships across all three modalities simultaneously, without requiring separate tokenization schemes or modality-specific fine-tuning. This enables genuine multimodal zero-shot inference from a single genomic model.

Evo was trained using a next-token prediction objective on raw DNA sequences and evaluated across an unusually diverse set of tasks, including bacterial protein mutation effect prediction, non-coding RNA fitness, promoter activity, gene essentiality, and generative design of multi-element CRISPR systems and transposable elements. It was the first model to experimentally validate AI-generated protein-RNA and protein-DNA codesign, representing a qualitative advance over prior generative biology tools.

Key Features

StripedHyena hybrid architecture: Evo uses a 32-block architecture combining 29 Hyena (data-controlled convolutional) layers with 3 multi-head attention layers equipped with rotary position embeddings (RoPE). This design processes long sequences efficiently without the quadratic attention cost of standard transformers and outperforms Transformer++ baselines at all compute budgets on DNA scaling law experiments.
Single-nucleotide resolution at 131k context: Byte-level tokenization treats each nucleotide as an individual token, preserving the full information content of DNA without subword compression. The 131,072-token context window spans over 130 kilobases — sufficient to encompass most bacterial operons and regulatory circuits in their entirety.
Cross-modality zero-shot prediction: Without any fine-tuning, Evo performs competitively with domain-specific protein language models on bacterial protein mutation effects and outperforms RNA-specific models on non-coding RNA fitness, demonstrating that whole-genome pretraining transfers across molecular modalities.
Functional generative design: Evo generated 14 experimentally validated IS200/IS605 transposable elements (approximately 50% experimental success rate) and produced functional CRISPR-Cas9 complexes, the first demonstrated examples of AI-driven protein-RNA and protein-DNA codesign using a single language model.
Megabase-scale sequence generation: The model can generate DNA sequences exceeding 1 megabase in length with plausible genomic organization, including realistic arrangements of coding sequences, regulatory elements, and intergenic regions.
Favorable scaling laws: Evo's scaling analysis across 300 model variants using the Chinchilla protocol showed that state-space and deep signal processing architectures significantly outperform standard transformers when modeling DNA at byte-level resolution — a result with broad implications for the choice of architecture in genomic modeling.

Technical Details

Evo is built on the StripedHyena architecture, which alternates Hyena operators — input-dependent long convolution filters that efficiently capture local and global sequence context — with sparse multi-head attention layers. The 32 blocks operate at a model width of 4,096 dimensions. Hyena layers use compositions of short and long convolution filters applied in a data-controlled manner, making them effective at aggregating nucleotide sequences into higher-order motifs while filtering noise. The 10% attention layers (3 of 32) retain the capacity for precise long-range token interactions that benefit certain genomic signals.

Training used the OpenGenome dataset: 2.7 million raw prokaryotic and phage genome sequences totaling 300 billion nucleotide tokens. Eukaryotic and human sequences were intentionally excluded for biosafety reasons. A two-stage training approach was used, first pretraining at shorter context lengths and then extending to 131k tokens, a strategy borrowed from large language model context extension methods. The model was trained in collaboration with Together AI using distributed GPU infrastructure.

Benchmarks include gene essentiality prediction with 0.90 AUROC on lambda phage data, mean promoter activity correlation of 0.43 across independent studies, and zero-shot mutation effect predictions competitive with ESM-1v and other protein-specific models on bacterial deep mutational scanning datasets.

Applications

Evo is suited for research at the intersection of genome biology and generative AI. Microbiologists can use the model's zero-shot scoring to prioritize variants in bacterial genetic screens or to predict the fitness consequences of mutations in non-model organisms lacking deep mutational scanning data. Synthetic biologists can use Evo's generative capabilities to design novel CRISPR systems, regulatory circuits, or transposable element-derived delivery vehicles. Genomics researchers can leverage the model's learned representations for tasks such as promoter activity prediction, essential gene identification, or genomic element annotation without labeled training data. Because Evo operates at the whole-genome scale, it is particularly valuable for studying genomic context effects — regulatory interactions, gene synteny, and co-evolutionary constraints — that single-gene or single-molecule models cannot capture.

Impact

Evo established that long-context, byte-level genomic models using non-transformer architectures can match or exceed domain-specific models trained exclusively on proteins or RNA. Its experimental validation of generative CRISPR and transposon design marked the first time a single language model was used to co-design interacting DNA and protein components, a result that attracted broad attention across the synthetic biology and genomics communities. The model weights are publicly released on HuggingFace via Together AI, lowering the barrier for academic groups to apply large-scale genomic modeling to their research. A limitation of Evo (v1) is its restriction to prokaryotic and phage genomes; eukaryotic biology, including human genetics and gene regulation in higher organisms, falls outside its training distribution. This scope was directly addressed in the subsequent Evo 2 model, which extended training to genomes from all domains of life.

Overview

Key Features

StripedHyena hybrid architecture: Evo uses a 32-block architecture combining 29 Hyena (data-controlled convolutional) layers with 3 multi-head attention layers equipped with rotary position embeddings (RoPE). This design processes long sequences efficiently without the quadratic attention cost of standard transformers and outperforms Transformer++ baselines at all compute budgets on DNA scaling law experiments.

Single-nucleotide resolution at 131k context: Byte-level tokenization treats each nucleotide as an individual token, preserving the full information content of DNA without subword compression. The 131,072-token context window spans over 130 kilobases — sufficient to encompass most bacterial operons and regulatory circuits in their entirety.

Cross-modality zero-shot prediction: Without any fine-tuning, Evo performs competitively with domain-specific protein language models on bacterial protein mutation effects and outperforms RNA-specific models on non-coding RNA fitness, demonstrating that whole-genome pretraining transfers across molecular modalities.

Functional generative design: Evo generated 14 experimentally validated IS200/IS605 transposable elements (approximately 50% experimental success rate) and produced functional CRISPR-Cas9 complexes, the first demonstrated examples of AI-driven protein-RNA and protein-DNA codesign using a single language model.

Megabase-scale sequence generation: The model can generate DNA sequences exceeding 1 megabase in length with plausible genomic organization, including realistic arrangements of coding sequences, regulatory elements, and intergenic regions.

Favorable scaling laws: Evo's scaling analysis across 300 model variants using the Chinchilla protocol showed that state-space and deep signal processing architectures significantly outperform standard transformers when modeling DNA at byte-level resolution — a result with broad implications for the choice of architecture in genomic modeling.

Technical Details

Applications

Impact

Evo

Overview

Key Features

Technical Details

Applications

Impact

Citation

Sequence modeling and design from molecular to genome scale with Evo.

Metrics

GitHub

Citations

HuggingFace

Tags

Resources

Evo

Overview

Key Features

Technical Details

Applications

Impact

Citation

Sequence modeling and design from molecular to genome scale with Evo.

Metrics

GitHub

Citations

HuggingFace

Tags

Resources