EDEN

Metagenomic foundation model trained on 9.7 trillion nucleotide tokens for generative therapeutic design across genes, peptides, and microbiomes.

Released: January 2026

Parameters: 28 Billion

EDEN (Environmentally-Derived Evolutionary Network) is a family of metagenomic foundation models from Basecamp Research, introduced in a January 2026 preprint. Its flagship is a 28-billion-parameter model trained on 9.7 trillion nucleotide tokens drawn from BaseData1, a proprietary dataset that — at the time of training — contained more than 10 billion novel genes from over 1 million new species. The dataset is intentionally enriched for environmental and host-associated metagenomes, phage sequences, and mobile genetic elements, so the model learns from cross-species evolutionary mechanisms that are largely absent from public sequence repositories.

The central thesis is that dramatically expanding the diversity of biology a model learns from — rather than overfitting to a handful of model organisms — unlocks new scaling behavior and turns therapeutic design into a more predictable engineering discipline. EDEN is positioned alongside other genomic foundation models such as Evo and the Nucleotide Transformer family, but distinguishes itself through the scale and novelty of its training corpus and an explicit focus on generative therapeutic design across multiple modalities and biological scales.

To demonstrate generality, the authors challenge a single architecture to design biological novelty across three distinct modalities: large gene insertion, antibiotic peptide design, and microbiome design. Weights are not publicly released; EDEN is a commercial model developed and deployed internally by Basecamp Research.

Key Features

Evolution-scale training corpus: Trained on 9.7T nucleotide tokens from BaseData1, enriched for metagenomes, phages, and mobile genetic elements that capture rare cross-species evolutionary signal.
Programmable gene insertion: Designs de novo large serine recombinases (LSRs) prompted on only 30 bp of target DNA, with a reported 63.2% overall functional hit rate on out-of-distribution prompts.
Antimicrobial peptide design: Generated a focused library of novel antimicrobial peptides where 97% showed activity, with top candidates reaching single-digit micromolar potency against critical-priority multidrug-resistant pathogens.
Microbiome-scale generation: Produced a gigabase-scale synthetic microbiome of over 94,000 metagenomic assemblies spanning 9,067 species with 99% biome-specific taxonomic accuracy.
Multi-modal therapeutic design: A single foundation-model architecture spans DNA, protein, and inter-genomic scales rather than relying on bespoke, task-specific pipelines.

Technical Details

EDEN is a transformer-based metagenomic language model, with the flagship checkpoint at 28 billion parameters trained on 9.7 trillion nucleotide tokens. The authors report state-of-the-art performance across a range of predictive and generative genomic and protein benchmarks. In low-N experimental validation, EDEN-generated recombinases were active across ten disease-associated loci (including ATM, DMD, F9, and USH2A) and four candidate safe-harbor sites; roughly 50% of generated LSRs were active in human cells, achieving therapeutically relevant CAR insertion in primary human T cells. The model also generated active bridge recombinases prompted on guide RNA alone, with sequence identities to training and public data as low as 65%. Model weights are not released.

Applications

EDEN targets cell and gene therapy, anti-infective discovery, and synthetic biology. Programmable gene insertion offers a route to site-specific integration of large genetic payloads without double-strand breaks, addressing the payload and safety limits of viral and nuclease-based editing at otherwise intractable targets. The antimicrobial peptide work supports discovery against drug-resistant pathogens, and the microbiome generation capability could inform synthetic community and metabolic-pathway design.

Impact

EDEN reframes therapeutic design as a scaling-driven engineering problem, arguing that combining vast evolutionary data with therapeutic readouts yields a single architecture that designs candidates across modalities and disease areas. Its experimental validation across recombinases, peptides, and microbiomes is unusually broad for a genomic foundation model. The principal limitation for the open research community is that EDEN and its underlying BaseData corpus are proprietary, so independent benchmarking and reuse are not currently possible.

Citation

Designing AI-programmable therapeutics with the EDEN family of foundation models

Munsamy, G., et al. (2026) Designing AI-programmable therapeutics with the EDEN family of foundation models. bioRxiv.

DOI: 10.64898/2026.01.12.699009

Recent citations

Papers that recently cited this model.

Metagenomic contextualization of proteins with state space models
Nima Azbijari, J. H. Wynne, A. Thurber, et al.
bioRxiv · Jul 2026
0
Mining the code of life for new antibiotics.
A. Crysler, César de la Fuente-Núñez
Cell Host and Microbe · Jul 2026
0
Deterministic Overlapping Multimorbidity Phenotypes for Leakage-Safe EHR Modeling of Incident Cognitive Impairment in All of Us
Zahra Rahemi, Meisam Omidi
Journal of interdisciplinary research applied to medicine · May 2026
0

Top citations

The most-cited papers that cite this model.

Metagenomic contextualization of proteins with state space models
Nima Azbijari, J. H. Wynne, A. Thurber, et al.
bioRxiv · Jul 2026
0
AI-programmable therapeutics via metagenomic foundation models for rare phage-mediated autoimmune modulations: early translational risks and benefits
E. Habib, Izmal Urooj, Hamna Danyal Barry, et al.
Annals of Medicine and Surgery · May 2026
0
Mining the code of life for new antibiotics.
A. Crysler, César de la Fuente-Núñez
Cell Host and Microbe · Jul 2026
0
Deterministic Overlapping Multimorbidity Phenotypes for Leakage-Safe EHR Modeling of Incident Cognitive Impairment in All of Us
Zahra Rahemi, Meisam Omidi
Journal of interdisciplinary research applied to medicine · May 2026
0

Citations

Total Citations4

Influential0

References0

Fields of citing research

Computer Science75%
Medicine75%
Biology75%
Chemistry25%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility

13Closed

Usability — can I run it?13

Reproducibility — can I retrain it?0

not reproducible

Model Openness Framework

Unclassified

Missing required components

Resources

Research Paper Official Website

Key Features

Evolution-scale training corpus: Trained on 9.7T nucleotide tokens from BaseData1, enriched for metagenomes, phages, and mobile genetic elements that capture rare cross-species evolutionary signal.

Programmable gene insertion: Designs de novo large serine recombinases (LSRs) prompted on only 30 bp of target DNA, with a reported 63.2% overall functional hit rate on out-of-distribution prompts.

Antimicrobial peptide design: Generated a focused library of novel antimicrobial peptides where 97% showed activity, with top candidates reaching single-digit micromolar potency against critical-priority multidrug-resistant pathogens.

Microbiome-scale generation: Produced a gigabase-scale synthetic microbiome of over 94,000 metagenomic assemblies spanning 9,067 species with 99% biome-specific taxonomic accuracy.

Multi-modal therapeutic design: A single foundation-model architecture spans DNA, protein, and inter-genomic scales rather than relying on bespoke, task-specific pipelines.

Technical Details

Applications

Impact

EDEN

#Key Features

#Technical Details

#Applications

#Impact

Citation

Designing AI-programmable therapeutics with the EDEN family of foundation models

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

EDEN

#Key Features

#Technical Details

#Applications

#Impact

Citation

Designing AI-programmable therapeutics with the EDEN family of foundation models

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact