A 28B-parameter metagenomic foundation model trained on 9.7T nucleotide tokens for programmable therapeutic design across genes, peptides, and microbiomes.
EDEN (Environmentally-Derived Evolutionary Network) is a family of metagenomic foundation models from Basecamp Research, introduced in a January 2026 preprint. Its flagship is a 28-billion-parameter model trained on 9.7 trillion nucleotide tokens drawn from BaseData1, a proprietary dataset that — at the time of training — contained more than 10 billion novel genes from over 1 million new species. The dataset is intentionally enriched for environmental and host-associated metagenomes, phage sequences, and mobile genetic elements, so the model learns from cross-species evolutionary mechanisms that are largely absent from public sequence repositories.
The central thesis is that dramatically expanding the diversity of biology a model learns from — rather than overfitting to a handful of model organisms — unlocks new scaling behavior and turns therapeutic design into a more predictable engineering discipline. EDEN is positioned alongside other genomic foundation models such as Evo and the Nucleotide Transformer family, but distinguishes itself through the scale and novelty of its training corpus and an explicit focus on generative therapeutic design across multiple modalities and biological scales.
To demonstrate generality, the authors challenge a single architecture to design biological novelty across three distinct modalities: large gene insertion, antibiotic peptide design, and microbiome design. Weights are not publicly released; EDEN is a commercial model developed and deployed internally by Basecamp Research.
EDEN is a transformer-based metagenomic language model, with the flagship checkpoint at 28 billion parameters trained on 9.7 trillion nucleotide tokens. The authors report state-of-the-art performance across a range of predictive and generative genomic and protein benchmarks. In low-N experimental validation, EDEN-generated recombinases were active across ten disease-associated loci (including ATM, DMD, F9, and USH2A) and four candidate safe-harbor sites; roughly 50% of generated LSRs were active in human cells, achieving therapeutically relevant CAR insertion in primary human T cells. The model also generated active bridge recombinases prompted on guide RNA alone, with sequence identities to training and public data as low as 65%. Model weights are not released.
EDEN targets cell and gene therapy, anti-infective discovery, and synthetic biology. Programmable gene insertion offers a route to site-specific integration of large genetic payloads without double-strand breaks, addressing the payload and safety limits of viral and nuclease-based editing at otherwise intractable targets. The antimicrobial peptide work supports discovery against drug-resistant pathogens, and the microbiome generation capability could inform synthetic community and metabolic-pathway design.
EDEN reframes therapeutic design as a scaling-driven engineering problem, arguing that combining vast evolutionary data with therapeutic readouts yields a single architecture that designs candidates across modalities and disease areas. Its experimental validation across recombinases, peptides, and microbiomes is unusually broad for a genomic foundation model. The principal limitation for the open research community is that EDEN and its underlying BaseData corpus are proprietary, so independent benchmarking and reuse are not currently possible.