bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene

Genos-m

BGI-HangzhouAI

A 4.7B-parameter Mixture-of-Experts genomic foundation model pretrained on ~1.2 trillion nucleotide tokens from human-associated microbial genomes.

Released: May 2026
Parameters: 4.7 Billion

Genos-m is a genomic foundation model purpose-built for the human microbiome, the dense and taxonomically diverse community of microbes that colonize the gut, oral cavity, skin, respiratory tract, and urogenital niches. While general nucleotide language models such as the Nucleotide Transformer, HyenaDNA, and Evo have largely been trained on reference genomes dominated by eukaryotic or broad prokaryotic sequence, Genos-m concentrates its capacity on the bacteria, archaea, and bacteriophages that are most relevant to human health. It was developed by the Genos team at BGI-HangzhouAI (Hangzhou, China) and released as a bioRxiv preprint in May 2026.

The model addresses a practical gap: microbiome research increasingly needs representations that transfer across strains, species, and functional elements without task-specific retraining. Genos-m is designed to be used in frozen-representation mode, meaning its pretrained embeddings can be fed to lightweight downstream heads for regression and classification without fine-tuning the backbone. This makes it well suited to the many small, noisy, heterogeneous datasets typical of microbial functional genomics.

By scaling a sparse Mixture-of-Experts (MoE) architecture to 4.7 billion total parameters while activating only 330 million per token, Genos-m aims to capture the breadth of microbial sequence diversity at a manageable inference cost, positioning it as a microbiome-specialized complement to general-purpose DNA foundation models.

#Key Features

  • Microbiome-specialized pretraining: Trained exclusively on human-associated microbial genomes spanning 186 phyla, 3,448 families, and 69,056 species across five body-site niches, rather than generic reference collections.
  • Sparse Mixture-of-Experts backbone: 4.7B total parameters with only 330M activated per token, decoupling model capacity from per-token compute.
  • Long-context modeling: Supports up to 1 million base pairs of context, enabling whole-operon, biosynthetic-gene-cluster, and genome-scale reasoning.
  • Frozen-representation evaluation: Benchmarked entirely without retraining the backbone, using fixed embeddings plus lightweight heads across diverse tasks.
  • Cross-modality transfer: Demonstrates zero-shot transfer of fitness prediction from DNA to RNA, indicating broadly useful learned representations.

#Technical Details

Genos-m is a decoder-style Transformer with sparse Mixture-of-Experts feed-forward layers, totaling 4.7 billion parameters of which roughly 330 million are activated per token. Pretraining used approximately 1.2 trillion nucleotide tokens drawn from human-associated microbial genomes, including prokaryotic isolates, metagenome-assembled genomes (MAGs), bacteriophages, and GTDB reference genomes, covering 186 phyla, 3,448 families, and 69,056 species. The context window extends to 1 million base pairs. Evaluation was conducted in frozen-representation mode across eight gene-fitness regression tasks, biosynthetic gene cluster (BGC) classification, whole-genome strain phenotype prediction, and a zero-shot RNA fitness transfer task, with no backbone retraining on any benchmark. Model weights are released under Apache 2.0 in two checkpoints (Genos-m-4.7B and a Megatron variant), with the paper under CC BY.

#Applications

Genos-m supports a range of microbiome and microbial genomics workflows: predicting gene fitness effects, classifying biosynthetic gene clusters for natural-product and antibiotic discovery, and predicting strain-level phenotypes from whole genomes. Because it operates from frozen embeddings, researchers can attach simple downstream models to tackle small or imbalanced datasets common in functional microbiology, metagenomics, and translational microbiome studies. Its demonstrated DNA-to-RNA transfer further suggests utility for RNA-level fitness questions without dedicated RNA pretraining.

#Impact

Genos-m extends the genomic foundation model paradigm into the human microbiome, a domain underserved by general DNA language models despite its centrality to health and disease. By pairing microbiome-focused pretraining with an efficient sparse MoE design, long context, and openly released Apache 2.0 weights, it offers the community a reusable backbone for microbial functional prediction. As a recent preprint, its benchmarks await peer review and independent reproduction, and its performance relative to general-purpose models such as Evo across broader tasks remains to be established, but its frozen-representation results across taxonomically diverse tasks signal a promising specialization strategy.

Citation

Genos-m: a foundation model for human-associated microbial genomes

Fang, C., et al. (2026) Genos-m: a foundation model for human-associated microbial genomes. bioRxiv.

DOI: 10.64898/2026.05.21.726868

Openness

Class III
Open Model

Tags

foundation_modelgene_fitness_predictionmetagenomicsmicrobiomemixture_of_expertsphenotype_predictionself_supervisedtransformervariant_effect_predictionzero_shot

Resources

GitHub RepositoryResearch PaperHuggingFace Model