Language model Models
Bio and scientific language models bring LLM-style architectures to biology — text-conditioned, instruction-tuned, agentic, or generative models that reason over scientific knowledge and across molecular modalities. Rather than a single data type, this is a model-type axis: it groups systems whose defining trait is that they are generative or language-driven, from molecule-and-text models to scientific assistants. They are emerging as a connective layer across the field, translating between natural language and the languages of biology.
12 models in this category
Notable Models
Top-rated language model models from our evaluations
A protein-text foundation model embedding sequences and natural language in a shared token space, enabling protein understanding and de novo design from one checkpoint.
A multimodal Q-former that fuses DNA sequence, gene context, protein function, and text into a prefix for a frozen LLM, enabling zero-shot genetic variant interpretation.
A unified bio-language Mixture-of-Experts foundation model spanning DNA, protein sequence and structure, and biological text, applied across eight task families from a single checkpoint.
OpenAI's first life-sciences frontier reasoning model, optimized for multi-step scientific workflows spanning protein engineering, genomics, drug-target discovery, and biochemistry reasoning.
A reasoning language model post-trained on virtual cell simulations to answer complex biological questions about gene perturbations in natural language.
Unified science foundation model from Microsoft Research treating molecules, proteins, RNA, DNA, and materials as a shared sequence language for cross-domain generation.