Language model

Language model Models

Bio and scientific language models bring LLM-style architectures to biology — text-conditioned, instruction-tuned, agentic, or generative models that reason over scientific knowledge and across molecular modalities. Rather than a single data type, this is a model-type axis: it groups systems whose defining trait is that they are generative or language-driven, from molecule-and-text models to scientific assistants. They are emerging as a connective layer across the field, translating between natural language and the languages of biology.

12 models in this category

Notable Models

Top-rated language model models from our evaluations

AMix-2

Shanghai AI Laboratory +4 others

Released May 30, 2026

A protein-text foundation model embedding sequences and natural language in a shared token space, enabling protein understanding and de novo design from one checkpoint.

ProteinLanguage model

Bio-BLIP

Stanford University

Released May 15, 2026

A multimodal Q-former that fuses DNA sequence, gene context, protein function, and text into a prefix for a frozen LLM, enabling zero-shot genetic variant interpretation.

DNA & GeneLanguage model

A unified bio-language Mixture-of-Experts foundation model spanning DNA, protein sequence and structure, and biological text, applied across eight task families from a single checkpoint.

Language modelDNA & GeneProtein

GPT-Rosalind

OpenAI

Released April 16, 2026

1.3K

OpenAI's first life-sciences frontier reasoning model, optimized for multi-step scientific workflows spanning protein engineering, genomics, drug-target discovery, and biochemistry reasoning.

Language model

rBio

Chan Zuckerberg Initiative

Released August 18, 2025

13141

A reasoning language model post-trained on virtual cell simulations to answer complex biological questions about gene perturbations in natural language.

Language model
836

Unified science foundation model from Microsoft Research treating molecules, proteins, RNA, DNA, and materials as a shared sequence language for cross-domain generation.

Language modelSmall moleculeProtein