Overview

rBio is the first reasoning language model trained on virtual cell simulations rather than exclusively on experimental data. Developed by the AI team at the Chan Zuckerberg Initiative (CZI), led by Theofanis Karaletsos and Ana-Maria Istrate, rBio addresses a long-standing tension in computational biology: large language models are adept at synthesizing published literature, but they struggle to generate reliable step-by-step reasoning about complex biological mechanisms, particularly when the relevant experimental data is sparse or scattered across many studies. Meanwhile, virtual cell models — computational systems trained to simulate cellular behavior — contain rich mechanistic information but present steep usability barriers for most researchers. rBio bridges these worlds by distilling the predictive knowledge encoded in virtual cell models into a natural language reasoning system.

The model was released alongside a preprint in August 2025 (Istrate et al., bioRxiv 2025.08.18.670981) and is hosted on CZI's Virtual Cells Platform. rBio v1.0 focuses on gene perturbation biology, enabling users to pose questions such as "Would suppressing gene A result in increased expression of gene B?" and receive reasoned, step-by-step answers grounded in virtual cell predictions. This capability is particularly valuable in early-stage research, where the cost of running systematic experimental screens makes computational pre-selection of perturbation candidates essential.

The release of rBio reflects a broader strategy at CZI to make the Virtual Cells Platform accessible without specialized machine learning expertise. By exposing biological simulation capabilities through natural language interaction, rBio lowers the barrier for wet-lab biologists and translational researchers to benefit from the computational power of virtual cell models.

Key Features

Soft verification training: Unlike conventional reinforcement learning from human feedback, rBio uses virtual cell models and biological knowledge bases — TranscriptFormer predictions, Gene Ontology annotations, and task-specific multilayer perceptrons — as probabilistic soft verifiers. Rewards are tuned in proportion to the likelihood that the model's answers are correct, accommodating the inherent uncertainty of biological questions rather than treating them as problems with binary right-or-wrong answers.
Chain-of-thought biological reasoning: rBio produces interpretable step-by-step reasoning traces before arriving at a final answer, making its logic auditable and helping researchers evaluate the plausibility of its conclusions rather than treating it as a black box.
Multiple verifier configurations: The research evaluates 11 distinct rBio variants, each post-trained with a different combination of verifiers (experimental data, Gene Ontology, TranscriptFormer, and multi-source composites). This allows systematic comparison of which biological knowledge sources contribute most to reasoning quality.
Zero-shot cross-task generalization: rBio can generalize gene co-expression patterns learned from TranscriptFormer — a cell type classification model — to genetic perturbation prediction tasks, demonstrating transfer of biological reasoning across distinct task domains.
Extensible framework: The soft verification paradigm is not specific to gene perturbation; the architecture is designed to incorporate any domain covered by a virtual cell model, positioning rBio as a generalizable interface to the broader Virtual Cells Platform.

Technical Details

rBio-1 is built on Qwen2.5-3B-Instruct, a 3-billion parameter instruction-tuned language model, and post-trained using Group Relative Policy Optimization (GRPO) for 100,000 steps across 8 NVIDIA H100 GPUs over approximately 10 days. The post-training procedure departs from standard RLHF by replacing human reward signals with soft biological verifiers. TranscriptFormer supplies pointwise mutual information (PMI) scores that quantify gene co-expression relationships; these PMI values generate continuous reward signals proportional to the agreement between rBio's predicted gene interactions and TranscriptFormer's learned co-expression model. Gene Ontology annotations contribute keyword-based and ROUGE-based rewards that evaluate whether the model's reasoning faithfully invokes relevant biological concepts.

On the PerturbQA benchmark — a held-out evaluation of genetic perturbation predictions across four cancer cell lines (K562, RPE1, HEPG2, and JURKAT) — rBio-1 achieves F1 scores of 0.74–0.79 and balanced accuracy of 0.83–0.91 using chain-of-thought prompting, outperforming SUMMER (ICLR 2025) and baseline Qwen2.5 models. Notably, the best rBio composite variant (trained with TranscriptFormer, Gene Ontology, MLP, and experimental data jointly) matches performance of an rBio ablation trained directly on experimental data, demonstrating that virtual cell verifiers can substitute for scarce experimental labels. The model weights and training code are available on GitHub under the MIT License, subject to the underlying Qwen Research License.

Applications

rBio is designed for researchers exploring gene regulatory networks, functional genomics, and cellular perturbation biology who want to rapidly evaluate hypotheses before committing to experimental screens. A typical use case involves asking rBio to predict whether knocking out a transcription factor will upregulate or downregulate a panel of downstream targets, using the model's step-by-step reasoning to prioritize candidates for CRISPR validation. The model is also applicable to literature-informed hypothesis generation in fields such as neurodegenerative disease, cancer biology, and immunology, where gene interaction questions often arise early in project design. Because rBio operates through a conversational interface, it is accessible to biologists without computational expertise, substantially widening the practical user base for virtual cell simulation technology.

Impact

rBio introduces a new paradigm for training scientific reasoning models: using computational world models of biology as substitutes for scarce experimental labels. This approach partially resolves a fundamental bottleneck in applying reinforcement learning to life sciences — the shortage of large, unambiguous training signals — by reframing biological world models as probabilistic oracles. The model's state-of-the-art performance on PerturbQA and its cross-task generalization results establish soft verification as a viable training strategy for biological question answering. As CZI expands the Virtual Cells Platform to cover additional cellular domains, the rBio framework is positioned to become a natural language gateway to an increasingly comprehensive computational biology infrastructure. Current limitations include a narrow focus on gene perturbation in the v1.0 release, reliance on the TranscriptFormer model's coverage of cell types and species, and the inherent difficulty of validating open-ended biological reasoning at scale.

Overview

Key Features

Soft verification training: Unlike conventional reinforcement learning from human feedback, rBio uses virtual cell models and biological knowledge bases — TranscriptFormer predictions, Gene Ontology annotations, and task-specific multilayer perceptrons — as probabilistic soft verifiers. Rewards are tuned in proportion to the likelihood that the model's answers are correct, accommodating the inherent uncertainty of biological questions rather than treating them as problems with binary right-or-wrong answers.

Chain-of-thought biological reasoning: rBio produces interpretable step-by-step reasoning traces before arriving at a final answer, making its logic auditable and helping researchers evaluate the plausibility of its conclusions rather than treating it as a black box.

Multiple verifier configurations: The research evaluates 11 distinct rBio variants, each post-trained with a different combination of verifiers (experimental data, Gene Ontology, TranscriptFormer, and multi-source composites). This allows systematic comparison of which biological knowledge sources contribute most to reasoning quality.

Zero-shot cross-task generalization: rBio can generalize gene co-expression patterns learned from TranscriptFormer — a cell type classification model — to genetic perturbation prediction tasks, demonstrating transfer of biological reasoning across distinct task domains.

Extensible framework: The soft verification paradigm is not specific to gene perturbation; the architecture is designed to incorporate any domain covered by a virtual cell model, positioning rBio as a generalizable interface to the broader Virtual Cells Platform.

Technical Details

Applications

Impact

rBio

Overview

Key Features

Technical Details

Applications

Impact

Tags

Resources

rBio

Overview

Key Features

Technical Details

Applications

Impact

Tags

Resources