bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell

RVQ-Alpha

Guangzhou National Laboratory

A Qwen3-4B language model that reads and reasons over single cells by tokenizing scRNA-seq with residual vector quantization and training with verifiable reinforcement learning.

Released: April 2026
Parameters: 4 Billion

RVQ-Alpha is a framework for connecting single-cell transcriptomics to large language models (LLMs) by representing each cell as a short sequence of discrete tokens that live directly inside the LLM vocabulary. Single-cell RNA sequencing (scRNA-seq) produces continuous, high-dimensional expression profiles that do not map cleanly onto the discrete token streams LLMs expect. Prior LLM-for-biology approaches either serialize cells into long text "sentences" of ranked gene names or attach continuous embeddings through a separate encoder, both of which inflate sequence length and leave the model prone to hallucinating biological claims that are not grounded in the underlying measurements.

RVQ-Alpha addresses this with a residual vector quantization (RVQ) tokenizer that compresses each cell into a fixed 10-token sequence, embedding the new cell tokens natively in the vocabulary of a Qwen3-4B base model. A single autoregressive model can then interpret existing cell states (for example, annotating cell type) and generate new ones (for example, predicting a post-perturbation profile), with the generated tokens decoded back into expression space by the RVQ decoder.

The model was developed by researchers at Guangzhou National Laboratory and posted to bioRxiv in April 2026. The released artifact is a fixed Qwen3-4B checkpoint produced by continued pretraining, supervised fine-tuning (SFT), and reinforcement learning with verifiable rewards (RLVR) on scRNA-seq data.

#Key Features

  • Compact discrete cell tokenization: An eight-codebook residual quantizer (32 entries per codebook) encodes each cell in just 10 tokens, roughly 3.4x fewer than prior discrete tokenization methods, where earlier codebooks capture broad identity and later codebooks refine within-lineage variation.
  • Vocabulary-native cell tokens: Cell tokens are embedded directly in the LLM vocabulary rather than passed through a separate encoder, letting one autoregressive model both interpret and generate cell states.
  • Evidence-first reasoning (scCoT-Synth): A teacher-student engine grounds the newly added biological tokens through "evidence-before-conclusion" chain-of-thought, reducing unsupported claims.
  • Fact-Aware RLVR: A verifiable-reward reinforcement learning stage pairs an ontology-grounded answer judge with saliency-weighted verification of biological claims against the actual expression data.
  • Generative and discriminative in one model: The same checkpoint supports cell type annotation and autoregressive generation of post-perturbation cell states.

#Technical Details

RVQ-Alpha is built on the Qwen3-4B transformer (~4 billion parameters). Each cell is quantized by eight residual codebooks of 32 entries each into a fixed 10-token representation that is inserted into the LLM vocabulary, reducing sequence length substantially relative to text-based gene-name serialization. Training proceeds in three stages: continued pretraining to integrate the cell tokens, supervised fine-tuning with scCoT-Synth-generated evidence-first reasoning traces, and a Fact-Aware RLVR stage combining an ontology-grounded answer judge with saliency-weighted claim verification. The model is evaluated across eight held-out, out-of-distribution (OOD) datasets, where it improves OOD generalization and rare-cell recognition; ablation studies report that evidence-first grounding reduces hallucination by more than fivefold relative to baselines.

#Applications

RVQ-Alpha targets single-cell analysis workflows that benefit from natural-language interaction, including cell type annotation on unseen datasets and prediction of post-perturbation cell states for in-silico screening. By generating reasoning that cites supporting expression evidence, it is positioned for settings where computational biologists need annotations and hypotheses that can be traced back to the underlying measurements rather than accepted as opaque outputs.

#Impact

RVQ-Alpha contributes to a fast-growing line of work adapting LLMs to single-cell biology, alongside efforts such as Cell2Sentence, by showing that compact discrete tokenization plus verifiable reinforcement learning can improve out-of-distribution generalization and sharply curb hallucination. Its emphasis on evidence-grounded reasoning and ontology-based reward verification offers a template for making LLM-based cell models more trustworthy. As of the April 2026 preprint, model weights have not been publicly released, which currently limits independent reproduction and downstream adoption.

Tags

cell_type_annotationperturbation_predictiontransformerresidual_vector_quantizationlanguage_modelreinforcement_learningmultimodalsingle_cell_rna_seq