bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cellLanguage model

CellPulse

Wuhan Institute of Virology

A direction-aware foundation model trained on ~23M bulk RNA-seq differential-expression profiles that simulates coordinated gene dynamics in viral infection.

Released: April 2026

CellPulse is a transcriptomic foundation model developed at the State Key Laboratory of Virology and Biosafety, Wuhan Institute of Virology (Chinese Academy of Sciences), and released as a bioRxiv preprint in April 2026. It addresses a gap left by most gene-expression foundation models: their reliance on single-cell data and static snapshots of expression, which limits their relevance to real clinical scenarios where the biologically meaningful signal is how a tissue's transcriptome shifts in response to a perturbation such as a viral infection.

Rather than modeling absolute expression levels, CellPulse is explicitly direction-aware — it learns from differential-expression profiles that encode the direction and magnitude by which each gene moves between an infected and a baseline state. This framing lets the model capture coordinated gene dynamics, i.e. the regulatory programs that genes co-engage during infection, instead of treating each gene independently. The model is trained on a newly assembled resource the authors call the Virus Stimulated Atlas (VISTA), comprising over 23 million bulk RNA-seq differential-expression profiles drawn from viral-infection studies.

Positioned at the intersection of single-cell/transcriptomic foundation models (such as Geneformer and scGPT) and infection biology, CellPulse is distinctive in being built on bulk rather than single-cell data and in being purpose-tuned for the perturbation-response setting most relevant to infectious-disease research.

#Key Features

  • Direction-aware modeling: Trained on differential-expression profiles, the model encodes the sign and magnitude of each gene's response, capturing coordinated up- and down-regulation rather than static expression snapshots.
  • VISTA training atlas: Learns from the Virus Stimulated Atlas of over 23 million bulk RNA-seq differential-expression profiles assembled from viral-infection datasets.
  • 31-class virus-type classification: Identifies 31 distinct virus types from host transcriptional signatures alone, without requiring viral genomic sequence.
  • Emergent host-factor discovery: Surfaces host factors involved in infection without explicit supervision for that task, providing candidate targets for downstream interrogation.
  • Drug-discovery readout: Host factors highlighted by the model were used to nominate therapeutic candidates, with hits validated in laboratory and animal experiments.

#Technical Details

CellPulse is a transformer-based foundation model trained in a self-supervised fashion on the VISTA corpus of more than 23 million bulk RNA-seq differential-expression profiles. The key modeling choice is its representation of input: instead of raw counts, the model operates on differential-expression signals that explicitly encode the direction of change, allowing it to learn regulatory patterns of coordinated gene dynamics. On the benchmark virus-typing task, the model classifies 31 distinct virus types from host gene-activity patterns alone. Beyond classification, the authors report that the model recovers infection-relevant host factors without dedicated training for that objective, and they use those factors to drive a drug-screening pipeline whose predictions were corroborated by wet-lab and animal-model experiments. As an April 2026 preprint, the work has not yet completed peer review, and a public code or weights release was not identified at the time of writing.

#Applications

CellPulse is aimed at infectious-disease and antiviral-discovery researchers who work from host transcriptomes. Its virus-typing capability supports genome-agnostic identification of infection from host response signatures, which is useful when viral sequence is unavailable, degraded, or from an uncharacterized agent. Its host-factor discovery and drug-screening outputs offer a computational starting point for target identification and antiviral repurposing, with the authors' experimental validation illustrating how predictions can be carried into the lab. More broadly, the differential-expression framing is applicable to other perturbation-response settings where coordinated gene dynamics, rather than static expression, carry the signal of interest.

#Impact

CellPulse contributes a perturbation-centric, bulk-RNA-seq foundation model to a field dominated by single-cell, static-expression models, and pairs it with VISTA, a large viral-infection differential-expression atlas that is itself a notable resource. By demonstrating an end-to-end path from a transcriptomic model to experimentally validated antiviral candidates, the work argues that direction-aware modeling of coordinated gene dynamics can yield clinically actionable hypotheses. Its broader influence will depend on independent evaluation, peer review, and the availability of the model and VISTA atlas to the community; the release is distributed under a CC BY-NC-ND license, which permits non-commercial reuse without derivatives.

Tags

gene_expressionvariant_effect_predictiondrug_discoverytransformerfoundation_modelself_supervisedvirologytranscriptomics