bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

CLOP-DiT

Third Military Medical University

Generates single-cell transcriptomic profiles from structured biological metadata via contrastive language-omics pretraining and a diffusion transformer.

Released: March 2026

Generative models of single-cell transcriptomes promise controllable in silico cell populations for simulation, data augmentation, and hypothesis testing — but steering generation with biologically meaningful descriptions, rather than opaque latent codes, remains difficult. CLOP-DiT tackles this by conditioning single-cell generation on structured metadata, letting researchers specify cells in natural biological terms and obtain matching expression profiles.

Developed by researchers at the Third Military Medical University (Army Medical University, Chongqing) and posted to bioRxiv in March 2026, CLOP-DiT is a modular three-stage pipeline. A Contrastive Language-Omics Pretraining (CLOP) stage aligns text embeddings from BiomedBERT with cell embeddings from scGPT into a shared 512-dimensional space. A conditional Diffusion Transformer then generates scGPT-compatible latent states via flow matching, steered by a five-field biological template — cell type, tissue, organism, marker genes, and disease. Finally, a frozen scGPT decoder maps the generated latents back to gene expression.

By reusing frozen, pretrained omics and language backbones and learning only the alignment and the diffusion generator, CLOP-DiT produces realistic, metadata-controllable single-cell profiles without retraining its foundation-model components.

#Key Features

  • Metadata-conditioned generation: Produces cells from a structured five-field template (cell type, tissue, organism, marker genes, disease), enabling biologically interpretable control over generated states.
  • Contrastive language-omics alignment: The CLOP stage maps BiomedBERT text and scGPT cell embeddings into a shared 512-dimensional space, grounding generation in natural-language biology.
  • Diffusion transformer with flow matching: A conditional DiT generates scGPT-compatible latents via flow matching, supporting high-fidelity or high-diversity sampling regimes.
  • Frozen foundation-model backbones: Reuses pretrained scGPT (encoder and decoder) and BiomedBERT without retraining, generating new profiles from a fixed trained DiT checkpoint.
  • Tunable fidelity-diversity tradeoff: Classifier-free guidance lets users trade realism against diversity for different downstream needs.

#Technical Details

CLOP-DiT comprises three components. The contrastive aligner (CLOP) projects BiomedBERT text embeddings and scGPT cell embeddings into a shared 512-dimensional space. The conditional Diffusion Transformer generates scGPT-compatible latent states through flow matching, steered by the five-field template. A frozen scGPT decoder converts latents to gene expression. The system was trained on 220,304 cells spanning 69 cell types drawn from 80 GEO datasets. Evaluation reports a tunable tradeoff via classifier-free guidance (CFG): a high-fidelity regime (CFG = 2.0) reaches 36.9% KNN accuracy — roughly 25x chance — with 81.0% steering accuracy, while a high-diversity regime (CFG = 1.0) achieves a diversity ratio of 0.93 at 80.7% steering. The trained DiT checkpoint generates new profiles without retraining.

#Applications

CLOP-DiT is aimed at computational biologists and method developers who need controllable synthetic single-cell data. Use cases include augmenting rare cell-type representation in training sets, simulating cell states under specified tissue, organism, marker, or disease conditions, benchmarking analysis pipelines on data with known generative ground truth, and hypothesis-driven creation of cell states described in natural-language metadata.

#Impact

By coupling a biomedical language model with a single-cell foundation model through contrastive alignment and a diffusion transformer, CLOP-DiT advances controllable, metadata-grounded transcriptome generation. Its modular reuse of frozen scGPT and BiomedBERT backbones makes the approach lightweight to deploy and adapt. As a preprint with no public weights release confirmed at the time of writing, its reported steering and fidelity metrics await peer review and independent benchmarking against other single-cell generative models.

Tags

single_cell_generationdata_augmentationin_silico_simulationdiffusiontransformergenerativecontrastive_learningmultimodaltranscriptomicsgene_expression