bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cellRNA

cfRNA-ICL

Eigen Bio

In-context learning foundation model for cell-free RNA, pretrained on synthetic tasks from a cfRNA-specific structural causal model for few-shot cancer classification.

Released: December 2025

Cell-free RNA (cfRNA) circulating in human plasma offers a minimally invasive window into tissue physiology and disease, making it an attractive substrate for liquid-biopsy cancer detection. However, cfRNA data are unusually difficult for machine learning: measurements are extremely sparse and strongly zero-inflated, abundances follow heavy-tailed distributions, library sizes vary widely, and genes differ markedly in detectability. These properties differ substantially from bulk or single-cell transcriptomics, so models that perform well on conventional tabular or expression data tend to generalize poorly when applied directly to cfRNA, a problem compounded by the scarcity of large, well-labeled cfRNA cohorts.

cfRNA-ICL, introduced by Eigen Bio in a December 2025 bioRxiv preprint, addresses this gap with an in-context learning (ICL) approach adapted from the prior-data fitted network (PFN) / TabPFN family. Instead of training on real labeled cohorts, the model is meta-trained entirely on synthetic classification tasks sampled from a biologically grounded structural causal model (SCM) that is purpose-built to reproduce the statistical geometry of cfRNA. At inference the model performs classification in-context: labeled examples are supplied as context and predictions for new samples are produced in a single forward pass, with no task-specific gradient updates.

The central contribution is the domain-specific prior. Where generic tabular ICL models draw tasks from generic SCMs, cfRNA-ICL's SCM encodes empirical cfRNA behavior, giving the model inductive biases matched to real plasma data. The authors report that this yields consistently stronger cancer classification than tabular ICL models trained on generic synthetic data, with the largest gains in few-shot settings.

#Key Features

  • cfRNA-specific synthetic prior: A structural causal model parameterized from empirical cfRNA measurements generates the meta-training task universe, encoding gene-level dropout, overdispersion, tissue-mixture latent factors, compositional variability, and sequencing noise.
  • In-context, gradient-free inference: Following the PFN/TabPFN paradigm, the model classifies new plasma samples from a handful of labeled examples supplied as context, without per-task retraining or fine-tuning.
  • Strong few-shot behavior: The largest improvements over generic tabular ICL baselines appear in low-label regimes, where exposure to cfRNA-specific statistical structure during meta-training is most valuable.
  • Biologically coherent representations: Unsupervised analyses show the model organizes samples into manifolds that preserve cancer-type identity without supervised constraints.
  • Label-efficient by design: Synthetic pretraining sidesteps the scarcity of large labeled cfRNA cohorts, a major practical bottleneck for the field.

#Technical Details

cfRNA-ICL is a transformer-based prior-data fitted network in the lineage of TabPFN, which uses self-attention among context (training) samples and cross-attention from query (test) samples to those examples to approximate Bayesian inference over the synthetic prior. The defining departure from prior work is the generative prior itself: rather than sampling from generic structural causal models, the authors construct a cfRNA-specific SCM calibrated to empirical measurements of dropout, overdispersion, tissue-mixture-driven latent factors, compositional variability, and sequencing noise, producing a synthetic task distribution whose geometry mirrors real cfRNA. The model is evaluated on multiple cfRNA cancer classification benchmarks against tabular ICL models trained on generic synthetic data, with reported gains most pronounced in few-shot scenarios; the preprint is a single-version bioRxiv release (CC BY-NC-ND), and specific architecture sizes, hyperparameters, and per-benchmark metrics should be confirmed against the full text.

#Applications

cfRNA-ICL targets liquid-biopsy workflows where plasma cfRNA is profiled for cancer detection and classification. Because it learns in-context from a few labeled examples, it is well suited to settings with limited annotated samples, such as emerging assays, rare cancer types, or new cohorts where assembling large training sets is impractical. Beyond classification, its unsupervised representations could support exploratory analysis of cfRNA structure across oncology and other plasma-based applications, and the SCM-prior framework offers a template for building cfRNA models without large labeled datasets.

#Impact

cfRNA-ICL demonstrates that tailoring the synthetic prior of an in-context learning model to the statistics of a difficult biological data type can outperform generic tabular foundation models on that domain. It extends the TabPFN/PFN paradigm into liquid biopsy and articulates a practical route toward foundation-scale cfRNA models that are intrinsically adapted to plasma cfRNA rather than retrofitted from general-purpose architectures. As a recent single-version preprint from an industry group, its benchmarks await independent validation and peer review, and code and pretrained weights were not located in public repositories at the time of writing; nonetheless it contributes a concrete strategy for the persistent challenge of label-scarce cfRNA modeling.

Citation

DOI: 10.64898/2025.12.10.693604

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
8Closed
Usability — can I run it?7
Reproducibility — can I retrain it?10
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cancer_classificationcell_free_rnaearly_cancer_detectionfew_shotfoundation_modelin_context_learningliquid_biopsyrepresentation_learningtranscriptomicstransformer

Resources

Research PaperOfficial Website