bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
BiosignalsSingle-cell

EventHorizon

ARUP Laboratories / University of Utah

A self-supervised foundation model for clinical flow cytometry that produces unified, panel-agnostic specimen-level representations from heterogeneous multi-panel data.

Released: June 2026

Clinical flow cytometry is a cornerstone of hematopathology, immunology, and the diagnosis of leukemias and lymphomas, measuring the expression of many protein markers across millions of individual cells per specimen. Yet the data is notoriously hard to analyze at scale: different laboratories and assays use different antibody panels, so each panel measures a different, only partially overlapping set of markers. This heterogeneity has historically forced machine learning models to be trained one panel at a time, fragmenting effort and preventing the kind of unified, transferable representations that have transformed other areas of biology.

EventHorizon, developed by researchers at ARUP Laboratories' Institute for Research and Innovation and the University of Utah Department of Pathology and posted to bioRxiv in June 2026, is — to the catalog's knowledge — the first self-supervised foundation model for clinical flow cytometry. It learns a single shared latent space into which specimens from many different panels can be embedded, producing panel-agnostic, specimen-level representations directly from raw multi-panel data. Rather than predicting a fixed set of diagnostic labels, it is pretrained without labels and then evaluated by probing the frozen representations.

The work is notable for translating the self-supervised "foundation model" recipe into a real, high-volume clinical setting, using a corpus of over 100,000 clinical specimens spanning 17 distinct antibody panels drawn from routine diagnostic workflows.

#Key Features

  • Panel-agnostic representations: A marker-aware tokenization scheme lets the model integrate cells measured under different antibody panels into one shared latent space, so specimens from heterogeneous assays become directly comparable.
  • Hierarchical two-stage transformer: The architecture aggregates information first at the cell level and then at the specimen level, producing a single embedding that summarizes millions of events per specimen.
  • Self-supervised pretraining: Training uses a DINO-inspired self-distillation objective with flow-cytometry-specific data augmentations, requiring no diagnostic labels during pretraining.
  • Biology-driven structure: Analysis suggests the learned space organizes specimens primarily by biological diagnosis rather than by technical panel identity, indicating it captures clinically meaningful signal rather than batch artifacts.
  • Strong frozen-probe performance: Simple k-nearest-neighbor probing of the frozen embeddings achieves accuracy comparable to fully supervised and panel-specific self-supervised baselines.

#Technical Details

EventHorizon is built on a two-stage hierarchical transformer. Individual cells (events) are first tokenized in a marker-aware fashion so that markers shared across panels map to a common representation while panel-specific markers are still accommodated; a first transformer stage builds cell-level representations, and a second stage aggregates them into a single specimen-level embedding. Pretraining follows a DINO-style self-distillation scheme — a student network is trained to match a momentum-updated teacher across augmented views — adapted with augmentations tailored to the statistical structure of flow cytometry data. The model was pretrained on more than 100,000 clinical specimens spanning 17 distinct antibody panels. Representations were evaluated by freezing the backbone and applying lightweight k-nearest-neighbor probes, which matched the performance of fully supervised models and panel-specific self-supervised baselines on diagnostic classification tasks.

#Applications

The model targets clinical diagnostic workflows in hematopathology and immunology, where flow cytometry is used to characterize and classify hematologic malignancies and immune phenotypes. Because a single embedding space spans many panels, EventHorizon could support scalable, reproducible diagnostic decision support that generalizes across laboratories and assay configurations, retrieval of similar historical cases, and downstream classifiers trained with comparatively few labels. Clinical laboratories, pathologists, and researchers analyzing large archives of heterogeneous cytometry data are the primary beneficiaries.

#Impact

As an early foundation model purpose-built for clinical flow cytometry, EventHorizon demonstrates that self-supervised pretraining can unify one of the most panel-fragmented modalities in laboratory medicine into a single representation space, pointing toward more transferable and label-efficient diagnostic tooling. Its authorship by a high-volume clinical reference laboratory underscores a clear translational intent. The work is an early-stage preprint released under a CC BY-ND license; at the time of writing no public code or model weights had been released, and the reported results rest on retrospective probing rather than prospective clinical validation, so its real-world diagnostic utility remains to be established.

Citation

EventHorizon: A Foundation Model for Clinical Flow Cytometry

Grespan, M. M., et al. (2026) EventHorizon: A Foundation Model for Clinical Flow Cytometry. bioRxiv.

DOI: 10.64898/2026.06.18.733197

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0
Influential0
References52

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
4Closed
Usability — can I run it?7
Reproducibility — can I retrain it?0
not reproducible
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cell_type_annotationdiagnostic_classificationflow_cytometryfoundation_modelhematopathologyrepresentation_learningself_distillationself_supervisedtransformer

Resources

Research Paper