Clinical flow cytometry is a cornerstone of hematopathology, immunology, and the diagnosis of leukemias and lymphomas, measuring the expression of many protein markers across millions of individual cells per specimen. Yet the data is notoriously hard to analyze at scale: different laboratories and assays use different antibody panels, so each panel measures a different, only partially overlapping set of markers. This heterogeneity has historically forced machine learning models to be trained one panel at a time, fragmenting effort and preventing the kind of unified, transferable representations that have transformed other areas of biology.

EventHorizon, developed by researchers at ARUP Laboratories' Institute for Research and Innovation and the University of Utah Department of Pathology and posted to bioRxiv in June 2026, is — to the catalog's knowledge — the first self-supervised foundation model for clinical flow cytometry. It learns a single shared latent space into which specimens from many different panels can be embedded, producing panel-agnostic, specimen-level representations directly from raw multi-panel data. Rather than predicting a fixed set of diagnostic labels, it is pretrained without labels and then evaluated by probing the frozen representations.

The work is notable for translating the self-supervised "foundation model" recipe into a real, high-volume clinical setting, using a corpus of over 100,000 clinical specimens spanning 17 distinct antibody panels drawn from routine diagnostic workflows.

Key Features

Panel-agnostic representations: A marker-aware tokenization scheme lets the model integrate cells measured under different antibody panels into one shared latent space, so specimens from heterogeneous assays become directly comparable.
Hierarchical two-stage transformer: The architecture aggregates information first at the cell level and then at the specimen level, producing a single embedding that summarizes millions of events per specimen.
Self-supervised pretraining: Training uses a DINO-inspired self-distillation objective with flow-cytometry-specific data augmentations, requiring no diagnostic labels during pretraining.
Biology-driven structure: Analysis suggests the learned space organizes specimens primarily by biological diagnosis rather than by technical panel identity, indicating it captures clinically meaningful signal rather than batch artifacts.
Strong frozen-probe performance: Simple k-nearest-neighbor probing of the frozen embeddings achieves accuracy comparable to fully supervised and panel-specific self-supervised baselines.

Technical Details

EventHorizon is built on a two-stage hierarchical transformer. Individual cells (events) are first tokenized in a marker-aware fashion so that markers shared across panels map to a common representation while panel-specific markers are still accommodated; a first transformer stage builds cell-level representations, and a second stage aggregates them into a single specimen-level embedding. Pretraining follows a DINO-style self-distillation scheme — a student network is trained to match a momentum-updated teacher across augmented views — adapted with augmentations tailored to the statistical structure of flow cytometry data. The model was pretrained on more than 100,000 clinical specimens spanning 17 distinct antibody panels. Representations were evaluated by freezing the backbone and applying lightweight k-nearest-neighbor probes, which matched the performance of fully supervised models and panel-specific self-supervised baselines on diagnostic classification tasks.

Applications

The model targets clinical diagnostic workflows in hematopathology and immunology, where flow cytometry is used to characterize and classify hematologic malignancies and immune phenotypes. Because a single embedding space spans many panels, EventHorizon could support scalable, reproducible diagnostic decision support that generalizes across laboratories and assay configurations, retrieval of similar historical cases, and downstream classifiers trained with comparatively few labels. Clinical laboratories, pathologists, and researchers analyzing large archives of heterogeneous cytometry data are the primary beneficiaries.

Impact

As an early foundation model purpose-built for clinical flow cytometry, EventHorizon demonstrates that self-supervised pretraining can unify one of the most panel-fragmented modalities in laboratory medicine into a single representation space, pointing toward more transferable and label-efficient diagnostic tooling. Its authorship by a high-volume clinical reference laboratory underscores a clear translational intent. The work is an early-stage preprint released under a CC BY-ND license; at the time of writing no public code or model weights had been released, and the reported results rest on retrospective probing rather than prospective clinical validation, so its real-world diagnostic utility remains to be established.

Key Features

Panel-agnostic representations: A marker-aware tokenization scheme lets the model integrate cells measured under different antibody panels into one shared latent space, so specimens from heterogeneous assays become directly comparable.

Hierarchical two-stage transformer: The architecture aggregates information first at the cell level and then at the specimen level, producing a single embedding that summarizes millions of events per specimen.

Self-supervised pretraining: Training uses a DINO-inspired self-distillation objective with flow-cytometry-specific data augmentations, requiring no diagnostic labels during pretraining.

Biology-driven structure: Analysis suggests the learned space organizes specimens primarily by biological diagnosis rather than by technical panel identity, indicating it captures clinically meaningful signal rather than batch artifacts.

Strong frozen-probe performance: Simple k-nearest-neighbor probing of the frozen embeddings achieves accuracy comparable to fully supervised and panel-specific self-supervised baselines.

Technical Details

Applications

Impact

EventHorizon

Key Features

Technical Details

Applications

Impact

Citation

EventHorizon: A Foundation Model for Clinical Flow Cytometry

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

EventHorizon

Key Features

Technical Details

Applications

Impact

Citation

EventHorizon: A Foundation Model for Clinical Flow Cytometry

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

EventHorizon

#Key Features

#Technical Details

#Applications

#Impact

Citation

EventHorizon: A Foundation Model for Clinical Flow Cytometry

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

EventHorizon

#Key Features

#Technical Details

#Applications

#Impact

Citation

EventHorizon: A Foundation Model for Clinical Flow Cytometry

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact