bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Pathology

GenBloom

Helmholtz Munich / LMU Munich

Genetically aligned foundation model for blood smear cytology that links single white-blood-cell morphology to chromosomal aberrations and mutations for AML/APL diagnosis.

Released: May 2026

GenBloom is a genetically aligned foundation model for peripheral blood smear cytology, developed by the Marr Lab at Helmholtz Munich together with LMU Munich and presented at MICCAI 2026. It targets the diagnosis of hematological malignancies — in particular acute myeloid leukemia (AML) and its acute promyelocytic leukemia (APL) subtype — where the morphological appearance of single white blood cells under the microscope is the entry point to diagnosis but the definitive disease classification ultimately rests on underlying genetics, such as recurrent chromosomal aberrations and somatic mutations.

The central idea is to bridge this gap between morphology and genetics directly inside the representation space. Rather than learning purely visual features, GenBloom aligns single-cell morphological embeddings with patient-level genetic labels using supervised contrastive learning, so that cells from patients sharing the same genetic alteration are pulled together in embedding space. Aggregating these aligned single-cell embeddings yields a patient-level representation that captures clinically meaningful structure, improving downstream diagnosis and enabling retrieval of patients by disease entity.

GenBloom is released in two variants. GenBloom-V is a vision-only encoder that learns cell morphology without genetic supervision, and GenBloom-G adds the genetic alignment objective on top. The GenBloom-V encoder doubles as a general-purpose single-cell embedding backbone for blood smear images that can be reused off the shelf without re-training.

#Key Features

  • Genetic alignment of morphology: Supervised contrastive learning aligns single white-blood-cell image embeddings with chromosomal aberrations and somatic mutations, injecting genetic structure into a purely visual representation.
  • Patient-level representations: Single-cell embeddings are aggregated into a patient-level representation that improves AML/APL diagnosis over morphology-only baselines.
  • Two complementary variants: GenBloom-V (vision-only) and GenBloom-G (genetically aligned) support both general-purpose embedding and genetics-informed diagnosis.
  • Zero-shot disease retrieval: The aligned embedding space supports retrieving patients by disease entity without task-specific retraining.
  • Reusable cell-embedding backbone: The GenBloom-V encoder serves as an off-the-shelf single-cell feature extractor for blood smear cytology, usable without fine-tuning.
  • Open weights and inference code: Model weights are released on HuggingFace under Apache 2.0, with a provided inference_genbloom.ipynb notebook.

#Technical Details

GenBloom builds on a self-supervised vision encoder pretrained with the DINOv2 framework, which combines the DINO self-distillation objective with the iBOT masked image modeling objective, to embed individual segmented white blood cells from peripheral blood smears. On top of these single-cell embeddings, the GenBloom-G variant applies a supervised contrastive objective that uses patient-level genetic annotations — chromosomal aberrations and somatic mutations — as the supervision signal, so cells inheriting the same genetic alteration are mapped to nearby points. The model was pretrained on a cohort of more than 1,500 patients, and single-cell embeddings are pooled into patient-level representations for diagnosis and retrieval. Reported experiments compare the vision-only GenBloom-V against the genetically aligned GenBloom-G on AML and APL classification and on zero-shot disease retrieval, with the genetic alignment yielding improved patient-level diagnostic performance.

#Applications

GenBloom is aimed at computational hematopathology research, where peripheral blood smear analysis is a routine first step in working up suspected leukemia. Its patient-level representations can support AML and APL classification, while the zero-shot retrieval capability lets researchers find patients with similar disease entities directly from cell morphology — useful for cohort building and case review. The reusable GenBloom-V encoder provides a single-cell feature extractor that other groups can drop into downstream blood-cell classification or analysis pipelines without retraining, lowering the barrier to building cytology models.

#Impact

GenBloom illustrates how patient-level genetic information can be folded into image foundation models so that learned representations reflect the genetics that ultimately define hematological disease entities, rather than morphology alone. By open-sourcing both code and Apache 2.0 weights together with an inference notebook, the Marr Lab makes the model readily reusable as a cytology backbone for the community. As a research model the diagnostic results require independent validation across laboratories, scanners, and staining protocols before any clinical application, and performance on genetic alterations underrepresented in the training cohort remains to be established.

Citation

Preprint

DOI: 10.48550/arXiv.2605.29980

DOI: 10.48550/arXiv.2605.29980

Openness

Unclassified
Missing required components

Tags

cell_type_annotationcontrastive_learningcytologydisease_classificationfoundation_modelhematologyrepresentation_learningvision_transformerzero_shot

Resources

GitHub RepositoryResearch PaperHuggingFace Model