Genetically aligned foundation model for blood smear cytology that links single white-blood-cell morphology to chromosomal aberrations and mutations for AML/APL diagnosis.
GenBloom is a genetically aligned foundation model for peripheral blood smear cytology, developed by the Marr Lab at Helmholtz Munich together with LMU Munich and presented at MICCAI 2026. It targets the diagnosis of hematological malignancies — in particular acute myeloid leukemia (AML) and its acute promyelocytic leukemia (APL) subtype — where the morphological appearance of single white blood cells under the microscope is the entry point to diagnosis but the definitive disease classification ultimately rests on underlying genetics, such as recurrent chromosomal aberrations and somatic mutations.
The central idea is to bridge this gap between morphology and genetics directly inside the representation space. Rather than learning purely visual features, GenBloom aligns single-cell morphological embeddings with patient-level genetic labels using supervised contrastive learning, so that cells from patients sharing the same genetic alteration are pulled together in embedding space. Aggregating these aligned single-cell embeddings yields a patient-level representation that captures clinically meaningful structure, improving downstream diagnosis and enabling retrieval of patients by disease entity.
GenBloom is released in two variants. GenBloom-V is a vision-only encoder that learns cell morphology without genetic supervision, and GenBloom-G adds the genetic alignment objective on top. The GenBloom-V encoder doubles as a general-purpose single-cell embedding backbone for blood smear images that can be reused off the shelf without re-training.
inference_genbloom.ipynb notebook.GenBloom builds on a self-supervised vision encoder pretrained with the DINOv2 framework, which combines the DINO self-distillation objective with the iBOT masked image modeling objective, to embed individual segmented white blood cells from peripheral blood smears. On top of these single-cell embeddings, the GenBloom-G variant applies a supervised contrastive objective that uses patient-level genetic annotations — chromosomal aberrations and somatic mutations — as the supervision signal, so cells inheriting the same genetic alteration are mapped to nearby points. The model was pretrained on a cohort of more than 1,500 patients, and single-cell embeddings are pooled into patient-level representations for diagnosis and retrieval. Reported experiments compare the vision-only GenBloom-V against the genetically aligned GenBloom-G on AML and APL classification and on zero-shot disease retrieval, with the genetic alignment yielding improved patient-level diagnostic performance.
GenBloom is aimed at computational hematopathology research, where peripheral blood smear analysis is a routine first step in working up suspected leukemia. Its patient-level representations can support AML and APL classification, while the zero-shot retrieval capability lets researchers find patients with similar disease entities directly from cell morphology — useful for cohort building and case review. The reusable GenBloom-V encoder provides a single-cell feature extractor that other groups can drop into downstream blood-cell classification or analysis pipelines without retraining, lowering the barrier to building cytology models.
GenBloom illustrates how patient-level genetic information can be folded into image foundation models so that learned representations reflect the genetics that ultimately define hematological disease entities, rather than morphology alone. By open-sourcing both code and Apache 2.0 weights together with an inference notebook, the Marr Lab makes the model readily reusable as a cytology backbone for the community. As a research model the diagnostic results require independent validation across laboratories, scanners, and staining protocols before any clinical application, and performance on genetic alterations underrepresented in the training cohort remains to be established.