Hibou is a family of Vision Transformer foundation models for digital pathology, developed by HistAI and released in June 2024. The family comprises two variants — Hibou-B and Hibou-L — pretrained on a curated dataset of over 1 million whole-slide images (WSIs) using the DINOv2 self-supervised learning framework with additional register tokens for improved feature quality.
What distinguishes Hibou from competing pathology foundation models is its combination of training scale, stain diversity, and permissive licensing. The pretraining corpus spans both H&E-stained slides (936,441 WSIs) and non-H&E modalities (202,464 slides including immunohistochemistry, special stains, and cytology), exposing the model to the full breadth of tissue preparation techniques encountered in real clinical and research settings. Both variants are released under the Apache 2.0 license, enabling unrestricted commercial and research use without the restrictive gating common among competing pathology foundation models such as Prov-GigaPath and Virchow.
At time of publication, Hibou-L established state-of-the-art average accuracy across six standard patch classification datasets and outperformed Prov-GigaPath on all three slide-level WSI classification benchmarks evaluated. Hibou-B, despite having roughly 13 times fewer parameters than GigaPath, matched or exceeded it on two of three slide-level tasks, demonstrating strong parameter efficiency from the DINOv2 training strategy.
transformers library with a single AutoModel.from_pretrained call, simplifying integration into existing PyTorch pipelines.Both Hibou variants are built on the DINOv2 Vision Transformer architecture with a modification to incorporate register tokens — additional learnable tokens appended to the patch sequence that allow the model to offload global information processing away from local patch tokens, improving spatial feature quality. Hibou-B uses a ViT-B/14 backbone (85.7M parameters, 14-pixel patch size) and Hibou-L uses a ViT-L/14 backbone (~307M parameters, 14-pixel patch size). The choice of 14-pixel rather than the more common 16-pixel patch size yields finer spatial resolution per token, which is advantageous for pathology images where cellular-level features at high magnification are diagnostically relevant.
The pretraining corpus totaled over 1.1 million WSIs: 936,441 H&E slides, 202,464 non-H&E slides, and 2,676 cytology slides, sourced from public and proprietary collections covering multiple human organ systems. Hibou-L trained on approximately 1.2 billion clean patches over 1.175 million iterations on 32 NVIDIA A100-40G GPUs; Hibou-B trained on 512 million patches over 500,000 iterations on 8 A100-80G GPUs. Standard DINOv2 solarization augmentation was deliberately excluded, as it degrades performance on stained tissue images; instead, RandStainNA stain normalization and color jittering were applied. On patch classification benchmarks using linear probing, Hibou-L achieved an average accuracy of 0.890 across six datasets (CRC-100K, PCAM, MHIST, MSI-CRC, MSI-STAD, TIL-DET), surpassing contemporaneous models including Phikon, Kaiko-B8, Virchow, RudolfV, Prov-GigaPath, and H-optimus-0.
Hibou functions as a general-purpose feature extractor for digital pathology workflows. Downstream tasks include cancer subtyping from WSI patches (e.g., distinguishing IDC from ILC in breast cancer, or LUAD from LUSC in lung cancer), molecular biomarker prediction from H&E slides (microsatellite instability, mutation status), and survival analysis using slide-level aggregated embeddings. The companion CellViT-Hibou-L model — combining Hibou-L features with the CellViT segmentation framework — enables panoptic nuclei segmentation on the PanNuke benchmark, with improved performance over CellViT-SAM-H baselines for epithelial and dead cell categories. Because Hibou was pretrained on non-H&E stains, its representations transfer more reliably to IHC panels and special stain workflows than models trained exclusively on H&E, broadening applicability across clinical laboratory settings.
Hibou addresses a recognized gap in the pathology foundation model landscape: the combination of open licensing, multi-stain pretraining, and competitive benchmark performance has made it one of the more practically accessible models in the field. Its Apache 2.0 release stands in contrast to the non-commercial or gated licensing of several higher-profile competitors, lowering barriers for both academic research and clinical product development. A notable limitation is that the pretraining WSI dataset is not publicly released, limiting reproducibility of the pretraining procedure. Hibou-L was also trained on approximately one-sixth of HistAI's full proprietary dataset at time of publication, suggesting meaningful headroom for further performance improvement. As with all pathology foundation models, downstream applications require independent clinical validation before deployment in regulated healthcare settings.
Nechaev, D., et al. (2024) Hibou: A Family of Foundational Vision Transformers for Pathology. arXiv.org.
DOI: 10.48550/arXiv.2406.05074