Overview

Hibou is a family of Vision Transformer foundation models for digital pathology, developed by HistAI and released in June 2024. The family comprises two variants — Hibou-B and Hibou-L — pretrained on a curated dataset of over 1 million whole-slide images (WSIs) using the DINOv2 self-supervised learning framework with additional register tokens for improved feature quality.

What distinguishes Hibou from competing pathology foundation models is its combination of training scale, stain diversity, and permissive licensing. The pretraining corpus spans both H&E-stained slides (936,441 WSIs) and non-H&E modalities (202,464 slides including immunohistochemistry, special stains, and cytology), exposing the model to the full breadth of tissue preparation techniques encountered in real clinical and research settings. Both variants are released under the Apache 2.0 license, enabling unrestricted commercial and research use without the restrictive gating common among competing pathology foundation models such as Prov-GigaPath and Virchow.

At time of publication, Hibou-L established state-of-the-art average accuracy across six standard patch classification datasets and outperformed Prov-GigaPath on all three slide-level WSI classification benchmarks evaluated. Hibou-B, despite having roughly 13 times fewer parameters than GigaPath, matched or exceeded it on two of three slide-level tasks, demonstrating strong parameter efficiency from the DINOv2 training strategy.

Key Features

Two model sizes: Hibou-B (ViT-B/14, ~86M parameters) and Hibou-L (ViT-L/14, ~307M parameters) accommodate different compute budgets without sacrificing representational quality.
DINOv2 with register tokens: Self-supervised pretraining is extended with learnable register tokens that improve attention map quality and reduce artifacts, yielding cleaner patch-level features than standard DINOv2.
Multi-stain training corpus: Coverage of H&E, immunohistochemistry, and special stains ensures features generalize across staining protocols, unlike models trained exclusively on H&E slides.
Apache 2.0 license: Fully permissive for commercial and research use, with no gating or institutional registration requirements beyond a HuggingFace account.
HuggingFace integration: Both variants load directly via the transformers library with a single AutoModel.from_pretrained call, simplifying integration into existing PyTorch pipelines.
State-of-the-art slide-level benchmarks: Hibou-L outperforms Prov-GigaPath on TCGA-BRCA, TCGA-NSCLC, and TCGA-RCC WSI classification tasks using attention-based multiple instance learning pooling.

Technical Details

Both Hibou variants are built on the DINOv2 Vision Transformer architecture with a modification to incorporate register tokens — additional learnable tokens appended to the patch sequence that allow the model to offload global information processing away from local patch tokens, improving spatial feature quality. Hibou-B uses a ViT-B/14 backbone (85.7M parameters, 14-pixel patch size) and Hibou-L uses a ViT-L/14 backbone (~307M parameters, 14-pixel patch size). The choice of 14-pixel rather than the more common 16-pixel patch size yields finer spatial resolution per token, which is advantageous for pathology images where cellular-level features at high magnification are diagnostically relevant.

The pretraining corpus totaled over 1.1 million WSIs: 936,441 H&E slides, 202,464 non-H&E slides, and 2,676 cytology slides, sourced from public and proprietary collections covering multiple human organ systems. Hibou-L trained on approximately 1.2 billion clean patches over 1.175 million iterations on 32 NVIDIA A100-40G GPUs; Hibou-B trained on 512 million patches over 500,000 iterations on 8 A100-80G GPUs. Standard DINOv2 solarization augmentation was deliberately excluded, as it degrades performance on stained tissue images; instead, RandStainNA stain normalization and color jittering were applied. On patch classification benchmarks using linear probing, Hibou-L achieved an average accuracy of 0.890 across six datasets (CRC-100K, PCAM, MHIST, MSI-CRC, MSI-STAD, TIL-DET), surpassing contemporaneous models including Phikon, Kaiko-B8, Virchow, RudolfV, Prov-GigaPath, and H-optimus-0.

Applications

Hibou functions as a general-purpose feature extractor for digital pathology workflows. Downstream tasks include cancer subtyping from WSI patches (e.g., distinguishing IDC from ILC in breast cancer, or LUAD from LUSC in lung cancer), molecular biomarker prediction from H&E slides (microsatellite instability, mutation status), and survival analysis using slide-level aggregated embeddings. The companion CellViT-Hibou-L model — combining Hibou-L features with the CellViT segmentation framework — enables panoptic nuclei segmentation on the PanNuke benchmark, with improved performance over CellViT-SAM-H baselines for epithelial and dead cell categories. Because Hibou was pretrained on non-H&E stains, its representations transfer more reliably to IHC panels and special stain workflows than models trained exclusively on H&E, broadening applicability across clinical laboratory settings.

Impact

Hibou addresses a recognized gap in the pathology foundation model landscape: the combination of open licensing, multi-stain pretraining, and competitive benchmark performance has made it one of the more practically accessible models in the field. Its Apache 2.0 release stands in contrast to the non-commercial or gated licensing of several higher-profile competitors, lowering barriers for both academic research and clinical product development. A notable limitation is that the pretraining WSI dataset is not publicly released, limiting reproducibility of the pretraining procedure. Hibou-L was also trained on approximately one-sixth of HistAI's full proprietary dataset at time of publication, suggesting meaningful headroom for further performance improvement. As with all pathology foundation models, downstream applications require independent clinical validation before deployment in regulated healthcare settings.

Overview

Key Features

Two model sizes: Hibou-B (ViT-B/14, ~86M parameters) and Hibou-L (ViT-L/14, ~307M parameters) accommodate different compute budgets without sacrificing representational quality.

DINOv2 with register tokens: Self-supervised pretraining is extended with learnable register tokens that improve attention map quality and reduce artifacts, yielding cleaner patch-level features than standard DINOv2.

Multi-stain training corpus: Coverage of H&E, immunohistochemistry, and special stains ensures features generalize across staining protocols, unlike models trained exclusively on H&E slides.

Apache 2.0 license: Fully permissive for commercial and research use, with no gating or institutional registration requirements beyond a HuggingFace account.

HuggingFace integration: Both variants load directly via the transformers library with a single AutoModel.from_pretrained call, simplifying integration into existing PyTorch pipelines.

State-of-the-art slide-level benchmarks: Hibou-L outperforms Prov-GigaPath on TCGA-BRCA, TCGA-NSCLC, and TCGA-RCC WSI classification tasks using attention-based multiple instance learning pooling.

Technical Details

Applications

Impact

Hibou

Overview

Key Features

Technical Details

Applications

Impact

Citation

Hibou: A Family of Foundational Vision Transformers for Pathology

Metrics

GitHub

Citations

HuggingFace

Tags

Resources

Hibou

Overview

Key Features

Technical Details

Applications

Impact

Citation

Hibou: A Family of Foundational Vision Transformers for Pathology

Metrics

GitHub

Citations

HuggingFace

Tags

Resources