Prov-GigaPath
Xu et al., "A whole-slide foundation model for digital pathology from real-world data", Nature, 2024. DOI: 10.1038/s41586-024-07441-w
What they do
Prov-GigaPath uses offline Otsu-based tissue segmentation, resolution normalization via pyvips, and tile extraction with a notably low occupancy threshold.
- Tile size: 256×256 at 0.5 µm/px (20×)
- Resolution normalization: all WSIs are resized to 0.5 µm/px using pyvips before tiling. "This step is necessary because some slides have higher resolution depending on the scanner settings."
- Tissue detection: Otsu thresholding on luminance (simple RGB channel mean) at a downsampled resolution (e.g. 1,024 pixels). Foreground = pixels with luminance below the Otsu threshold. Implemented with
skimage.filters.threshold_otsu()in the source code. - Tissue threshold: 10% — tiles with occupancy < 0.1 are discarded. Occupancy is computed as the fraction of foreground pixels per tile.
- Sampling: all remaining non-overlapping tiles are used for pretraining
- Normalization: ImageNet statistics — mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225). At inference, tiles are resized to 256, center-cropped to 224.
- Architecture: GigaPath — ViT-g/14 tile encoder + LongNet slide encoder
- Training: DINOv2 for tile encoder (standard settings, batch size 12/GPU, effective 384), LongNet for slide-level modeling
- Data: 1.38B tiles from 171K H&E and IHC slides (Providence health network, proprietary)
wsistream approximation
Prov-GigaPath extracts all non-overlapping tissue tiles at 20× offline. This is fundamentally a GridSampler workflow. However, GridSampler yields patches in fixed scan order, so with a finite patches_per_slide it always selects the same top-left-biased subset. RandomSampler avoids that spatial bias, but samples with replacement and does not guarantee full coverage. Neither is a perfect online substitute for offline full-grid extraction.
The paper normalizes all WSIs to 0.5 µm/px with pyvips before tiling. wsistream instead selects the closest existing pyramid level via target_mpp, which avoids a full-slide resize but means the effective resolution may not be exactly 0.5 µm/px.
from wsistream.pipeline import PatchPipeline
from wsistream.backends import OpenSlideBackend
from wsistream.tissue import OtsuTissueDetector
from wsistream.sampling import RandomSampler
pipeline = PatchPipeline(
slide_paths=slide_paths,
backend=OpenSlideBackend(),
tissue_detector=OtsuTissueDetector(),
sampler=RandomSampler(
patch_size=256, # 256x256 at 20x
num_patches=-1,
target_mpp=0.5, # 20x magnification (0.5 µm/px)
tissue_threshold=0.1, # 10% occupancy (notably lower than other papers)
),
pool_size=8,
patches_per_slide=100,
cycle=True,
)
With multi-crop views
Prov-GigaPath trains the tile encoder with DINOv2 multi-crop on 256×256 tiles. The paper does not enumerate the DINOv2 crop configuration (scale ranges, crop counts, local output size). The crop scales and counts below follow DINOv2's default SSL config; the 98px local crop size follows the DINOv2 ViT-g/14 training config.
from wsistream.views import ViewConfig, RandomResizedCrop
pipeline = PatchPipeline(
slide_paths=slide_paths,
backend=OpenSlideBackend(),
tissue_detector=OtsuTissueDetector(),
sampler=RandomSampler(patch_size=256, num_patches=-1, target_mpp=0.5,
tissue_threshold=0.1),
views=[
ViewConfig(
name="global",
crop=RandomResizedCrop(size=224, scale=(0.32, 1.0)),
count=2, # global_0, global_1 — DINOv2 default: 2 global crops
),
ViewConfig(
name="local",
crop=RandomResizedCrop(size=98, scale=(0.05, 0.32)), # DINOv2 ViT-g/14 config
count=8, # local_0 … local_7 — DINOv2 default: 8 local crops
),
],
pool_size=8,
patches_per_slide=100,
cycle=True,
)
Per-crop augmentations
Prov-GigaPath uses standard DINOv2 photometric augmentations (color jitter, Gaussian blur, grayscale, solarization, horizontal flip) applied per crop. To add them with view-asymmetric probabilities matching DINOv2 defaults, see the DINOv2-style multi-crop example.
Deviations from paper
| Step | Paper | wsistream | Match |
|---|---|---|---|
| Tissue detection | Otsu on luminance (RGB channel mean) via skimage.filters.threshold_otsu |
OtsuTissueDetector (Otsu on weighted grayscale via cv2.cvtColor) |
Approximate — both are Otsu-based, but the grayscale conversion differs (simple RGB mean vs. weighted 0.299R+0.587G+0.114B) |
| Tissue threshold | 10% occupancy | tissue_threshold=0.1 |
Exact |
| Resolution normalization | pyvips resize of entire WSI to 0.5 µm/px before tiling | target_mpp=0.5 selects closest pyramid level |
Approximate — wsistream reads from the nearest existing level rather than resampling the full slide |
| Tile size | 256×256 | patch_size=256 |
Exact |
| Extraction | Offline, all non-overlapping tiles | Online random sampling (with replacement) | Different — random sampling does not guarantee full coverage |
| Normalization | ImageNet mean/std | Training code | Exact (not part of wsistream) |
| DINOv2 crop sizes / scales | Not enumerated in paper | DINOv2 default scales; local size 98 from DINOv2 ViT-g/14 config | Unverified — paper does not state crop sizes or scale ranges |
| DINOv2 crop counts | Not enumerated in paper | 2 global + 8 local (DINOv2 default) | Unverified |