Phikon
Filiot et al., "Phikon-v2, A large and public feature extractor for biomarker prediction", NeurIPS, 2024.
Filiot et al., "Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling", 2023. medRxiv:2023.07.21.23292757
What they do
Phikon-v2 uses offline tissue segmentation with a proprietary U-Net, followed by tile extraction at a single magnification.
- Tile size: 224×224 at 20× (0.5 µm/px)
- Tissue detection: an in-house bi-directional U-Net segments tissue and discards background and artifacts at 2.5× magnification. This model is not open-sourced.
- Tissue threshold: 60% — tiles must have a minimal tissue matter proportion of 60%
- Normalization: ImageNet — mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)
- DINOv2 multi-crop: tiles are 224×224 and fed to DINOv2. From Extended Table 5 in the paper: 2 global crops (size 224, scale 0.32–1.0) and 8 local crops (size 96, scale 0.05–0.32). Local 96px = 6 × 16 for ViT-L/16. No registers.
- Architecture: ViT-L/16 (307M params), no registers
- Training: DINOv2 (DINO + iBOT + KoLeo). 250K iterations, batch size 4096, 32 nodes × 4 V100 GPUs (128 V100s total), 11K GPU hours. Released model taken at iteration 100K (400M tiles seen, ~93% of dataset).
- Data: PANCAN-XL — 456M tiles from 58K publicly available WSIs across 132 public datasets + 4 internal (TCGA, CPTAC, GTEx, and others), covering 30+ cancer sites
wsistream approximation
Phikon-v2's tissue detection uses a proprietary U-Net at 2.5× magnification that is not publicly available. We substitute OtsuTissueDetector as a heuristic. Note that Phikon-v2 extracts tiles at 224×224 directly (not 256 with a resize), matching the DINOv2 global crop size.
from wsistream.pipeline import PatchPipeline
from wsistream.backends import OpenSlideBackend
from wsistream.tissue import OtsuTissueDetector
from wsistream.sampling import RandomSampler
pipeline = PatchPipeline(
slide_paths=slide_paths,
backend=OpenSlideBackend(),
tissue_detector=OtsuTissueDetector(),
sampler=RandomSampler(
patch_size=224, # 224x224 at 20x (no separate resize step)
num_patches=-1,
target_mpp=0.5, # 20x magnification (0.5 µm/px)
tissue_threshold=0.6, # 60% tissue required
),
pool_size=8,
patches_per_slide=100,
cycle=True,
)
With multi-crop views
Phikon-v2 feeds 224×224 tiles to DINOv2's multi-crop. Extended Table 5 of the paper states the crop configuration. Because the source tile (224px) equals the global crop output size (224px), global crops at scale < 1.0 involve upsampling — this is intentional and matches the paper's setup.
from wsistream.views import ViewConfig, RandomResizedCrop
pipeline = PatchPipeline(
slide_paths=slide_paths,
backend=OpenSlideBackend(),
tissue_detector=OtsuTissueDetector(),
sampler=RandomSampler(patch_size=224, num_patches=-1, target_mpp=0.5,
tissue_threshold=0.6),
views=[
ViewConfig(
name="global",
crop=RandomResizedCrop(size=224, scale=(0.32, 1.0)),
count=2, # global_0, global_1 — paper: 2 global crops (Extended Table 5)
),
ViewConfig(
name="local",
crop=RandomResizedCrop(size=96, scale=(0.05, 0.32)), # paper: 8 local, 96px = 6×16 for ViT-L/16
count=8, # local_0 … local_7 — paper: 8 local crops (Extended Table 5)
),
],
pool_size=8,
patches_per_slide=100,
cycle=True,
)
Per-crop augmentations
Phikon-v2 uses standard DINOv2 photometric augmentations (color jitter, Gaussian blur, grayscale, solarization, horizontal flip) applied per crop. To add them with view-asymmetric probabilities matching DINOv2 defaults, see the DINOv2-style multi-crop example.
Deviations from paper
| Step | Paper | wsistream | Match |
|---|---|---|---|
| Tissue detection | In-house bi-directional U-Net at 2.5× (not open-sourced) | OtsuTissueDetector |
Approximate — heuristic substitute for a learned model |
| Tissue threshold | 60% | tissue_threshold=0.6 |
Exact |
| Tile size | 224×224 | patch_size=224 |
Exact |
| Magnification | 20× (0.5 µm/px) | target_mpp=0.5 |
Exact |
| Extraction | Offline, all non-overlapping tiles | Online random sampling (with replacement) | Different — random sampling does not guarantee full coverage |
| DINOv2 global crop | size 224, scale (0.32, 1.0), count 2 (Extended Table 5) | size=224, scale=(0.32, 1.0), count=2 |
Exact |
| DINOv2 local crop | size 96, scale (0.05, 0.32), count 8 (Extended Table 5) | size=96, scale=(0.05, 0.32), count=8 |
Exact |
| Normalization | ImageNet mean/std | Training code | Exact (not part of wsistream) |
Original Phikon (Filiot et al., 2023)
The original Phikon uses a simpler pipeline: tiles are extracted at 224×224 and 20× using OpenSlide's DeepZoomGenerator (tile_size=224, overlap=0) with basic foreground filtering ("all matter tiles"). It was trained with iBOT (not DINOv2) on a ViT-B/16 using ~40M tiles from TCGA (16 cancer types). Phikon-v2 scales this up in both data (456M tiles from 58K WSIs) and model (ViT-L) while switching from iBOT to DINOv2 and adding a U-Net tissue segmenter.
The same wsistream approximation above applies to Phikon, except that Phikon's simpler foreground filtering is likely closer to OtsuTissueDetector than Phikon-v2's U-Net.