Getting Started

Installation

git clone https://github.com/RamonKaspar/wsistream.git
cd wsistream
pip install -e ".[openslide]"   # or [tiffslide], or [all], or [dev]

Available extras:

Extra	What it adds
`openslide`	OpenSlide backend (requires system-level OpenSlide library)
`tiffslide`	TiffSlide backend (pure Python, no C dependencies)
`torch`	PyTorch integration (`WsiStreamDataset`, `MonitoredLoader`, DDP utilities)
`all`	Both backends + torch + albumentations + matplotlib
`dev`	Everything in `all` + pytest, ruff, mypy, mkdocs-material

OpenSlide C library

The openslide extra installs openslide-bin, which provides the OpenSlide C library automatically on most platforms. If that fails, install manually:

Ubuntu/Debian: apt-get install openslide-tools
macOS: brew install openslide

Minimal example

from wsistream.pipeline import PatchPipeline
from wsistream.backends import OpenSlideBackend
from wsistream.tissue import OtsuTissueDetector
from wsistream.sampling import RandomSampler

pipeline = PatchPipeline(
    slide_paths="/path/to/slides",  # directory or list of files
    backend=OpenSlideBackend(),
    tissue_detector=OtsuTissueDetector(),
    sampler=RandomSampler(patch_size=256, num_patches=10),
    pool_size=1,
    patches_per_slide=10,
)

results = list(pipeline)
for result in results:
    print(result.image.shape, result.coordinate.x, result.coordinate.y)

Each iteration yields a PatchResult with the following fields:

Field	Type	Description
`image`	`np.ndarray` or `None`	The patch pixels when `transforms` are used. Shape `(H, W, 3)`, dtype `uint8` (or `float32` after normalization). `None` when `views` are configured.
`views`	`dict[str, np.ndarray]` or `None`	Named multi-view outputs when `views` are configured.
`coordinate`	`PatchCoordinate`	Location in the slide: `x`, `y`, `level`, `patch_size`, `mpp`, `slide_path`.
`tissue_fraction`	`float`	Fraction of the patch region covered by tissue (from the tissue mask), in `[0, 1]`.
`slide_metadata`	`SlideMetadata` or `None`	Dataset-specific metadata (populated when a `DatasetAdapter` is configured).

Key pipeline parameters:

Parameter	Description
`pool_size`	Number of slides kept open simultaneously. Patches are interleaved across the pool via round-robin.
`patches_per_slide`	Maximum patches extracted from one slide before it is closed and replaced by the next.
`patches_per_visit`	Patches read from one slide before advancing to the next in the pool. Higher values improve I/O throughput on network filesystems. Default `1`.
`cycle`	When `True`, slides are re-queued after processing, producing an infinite stream for step-based training.
`replacement`	`"with_replacement"` (default) or `"without_replacement"`. When without, each slide's grid coordinates are consumed at most once per cycle. See Sampling.
`seed`	Seed for all internal RNGs: slide-queue order, sampler, transforms, and crops. Set this instead of seeds on individual transforms for reproducibility.

See Architecture for a full explanation of the pipeline flow.

Visualizing results

from wsistream.viz import plot_patch_grid

patches = [r.image for r in results]
plot_patch_grid(patches, ncols=5, save_path="my_patches.png")

Next steps

Online Patching -- understand the core concept
Architecture -- how the pipeline works internally