{ "cells": [ { "cell_type": "markdown", "id": "96bfc62e", "metadata": {}, "source": [ "# Loading patches/windows from masked regions of images with ZarrDataset" ] }, { "cell_type": "markdown", "id": "bf1a1211", "metadata": {}, "source": [ "Import the \"zarrdataset\" package" ] }, { "cell_type": "code", "execution_count": null, "id": "99af62f5", "metadata": {}, "outputs": [], "source": [ "import zarrdataset as zds\n", "import zarr" ] }, { "cell_type": "markdown", "id": "0db1dc86", "metadata": {}, "source": [ "Load data stored on S3 storage" ] }, { "cell_type": "code", "execution_count": null, "id": "87f76520", "metadata": {}, "outputs": [], "source": [ "# These are images from the Image Data Resource (IDR) \n", "# https://idr.openmicroscopy.org/ that are publicly available and were \n", "# converted to the OME-NGFF (Zarr) format by the OME group. More examples\n", "# can be found at Public OME-Zarr data (Nov. 2020)\n", "# https://www.openmicroscopy.org/2020/11/04/zarr-data.html\n", "\n", "filenames = [\"https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0073A/9798462.zarr\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "3eb2893d", "metadata": {}, "outputs": [], "source": [ "import random\n", "import numpy as np\n", "\n", "# For reproducibility\n", "np.random.seed(478963)\n", "random.seed(478965)" ] }, { "cell_type": "code", "execution_count": null, "id": "93d60e69", "metadata": {}, "outputs": [], "source": [ "z_img = zarr.open(filenames[0], mode=\"r\")\n", "z_img[\"0\"].info" ] }, { "cell_type": "code", "execution_count": null, "id": "56995806", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "plt.imshow(np.moveaxis(z_img[\"4\"][0, :, 0], 0, -1))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "ed0ec99c", "metadata": {}, "source": [ "## Define a mask from where patches will be extracted" ] }, { "cell_type": "code", "execution_count": null, "id": "cf6f24d0", "metadata": {}, "outputs": [], "source": [ "mask = np.array([\n", " [0, 0, 0, 0],\n", " [0, 0, 1, 0],\n", " [0, 1, 0, 0],\n", "], dtype=bool)" ] }, { "cell_type": "markdown", "id": "a4d9c652", "metadata": {}, "source": [ "ZarrDataset will match the size of the mask t the size of the image that is being sampled.\n", "\n", "For that reason, it is not necessary for the mask to be of the same size of the image." ] }, { "cell_type": "code", "execution_count": null, "id": "04f1abea", "metadata": {}, "outputs": [], "source": [ "_, d, _, h, w = z_img[\"4\"].shape\n", "m_h, m_w = mask.shape\n", "\n", "factor_h = h / m_h\n", "factor_w = w / m_w\n", "\n", "plt.imshow(np.moveaxis(z_img[\"4\"][0, :, 0], 0, -1))\n", "\n", "sampling_region = np.array([\n", " [0, 0],\n", " [0, factor_w],\n", " [factor_h, factor_w],\n", " [factor_h, 0],\n", " [0, 0]\n", "])\n", "\n", "for m_y, m_x in zip(*np.nonzero(mask)):\n", " offset_y = m_y * factor_h\n", " offset_x = m_x * factor_w\n", " plt.plot(sampling_region[:, 1] + offset_x,\n", " sampling_region[:, 0] + offset_y)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "4f545269", "metadata": {}, "source": [ "## Extract patches of size 512x512 pixels from masked regiosn of a Whole Slide Image (WSI)" ] }, { "cell_type": "markdown", "id": "fa72eceb", "metadata": {}, "source": [ "Sample the image uniformly in a squared grid pattern using a `PatchSampler`" ] }, { "cell_type": "code", "execution_count": null, "id": "1bd5a11f", "metadata": {}, "outputs": [], "source": [ "patch_size = dict(Y=512, X=512)\n", "patch_sampler = zds.PatchSampler(patch_size=patch_size)" ] }, { "cell_type": "markdown", "id": "37f87569", "metadata": {}, "source": [ "Use the ZarrDataset class to enable extraction of samples from masked regions by specifying two modalities: images, and masks.\n", "\n", "Enable sampling patched from random locations with `shuffle=True`" ] }, { "cell_type": "code", "execution_count": null, "id": "3674fdb6", "metadata": {}, "outputs": [], "source": [ "image_specs = zds.ImagesDatasetSpecs(\n", " filenames=filenames,\n", " data_group=\"0\",\n", " source_axes=\"TCZYX\",\n", ")\n", "\n", "# Use the MasksDatasetSpecs to add the specifications of the masks.\n", "# Filenames can receive different types of variables, in this case a list with a single mask for the only image in image_specs.\n", "masks_specs = zds.MasksDatasetSpecs(\n", " filenames=[mask],\n", " source_axes=\"YX\",\n", ")\n", "\n", "my_dataset = zds.ZarrDataset([image_specs, masks_specs],\n", " patch_sampler=patch_sampler,\n", " draw_same_chunk=False,\n", " shuffle=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "c85d3f9d", "metadata": {}, "outputs": [], "source": [ "ds_iterator = iter(my_dataset)" ] }, { "cell_type": "code", "execution_count": null, "id": "917d95bb", "metadata": {}, "outputs": [], "source": [ "sample = next(ds_iterator)\n", "type(sample), sample.shape, sample.dtype" ] }, { "cell_type": "code", "execution_count": null, "id": "4b555dda", "metadata": {}, "outputs": [], "source": [ "plt.imshow(np.moveaxis(sample[0, :, 0], 0, -1))\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "2d57d73c", "metadata": {}, "outputs": [], "source": [ "samples = []\n", "for i, sample in enumerate(my_dataset):\n", " samples.append(np.pad(np.moveaxis(sample[0, :, 0], 0, -1),((1, 1), (1, 1), (0, 0))))\n", "\n", " # Obtain only 5 samples\n", " if i >= 4:\n", " break\n", "\n", "grid_samples = np.hstack(samples)" ] }, { "cell_type": "code", "execution_count": null, "id": "99157b0d", "metadata": {}, "outputs": [], "source": [ "plt.imshow(grid_samples)\n", "plt.show()" ] } ], "metadata": { "execution": { "timeout": 600 }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }