{ "cells": [ { "cell_type": "markdown", "id": "38952883", "metadata": {}, "source": [ "# Labeled dataset loading with ZarrDataset" ] }, { "cell_type": "markdown", "id": "45b86cee", "metadata": {}, "source": [ "Import the \"zarrdataset\" package" ] }, { "cell_type": "code", "execution_count": null, "id": "390d9705", "metadata": {}, "outputs": [], "source": [ "import zarrdataset as zds\n", "import zarr" ] }, { "cell_type": "markdown", "id": "b31da04e", "metadata": {}, "source": [ "Load data stored on S3 storage" ] }, { "cell_type": "code", "execution_count": null, "id": "fcdf2a77", "metadata": {}, "outputs": [], "source": [ "# These are images from the Image Data Resource (IDR) \n", "# https://idr.openmicroscopy.org/ that are publicly available and were \n", "# converted to the OME-NGFF (Zarr) format by the OME group. More examples\n", "# can be found at Public OME-Zarr data (Nov. 2020)\n", "# https://www.openmicroscopy.org/2020/11/04/zarr-data.html\n", "\n", "filenames = [\"https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0073A/9798462.zarr\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "26c76379", "metadata": {}, "outputs": [], "source": [ "import random\n", "import numpy as np\n", "\n", "# For reproducibility\n", "np.random.seed(478963)\n", "random.seed(478965)" ] }, { "cell_type": "markdown", "id": "f4666dd4", "metadata": {}, "source": [ "## Extract pair of patches of size 512x512 pixels and their respective label from a labeled Whole Slide Image (WSI)" ] }, { "cell_type": "markdown", "id": "20213f75", "metadata": {}, "source": [ "LabeledZarrDataset can retrieve the associated label to each patch extracted as a pair of input and target samples." ] }, { "cell_type": "code", "execution_count": null, "id": "168d7af3", "metadata": {}, "outputs": [], "source": [ "patch_size = dict(Y=512, X=512)\n", "patch_sampler = zds.PatchSampler(patch_size=patch_size)" ] }, { "cell_type": "markdown", "id": "71cf7ced", "metadata": {}, "source": [ "### Weakly labeled exmaple" ] }, { "cell_type": "markdown", "id": "d6f1b0c1", "metadata": {}, "source": [ "Weakly labeled means that there is a few labels (or only one) associated to the whole image.\n", "\n", "These labels could be loaded directly from a list or arrays." ] }, { "cell_type": "code", "execution_count": null, "id": "6f8977d6", "metadata": {}, "outputs": [], "source": [ "image_specs = zds.ImagesDatasetSpecs(\n", " filenames=filenames,\n", " data_group=\"0\",\n", " source_axes=\"TCZYX\",\n", ")\n", "\n", "# The LabelsDatasetSpecs class can be used as guide to include the minimum specifications to load the labels from the dataset.\n", "# This example uses a single label for the whole image.\n", "labels_specs = zds.LabelsDatasetSpecs(\n", " filenames=[np.array([1])],\n", " source_axes=\"L\",\n", ")\n", "\n", "my_dataset = zds.ZarrDataset([image_specs, labels_specs],\n", " patch_sampler=patch_sampler,\n", " shuffle=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "b34fc47d", "metadata": {}, "outputs": [], "source": [ "for i, (sample, label) in enumerate(my_dataset):\n", " print(f\"Sample {i}, patch size: {sample.shape}, label: {label}\")\n", "\n", " # Obtain only 5 samples\n", " if i >= 4:\n", " break" ] }, { "cell_type": "markdown", "id": "70799f79", "metadata": {}, "source": [ "### Densely labeled example" ] }, { "cell_type": "markdown", "id": "a4bfa8b2", "metadata": {}, "source": [ "Densely labeled images contain more spatial information about the image.\n", "\n", "This could be the case when pixels of the image belong to a specific class, like in object segmentation problems.\n", "\n", "The image label does not need to be of the same size of the image, since ZarrDataset will match the coordinates of the image and the label." ] }, { "cell_type": "code", "execution_count": null, "id": "72814c5b", "metadata": {}, "outputs": [], "source": [ "from skimage import color, filters, morphology\n", "\n", "z_img = zarr.open(filenames[0], mode=\"r\")\n", "\n", "im_gray = color.rgb2gray(z_img[\"4\"][0, :, 0], channel_axis=0)\n", "thresh = filters.threshold_otsu(im_gray)\n", "\n", "labels = im_gray > thresh\n", "labels = morphology.remove_small_objects(labels == 0, min_size=16 ** 2,\n", " connectivity=2)\n", "labels = morphology.remove_small_holes(labels, area_threshold=128)\n", "labels = morphology.binary_erosion(labels, morphology.disk(3))\n", "labels = morphology.binary_dilation(labels, morphology.disk(16))" ] }, { "cell_type": "markdown", "id": "564d61f4", "metadata": {}, "source": [ "The label image can be something like the following" ] }, { "cell_type": "code", "execution_count": null, "id": "aeff76cb", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "plt.subplot(1, 2, 1)\n", "plt.imshow(np.moveaxis(z_img[\"4\"][0, :, 0], 0, -1))\n", "plt.subplot(1, 2, 1)\n", "plt.imshow(labels)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "786c16a3", "metadata": {}, "source": [ "In this case, the labels are passed as a list of Numpy NDArrays, but these could be also stored in Zarr, either locally or in a remote S3 bucket." ] }, { "cell_type": "code", "execution_count": null, "id": "818d4bd1", "metadata": {}, "outputs": [], "source": [ "image_specs = zds.ImagesDatasetSpecs(\n", " filenames=filenames,\n", " data_group=\"0\",\n", " source_axes=\"TCZYX\",\n", ")\n", "\n", "# A list with a labeled image, for the single image in the dataset, is passed as `filenames` argument.\n", "labels_specs = zds.LabelsDatasetSpecs(\n", " filenames=[labels],\n", " source_axes=\"YX\",\n", ")\n", "\n", "my_dataset = zds.ZarrDataset([image_specs, labels_specs],\n", " patch_sampler=patch_sampler,\n", " shuffle=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "a28ea4c1", "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(3, 6)\n", "for i, (sample, label) in enumerate(my_dataset):\n", " print(f\"Sample {i}, patch size: {sample.shape}, label: {label.shape}\")\n", "\n", " ax[i // 3, 2 * (i % 3)].imshow(sample[0, :, 0].transpose(1, 2, 0))\n", " ax[i // 3, 2 * (i % 3)].set_title(f\"Image {i}\")\n", " ax[i // 3, 2 * (i % 3)].axis(\"off\")\n", "\n", " ax[i // 3, 2 * (i % 3) + 1].imshow(label)\n", " ax[i // 3, 2 * (i % 3) + 1].set_title(f\"Label {i}\")\n", " ax[i // 3, 2 * (i % 3) + 1].axis(\"off\")\n", "\n", " # Obtain only 9 samples\n", " if i >= 8:\n", " break\n", "\n", "plt.show()" ] } ], "metadata": { "execution": { "timeout": 600 }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }