{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b813b87c",
   "metadata": {},
   "source": [
    "# Basic ZarrDataset usage example"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecd3e915",
   "metadata": {},
   "source": [
    "Import the \"zarrdataset\" package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55f81521",
   "metadata": {},
   "outputs": [],
   "source": [
    "import zarrdataset as zds\n",
    "import zarr"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66b43467",
   "metadata": {},
   "source": [
    "Load data stored on S3 storage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "057d034d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# These are images from the Image Data Resource (IDR) \n",
    "# https://idr.openmicroscopy.org/ that are publicly available and were \n",
    "# converted to the OME-NGFF (Zarr) format by the OME group. More examples\n",
    "# can be found at Public OME-Zarr data (Nov. 2020)\n",
    "# https://www.openmicroscopy.org/2020/11/04/zarr-data.html\n",
    "\n",
    "filenames = [\"https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0073A/9798462.zarr\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a32081be",
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "import numpy as np\n",
    "\n",
    "# For reproducibility\n",
    "np.random.seed(478963)\n",
    "random.seed(478965)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d888d310",
   "metadata": {},
   "source": [
    "Inspect the image to sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b99506cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "z_img = zarr.open(filenames[0], mode=\"r\")\n",
    "z_img[\"0\"].info"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ff83a85",
   "metadata": {},
   "source": [
    "Display a downsampled version of the image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81eecf24",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.imshow(np.moveaxis(z_img[\"5\"][0, :, 0], 0, -1))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c56e97f9",
   "metadata": {},
   "source": [
    "## Retrieving whole images"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21ff2662",
   "metadata": {},
   "source": [
    "Create a ZarrDataset to handle the image dataset instead of opening all the dataset images by separate and hold them in memory until they are not used anymore."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1be221cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_dataset = zds.ZarrDataset()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2698879",
   "metadata": {},
   "source": [
    "Start by retrieving whole images, from a subsampled (pyramid) group (e.g. group 6) within the zarr image file, instead the full resolution image at group \"0\".\n",
    "The source array axes should be specified in order to handle images properly, in this case Time-Channel-Depth-Height-Width (TCZYX)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa6ed3a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_dataset.add_modality(\n",
    "  modality=\"image\",\n",
    "  filenames=filenames,\n",
    "  source_axes=\"TCZYX\",\n",
    "  data_group=\"6\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a54a6c84",
   "metadata": {},
   "source": [
    "The ZarrDataset class can be used as a Python's generator, and can be accessed by `iter` and subsequently `next` operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3f45216",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_iterator = iter(my_dataset)\n",
    "ds_iterator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2bc9b2cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "sample = next(ds_iterator)\n",
    "\n",
    "print(type(sample), sample.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f58ad14",
   "metadata": {},
   "source": [
    "Compare the shape of the retreived sample with the shape of the original image in group \"6\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65ee053d",
   "metadata": {},
   "outputs": [],
   "source": [
    "z_img[\"6\"].info"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30d9074b",
   "metadata": {},
   "source": [
    "## Extracting patches of size 512x512 pixels from a Whole Slide Image (WSI)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b7b40c1",
   "metadata": {},
   "source": [
    "The PatchSampler class can be used along with ZarrDataset to retrieve patches from WSIs without having to tiling them in a pre-process step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "577f4551",
   "metadata": {},
   "outputs": [],
   "source": [
    "patch_size = dict(Y=512, X=512)\n",
    "patch_sampler = zds.PatchSampler(patch_size=patch_size)\n",
    "\n",
    "patch_sampler"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40265520",
   "metadata": {},
   "source": [
    "Create a new dataset using the ZarrDataset class, and pass the PatchSampler as `patch_sampler` argument.\n",
    "Because patches are being exracted instead of whole images, the full resolution image at group \"0\" can be used as input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18becce5",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_dataset = zds.ZarrDataset(patch_sampler=patch_sampler)\n",
    "\n",
    "my_dataset.add_modality(\n",
    "  modality=\"image\",\n",
    "  filenames=filenames,\n",
    "  source_axes=\"TCZYX\",\n",
    "  data_group=\"0\"\n",
    ")\n",
    "\n",
    "my_dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a25a19f4",
   "metadata": {},
   "source": [
    "Create a generator from the dataset object and extract some patches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "107f41f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_iterator = iter(my_dataset)\n",
    "\n",
    "sample = next(ds_iterator)\n",
    "type(sample), sample.shape, sample.dtype\n",
    "\n",
    "sample = next(ds_iterator)\n",
    "type(sample), sample.shape, sample.dtype\n",
    "\n",
    "sample = next(ds_iterator)\n",
    "type(sample), sample.shape, sample.dtype"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89d742d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(np.moveaxis(sample[0, :, 0], 0, -1))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76b65598",
   "metadata": {},
   "source": [
    "## Using ZarrDataset in a for loop"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12ee7813",
   "metadata": {},
   "source": [
    "ZarrDatasets can be used as generators, for example in for loops"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "004dcda0",
   "metadata": {},
   "outputs": [],
   "source": [
    "samples = []\n",
    "for i, sample in enumerate(my_dataset):\n",
    "    samples.append(np.moveaxis(sample[0, :, 0], 0, -1))\n",
    "\n",
    "    if i >= 4:\n",
    "        # Take only five samples for illustration purposes\n",
    "        break\n",
    "\n",
    "samples_stack = np.hstack(samples)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f2d8eac",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(samples_stack)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8eebaf4a",
   "metadata": {},
   "source": [
    "## Create a ZarrDataset with all the dataset specifications.\n",
    "\n",
    "Use a dictionary (or a list of them for multiple modalities) to define the dataset specifications.\n",
    "Alternatively, use a list of DatasetSpecs (or derived classes) to define the dataset specifications that ZarrDataset requires.\n",
    "\n",
    "For example, `ImagesDatasetSpecs` can be used to define an _image_ data modality. Other pre-defined modalities are `LabelsDatasetSpecs` for _labels_, and `MaskDatasetSpecs` for _masks_."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b42b3e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "image_specs = zds.ImagesDatasetSpecs(\n",
    "  filenames=filenames,\n",
    "  data_group=\"0\",\n",
    "  source_axes=\"TCZYX\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecd0869e",
   "metadata": {},
   "source": [
    "Also, try sampling patches from random locations by setting `shuffle=True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e00b1347",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_dataset = zds.ZarrDataset(dataset_specs=[image_specs],\n",
    "                             patch_sampler=patch_sampler,\n",
    "                             shuffle=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1c82bc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "samples = []\n",
    "for i, sample in enumerate(my_dataset):\n",
    "    samples.append(np.moveaxis(sample[0, :, 0], 0, -1))\n",
    "\n",
    "    if i >= 4:\n",
    "        # Take only five samples for illustration purposes\n",
    "        break\n",
    "\n",
    "samples_stack = np.hstack(samples)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "278e0453",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(samples_stack)\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "execution": {
   "timeout": 600
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}