{ "cells": [ { "cell_type": "markdown", "id": "6d9fb03a", "metadata": {}, "source": [ "# Integration of ZarrDataset with Tensorflow Datasets" ] }, { "cell_type": "code", "execution_count": null, "id": "6468a31a", "metadata": {}, "outputs": [], "source": [ "import zarrdataset as zds\n", "import tensorflow as tf" ] }, { "cell_type": "code", "execution_count": null, "id": "b25034b5", "metadata": {}, "outputs": [], "source": [ "# These are images from the Image Data Resource (IDR) \n", "# https://idr.openmicroscopy.org/ that are publicly available and were \n", "# converted to the OME-NGFF (Zarr) format by the OME group. More examples\n", "# can be found at Public OME-Zarr data (Nov. 2020)\n", "# https://www.openmicroscopy.org/2020/11/04/zarr-data.html\n", "\n", "filenames = [\n", " \"https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0073A/9798462.zarr\"\n", "]" ] }, { "cell_type": "code", "execution_count": null, "id": "8fb5811b", "metadata": {}, "outputs": [], "source": [ "import random\n", "import numpy as np\n", "\n", "# For reproducibility\n", "np.random.seed(478963)\n", "random.seed(478965)" ] }, { "cell_type": "markdown", "id": "3ebb3cca", "metadata": {}, "source": [ "## Extracting patches of size 1024x1024 pixels from a Whole Slide Image (WSI)" ] }, { "cell_type": "markdown", "id": "5064cd2e", "metadata": {}, "source": [ "Sample the image randomly using a [Blue Noise](https://blog.demofox.org/2017/10/20/generating-blue-noise-sample-points-with-mitchells-best-candidate-algorithm/) sampling." ] }, { "cell_type": "code", "execution_count": null, "id": "16e6278d", "metadata": {}, "outputs": [], "source": [ "patch_size = dict(Y=1024, X=1024)\n", "patch_sampler = zds.BlueNoisePatchSampler(patch_size=patch_size)" ] }, { "cell_type": "markdown", "id": "38bdc472", "metadata": {}, "source": [ "Create a dataset from the list of filenames. All those files should be stored within their respective group \"0\".\n", "\n", "Also, specify that the axes order in the image is Time-Channel-Depth-Height-Width (TCZYX), so the data can be handled correctly" ] }, { "cell_type": "code", "execution_count": null, "id": "e9b1137d", "metadata": {}, "outputs": [], "source": [ "image_specs = zds.ImagesDatasetSpecs(\n", " filenames=filenames,\n", " data_group=\"3\",\n", " source_axes=\"TCZYX\",\n", ")\n", "\n", "# A list with a labeled image, for the single image in the dataset, is passed as `filenames` argument.\n", "labels_specs = zds.LabelsDatasetSpecs(\n", " filenames=[np.ones(1)],\n", " source_axes=\"L\",\n", ")\n", "\n", "my_dataset = zds.ZarrDataset([image_specs, labels_specs],\n", " patch_sampler=patch_sampler,\n", " shuffle=True)" ] }, { "cell_type": "markdown", "id": "cdcb2ca5", "metadata": {}, "source": [ "## Create a Tensoflow Dataset from the ZarrDataset object" ] }, { "cell_type": "markdown", "id": "0f554555", "metadata": {}, "source": [ "When PyTorch is not present in the system, ZarrDataset will still work as a python generator.\n", "\n", "This makes it easy to connect ZarrDataset with `tensorflow.data.Dataset` and create an iterable dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "3173e008", "metadata": {}, "outputs": [], "source": [ "my_dataloader = tf.data.Dataset.from_generator(\n", " my_dataset.__iter__,\n", " output_signature=(tf.TensorSpec(shape=(1, 3, 1, None, None),\n", " dtype=tf.float32),\n", " tf.TensorSpec(shape=(1,),\n", " dtype=tf.int64)))\n", "\n", "batched_dataset = my_dataloader.batch(4)" ] }, { "cell_type": "markdown", "id": "7ce458ae", "metadata": {}, "source": [ "This data loader can be used within Tensorflow training pipelines." ] }, { "cell_type": "code", "execution_count": null, "id": "e9efb890", "metadata": {}, "outputs": [], "source": [ "samples = []\n", "for i, (sample, target) in enumerate(my_dataloader):\n", " samples.append(np.moveaxis(sample[0, :, 0], 0, -1))\n", "\n", " print(f\"Sample {i+1} with size {sample.shape}, and target {target}\")\n", "\n", " if i >= 4:\n", " # Take only five samples for illustration purposes\n", " break\n", "\n", "samples_stack = np.hstack(samples)" ] }, { "cell_type": "code", "execution_count": null, "id": "633c34c1", "metadata": {}, "outputs": [], "source": [ "samples_stack.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "572112e9", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "plt.imshow(samples_stack.astype(np.uint8))\n", "plt.show()" ] } ], "metadata": { "execution": { "timeout": 600 }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }