# **Deep learning for image analysis with PyTorch**

#### Fernando Cervantes, Systems Analyst I, Imaging Solutions, Research IT
#### fernando.cervantes@jax.org    (slack) @fernando.cervantes

## **1 Introduction**

### 1.1 _Artificial neural networks_

An artificial neural network (ANN) is a machine learning model composed of interconnected computing units, also called _neurons_.<br>
ANN are commonly implemented using linear operations performed by each neuron, followed by a nonlinear operation, also known as _activation function_ ($\sigma$), as the neuron output.<br>
These models have demonstrated to be universal function approximators and have been successfully used in a wide variety of _artificial intelligence_ (AI) problems.

$N(x) = \sigma(\sum{w^{N}_{i} x_{i}} + b^{N})$

The most common ANN model is the _multilayer perceptron_. This model is an ensemble of neurons arranged in a layered architecture.<br>
Each layer receives the output of the previous layer as input and is processed thought its corresponding neurons. <br>
This way the layerâ€™s neurons _see_ the same information but processes it differently.<br>
Also, the same activation function is used for the layers output.

More complex tasks have been addressed by using more complex models. In the case of multilayer perceptrons, the complexity of the model is increased by adding more layers.

![Image](https://www.mdpi.com/applsci/applsci-09-05507/article_deploy/html/images/applsci-09-05507-g003-550.jpg)

### 1.2 _Deep learning models_

A model that constructs a hierarchical representation from previous layers is considered a deep learning model.<br>
It can consist of tens of layers, to hundreds or thousands of them. <br>
Deep learning has been widely studied in recent years thanks to the increase of computing capacity of computers and GPUs.

By using GPUs, operations such as the linear convolution, can be used within the nerons of an ANN.<br>
These networks are referred as _Convolutional Neural Networks_ (CNN). <br>
For image analaysis, CNN are useful because they can keep the spatial context of images though the subsequent layers.

### 1.2.1 _Operations_

In deep learning models, each neuron can implement any operation over its respective input.<br>
The most common operations are
1. Convolution
2. Pooling / Downsampling (average, min, max)
3. Linear / Fully connected (used on last layers)

#### 1.2.1.1 Convolution

This is a linear operation applied locally to a neighborhood of each location of the input image. <br>
Convolutions can be applied in multiple dimensions, however, in this workshop only two dimensional convolutions are used for image analysis.<br>
A two dimensonal convolution applies the same _kernel_ (set of weights) to every pixel in the image. <br>
The size of the kernel, the spacing between pixels, and the number of output channels generated are some of the customizable parameters of this operation.

#### 1.2.1.2 Pooling

These operations reduce spatial information in each pixel's neighborhood.<br>
The pooling operations can be used to downsample the resolution of the latent feature maps (layer's outputs). <br>
The most used operations applied to the pixel's neighborhood are the average, min, and max pooling.

#### 1.2.1.3 Linear

This operation applies a _Matrix vector_ multiplication and addition of a _bias_ term.<br>
The linear operation is used on the last layers of a CNN when the spatial information has been condensated to perform high level computer vision tasks.

### 1.2.2 _Activation functions_

After each linear operation (or convolution), a nonlinear function can be applied to break the linearity of the neural network model.<br>
This gives the model flexibility to address more complex tasks.<br>
The most used activation functions are based on the _Rectified Linear Unit_ (ReLU) that is differentiable and inexpensive to compute.<br>

![Image](https://pytorch.org/docs/stable/_images/ReLU.png)

Other nonlienar functions can be used, such as sigmoid, hyperbolic tangents, and their variants.<br>
However, the simplest activation functions are preferred when a high number of layers are used to reduce the model's overhead.

![Image](https://pytorch.org/docs/stable/_images/Sigmoid.png)

### 1.3 _Nerual network architecture_

The performance of a deep learning model depends highly in the topology or _architecture_ of its network. <br>
The network's architecture comprises the information to build the nerual network, such as the number of layers, the number of neurons, the size of the kernels for the convution operations, etc.

In recent years, the architecture of a network also defines if especial operation (drop out, batch normalization, residual connections) are applied between each layer.

It is common to find illustrated the architecture of the neural network in research papers.<br>
The following figure is the representation of the LeNet CNN architecture.<br>

![Image](https://pytorch.org/tutorials/_images/mnist.png)

In the LeNet illustration, what is actually shown are the input and outputs of each layer. <br>
It is also common to find the architecture as a table defining in order the type of operation, the parameters, and the shape of the input/output of each layer.<br>
The following table is the architecture of the Inception v3 CNN.

![Image](https://pytorch.org/assets/images/inception_v3.png)

More illustrative representations of the architectures show the connections and type of operations, when the output of each layer could be more difficult to represent.<br>
The next illustration is the same Inception V3 model.

![Image](https://production-media.paperswithcode.com/methods/inceptionv3onc--oview_vjAbOfw.png)