# **Deep learning for image analysis with PyTorch**

#### Fernando Cervantes, Systems Analyst I, Imaging Solutions, Research IT
#### fernando.cervantes@jax.org    (slack) @fernando.cervantes

## **4 Define the optimization problem**

The performance of an artificial neural network depends on the architecture of the network and the value of its parameters $\theta$.<br>
These paramateres are fitted (optimized) through a process known as *training*.<br>
The *training* process is defined as an optimization problem which target is to minimize a loss function $L$.<br>
For tasks where a set of examples ($X$) and their expected output/ground-truth ($Y$) are available the fitting process is known as *supervised training*.<br>
On those cases, the target loss is the *Training Error* which is computed as the average loss on the training examples as follows.

$\hat{\theta} = \text{arg}\,\min\limits_{\theta}\, L(X, Y, \theta) = \text{arg}\,\min\limits_{\theta}\, \frac{1}{N}\sum\limits_{i=1}^{N}L(x_i, y_i, \theta)$,

### 4.1 _Loss functions_

The two most common tasks for neural networks are regression and classification.<br>
Regression consists of the prediction of a real valued target, while classification is the prediction of a category target.

Regression error $L(x, y, \theta) = \left(y - f_\theta(x)\right)^2$ [*Mean Squared Error (MSE)*](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html?highlight=mse#torch.nn.MSELoss)<br>
Classification error $L(x, y, \theta) = -\sum\limits_{g=1}^G y_g \log\left(f_{\theta,g}(x)\right)$ [*Cross-Entropy (CE)*](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html?highlight=cross%20entropy#torch.nn.CrossEntropyLoss)/*Kullbackâ€“Leibler divergence*<br>


Special case of classification of two categories,  $L(x, y, \theta) = -y \log\left(f_\theta(x)\right) - (1-y) \log\left(1-f_\theta(x)\right)$ [*Binary Cross-Entropy (BCE)*](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html?highlight=binary%20cross%20entropy)

In [1]:
import torch.nn as nn

In [2]:
criterion = nn.MSELoss() # Mean squared error

In [3]:
criterion = nn.CrossEntropyLoss()

In [4]:
criterion = nn.BCELoss() # Binary Cross-Entropy

### 4.2 _Validation loss_

The *training error* is not allways a good estimation of the *test error*.<br>
For that reason, training a neural network is perfomed under a cross-validation scheme.<br>
In these schemes, the *validation error* is used to estimate the *test error* since the *validation examples* are not used to train the model.<br>
Then, the best model is chosen as the one that minimizes the *validation error*.

### 4.3 _Optimization algorithms_

*Gradient-based* optimization algorithms have been the preferred ones to fit neural network's parameters.<br>
*Gradient descent* (GD) is one of the simplest gradient-based algotithms, and is defined as follows

$\theta^{t+1} = \theta^t - \alpha \nabla L(X, Y, \theta^t)$.<br>

***
The *Stochastic Gradient Descent* (SGD) allows to train the network parameters *on-line* for large datasets.<br>
This means that the optimization occurs after processing an individual batch instead of waiting until all training examples have been processed.<br>
This method updates the parameters of the model as follows<br>
$\theta^{t+1} = \theta^t - \alpha \nabla L(\{(x_i, y_i)\}_i^B, \theta^t)$,<br>
where <br>
- $\{(x_i, y_i)\}_i^B$ is a mini batch of training examples and their corresponding ground-truth, <br>
- $\nabla L(\{(x_i, y_i)\}_i^B, \theta)$ is the gradient computed for the current $\theta$ at time $t$, and <br>
- $\alpha$ is the step size (*learning rate*).

***
#### The difference between SGD and GD
In SGD ocurrs after each mini-batch, and in GD the update is computed after all examples have been processed.

***
In PyTorch, there a wide variety of [optimizers](https://pytorch.org/docs/stable/optim.html) including [*SGD*](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD) and [*ADAM*](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam).<br>
All those optimizers are found in the *torch.optim* module

In [304]:
import torch.optim as optim

In [305]:
optim.SGD

torch.optim.sgd.SGD

In [306]:
optim.Adam

torch.optim.adam.Adam