Convolutional-Neural-Networks(CNNs)¶
Materials¶
Neural Networks Part 8: Image Classification with Convolutional Neural Networks (CNNs)
Definition¶
Convolution Neural Networks are designed to let computer understand the image pattern
Why we do not use ANN to understand Images?¶
- n∗n images will process large weight matrix in ANN calculation
- If one or two pixel shifted, the predicted result will change dramatically in ANN. In other words, ANN do not capture the collective meaning of pixels
Therefore, we want a new approach which have following three properties
- Reduce the number of input nodes a. Achieved by both filters and pooling layers
- Tolerate Small shifts in where the pixels are in the image
- Take advantage of the correlations that we observe in complex images a. Accomplished by the filter (kernel)
Steps in CNNs¶
- Apply Filters (Kernel) to the Input Image a. The intensity (value) of **each pixel value in the filter is determined by Backpropagation **
- Apply filters to calculate (Dot Product between the input and the Filter) new input
- Do step 2 to every pixel to get the new input image (aka Feature Map)
- Use activation function (here use Relu as Example) on feature map
- Pooling on the updated (results from activation function) feature map
a. For example, Max Pooling is simply pick the max value from the filter
b. When the area pixels are match the filter (kernel), Max Pooling result will be high, otherwise will be lower
- Put the result from max pooled layer into ANN (flaten), and do the calculation as normal ANN
Layers in CNN¶
- Input layer
- Convo layer (Convo + ReLU)
- Pooling layer
- Fully Connected (FC) layer
- Softmax / logistic layer
- Output layer
Hyperparameters in filters (kernels)¶
- Padding
- The technique involves adding zeros symmetrically around the edges of an input.
- Padding is often necessary when the kernel extends beyond the activation map. Padding conserves data at the borders of activation maps, which leads to better performance, and it can help preserve the input's spatial size, which allows an architecture designer to build deeper, higher performing networks.
- Kernel Size
- Often also referred to as filter size, refers to the dimensions of the sliding window over the input.
- Small kernel sizes are able to extract a much larger amount of information containing highly local features from the input, generally, smaller kernel sizes lead to better performance for the image classification.
- Large kernels are better suited to extract features that are larger
- Often also referred to as filter size, refers to the dimensions of the sliding window over the input.
- Stride
- indicates how many pixels the kernel should be shifted over at a time.
- The impact stride has on a CNN is similar to kernel size.
- As stride is decreased, more features are learned because more data is extracted, which also leads to larger output layers.
- One responsibility of the architecture designer is to ensure that the kernel slides across the input symmetrically when implementing a CNN.