Convolutional Networks
- Shin Yoonah, Yoonah
- 2022년 8월 8일
- 3분 분량

How CNN's build features
Adding Layer
Receptive Field
Pooling
Flattering and fully connected Neural Layers
CNN for Image Classification
CNN is a neural network with special layer
The model classifies an image by taking a past of the image

Each input image will pass it through a serious of convolution layers with filters
Convolution and pooling layers = the first layers used to extract features from an input
These can be thought of as the feature learning layers, the fully connected layers are simply a neural network
How CNN build features?
If you recall the H.O.G feature used sobel kernels to detect vertical and horizontal edge
Looking at the kernels, we see the vertical edge detector kernel looks like a horizontal edge

The horizontal edge looks like a horizontal edge
We can represent H.O.G with a diagram that looks similar to a neural network
We replace the linear function with a convolution

AND, we have the squaring & square root operations
In a CNN, we have neurons but the kernels are learnable parameters

The activation functions in this case RELU are applied to each pixel
(Instead of an activation, the output is an activation map or feature map, similar to a one channel image)
Like the HOG's sobel kernel, each kernel of a CNN will detect a different property of the image

We use multiple kernels analogous to multiple neurons, if we have M kernels, we will have M feature map
For each map, we apply the convolution + RELU
*generally, we will use clear squares to represent the feature or activation maps
Grayscale image can be seen as a 1 channel input, if we have M kernels, each feature map will be a channel
---> Therefore, we will have M output channels
Adding Layers
We can also stack convolutional layers each output channel --> analogous to the neurons
The input of the next layer is equal to the output of the previous layer
Here we have three outputs, the next layer will take three outputs as inputs

If this layer has two outputs, it will output two feature maps
Input Channels
The next layer will apply a convolutional kernel to each input, then add them together
Then apply an activation function

The neurons are replaced with kernels
The previous layer has three output channels
For the first input, we apply the convolution and activation to the output of the first channel
Repeat the process for the second input channel adding it to the activation map
The process is repeated for the next input
We can process color images by having three input channels
---> just like a neural network, we can stack layers using the 3D representation

We can also represent them with these yellow boxes indicating the kernel size and the number of channels
If you recall the sobel kernels looked like they detect vertical and horizontal edges they were trying to detect
Consider a CNN used to see faces
The first layer looks like edges and corners

2. The second layer looks like parts of the face

3. The final layer looks like face

Adding more layers build more complex features
Receptive Field
The size of the region in the input that produces a pixel value in the activation map
The larger the receptive field, the move information the activation map
Increase the receptive field by adding more layers
---> requires less parameters than increasing the size of the kernel
Pooling
It helps to reduce the number of parameters, increases the receptive field while preserving the important features
*Max pooling is the most popular type of pooling
- pooling also makes CNN's more immutable to small changes in the image
Flattering and Fully connected Neural Network
We simply flatten or reshape the output of the feature learning layers and use them as an input to the fully connected layers
Copyright Coursera All rights reserved
Comentarios