Convolutional Networks

Shin Yoonah, Yoonah
2022년 8월 8일
3분 분량

How CNN's build features
Adding Layer
Receptive Field
Pooling
Flattering and fully connected Neural Layers

CNN for Image Classification

CNN is a neural network with special layer

The model classifies an image by taking a past of the image

Each input image will pass it through a serious of convolution layers with filters

Convolution and pooling layers = the first layers used to extract features from an input

These can be thought of as the feature learning layers, the fully connected layers are simply a neural network

How CNN build features?

If you recall the H.O.G feature used sobel kernels to detect vertical and horizontal edge

Looking at the kernels, we see the vertical edge detector kernel looks like a horizontal edge

The horizontal edge looks like a horizontal edge

We can represent H.O.G with a diagram that looks similar to a neural network

We replace the linear function with a convolution

AND, we have the squaring & square root operations

In a CNN, we have neurons but the kernels are learnable parameters

The activation functions in this case RELU are applied to each pixel

(Instead of an activation, the output is an activation map or feature map, similar to a one channel image)

Like the HOG's sobel kernel, each kernel of a CNN will detect a different property of the image

We use multiple kernels analogous to multiple neurons, if we have M kernels, we will have M feature map

For each map, we apply the convolution + RELU

*generally, we will use clear squares to represent the feature or activation maps

Grayscale image can be seen as a 1 channel input, if we have M kernels, each feature map will be a channel

---> Therefore, we will have M output channels

Adding Layers

We can also stack convolutional layers each output channel --> analogous to the neurons

The input of the next layer is equal to the output of the previous layer

Here we have three outputs, the next layer will take three outputs as inputs

If this layer has two outputs, it will output two feature maps

Input Channels

The next layer will apply a convolutional kernel to each input, then add them together

Then apply an activation function

The neurons are replaced with kernels

The previous layer has three output channels

For the first input, we apply the convolution and activation to the output of the first channel
Repeat the process for the second input channel adding it to the activation map

The process is repeated for the next input

We can process color images by having three input channels

---> just like a neural network, we can stack layers using the 3D representation

We can also represent them with these yellow boxes indicating the kernel size and the number of channels

If you recall the sobel kernels looked like they detect vertical and horizontal edges they were trying to detect

Consider a CNN used to see faces

The first layer looks like edges and corners

2. The second layer looks like parts of the face

3. The final layer looks like face

Adding more layers build more complex features

Receptive Field

The size of the region in the input that produces a pixel value in the activation map
The larger the receptive field, the move information the activation map

Increase the receptive field by adding more layers

---> requires less parameters than increasing the size of the kernel

Pooling

It helps to reduce the number of parameters, increases the receptive field while preserving the important features

*Max pooling is the most popular type of pooling

- pooling also makes CNN's more immutable to small changes in the image

Flattering and Fully connected Neural Network

We simply flatten or reshape the output of the feature learning layers and use them as an input to the fully connected layers