top of page

Convolutional Networks

  • 작성자 사진: Shin Yoonah, Yoonah
    Shin Yoonah, Yoonah
  • 2022년 8월 8일
  • 3분 분량

  1. How CNN's build features

  2. Adding Layer

  3. Receptive Field

  4. Pooling

  5. Flattering and fully connected Neural Layers


CNN for Image Classification

CNN is a neural network with special layer


The model classifies an image by taking a past of the image

Each input image will pass it through a serious of convolution layers with filters


Convolution and pooling layers = the first layers used to extract features from an input

These can be thought of as the feature learning layers, the fully connected layers are simply a neural network


How CNN build features?

If you recall the H.O.G feature used sobel kernels to detect vertical and horizontal edge


Looking at the kernels, we see the vertical edge detector kernel looks like a horizontal edge

The horizontal edge looks like a horizontal edge


We can represent H.O.G with a diagram that looks similar to a neural network

We replace the linear function with a convolution

AND, we have the squaring & square root operations


In a CNN, we have neurons but the kernels are learnable parameters

The activation functions in this case RELU are applied to each pixel

(Instead of an activation, the output is an activation map or feature map, similar to a one channel image)


Like the HOG's sobel kernel, each kernel of a CNN will detect a different property of the image

We use multiple kernels analogous to multiple neurons, if we have M kernels, we will have M feature map


For each map, we apply the convolution + RELU

*generally, we will use clear squares to represent the feature or activation maps


Grayscale image can be seen as a 1 channel input, if we have M kernels, each feature map will be a channel

---> Therefore, we will have M output channels


Adding Layers

We can also stack convolutional layers each output channel --> analogous to the neurons

The input of the next layer is equal to the output of the previous layer

Here we have three outputs, the next layer will take three outputs as inputs

If this layer has two outputs, it will output two feature maps


Input Channels

The next layer will apply a convolutional kernel to each input, then add them together

Then apply an activation function

The neurons are replaced with kernels


The previous layer has three output channels

  1. For the first input, we apply the convolution and activation to the output of the first channel

  2. Repeat the process for the second input channel adding it to the activation map

The process is repeated for the next input


We can process color images by having three input channels

---> just like a neural network, we can stack layers using the 3D representation



We can also represent them with these yellow boxes indicating the kernel size and the number of channels


If you recall the sobel kernels looked like they detect vertical and horizontal edges they were trying to detect


Consider a CNN used to see faces

  1. The first layer looks like edges and corners

2. The second layer looks like parts of the face

3. The final layer looks like face


Adding more layers build more complex features


Receptive Field

  1. The size of the region in the input that produces a pixel value in the activation map

  2. The larger the receptive field, the move information the activation map

Increase the receptive field by adding more layers

---> requires less parameters than increasing the size of the kernel


Pooling

It helps to reduce the number of parameters, increases the receptive field while preserving the important features


*Max pooling is the most popular type of pooling

- pooling also makes CNN's more immutable to small changes in the image


Flattering and Fully connected Neural Network

We simply flatten or reshape the output of the feature learning layers and use them as an input to the fully connected layers


Copyright Coursera All rights reserved

Comentarios


bottom of page