Fully Connected Neural Network Architecture

This is how to arrange different number of hidden layers and neurons

Neural networks are usually represented represented without the learnable parameters

Hidden layer: 4

Output layer: 1

Make multi-class predictions using neural networks

--> add more neurons to the output layer

The process can be thought of as just replacing the output layer with a SoftMax function

If there has three neurons for three classes, we choose the class according to the index of the neuron that has the largest value

In this case, we have to choose neuron2

Use the following diagram

5 neurons in the output layer, one for each class

Then, add hidden layer

- more than one hidden layer, the neural network is called a deep neural network

- more neurons or more layers may lead to overfitting

The output or activation of each layer is the same dimension as the number of neurons

Each neuron is like a linear classification, therefore each neuron must have the same number of inputs as the previous layer

Let's see how the following neural network makes a prediction in the following layers

- each neuron in the 1st layer has 4 inputs

As there are three neurons, the activation has a dimension of three

- each neuron in the next layer has an input dimension of three

As there is two neurons in the second layer, the output activation had a dimension of two

One way to select a fully connected Neural Network Architecture is to use the validation data

It turns out that deep networks work better, but are hard to train

If you recall, to perform gradient descent to obtain our learning parameters

- We have to calculate the gradient

But the deeper the network, the smaller the gradient gets ---> vanishing gradient

As a result, its harder to train the deeper layer of the network

Sigmoid Function

- One of the main drawbacks with using the sigmoid activation function = the vanishing gradient

- One attraction function is the rectified linear unit function or Relu function for short

Relu

It is only used in the hidden layers

The value of Relu function is 0 when its input is less than zero

If the input Z is larger than 0, the input of the function will equal to its output

- if the input z equals 5, the output equals 5

*Some methods like dropout prevent overfitting, batch normalization to help with training

*Skip connections allow you to train deeper Networks by connecting deeper layers during training

The hidden Layers of Neural Networks replace the kernels is SVM's

Training neural networks is more of an art than a science !!

최근 게시물