top of page

Fully Connected Neural Network Architecture

  • 작성자 사진: Shin Yoonah, Yoonah
    Shin Yoonah, Yoonah
  • 2022년 8월 6일
  • 2분 분량

This is how to arrange different number of hidden layers and neurons


Neural networks are usually represented represented without the learnable parameters

Hidden layer: 4

Output layer: 1


Make multi-class predictions using neural networks

--> add more neurons to the output layer


The process can be thought of as just replacing the output layer with a SoftMax function


If there has three neurons for three classes, we choose the class according to the index of the neuron that has the largest value


In this case, we have to choose neuron2


Use the following diagram

5 neurons in the output layer, one for each class


Then, add hidden layer

- more than one hidden layer, the neural network is called a deep neural network

- more neurons or more layers may lead to overfitting


The output or activation of each layer is the same dimension as the number of neurons


Each neuron is like a linear classification, therefore each neuron must have the same number of inputs as the previous layer


Let's see how the following neural network makes a prediction in the following layers

- each neuron in the 1st layer has 4 inputs

As there are three neurons, the activation has a dimension of three


- each neuron in the next layer has an input dimension of three

As there is two neurons in the second layer, the output activation had a dimension of two


One way to select a fully connected Neural Network Architecture is to use the validation data


It turns out that deep networks work better, but are hard to train

If you recall, to perform gradient descent to obtain our learning parameters

- We have to calculate the gradient


But the deeper the network, the smaller the gradient gets ---> vanishing gradient


As a result, its harder to train the deeper layer of the network


Sigmoid Function

- One of the main drawbacks with using the sigmoid activation function = the vanishing gradient

- One attraction function is the rectified linear unit function or Relu function for short


Relu

It is only used in the hidden layers


The value of Relu function is 0 when its input is less than zero


If the input Z is larger than 0, the input of the function will equal to its output

- if the input z equals 5, the output equals 5


*Some methods like dropout prevent overfitting, batch normalization to help with training

*Skip connections allow you to train deeper Networks by connecting deeper layers during training


The hidden Layers of Neural Networks replace the kernels is SVM's


Training neural networks is more of an art than a science !!


Copyright Coursera



 
 
 

Comments


bottom of page