Fully Connected Neural Network Architecture
- Shin Yoonah, Yoonah
- 2022년 8월 6일
- 2분 분량

This is how to arrange different number of hidden layers and neurons
Neural networks are usually represented represented without the learnable parameters

Hidden layer: 4
Output layer: 1
Make multi-class predictions using neural networks
--> add more neurons to the output layer
The process can be thought of as just replacing the output layer with a SoftMax function
If there has three neurons for three classes, we choose the class according to the index of the neuron that has the largest value

In this case, we have to choose neuron2
Use the following diagram

5 neurons in the output layer, one for each class
Then, add hidden layer
- more than one hidden layer, the neural network is called a deep neural network
- more neurons or more layers may lead to overfitting
The output or activation of each layer is the same dimension as the number of neurons
Each neuron is like a linear classification, therefore each neuron must have the same number of inputs as the previous layer
Let's see how the following neural network makes a prediction in the following layers

- each neuron in the 1st layer has 4 inputs
As there are three neurons, the activation has a dimension of three
- each neuron in the next layer has an input dimension of three
As there is two neurons in the second layer, the output activation had a dimension of two
One way to select a fully connected Neural Network Architecture is to use the validation data
It turns out that deep networks work better, but are hard to train
If you recall, to perform gradient descent to obtain our learning parameters
- We have to calculate the gradient
But the deeper the network, the smaller the gradient gets ---> vanishing gradient
As a result, its harder to train the deeper layer of the network
Sigmoid Function
- One of the main drawbacks with using the sigmoid activation function = the vanishing gradient
- One attraction function is the rectified linear unit function or Relu function for short
Relu

It is only used in the hidden layers
The value of Relu function is 0 when its input is less than zero
If the input Z is larger than 0, the input of the function will equal to its output

- if the input z equals 5, the output equals 5
*Some methods like dropout prevent overfitting, batch normalization to help with training
*Skip connections allow you to train deeper Networks by connecting deeper layers during training
The hidden Layers of Neural Networks replace the kernels is SVM's
Training neural networks is more of an art than a science !!
Copyright Coursera
Comments