Number of Nodes in the Input Layer

Instead of n, shouldn’t be there " m " number of nodes in the input layer ??

Can you please explain

hey @mananaroramail ,
n = number of features
m = number of samples or batch_size you take.

Take an example , You learning to play cricket , specifically batting.
So your features will be to bat , holding bat , your posture etc. will matter as they are the only things that will improve your batting.
Whereas how many number of times you play ,defines how you learn and improve yourself.

Similarly, you need to train your model to improve its performance , on the features ( n ) you have . Not the number of steps ( m ) how many times it needs to train.

If we take a batch_size ( m ) = 32 and we have number of features ( n ) = 10.
So, we input our model with a sample having n features ,and update its weights ,its biases after passing 32 such samples into our model. So that our model learns more better and can converge more efficiently.

I hope this helps you understand the problem.
Thank You :slightly_smiling_face:.

Ok, that is clear to me.
Thanks !

One more thing, I am supplying the inputs to each neuron in the 1st hidden layer.
Each neuron in the hidden layer is a linear classifier.
As I am feeding each neuron with the same examples, wouldn’t they learn the same parameters ( i.e W and the bias ) ?

It depends upon how you initialize them.
By default we use random weights and bias and after that our training comes into action which updates those weights and bias values such that it learns how it does need to differentiate between different inputs.

But we are initializing like : np.random.randn(input_size, layers[0])

np.random.randn() gives points under the Gaussian surface with mean = 0 and sigma = 1.

Thus, all the weights and biases would be very close to each other. Thus, wouldn’t they reach the same minima for the loss function ??
And hence, will learn the same parameters ?

So, the training part is different, and not like how a single perceptron was trained ?

Neural network is group of a number of small simple perceptrons , although the intialized weights are in gaussian form and will be closer to each other.

There training depends upon the inputs they are multiplied with and biases that are added , after that we update our weights using the optimizers. These optimizers are the ones , which will now convert the previous gaussian distribution into such a way whatever the input may be , the model can perform its best.
So , while being it important how you initialize these parameters its also important how you tune them.