In this why we generally take weights between -0.5 to 0.5
Vanishing Gradient
hey @shivani_jainEtW ,
while working on neural networks the way we choose weights of our neurons highly effects our model training and performance.
If we take weights to very large in values , they make our model to learn something else and deviating it from its learning path , similar to this problem , another problem that occurs is vanishing gradient.
So , a way found through experiments is to use weights between the range [-0.5 , 0.5] .
But this was much useful earlier when relu was not introduced , as now we can use relu then this is not the case as important as it was earlier.
So , to understand more about the values of weights to be choosen, you need to first understand more about the activation function you are using and its derivative also. If you got those two things correctly , then you can easily think and estimate how they are going to perform based on different values of your inputs or nodes.
I hope this helps.
Thank You and Happy Learning
.
I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.
On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.