Vanishing Gradient

shivani_jainEtW · January 21, 2021, 4:59am

In this why we generally take weights between -0.5 to 0.5

prashant_ml · January 21, 2021, 6:44am

hey @shivani_jainEtW ,
while working on neural networks the way we choose weights of our neurons highly effects our model training and performance.
If we take weights to very large in values , they make our model to learn something else and deviating it from its learning path , similar to this problem , another problem that occurs is vanishing gradient.

So , a way found through experiments is to use weights between the range [-0.5 , 0.5] .
But this was much useful earlier when relu was not introduced , as now we can use relu then this is not the case as important as it was earlier.

So , to understand more about the values of weights to be choosen, you need to first understand more about the activation function you are using and its derivative also. If you got those two things correctly , then you can easily think and estimate how they are going to perform based on different values of your inputs or nodes.

I hope this helps.
Thank You and Happy Learning .

prashant_ml · January 27, 2021, 7:46am

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.