why are we taking the mean here,in the final update rule, our task it to only find the sum,why do we divide by m and why the mean
In the final update rule, why m
hey @sankalparora5 ,
we use mean values of total errors because we need to maintain proper learning of our model to reach convergence faster.
If you remember the gradient descent curve , we need to reach the optimum value with as much as information we can learn.
It’s not that we cannot perform taking mean,  we can do it. But the weights will fluctuate a lot while updating them , sometimes the values can even a lot higher than the required ones. Sometimes , there can be chance that the overall error sum comes to very small there is almost no updation .
To counter these things , we generally take mean as a convention   .
You can surely try without it. You can make as many experiments you want.
I hope this helped you understand.
Thank You.  .
 .
I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.
On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.
