Significance of taking average in weight update formula

pasta · August 23, 2020, 3:01pm

Hi
In the formula weights=weights-learning_rate*dw/(float(m)) why has instructor divided by ‘m’ ? because while deriving this formula earlier instructor did not divided by m

Aayushkh_333 · August 23, 2020, 8:02pm

Hey @pasta, this is done usually in case of Batch Gradient Descent. So the weights are updated only once after all data instances of the dataset have been processed. Specifically, during the batch gradient descent, the gradients for each instance in the dataset are calculated and summed. In the end, the accumulated gradient is divided by the number of data instances. In this way, we get an averaged gradient across all data instances in the dataset .

This technique is less computationally demanding, as no updates are required after each sample.
Another advantage is the fact that the convergence of the weights to the optimal weights is very stable. By calculating and averaging all individual gradients over each sample in the dataset, we get a very good estimate of the true gradient, indicating the highest increase of the loss function.

I hope this helps !
Please mark the doubt as resolved in your doubts section !
Happy Learning !