In online theory “MLP08-Vectorizing Backpropagation for m examples”,.
change in loss/ change in bias i.e, dL/db = 1/m(np.sum(delta,axis=0)).
In this we are taking average gradient for all m examples.
In Online Video- “NN- Implementation BackPropogation” the above was computed as -
dw1 = np.dot(X.T,delta1)
db1 = np.sum(delta1,axis=0)/float(m)
But, In the video “NN-Training Your Model” the above code was changed to-
dw1 = np.dot(X.T,delta1)
db1 = np.sum(delta1,axis=0)
When I didn’t remove float(m), my loss curve was fluctuating and accuracy was low but when i removed the float(m), I got the exact result as in the “NN-Training your model” video.
Why did this happen? Why are we not taking avg gradient as explained in theory but just taking sum of the error?