Implementation of backprop

mohituniyal2010 · July 7, 2019, 9:50am

In the video implementing backpropagation
while calculated db1, db2, db3
prateek bhaiya worte this :
db3 = np.sum(delta3,axis=0)/float(m)

and in video NN- training your model,
suddenly it changed to
db3 = np.sum(delta3,axis=0) for all db’s without showing us in the lecture,

My question is why he removed float(m) from the denomination,? in the theory part he taught that we have to take col wise sum and divide by m {means taking average} , even the formula he wrote also has divided by m…
Then why in the implementation part he removed / m
I tried dividing by m there, but then i get very unsual behavior of loss, it is not constantly decreasing…
What could be the reason for this?

Rahul_garg · July 7, 2019, 10:33am

hi @mohituniyal2010
good observation
there is simple logic if u divide the sum by m then u have to keep the learning rate high to compunsate for loss taken due to division.
if u dont divide db3 by m then u can keep the learning rate high

mohituniyal2010 · July 7, 2019, 2:48pm

so, its just the matter of our preference, if we divide by m , learning rate should be high
if we dont divide db3 by m, then learning should be smaller as compared to first case.

Rahul_garg · July 9, 2019, 9:22am

yes u got the point exactly;