Sir , in gradient descent we need df(x)/dx to find the minima . So , why don’t we get minima by simply put df(x)/dx = 0 . Why , we choose iterative method ?
Doubt related to gradient decent
hey @shubhambarnwal02 ,
You can be perfect in anything right , in the same way the machine can’t be .
But yeah , learning and understanding the concepts make you reach closer to that perfect mark , similarly we also want our machines to close to 0 error as possible as it can be.
Hence we don’t keep it directly as 0 , we want to learn and get closer to it.
I hope this helps.