in the ml intro lecture prateek sir says that since y =mod(x) is not differentiable so we take sum of square of yi-h(xi)
why?
Why we have taken total error as sum of squares of error
There are many reasons for selecting sum of squared errors as the function:
For Gradient Descent to work, the Loss Function should be differentiable and squared error indeed is differentiable.
It has a convex optimization surface (In simple terms, the answer to the such a problem is globally unique and no local minimas are present.)
Hope this helps!
I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.
On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.