Gradient ascent

if the intial theta value is random…lets say…if the value is not in
that concave curve…how will update rule change…if the theta value lies in the curve…the for further updations it will reach to global maxima

Hello @yaswanth.koravi,

First of all, welcome to the community :smile:

That’s a good question. If the function to be minimized is non-convex, then sure, the minimum value will be dependant on the initial value of the parameter in case of gradient descent. But this doesnt mean, we need to change any update rule. Gradient descent has its advantages and disadvantages, where disadvantage being stuck at local minimas for a highly non convex function. Hence, as you said, a way towards a good initialization(meaning the one which will result in the global minimum) is an active area of research. All we can expect by a random initialization is that we get a good minimum of the function to be minimized. In case of a simple linear regressions, the loss is a convex function so you dont need to worry about the random initialization getting you stuck at local minimas. The issue above occurs mainly with non linear models, like Neural networks.

We have different types of optimizers to metigate this issue to some extent, Adam, Adagrad, Momentum, Nesterov, all these are some examples of the said optimizers. Also, some not so pure random initialization like glorot uniform, He initialization are some techniques of initialization we can use to reduce the effect of local minimums. But as I said, this is still an active area of research.

So coming to your question, the only problem with a random initialization is multiple minimas/maximas. As long as you have a convex function you are super good to go with the random initialization.

Happy Learning :smiley:
Thanks