Gradient Descent and Adam Optimizer

preetishvij · May 16, 2020, 7:21pm

I’m very confused of what is the use of optimizers such as adam while compiling model ?

I know gradient descent is used to update model parameters. Is it also an optimization technique. What is the difference between gradient descent and adam.

If i compile a model and mention optimizer as adam and while training i passed batch size also then am i using gradient descent or adam to update weights?

prashant_ml · May 17, 2020, 5:49am

Adam optimizer and gradient descent both are optimization techniques. There are chances like if you have a large learning rate where you can move far ahead of the global minima which will increase your error and if you lower your learning rate , then it will surely converge but will take long time to get it done.

whereas Adam optimizer uses momentum,which,doesn’t allow it to just take the instantaneous gradient , but instead it keep tracks of the direction it was going in form of velocity. With the help of this , if there is chance when you are going back and forth because of the gradient then this momentum will force it to go slower and smaller steps. This will help to converge faster providing better results.

Due to this Adam optimizer is preferred in major deep learning practices.

prashant_ml · January 6, 2021, 2:56pm

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.