optimizers (the Adam algorithm)

Sanika NandpureSanika Nandpure
1 min read

The Adam (Adaptive Moment estimation) algorithm optimizes gradient descent. If it notices that gradient descent is taking small steps in the same direction, it increases the value of alpha so that it can reach the minimum faster. On the other extreme, if the set alpha is too large and gradient descent is bouncing around, the Adam algorithm automatically reduces the learning rate alpha. Thus, the Adam optimizer serves as a balancer of sorts–it takes care of adjusting the learning rate alpha as per the current behavior of gradient descent.

You should use this Adam algorithm in the compile stage of the neural network; just add an extra parameter to the compile function in TensorFlow such that:

model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)

Note that even though the Adam algorithm is more robust to what initial value of alpha you choose, it may still be useful to play around with different values to see which one converges faster. Do not blindly rely on library algorithms to do the dirty work for you.

0
Subscribe to my newsletter

Read articles from Sanika Nandpure directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sanika Nandpure
Sanika Nandpure

I'm a second-year student at the University of Texas at Austin with an interest in engineering, math, and machine learning.