The Adam (Adaptive Moment estimation) algorithm optimizes gradient descent. If it notices that gradient descent is taking small steps in the same direction, it increases the value of alpha so that it can reach the minimum faster. On the other extreme, if the set alpha is too large and gradient descent is bouncing around, the Adam algorithm automatically reduces the learning rate alpha. Thus, the Adam optimizer serves as a balancer of sorts–it takes care of adjusting the learning rate alpha as per the current behavior of gradient descent.

You should use this Adam algorithm in the compile stage of the neural network; just add an extra parameter to the compile function in TensorFlow such that:

model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)

Note that even though the Adam algorithm is more robust to what initial value of alpha you choose, it may still be useful to play around with different values to see which one converges faster. Do not blindly rely on library algorithms to do the dirty work for you.

optimizers (the Adam algorithm)

Subscribe to my newsletter

Sanika Nandpure

Sanika Nandpure