LSTM’s can learn from its input sequence when to use short term dependency and when to use long term dependency.

In short term dependency it decides to clear the cell state.

In long term dependency it decides not to clear the cell state.

LSTM’s contain several gates for learning different aspects of the input sequence.

Input Gate:

It controls how much of new information from current input should be stored in cell state.
It allows LSTM to remember which portion of current input to remember and which to discard.

Forget Gate:

Helps in deciding what amount of previous information should be discarded with respect to new information.
Manage the cell state by determining which information should be removed from previous cell state.

Output Gate:

It decides how much of cell state should decide the hidden state.
It determines what part of cell state should be exposed to hidden state for the next time step.

New Memory Cell:

New information that can be added to the cell state.
Represents new information of current input.

Final Memory Calculations

$$c_t = (f_t \odot c_{t-1}) + (i_t \odot \tilde{c_t})$$

The first expression $(f_t \odot c_{t-1})$ represents how much of the previous cell state should be forgotten. Closer $f_t $ is to 1, the more the network forgets the previous memory.

The second expression $(i_t \odot \tilde{c_t})$ decides how much of new input information should be used to update the cell state.Closer $i_t $ and $\tilde{c_t}$ is to 1, the more input information is added to the cell state.

Hidden State

$$h_t = o_t \odot tanh(c_t)$$

Additionally there is a peephole weight which allows leak of information from previous final memory.

What I learnt about LSTM.