LSTM for Dummies

Understanding LSTM Internals: A Practical, Intuitive Walkthrough

LSTM (Long Short-Term Memory) networks are a type of recurrent neural network designed to learn long-term dependencies. But what makes them so effective?

In this post, I summarize key concepts I learned while revisiting LSTMs for my solar irradiance forecasting project.


πŸ”§ LSTM Core Structure

At each time step t, LSTM receives:

  • xβ‚œ: current input (e.g., time and temperature)
  • hβ‚œβ‚‹β‚: previous output (short-term memory)
  • cβ‚œβ‚‹β‚: previous cell state (long-term memory)

And it produces:

  • hβ‚œ: current output
  • cβ‚œ: updated memory state

🧠 Gates and Their Roles

GateFormulaPurpose
Forget Gatefβ‚œ = sigmoid(W_f Β· [hβ‚œβ‚‹β‚, xβ‚œ])How much of old memory to keep
Input Gateiβ‚œ = sigmoid(W_i Β· [hβ‚œβ‚‹β‚, xβ‚œ])How much new info to allow
Candidategβ‚œ = tanh(W_g Β· [hβ‚œβ‚‹β‚, xβ‚œ])What is the new info (positive/negative)
Cell Statecβ‚œ = fβ‚œ * cβ‚œβ‚‹β‚ + iβ‚œ * gβ‚œCombines old and new memory
Output Gateoβ‚œ = sigmoid(W_o Β· [hβ‚œβ‚‹β‚, xβ‚œ])How much memory to reveal
Hidden Statehβ‚œ = oβ‚œ * tanh(cβ‚œ)Final output of the step

πŸ” Intuition

  • Sigmoid gates (fβ‚œ, iβ‚œ, oβ‚œ) act like knobs: control "how much" flows through.
  • Tanh shapes the content: can be positive, negative, or neutral.
  • The model learns when to forget, when to remember, and how much to speak out.

πŸ› οΈ Analogies That Help

πŸ“¦ Conveyor Belt Analogy (Cell State)

Imagine the cell state cβ‚œ as a conveyor belt moving through time.

  • It carries useful information forward, like facts or trends.
  • Gates decide what to keep on the belt, what to drop, and what to add.
  • Over time, the belt accumulates only relevant memory, efficiently managing what matters most.

🚰 Tap-and-Water Analogy (Gating)

  • Sigmoid = the tap knob: how much you open it (0 to 1)
  • Tanh = the water quality: whether it’s hot (+), cold (-), or neutral (0)
  • Final flow = "importance" Γ— "content" = what gets added to memory or sent out as output

πŸ” Example: Solar Irradiance Forecasting

Imagine the model observes a sequence like:

Morning β†’ Noon β†’ Afternoon β†’ Evening
(Rising irradiance β†’ Peak β†’ Falling)

Around 2:30 PM:

  • fβ‚œ is low (e.g., 0.3): forget old rising trend.
  • iβ‚œ is high (e.g., 0.9): accept new info.
  • gβ‚œ is negative (e.g., -0.6): indicates falling irradiance.
  • cβ‚œ becomes negative/slightly reduced β†’ hβ‚œ outputs a lower value.

At 6:00 PM:

  • fβ‚œ increases again: retain recent low trend.
  • iβ‚œ drops: less new change coming in.
  • Model stabilizes, outputs lower irradiance prediction.

🧩 Summary

  • LSTMs use learnable gate mechanisms to manage memory.
  • Negative values from tanh encode declining or inverse trends.
  • Forget gate doesn’t decay memory blindly β€” it learns context-specific importance (e.g., time of day + temperature).
  • Hidden state reflects both recent and past context, shaped by cell state.

This understanding was inspired by my project:
"Solar Irradiance Forecasting using LSTM with Varying Input Features"

Flowchart (MermaidJS format) is available in the README for easy understanding.


Stay tuned for more posts like this β€” I’ll be sharing intuitive breakdowns of other models I use in my research and personal projects.

0
Subscribe to my newsletter

Read articles from DHANUSH D SHEKAR directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

DHANUSH D SHEKAR
DHANUSH D SHEKAR