LSTM for Dummies


Understanding LSTM Internals: A Practical, Intuitive Walkthrough
LSTM (Long Short-Term Memory) networks are a type of recurrent neural network designed to learn long-term dependencies. But what makes them so effective?
In this post, I summarize key concepts I learned while revisiting LSTMs for my solar irradiance forecasting project.
π§ LSTM Core Structure
At each time step t
, LSTM receives:
xβ
: current input (e.g., time and temperature)hβββ
: previous output (short-term memory)cβββ
: previous cell state (long-term memory)
And it produces:
hβ
: current outputcβ
: updated memory state
π§ Gates and Their Roles
Gate | Formula | Purpose |
Forget Gate | fβ = sigmoid(W_f Β· [hβββ, xβ]) | How much of old memory to keep |
Input Gate | iβ = sigmoid(W_i Β· [hβββ, xβ]) | How much new info to allow |
Candidate | gβ = tanh(W_g Β· [hβββ, xβ]) | What is the new info (positive/negative) |
Cell State | cβ = fβ * cβββ + iβ * gβ | Combines old and new memory |
Output Gate | oβ = sigmoid(W_o Β· [hβββ, xβ]) | How much memory to reveal |
Hidden State | hβ = oβ * tanh(cβ) | Final output of the step |
π Intuition
- Sigmoid gates (
fβ
,iβ
,oβ
) act like knobs: control "how much" flows through. - Tanh shapes the content: can be positive, negative, or neutral.
- The model learns when to forget, when to remember, and how much to speak out.
π οΈ Analogies That Help
π¦ Conveyor Belt Analogy (Cell State)
Imagine the cell state cβ
as a conveyor belt moving through time.
- It carries useful information forward, like facts or trends.
- Gates decide what to keep on the belt, what to drop, and what to add.
- Over time, the belt accumulates only relevant memory, efficiently managing what matters most.
π° Tap-and-Water Analogy (Gating)
- Sigmoid = the tap knob: how much you open it (0 to 1)
- Tanh = the water quality: whether itβs hot (+), cold (-), or neutral (0)
- Final flow = "importance" Γ "content" = what gets added to memory or sent out as output
π Example: Solar Irradiance Forecasting
Imagine the model observes a sequence like:
Morning β Noon β Afternoon β Evening
(Rising irradiance β Peak β Falling)
Around 2:30 PM:
fβ
is low (e.g., 0.3): forget old rising trend.iβ
is high (e.g., 0.9): accept new info.gβ
is negative (e.g., -0.6): indicates falling irradiance.cβ
becomes negative/slightly reduced βhβ
outputs a lower value.
At 6:00 PM:
fβ
increases again: retain recent low trend.iβ
drops: less new change coming in.- Model stabilizes, outputs lower irradiance prediction.
π§© Summary
- LSTMs use learnable gate mechanisms to manage memory.
- Negative values from
tanh
encode declining or inverse trends. - Forget gate doesnβt decay memory blindly β it learns context-specific importance (e.g., time of day + temperature).
- Hidden state reflects both recent and past context, shaped by cell state.
π Related Project: LSTM Forecasting for Solar Irradiance
This understanding was inspired by my project:
"Solar Irradiance Forecasting using LSTM with Varying Input Features"
- Published at IEEE TENSYMP 2022 β IIT Bombay
- GitHub Repository (includes diagrams created using Mermaid):
π Solar-Irradiance-Forecasting GitHub Repo
Flowchart (MermaidJS format) is available in the README for easy understanding.
Stay tuned for more posts like this β Iβll be sharing intuitive breakdowns of other models I use in my research and personal projects.
Subscribe to my newsletter
Read articles from DHANUSH D SHEKAR directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
