Understanding LSTM Internals: A Practical, Intuitive Walkthrough

LSTM (Long Short-Term Memory) networks are a type of recurrent neural network designed to learn long-term dependencies. But what makes them so effective?

In this post, I summarize key concepts I learned while revisiting LSTMs for my solar irradiance forecasting project.

🔧 LSTM Core Structure

At each time step t, LSTM receives:

xₜ: current input (e.g., time and temperature)
hₜ₋₁: previous output (short-term memory)
cₜ₋₁: previous cell state (long-term memory)

And it produces:

hₜ: current output
cₜ: updated memory state

🧠 Gates and Their Roles

Gate	Formula	Purpose
Forget Gate	`fₜ = sigmoid(W_f · [hₜ₋₁, xₜ])`	How much of old memory to keep
Input Gate	`iₜ = sigmoid(W_i · [hₜ₋₁, xₜ])`	How much new info to allow
Candidate	`gₜ = tanh(W_g · [hₜ₋₁, xₜ])`	What is the new info (positive/negative)
Cell State	`cₜ = fₜ * cₜ₋₁ + iₜ * gₜ`	Combines old and new memory
Output Gate	`oₜ = sigmoid(W_o · [hₜ₋₁, xₜ])`	How much memory to reveal
Hidden State	`hₜ = oₜ * tanh(cₜ)`	Final output of the step

🔁 Intuition

Sigmoid gates (fₜ, iₜ, oₜ) act like knobs: control "how much" flows through.
Tanh shapes the content: can be positive, negative, or neutral.
The model learns when to forget, when to remember, and how much to speak out.

🛠️ Analogies That Help

📦 Conveyor Belt Analogy (Cell State)

Imagine the cell state cₜ as a conveyor belt moving through time.

It carries useful information forward, like facts or trends.
Gates decide what to keep on the belt, what to drop, and what to add.
Over time, the belt accumulates only relevant memory, efficiently managing what matters most.

🚰 Tap-and-Water Analogy (Gating)

Sigmoid = the tap knob: how much you open it (0 to 1)
Tanh = the water quality: whether it’s hot (+), cold (-), or neutral (0)
Final flow = "importance" × "content" = what gets added to memory or sent out as output

🔍 Example: Solar Irradiance Forecasting

Imagine the model observes a sequence like:

Morning → Noon → Afternoon → Evening
(Rising irradiance → Peak → Falling)

Around 2:30 PM:

fₜ is low (e.g., 0.3): forget old rising trend.
iₜ is high (e.g., 0.9): accept new info.
gₜ is negative (e.g., -0.6): indicates falling irradiance.
cₜ becomes negative/slightly reduced → hₜ outputs a lower value.

At 6:00 PM:

fₜ increases again: retain recent low trend.
iₜ drops: less new change coming in.
Model stabilizes, outputs lower irradiance prediction.

🧩 Summary

LSTMs use learnable gate mechanisms to manage memory.
Negative values from tanh encode declining or inverse trends.
Forget gate doesn’t decay memory blindly — it learns context-specific importance (e.g., time of day + temperature).
Hidden state reflects both recent and past context, shaped by cell state.

This understanding was inspired by my project:
"Solar Irradiance Forecasting using LSTM with Varying Input Features"

Published at IEEE TENSYMP 2022 – IIT Bombay
GitHub Repository (includes diagrams created using Mermaid):
👉 Solar-Irradiance-Forecasting GitHub Repo

Flowchart (MermaidJS format) is available in the README for easy understanding.

Stay tuned for more posts like this — I’ll be sharing intuitive breakdowns of other models I use in my research and personal projects.

LSTM for Dummies

Table of contents

Understanding LSTM Internals: A Practical, Intuitive Walkthrough

🔧 LSTM Core Structure

🧠 Gates and Their Roles

🔁 Intuition

🛠️ Analogies That Help

📦 Conveyor Belt Analogy (Cell State)

🚰 Tap-and-Water Analogy (Gating)

🔍 Example: Solar Irradiance Forecasting

🧩 Summary

Subscribe to my newsletter

DHANUSH D SHEKAR

DHANUSH D SHEKAR

LSTM for Dummies

Table of contents

Understanding LSTM Internals: A Practical, Intuitive Walkthrough

🔧 LSTM Core Structure

🧠 Gates and Their Roles

🔁 Intuition

🛠️ Analogies That Help

📦 Conveyor Belt Analogy (Cell State)

🚰 Tap-and-Water Analogy (Gating)

🔍 Example: Solar Irradiance Forecasting

🧩 Summary

📁 Related Project: LSTM Forecasting for Solar Irradiance

Subscribe to my newsletter

DHANUSH D SHEKAR

DHANUSH D SHEKAR