Inside Python Dictionaries: A Deep Dive into Hashing and Collision Handling

Open up any Python program, and chances are — you'll see a dictionary at work.
It might be mapping names to emails, product IDs to prices, or coordinates to sensor readings. On the surface, it's just a bunch of key–value pairs. But under the hood? A dictionary is one of Python's most powerful and elegant data structures, built on a foundation of smart engineering: hashing and dynamic arrays.
In this blog post, we’ll explore:
Why we need dictionaries in the first place
How all of this is powered by a beautifully orchestrated hash table
How they make lookups lightning fast using
hash()
How Python handles collisions, growth, and different data types behind the scenes
Whether you're learning Python or want to deeply understand one of its core structures, let’s unravel the magic that makes d['apple']
work in constant time.
🔍 Why Do We Even Need Dictionaries in the First Place?
🧠 The Problem: Searching and Matching Data
Imagine you're trying to:
Look up a student’s name given their ID
Find the price of a product by its name
Count how many times a word appears in a text
Cache results, so expensive calculations don’t run again
If you used a list or an array, you’d have to:
Loop through every element
Check if the ID or name matches
Return the result
That’s O(n)
time — slow for large datasets.
💡 The Need: Fast, Key-Based Access
What we want instead is:
“Given a key, instantly give me the value.”
And that’s exactly what a dictionary is built for.
prices = {'apple': 1.49}
print(prices['apple']) # ✅ Fast lookup
Behind the scenes, Python uses hashing to jump directly to the value — no loops.
⚡ Real-World Use Cases for Dictionaries
Problem | Why Dictionary Helps |
Phonebook lookup | Map names to phone numbers |
Word frequency count | Map words to their counts |
Caching results | Map input to previously computed output |
🗄️ Python Dictionaries Explained Visually: Think Lockers, Keys, and Hashing
Imagine a hallway of lockers, each with a number. You want to store values (like 10
) behind keys (like 'apple'
). These lockers represent the underlying array, and Python decides which locker to use using a hash function. This system is called a hash table, and it’s the core data structure behind dictionaries.
🔐 Step 1: Hashing = Picking the Locker Number
Python takes your key 'apple'
and runs it through the hash()
function:
'apple' → 🔄 hash() → 728193 → Locker 37
🧠 Under the Hood:
index = hash('apple') % array_size
📌 Locker number = index
in a dynamic array
📌 Hashing =
the math to compute that index
📦 Step 2: Storing the Value
Slot[37] = [hash('apple'), 'apple', 10]
The hash table is just an array of slots like [hash, key, value]
. This dynamic array grows and shifts as needed.
❗ Step 3: What If Two Keys Hash to the Same Locker?
If 'grape'
also hashes to index 37:
Python tries the next available slot
This is called open addressing (probing)
Python keeps searching until it finds an empty slot — no linked lists, just continued use of the array-based structure.
🚀 Step 4: Growing the Locker Room
When the table gets ~66% full:
Python allocates a bigger array
Re-hashes all existing keys
Spreads them out into new slots
new_index = hash(key) % new_array_size
The underlying dynamic array grows to maintain performance.
✅ Summary: How It All Connects
Analogy Concept | Python Internals |
Locker room | Hash table (array structure) |
Lockers | Array indexes |
Hashing machine | hash() function |
Locker number | hash % size index |
Stored value | [hash, key, value] |
Adding lockers | Dynamic array resize |
Duplicate keys | Open addressing (probing) |
🔍 Hashing in Python — Under the Hood
🔹 Python’s Built-in hash()
Function
Used internally by dict
, set
, and custom objects.
Fast and efficient — optimized for:
Strings: Uses a variation of FNV hash + random seed
Numbers:
hash(x) == x
Tuples: Combines hashes of all elements
hash('apple') # → 728193 (varies per run due to hash randomization)
🔹 How hash()
Works Differently for Each Type
🧪 Strings (FNV-Style Hashing)
hash("apple")
Internally:
hash_val = seed
for c in "apple":
hash_val = (hash_val ^ ord(c)) * 1000003
- Random seed ensures protection against DoS attacks
🔢 Integers
hash(42) == 42
hash(-99) == -99
- Simple and direct — hash of an integer is the number itself
🔗 Tuples
Python allows you to use tuples as dictionary keys and set elements — but only if all the items inside the tuple are hashable (like numbers, strings, or other tuples).
Let’s say you write:
hash((1, 2, 3))
Here’s how Python computes this under the hood:
Python starts with a base value:
base = 0x345678 # a fixed starting number
It loops through each item in the tuple and updates
base
like this:for item in (1, 2, 3): base = (base ^ hash(item)) * 1000003
After the loop, it adds the length of the tuple:
base += len((1, 2, 3))
The final
base
value becomes the hash of the tuple:return base
🚧 Important Note:
This only works if every item in the tuple is hashable.
Example:
hash((1, 2, 3)) # ✅ OK
hash((1, [2, 3])) # ❌ TypeError – list is not hashable
✅ Why This Matters
It ensures that (1, 2, 3) and (3, 2, 1) have different hashes
It allows Python to store tuples in hash tables — like dictionary keys
Hashing is deterministic as long as all elements are immutable and hashable
🧱 Collision Handling in Dictionaries
When Python stores key–value pairs in a dictionary, it uses hashing to decide where in memory to put each item. But sometimes, two different keys may hash to the same index — this is called a collision.
To resolve this, Python uses a strategy called open addressing — where all entries are stored directly in the internal array. When a collision occurs, Python looks for the next available index using a technique called perturbation probing.
🔄 Perturbation: How Python Finds the Next Free Slot
Instead of checking the next slot one-by-one (like linear probing), Python uses perturbation to spread out its search path. This helps prevent clustering and keeps lookups fast.
🧠 Under the Hood: Perturbation Probing Logic
When inserting a key:
Python calculates
index = hash(key) % table_size
It sets a variable
perturb = hash(key)
While the slot at
index
is full:
index = (5 * index + 1 + perturb) % table_size
perturb >>= 5 # Bit-shift to change search pattern
Each step adjusts the index and slightly modifies perturb
, leading to a more distributed probing pattern.
✅ Why This Works
Avoids long chains of filled slots
Reduces clustering
Maintains nearly O(1) performance for insertions and lookups
Python’s dictionaries are built for speed, and this probing technique is one of the reasons why they perform so well, even when collisions occur.
🧩 Final Thought
Dictionaries may feel like everyday tools, but behind the scenes they embody powerful ideas: dynamic arrays, hashing, probing, and collision resolution — all abstracted away to deliver blazing fast lookups with a simple dict[key]
.
Whether you're solving problems with caching, counting, or fast access — Python dictionaries are your trusty hash-powered allies.
Subscribe to my newsletter
Read articles from Madhura Anand directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
