Inside Python Dictionaries: A Deep Dive into Hashing and Collision Handling

Madhura AnandMadhura Anand
6 min read

Open up any Python program, and chances are — you'll see a dictionary at work.

It might be mapping names to emails, product IDs to prices, or coordinates to sensor readings. On the surface, it's just a bunch of key–value pairs. But under the hood? A dictionary is one of Python's most powerful and elegant data structures, built on a foundation of smart engineering: hashing and dynamic arrays.

In this blog post, we’ll explore:

  • Why we need dictionaries in the first place

  • How all of this is powered by a beautifully orchestrated hash table

  • How they make lookups lightning fast using hash()

  • How Python handles collisions, growth, and different data types behind the scenes

Whether you're learning Python or want to deeply understand one of its core structures, let’s unravel the magic that makes d['apple'] work in constant time.


🔍 Why Do We Even Need Dictionaries in the First Place?

🧠 The Problem: Searching and Matching Data

Imagine you're trying to:

  • Look up a student’s name given their ID

  • Find the price of a product by its name

  • Count how many times a word appears in a text

  • Cache results, so expensive calculations don’t run again

If you used a list or an array, you’d have to:

  • Loop through every element

  • Check if the ID or name matches

  • Return the result

That’s O(n) time — slow for large datasets.


💡 The Need: Fast, Key-Based Access

What we want instead is:

“Given a key, instantly give me the value.”

And that’s exactly what a dictionary is built for.

prices = {'apple': 1.49}
print(prices['apple'])  # ✅ Fast lookup

Behind the scenes, Python uses hashing to jump directly to the value — no loops.


⚡ Real-World Use Cases for Dictionaries

ProblemWhy Dictionary Helps
Phonebook lookupMap names to phone numbers
Word frequency countMap words to their counts
Caching resultsMap input to previously computed output

🗄️ Python Dictionaries Explained Visually: Think Lockers, Keys, and Hashing

Imagine a hallway of lockers, each with a number. You want to store values (like 10) behind keys (like 'apple'). These lockers represent the underlying array, and Python decides which locker to use using a hash function. This system is called a hash table, and it’s the core data structure behind dictionaries.


🔐 Step 1: Hashing = Picking the Locker Number

Python takes your key 'apple' and runs it through the hash() function:

'apple' → 🔄 hash() → 728193 → Locker 37

🧠 Under the Hood:

index = hash('apple') % array_size

📌 Locker number = index in a dynamic array
📌 Hashing = the math to compute that index


📦 Step 2: Storing the Value

Slot[37] = [hash('apple'), 'apple', 10]

The hash table is just an array of slots like [hash, key, value]. This dynamic array grows and shifts as needed.


❗ Step 3: What If Two Keys Hash to the Same Locker?

If 'grape' also hashes to index 37:

  • Python tries the next available slot

  • This is called open addressing (probing)

Python keeps searching until it finds an empty slot — no linked lists, just continued use of the array-based structure.


🚀 Step 4: Growing the Locker Room

When the table gets ~66% full:

  • Python allocates a bigger array

  • Re-hashes all existing keys

  • Spreads them out into new slots

new_index = hash(key) % new_array_size

The underlying dynamic array grows to maintain performance.


✅ Summary: How It All Connects

Analogy ConceptPython Internals
Locker roomHash table (array structure)
LockersArray indexes
Hashing machinehash() function
Locker numberhash % size index
Stored value[hash, key, value]
Adding lockersDynamic array resize
Duplicate keysOpen addressing (probing)

🔍 Hashing in Python — Under the Hood

🔹 Python’s Built-in hash() Function

Used internally by dict, set, and custom objects.

Fast and efficient — optimized for:

  • Strings: Uses a variation of FNV hash + random seed

  • Numbers: hash(x) == x

  • Tuples: Combines hashes of all elements

hash('apple')  # → 728193 (varies per run due to hash randomization)

🔹 How hash() Works Differently for Each Type

🧪 Strings (FNV-Style Hashing)

hash("apple")

Internally:

hash_val = seed
for c in "apple":
    hash_val = (hash_val ^ ord(c)) * 1000003
  • Random seed ensures protection against DoS attacks

🔢 Integers

hash(42) == 42
hash(-99) == -99
  • Simple and direct — hash of an integer is the number itself

🔗 Tuples

Python allows you to use tuples as dictionary keys and set elements — but only if all the items inside the tuple are hashable (like numbers, strings, or other tuples).

Let’s say you write:

hash((1, 2, 3))

Here’s how Python computes this under the hood:

  1. Python starts with a base value:

     base = 0x345678  # a fixed starting number
    
  2. It loops through each item in the tuple and updates base like this:

     for item in (1, 2, 3):
         base = (base ^ hash(item)) * 1000003
    
  3. After the loop, it adds the length of the tuple:

     base += len((1, 2, 3))
    
  4. The final base value becomes the hash of the tuple:

     return base
    

🚧 Important Note:

This only works if every item in the tuple is hashable.

Example:

hash((1, 2, 3))        # ✅ OK
hash((1, [2, 3]))      # ❌ TypeError – list is not hashable

✅ Why This Matters

  • It ensures that (1, 2, 3) and (3, 2, 1) have different hashes

  • It allows Python to store tuples in hash tables — like dictionary keys

  • Hashing is deterministic as long as all elements are immutable and hashable


🧱 Collision Handling in Dictionaries

When Python stores key–value pairs in a dictionary, it uses hashing to decide where in memory to put each item. But sometimes, two different keys may hash to the same index — this is called a collision.

To resolve this, Python uses a strategy called open addressing — where all entries are stored directly in the internal array. When a collision occurs, Python looks for the next available index using a technique called perturbation probing.


🔄 Perturbation: How Python Finds the Next Free Slot

Instead of checking the next slot one-by-one (like linear probing), Python uses perturbation to spread out its search path. This helps prevent clustering and keeps lookups fast.

🧠 Under the Hood: Perturbation Probing Logic

When inserting a key:

  1. Python calculates
    index = hash(key) % table_size

  2. It sets a variable
    perturb = hash(key)

  3. While the slot at index is full:

index = (5 * index + 1 + perturb) % table_size
perturb >>= 5  # Bit-shift to change search pattern

Each step adjusts the index and slightly modifies perturb, leading to a more distributed probing pattern.

✅ Why This Works

  • Avoids long chains of filled slots

  • Reduces clustering

  • Maintains nearly O(1) performance for insertions and lookups

Python’s dictionaries are built for speed, and this probing technique is one of the reasons why they perform so well, even when collisions occur.


🧩 Final Thought

Dictionaries may feel like everyday tools, but behind the scenes they embody powerful ideas: dynamic arrays, hashing, probing, and collision resolution — all abstracted away to deliver blazing fast lookups with a simple dict[key].

Whether you're solving problems with caching, counting, or fast access — Python dictionaries are your trusty hash-powered allies.

10
Subscribe to my newsletter

Read articles from Madhura Anand directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Madhura Anand
Madhura Anand