Hash Tables in Python: A Deep Dive
Hash Tables, sometimes referred to as Hash Maps or Dictionaries in other languages, are an essential data structure that allows us to store and retrieve data efficiently. They achieve this through a process of mapping keys to specific locations in an array, known as hashing.
In today's post, we'll embark on a detailed exploration of Hash Tables, using a custom-built Python version. The end goal? To grasp the intricacies of hashing, handle collisions, and employ this mighty data structure in real-world scenarios.
Structure of a Hash Table
At its core, a Hash Table consists of an array and a hashing function. This function takes a key as input and returns an index where the associated value should be stored or retrieved.
Here's a breakdown of our custom HashTable
class:
class HashTable:
def __init__(self, size = 7):
# Initializes an array with a default size of 7
self.data_map = [None] * size
def __hash(self, key):
# Computes an index for the given key
my_hash = 0
for letter in key:
# This formula ensures that the hash values are spread out over our array
my_hash =(my_hash + ord(letter) * 23) % len(self.data_map)
return my_hash
def print_table(self):
# Prints the contents of the hash table for debugging purposes
for i, val in enumerate(self.data_map):
print(i, ": ", val)
def set_item(self, key, value):
# Sets a new key-value pair into the hash table
index = self.__hash(key)
# If the index is empty (None), initialize it as an empty list
if self.data_map[index] == None:
self.data_map[index] = []
# Appends the key-value pair to the list at the computed index
self.data_map[index].append([key, value])
def get_item(self, key):
# Retrieves a value based on its key
index = self.__hash(key)
if self.data_map[index] is not None:
for i in range(len(self.data_map[index])):
# Searches for the key in the index's list
if self.data_map[index][i][0] == key:
return self.data_map[index][i][1]
# Returns None if the key wasn't found
return None
def keys(self):
# Returns all the keys present in the hash table
all_keys = []
for i in range(len(self.data_map)):
if self.data_map[i] is not None:
for j in range(len(self.data_map[i])):
all_keys.append(self.data_map[i][j][0])
return all_keys
Exploring the Intricacies
Hashing: Our hashing function
__hash()
transforms a key into an array index. By using theord()
function (which returns an integer representing the Unicode character), we ensure a good spread of values across the hash table. The multiplication by 23, a prime number, further randomizes this spread, reducing the chances of collisions. Finally, the modulo operation ensures that the resulting index remains within the bounds of the array.Collisions: Notice that our
set_item()
method can handle situations where two keys hash to the same index (a collision). This is achieved by storing key-value pairs in a list at the indexed location. If a collision occurs, the new key-value pair is simply appended to this list.Retrieval: The
get_item()
method calculates the key's hash, then checks if that index contains the required key-value pair. If it does, the associated value is returned. If not, the method returnsNone
.Key Enumeration: Our
keys()
method returns a list of all keys currently in the hash table. This is useful for various applications, from debugging to operations that require knowledge of all current keys.
Why Hash Tables?
Hash tables offer O(1) average time complexity for both insertions and lookups, making them incredibly efficient for tasks involving frequent data retrieval or updates. However, they're not without their limitations and challenges, such as handling collisions and ensuring a good distribution of values to maintain performance.
In our Python example, we've seen a rudimentary hashing function and collision resolution using chaining (storing a list of key-value pairs at each index). This is just one approach, and as you delve deeper into the world of data structures, you'll encounter more sophisticated methods.
As always, understanding the inner workings of such structures provides the foundation for leveraging them most effectively in real-world scenarios, be it in Python or any other language. As you move forward, experimenting with different hashing functions and collision resolution strategies will deepen your appreciation and mastery of this versatile data structure.
Subscribe to my newsletter
Read articles from Jose Ramirez directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by