Hash Data Structures
Intro
Hash Set vs Hash Map
Hash Set or unordered_set
No values, just keys
Used to check if a key exists or not
Hash Map or unordered_map
Value is also included
Used to check if a key exists and also get its value
Ordered Map vs Unordered map
Ordered or Tree map
maintains its elements in sorted order based on the keys
Underlying structure is binary search tree or a self-balancing tree (such as a Red-Black Tree).
search, insert, and delete operations have a time complexity of O(log n) due to the use of tree structures
Traversing an ordered map produces elements in sorted order.
Used to maintain a ranking system or when you need to find the minimum or maximum key efficiently.
Unordered or HashMap
No order
The average-case time complexity for operations (insert, delete, lookup) is O(1)
On collision, complexity can degrade to O(n).
Underlying structure is a hashtable
Data Structure | Java | C++ (STL) | Go (Builtin/Third-Party) | Rust | Python |
Hash Set | HashSet<E> | std::unordered_set | map[T]struct{} | HashSet<T> from std::collections | set |
Hash Map | HashMap<K, V> | std::unordered_map | map[K]V | HashMap<K, V> from std::collections | dict |
Ordered Map | LinkedHashMap<K, V> | std::map | Third-party packages (orderedmap ) | BTreeMap<K, V> from std::collections | OrderedDict from collections |
Unordered Map | HashMap<K, V> | std::unordered_map | map[K]V | HashMap<K, V> from std::collections | dict |
Hash function
Properties :
Deterministic: For the same key, the hash function must always return the same hash value.
Uniformity: The hash function should distribute keys uniformly across the hash table to minimize clustering.
Efficiency: The hash function should be fast to compute.
Integers :
- Can be simple modulus
Strings :
Weighted sum
( str[0]*primeNumber^0 + str[1]*primeNumber^1 ... )
Collisions
Mitigations :
Open addressing : Stored in same array itself
Linear probing
In case of collision, the value is inserted at next available free slot
The element is fetched by linearly searching
Increases chance of collision even more causing clustering
Clustering : the phenomenon where elements tend to group together in contiguous slots, causing performance degradation in hash table operations such as insertions and queries
Quadratic probing
- Use a quadratic function to search for the next slot.
Double hashing
- Use a second hash function to determine the step size for probing.
Separate chaining
Value is a linked list
In java, if linked list becomes too long, it will be made into a self balanced binary search tree
Load Factor and Resizing
The load factor :
number of elements stored / size of hashtable
Keys are rehashed and made into a new table (resizing) usually when load factor is above 0.75
Usually usually load factor is checked during insertion
Implementation of HashMap
- Hash function's modulus makes sure that index lies inside the array
type Node struct {
Key string // The original key used for the hash map
Value string // The associated value for this key
Hash string // The precomputed hash (optional, could store the int hash)
Next *Node // Pointer to the next node (for chaining in case of collision)
}
Node[] mp = new Node[7];
// Hash function to get an index from the key
func hash(key string, tableSize int) int {
hashValue := 0
for i := 0; i < len(key); i++ {
hashValue = (hashValue*31 + int(key[i])) % tableSize
}
return hashValue
}
// Insert a key-value pair into the hash table
func insert(mp [](*Node), key string, value string) {
// TODO: check load factor and resize
index := hash(key, len(mp)) // Get the hash index
head := mp[index] // Get the head of the linked list at that index
// Check if the key already exists in the linked list
current := head
for current != nil {
if current.Key == key {
current.Value = value // Update the value if key exists
return
}
current = current.Next
}
// If key does not exist, insert a new node at the head of the list
newNode := &Node{Key: key, Value: value, Hash: "", Next: head}
mp[index] = newNode // Insert the new node
}
// Fetch the value associated with a given key
func fetch(mp [](*Node), key string) (string, bool) {
index := hash(key, len(mp)) // Get the hash index
current := mp[index] // Get the head of the linked list at that index
// Traverse the linked list at the index
for current != nil {
if current.Key == key {
return current.Value, true // Return the value if the key is found
}
current = current.Next
}
// If key is not found, return empty string and false
return "", false
}
func resize(mp [](*Node)) [](*Node) {
newSize := len(mp) * 2 // Double the size of the table
newMp := make([](*Node), newSize)
// Rehash all nodes into the new table
for _, head := range mp {
current := head
for current != nil {
insert(newMp, current.Key, current.Value) // Reinsert into the new table
current = current.Next
}
}
return newMp
}
Subscribe to my newsletter
Read articles from Aswin Benny directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by