Hash Data Structures

Aswin BennyAswin Benny
5 min read

Intro

  • Hash Set vs Hash Map

    • Hash Set or unordered_set

      • No values, just keys

      • Used to check if a key exists or not

    • Hash Map or unordered_map

      • Value is also included

      • Used to check if a key exists and also get its value

  • Ordered Map vs Unordered map

    • Ordered or Tree map

      • maintains its elements in sorted order based on the keys

      • Underlying structure is binary search tree or a self-balancing tree (such as a Red-Black Tree).

      • search, insert, and delete operations have a time complexity of O(log n) due to the use of tree structures

      • Traversing an ordered map produces elements in sorted order.

      • Used to maintain a ranking system or when you need to find the minimum or maximum key efficiently.

    • Unordered or HashMap

      • No order

      • The average-case time complexity for operations (insert, delete, lookup) is O(1)

      • On collision, complexity can degrade to O(n).

      • Underlying structure is a hashtable

Data StructureJavaC++ (STL)Go (Builtin/Third-Party)RustPython
Hash SetHashSet<E>std::unordered_setmap[T]struct{}HashSet<T> from std::collectionsset
Hash MapHashMap<K, V>std::unordered_mapmap[K]VHashMap<K, V> from std::collectionsdict
Ordered MapLinkedHashMap<K, V>std::mapThird-party packages (orderedmap)BTreeMap<K, V> from std::collectionsOrderedDict from collections
Unordered MapHashMap<K, V>std::unordered_mapmap[K]VHashMap<K, V> from std::collectionsdict

Hash function

  • Properties :

    • Deterministic: For the same key, the hash function must always return the same hash value.

    • Uniformity: The hash function should distribute keys uniformly across the hash table to minimize clustering.

    • Efficiency: The hash function should be fast to compute.

  • Integers :

    • Can be simple modulus
  • Strings :

    • Weighted sum

    • ( str[0]*primeNumber^0 + str[1]*primeNumber^1 ... )

Collisions

  • Mitigations :

    • Open addressing : Stored in same array itself

      • Linear probing

        • In case of collision, the value is inserted at next available free slot

        • The element is fetched by linearly searching

        • Increases chance of collision even more causing clustering

        • Clustering : the phenomenon where elements tend to group together in contiguous slots, causing performance degradation in hash table operations such as insertions and queries

      • Quadratic probing

        • Use a quadratic function to search for the next slot.
      • Double hashing

        • Use a second hash function to determine the step size for probing.
    • Separate chaining

      • Value is a linked list

      • In java, if linked list becomes too long, it will be made into a self balanced binary search tree

Load Factor and Resizing

  • The load factor : number of elements stored / size of hashtable

  • Keys are rehashed and made into a new table (resizing) usually when load factor is above 0.75

  • Usually usually load factor is checked during insertion

Implementation of HashMap

  • Hash function's modulus makes sure that index lies inside the array
type Node struct { 
    Key string // The original key used for the hash map 
    Value string // The associated value for this key 
    Hash string // The precomputed hash (optional, could store the int hash) 
    Next *Node // Pointer to the next node (for chaining in case of collision) 
}

Node[] mp = new Node[7];


// Hash function to get an index from the key
func hash(key string, tableSize int) int {
    hashValue := 0
    for i := 0; i < len(key); i++ {
        hashValue = (hashValue*31 + int(key[i])) % tableSize
    }
    return hashValue
}


// Insert a key-value pair into the hash table
func insert(mp [](*Node), key string, value string) {

    // TODO: check load factor and resize

    index := hash(key, len(mp))  // Get the hash index
    head := mp[index]  // Get the head of the linked list at that index

    // Check if the key already exists in the linked list
    current := head
    for current != nil {
        if current.Key == key {
            current.Value = value  // Update the value if key exists
            return
        }
        current = current.Next
    }

    // If key does not exist, insert a new node at the head of the list
    newNode := &Node{Key: key, Value: value, Hash: "", Next: head}
    mp[index] = newNode  // Insert the new node
}


// Fetch the value associated with a given key
func fetch(mp [](*Node), key string) (string, bool) {
    index := hash(key, len(mp))  // Get the hash index
    current := mp[index]  // Get the head of the linked list at that index

    // Traverse the linked list at the index
    for current != nil {
        if current.Key == key {
            return current.Value, true  // Return the value if the key is found
        }
        current = current.Next
    }

    // If key is not found, return empty string and false
    return "", false
}


func resize(mp [](*Node)) [](*Node) {
    newSize := len(mp) * 2  // Double the size of the table
    newMp := make([](*Node), newSize)

    // Rehash all nodes into the new table
    for _, head := range mp {
        current := head
        for current != nil {
            insert(newMp, current.Key, current.Value)  // Reinsert into the new table
            current = current.Next
        }
    }
    return newMp
}
0
Subscribe to my newsletter

Read articles from Aswin Benny directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aswin Benny
Aswin Benny