Intro

Hash Set vs Hash Map
- Hash Set or unordered_set
  - No values, just keys
  - Used to check if a key exists or not
- Hash Map or unordered_map
  - Value is also included
  - Used to check if a key exists and also get its value
Ordered Map vs Unordered map
- Ordered or Tree map
  - maintains its elements in sorted order based on the keys
  - Underlying structure is binary search tree or a self-balancing tree (such as a Red-Black Tree).
  - search, insert, and delete operations have a time complexity of O(log n) due to the use of tree structures
  - Traversing an ordered map produces elements in sorted order.
  - Used to maintain a ranking system or when you need to find the minimum or maximum key efficiently.
- Unordered or HashMap
  - No order
  - The average-case time complexity for operations (insert, delete, lookup) is O(1)
  - On collision, complexity can degrade to O(n).
  - Underlying structure is a hashtable

Data Structure	Java	C++ (STL)	Go (Builtin/Third-Party)	Rust	Python
Hash Set	`HashSet<E>`	`std::unordered_set`	`map[T]struct{}`	`HashSet<T>` from `std::collections`	`set`
Hash Map	`HashMap<K, V>`	`std::unordered_map`	`map[K]V`	`HashMap<K, V>` from `std::collections`	`dict`
Ordered Map	`LinkedHashMap<K, V>`	`std::map`	Third-party packages (`orderedmap`)	`BTreeMap<K, V>` from `std::collections`	`OrderedDict` from `collections`
Unordered Map	`HashMap<K, V>`	`std::unordered_map`	`map[K]V`	`HashMap<K, V>` from `std::collections`	`dict`

Hash function

Properties :
- Deterministic: For the same key, the hash function must always return the same hash value.
- Uniformity: The hash function should distribute keys uniformly across the hash table to minimize clustering.
- Efficiency: The hash function should be fast to compute.
Integers :
- Can be simple modulus
Strings :
- Weighted sum
- ( str[0]*primeNumber^0 + str[1]*primeNumber^1 ... )

Collisions

Mitigations :
- Open addressing : Stored in same array itself
  - Linear probing
    - In case of collision, the value is inserted at next available free slot
    - The element is fetched by linearly searching
    - Increases chance of collision even more causing clustering
    - Clustering : the phenomenon where elements tend to group together in contiguous slots, causing performance degradation in hash table operations such as insertions and queries
  - Quadratic probing
    - Use a quadratic function to search for the next slot.
  - Double hashing
    - Use a second hash function to determine the step size for probing.
- Separate chaining
  - Value is a linked list
  - In java, if linked list becomes too long, it will be made into a self balanced binary search tree

Load Factor and Resizing

The load factor : number of elements stored / size of hashtable
Keys are rehashed and made into a new table (resizing) usually when load factor is above 0.75
Usually usually load factor is checked during insertion

Implementation of HashMap

Hash function's modulus makes sure that index lies inside the array

type Node struct { 
    Key string // The original key used for the hash map 
    Value string // The associated value for this key 
    Hash string // The precomputed hash (optional, could store the int hash) 
    Next *Node // Pointer to the next node (for chaining in case of collision) 
}

Node[] mp = new Node[7];


// Hash function to get an index from the key
func hash(key string, tableSize int) int {
    hashValue := 0
    for i := 0; i < len(key); i++ {
        hashValue = (hashValue*31 + int(key[i])) % tableSize
    }
    return hashValue
}


// Insert a key-value pair into the hash table
func insert(mp [](*Node), key string, value string) {

    // TODO: check load factor and resize

    index := hash(key, len(mp))  // Get the hash index
    head := mp[index]  // Get the head of the linked list at that index

    // Check if the key already exists in the linked list
    current := head
    for current != nil {
        if current.Key == key {
            current.Value = value  // Update the value if key exists
            return
        }
        current = current.Next
    }

    // If key does not exist, insert a new node at the head of the list
    newNode := &Node{Key: key, Value: value, Hash: "", Next: head}
    mp[index] = newNode  // Insert the new node
}


// Fetch the value associated with a given key
func fetch(mp [](*Node), key string) (string, bool) {
    index := hash(key, len(mp))  // Get the hash index
    current := mp[index]  // Get the head of the linked list at that index

    // Traverse the linked list at the index
    for current != nil {
        if current.Key == key {
            return current.Value, true  // Return the value if the key is found
        }
        current = current.Next
    }

    // If key is not found, return empty string and false
    return "", false
}


func resize(mp [](*Node)) [](*Node) {
    newSize := len(mp) * 2  // Double the size of the table
    newMp := make([](*Node), newSize)

    // Rehash all nodes into the new table
    for _, head := range mp {
        current := head
        for current != nil {
            insert(newMp, current.Key, current.Value)  // Reinsert into the new table
            current = current.Next
        }
    }
    return newMp
}

Hash Data Structures

Intro

Hash function

Collisions

Load Factor and Resizing

Implementation of HashMap

Subscribe to my newsletter

Aswin Benny

Aswin Benny