java hashCode

Introduction

When creating a class in Java, there are three methods that are commonly overridden:

equals()
hashCode()
toString()

This article focuses on the hashCode() method.

How a hash table works

Java uses the value returned by the hashCode() method to determine the position of an object in a hash table. But what is a hash table ?

Hash tables are data structures designed to store and retrieve elements efficiently, avoiding the sequential order of traditional arrays.

Imagine a table with 19 slots. The position (or index) of each value in the hash table is determined by a function such as:

@Override
public int hashCode() {
    return value % tableSize;
}

A common and simple way to define the hashCode() method is to return the remainder of the division between the value to be stored and the table size.

For example, to store the value 20:

@Override
public int hashCode() {
    return 20 % 19;  // remainder of 20 / 19
}

To store the value 21:

@Override
public int hashCode() {
    return 21 % 19;
}

So, the value 20 is stored at index 1, also called first bucket; the value 21 at index 2, and so on.

This approach ensures that the hash code is always within the bounds of the table, distributing the objects across the available slots.

For example, if the table size is 19 and we need to store the first sixty natural numbers, the hash table might look like this:

graph TD
    subgraph Hash Table
        slot0["Slot 0"]
        slot1["Slot 1"]
        slot2["Slot 2"]
        slot3["Slot 3"]
        slot4["Slot 4"]
    end

    slot0 --> A1["19"]
    A1 --> A2["39"]
    A2 --> A3["59"]

    slot1 --> B1["20"]
    B1 --> B2["40"]
    B2 --> B3["60"]

    slot2 --> C1["21"]
    C1 --> C2["41"]

    slot3 --> D1["22"]
    D1 --> D2["42"]

    slot4 --> E1["23"]
    E1 --> E2["43"]

The first two slots has an associated linked list of three elements and the rest of the slots have linked lists of two elements.

Using prime numbers for the hash functions

We chose 19 as the table size because it is a prime number. The possible remainders (hash values) are all the integers from 0 up to 18, inclusive. This means every input value will be mapped to one of 19 possible slots. Choosing a prime number as the table size ensures that the hash values are distributed uniformly across the slots, minimizing collisions.

Collisions

A collision occurs when more than one value is mapped to the same slot or bucket.

The solution, as shown in the previous example, is using a linked list for each slot (a technique called separate chaining). This way, multiple values can coexist in the same bucket, and the hash table can still function correctly. However, the performance can degrade if too many collisions occur.

Other considerations for hash functions

Prime numbers are often used when the hash function is simple, but there are reasons why they are not always chosen:

Computational performance Many modern implementations prefer table sizes that are powers of 2 (e.g., 32, 64, 128) because they allow calculating the hash index using bitwise operations, which are faster than using the modulo operation. If the hash function distributes values well, primes may not be necessary.
Dynamic resizing Some resizing strategies (like doubling the table size) produce sizes that are powers of 2, not primes. This strategy simplifies memory management and reduces computational overhead.

Commonly used hash functions

The pattern of multiplying by 31 and combining field values is widely used and recommended in the Java community.

The use of 31 as a multiplier is standard because it is an odd prime, which helps produce a good distribution of hash codes.
This approach combines the values of all the class fields in a way that reduces the likelihood of collisions.
The general pattern is:

public int hashCode() {
    int result = 1;
    result = 31 * result + field1;
    result = 31 * result + field2;
    // ... and so on for more fields
    return result;
}

Since Java 8, the Objects.hashCode() method can also be used to generate hash codes:

@Override
public int hashCode() {
    return Objects.hash(firstField, secondField, ..., nField);
}

This method is more concise and automatically combines the hash codes of the provided fields.
Internally, it creates an array of the arguments and then calls Arrays.hashCode() on that array.
It handles null values gracefully, returning 0 for any null field.

Which one to use

Objects.hash() improves readability and reduces boilerplate, especially for classes with many fields.

Manual hash code gives you more control and may be preferred in performance-sensitive or very large-scale scenarios. Read this Spanish article for a very interesting case where a customized redefinition of hashCode() fixed a poor performance issue.

Hash Tables in the Java Collections Framework

The Java Collections Framework uses hash tables in several classes:

HashMap: The most widely used implementation of the Map interface, using a hash table internally.
HashSet: Implements the Set interface and is backed by a HashMap, using the hash table mechanism to ensure uniqueness of elements.

The hash code determines the bucket or slot where the object is stored in these hash-based colllections.

Equals() and hashCode()

If two objects are considered equal by the equals() method, they must have the same hash code as returned by hashCode(). This ensures that equal objects can be found in the same bucket in hash-based collections.
If two objects have the same hash code, they are not necessarily equal. Hash collisions can occur, so equals() is used as a second check to confirm actual equality.
If you override equals(), you must also override hashCode(). Failing to do so can lead to unexpected behavior in collections, such as not being able to retrieve objects that were added earlier.

Conclusion

The data structure called Hash Table was briefly introduced in this article. The most used strategies for hash functions minimize collisions and improve performance. Finally, the relationship between hashCode() and equals() is crucial, as they are used together to ensure correct behavior in hash-based collections.

Hash Tables and hashCode()

Table of contents