Core Java Data Structures and their performance implications (Big O Notation)

Mahock AbrahamMahock Abraham
5 min read

Part 1: Understanding Performance with Big O Notation

Before we dive into what the data structures are, we must understand how to measure them. Big O notation is the language we use to describe the efficiency of an algorithm as the input size (n) grows.

Notation

Name

Description

Example

O(1)

Constant

The operation takes the same amount of time, regardless of the input size.

Accessing an element in an array by its index (e.g., myArray[5]).

O(log n)

Logarithmic

The time taken increases logarithmically with the input size. Very efficient.

Finding an element in a sorted array using binary search.

O(n)

Linear

The time taken is directly proportional to the input size.

Iterating through all elements in a List.

O(n log n)

Log-linear

A common complexity for efficient sorting algorithms.

Java's Collections.sort() which uses Timsort.

O(n²)

Quadratic

The time taken is proportional to the square of the input size. Becomes slow quickly.

A naive nested loop to find pairs in a list.

Visualizing Big O Growth:

This chart shows why choosing the right data structure is critical. As the number of items (n) increases, O(n²) algorithms become unusable, while O(1) and O(log n) remain fast.


Part 2: The Java Collections Framework (JCF)

The JCF provides a set of high-performance, reusable data structures. They are primarily built around a few key interfaces.

1. The List Interface

A List is an ordered collection that allows duplicate elements. You can access elements by their integer index.

  • ArrayList

    • Underlying Structure: A dynamic array.

    • Performance:

      • get(index): O(1) - Extremely fast random access.

      • add(element) (at the end): O(1) on average (amortized).

      • add(index, element) / remove(index) (in the middle): O(n) - Slow, as it requires shifting all subsequent elements.

    • When to Use: The default choice for a list. Use it when you need fast, index-based access and do more reading than inserting/deleting from the middle.

  • LinkedList

    • Underlying Structure: A sequence of nodes, where each node holds its data and pointers to the next and previous nodes (a doubly-linked list).

    • Performance:

      • get(index): O(n) - Slow, as it must traverse the list from the beginning or end.

      • add(element) / remove(element) (at beginning or end): O(1) - Extremely fast.

      • add/remove in the middle (with an iterator): O(1).

    • When to Use: Use it when you have a high number of insertions and deletions, especially at the ends of the list. It's ideal for implementing stacks or queues.

2. The Set Interface

A Set is a collection that does not allow duplicate elements. It models the mathematical set abstraction.

  • HashSet

    • Underlying Structure: A hash table (internally, it's a HashMap).

    • Performance: O(1) average time for add(), remove(), and contains(). This is its primary advantage.

    • Ordering: No guaranteed order. The iteration order can change over time.

    • When to Use: When you need blazingly fast lookups (contains()) and don't care about the order of elements. Perfect for quickly checking for the existence of an item.

  • LinkedHashSet

    • Underlying Structure: A hash table combined with a linked list.

    • Performance: Same O(1) performance as HashSet.

    • Ordering: Maintains insertion order. You get the elements back in the order you added them.

    • When to Use: When you need the O(1) performance of a HashSet but also require the elements to be ordered by insertion.

  • TreeSet

    • Underlying Structure: A self-balancing binary search tree (Red-Black Tree).

    • Performance: O(log n) for add(), remove(), and contains(). Slower than HashSet, but still very efficient.

    • Ordering: Maintains sorted order. Elements are ordered according to their natural ordering or a Comparator provided at creation.

    • When to Use: When you need a collection that is always sorted.

3. The Map Interface

A Map is an object that maps unique keys to values.

  • HashMap

    • Underlying Structure: A hash table.

    • Performance: O(1) average time for put(), get(), and remove().

    • Ordering: No guaranteed order.

    • When to Use: The default choice for a map. Use it anytime you need to associate keys with values and don't need to maintain order.

  • LinkedHashMap

    • Underlying Structure: A hash table and a linked list.

    • Performance: Same O(1) performance as HashMap.

    • Ordering: Maintains insertion order.

    • When to Use: Useful for building caches (like an LRU cache) or when you need the speed of a HashMap but require a predictable iteration order.

  • TreeMap

    • Underlying Structure: A Red-Black Tree.

    • Performance: O(log n) for put(), get(), and remove().

    • Ordering: Maintains sorted order based on the keys.

    • When to Use: When you need to retrieve key-value pairs in a sorted order based on the key.


Part 3: Today's Task - Problem Solving

Now, apply this knowledge. The goal is to solve these problems and, more importantly, articulate the choice of data structure and its time complexity.

Problem 1: Two Sum

  • Statement: Given an array of integers nums and an integer target, return the indices of the two numbers such that they add up to target. You may assume that each input would have exactly one solution, and you may not use the same element twice.

  • Example:

    • nums = [2, 7, 11, 15], target = 9

    • Output: [0, 1] (because nums[0] + nums[1] == 9)

  • Your Task:

    1. Think of the O(n²) brute-force solution (a nested loop).

    2. Now, solve it in O(n) time. Hint: As you iterate through the array, what data structure would give you an O(1) lookup to see if you've already encountered the complement (target - current_number)?

Problem 2: Valid Parentheses

  • Statement: Given a string s containing just the characters (, ), {, }, [ and ], determine if the input string is valid. An input string is valid if:

    1. Open brackets must be closed by the same type of brackets.

    2. Open brackets must be closed in the correct order.

  • Example:

    • s = "()[]{}" -> Output: true

    • s = "([)]" -> Output: false

  • Your Task:

    • What are the characteristics of this problem? When you see a closing bracket, you need to check the most recently seen opening bracket. What data structure provides this Last-In, First-Out (LIFO) behavior? Use this data structure to solve the problem in O(n) time.
0
Subscribe to my newsletter

Read articles from Mahock Abraham directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mahock Abraham
Mahock Abraham