Demystifying Big-O Notation: Algorithm Efficiency

During the first two years of my undergraduate years, I never really understood computational complexities, even after learning about it in my Data Structures and Algorithms class in the second year, it remained a foggy concept. Maybe that’s why I ended up with a solid B in the course - the most painful B I got (still haven’t forgiven myself). It was not until my third year, while studying for a project, that I truly grasped what all those O(n), O(n²), and O(log n) terms meant. And trust me, when the penny finally dropped, it changed the way I have approached algorithmic inference and, by extension, writing codes.

While most lecturers and tutors approach this topic as abstract and uninteresting, computational complexity is more than just an abstract theory; it's the distinction between an algorithm that executes in seconds and one that could take hours or days. In this brief post, I will look into the essence of computational complexity, aiming to integrate fundamental concepts with real-world examples to help clarify the concepts for me.

What is computational complexity, really?

Let’s start with an analogy. imagine you are at a theme park with a queue to ride a rollercoaster; you are trying to figure out how long it will take to get on the ride. Oftentimes, this depends on:

The number of people in the queue is significant.
How fast does the rollercoaster take people on board?

This is a good analogy for computational complexity, as it is all about how long it takes an algorithm (the rollercoaster) to process the input (the people in the queue). Hence, from this, we can deduce that:

if the rollercoaster can handle one person at a time, it will take as long as the number of people in the queue. This is called the linear time and it is written as O(n). More on this later.
But if the rollercoaster takes multiple people at once, maybe halves the queue each time, it is much faster. This could be something like logarithmic time, written as O(log n).

Thus, succintly put:

💡

Computational complexity measures the resources required (such as time or memory) by an algorithm to solve a problem. Typically, we express it as a function of the input size, n, to illustrate the growth of the resource requirement as the input size increases.

To measure this, we need to take into account, two technical details:

Time complexity refers to how the runtime of an algorithm increases as n increases.
Space complexity: How the memory usage of an algorithm grows with n.

For example, an algorithm that requires storing every item in a list in memory will have a higher space complexity that one that processes items one at a time. These two factors often trade off against each other- the most efficient algorithms balance both.

Big-O Notation

In computer science, we often like to complicate things - but only because it is fun and helps us think in more abstractly, after all, where is the joy in understanding something complex without appreciating the beauty of its intrinsic patterns?

On the more technical side of things, the O in Big-O stands for “order”, reflecting the asymptotic behaviour of a function as its input size approaches infinity. Hence, in plain English, it tells us how the runtime or space requirements of algorithm grow relative to the size of the input.

Big-O does not care about the exactruntime in seconds (this often depends on the hardware and software); it instead focuses on the growth trend,for example, does the runtime double when the input doubles? or does the runtime increase slightly when the input doubles?

Big-O Notation Classifications

So, we can say Big-O notation is more than just mathematical formality; it is a powerful tool for understanding the efficiency of algorithms across a spectrum of possibilities. Fom lightning-fast constant time operations to painfully slow crawl of exponential growth, Big-O gives us a common language to classify and compare algorithms. Here, we explore the most common classification of Big-O notation, looking at what they mean and how they can shape the way we approach problem-solving in CS.

Constant Time - O(1)

In this case, an O(1) algorithm does not care about the input size. It always takes the same amount of time. A classic example is accessing an element in an array by its index. No matter how large the array is, this operation is instantaneous because you don’t need to traverse the entire array or perform any calculations on the other elements. The index acts like a direct pointer to the location in memory, giving you immediate access—hence the constant “1” in O(1). This is what makes O(1) operations so desirable: their runtime doesn’t grow with the size of the input. Here is how it looks in the code:
```
 def get_element(input_array, index_position):
     return input_array[index_position]
```
In the above example, no matter how large arr is, the operation of retrieving an element at a specific index will always take the same amount of time. Truth is an O(1) operation is a dream, no matter how large the input is, the runtime remains the same.
Linear Time - O(n)

An algorithm with O(n) time complexity grows linearly with the size of the input. This means that if the input doubles, the runtime will also double - although not as fast O(1), but it is predictable and manageable. Thus, when it comes to O(n), things start to scale proportionally. A common example of O(n) is finding the maximum value in a list, to find the maximum, you would have to look at every single element in the list, from the very start to the last element in the list. This means the time taken will increase linearly as the size of the list grows.

Here is a code implementation of maximum number search:
```
 def maximum_num_search(input_array):
     maximum_number = input_array[0]
     for number in input_array:
         if number > maximum_number: 
             maximum_number = number
     return maximum_number
```
In this example, the loop iterates through all n elements of arr. Such that if you have 10 elements, it will take 10 steps; while 10,000 elements will take 10,000 steps. This is the very essence of linear time- growing in lockstep with the input size. A real life example in would be in ML where you calculate the loss for a batch of data during training- the time required scales linearly with the batch size since each sample is processed independently. For example computing mean squred error loss over n predictions:
```
 loss = sum((y_prediction - y_groundtruth) ** 2) / n
```
Quadratic Time - O(n²)

Things start to get less efficient in quadratic time. Algorithms with O(n²) involve nested loops (a loop in a loop), meaning for every element in the input, the algo will process every other element, by extension the runtime grows exponentially as the input size increases. A good example would be checking all pairs of elements in an array, where for each element the algorithm compares it with every other element, leadingto n x n = n² iterations.
```
 def check_pairs(input_array):
     for i in range(len(input_arry)):
         for j in range(len(input_array)):
             print(input_array[i],  input_arry[j])
```
likewise, if input_array has 10 elements, the code will run 100 iterations. for 1000 elements, it will run 1,000, 000 iterations. As you can imagine, this quickly becomes impractical for large datasets. For example, calculating the pairwise distance matrix in clustering or dimensionality reduction techniques like t-SNE. Such as given n data points, you compute distances for all n² pairs.
```
 from sklearn.metrics.pairwise import euclidean_distances
 distances = euclidean_distances(X, X)
```
Although O(n²) algorithms can sometimes be unavoidable, you should generally be considered a red flag for scalability.
Logarithmic Time - O(log n)

Let’s get a little clever. This time complexity represents algorithms that grow very slowly, even as the input size increases significantly. This is often achieved by repeatedly dividing the problem into smaller chunks, rather than processing the entire input. Imagine you are searching for a word in a dictionary., you don’t start from the first page and go one by one, instead, you flip to the middle, check if the word is earlier or later alphabetically, and eliminate half of the book in one step. Repeat until you find the work.
```
 def binary_search(input_array, target):
     left, right = 0, len(input_array)-1
     while left <= right:
         mid = (left + right) // 2
         if input_array[mid] == target:
             return mid
         elif input_array[mid] < target:
             left = mid + 1
         else:
             right = mid - 1
```
This is a binary search in action. Binary search is an ideal example- here the input (often a sorted list) is halved at each step until you get the target value. this approach reduces the number of steps required to get to the final solution as it takes the divide-and-conquer strategy. in this case, if input_array has 1000 elements, binary search might only take about 10 steps to find the target. For 1,000,000 elements, it might take just 20 steps - this right here is the power of logarithmic time - ti scales gracefully, making it a favourite for algorithms dealing with large datasets. in Data Science, for instance, we can see logarithmic time at play, such as in searching for a specific value in a sorted dataset using binary search:
```
 import bisect
 index = bisect.bisect_left(sorted_list, target)
```
Exponential Time: O(2^n)

Exponential time is the real villain of algorithm efficiency, and trust me you don’t want to mess with it, especially when dealing with streams of data! These algorithms grows so quickly that even for a small increase, the input size can lead to astronomically high runtimes. They are usually a last resort for problems one can’t solve efficiently. A good example would trying to solve the Tower of Hanoi problem recursively - it is like trying to compound your efforts: for each additional disk, the work required doubles, leading to unnecessary exponential growth. The solution might feel elegant in theory, but in practice, it can quickly become computationally infeasible.

Tower of Hanoi: The Exponential Monster

The Tower of Hanoi problem involves moving nnn disks from one rod to another, following these rules:
1. Only one disk can be moved at a time.
2. A larger disk cannot be placed on top of a smaller disk.
3. An auxiliary rod can be used as a temporary holding space.

For nnn disks, the minimum number of moves required is 2n−12^n - 12n−1. Here’s how it grows:

For 3 disks: 2³− 1 = 7 moves.
For 10 disks 2^10 - 1 = 1023 moves
For 20 disks: 2²0 - 1 = 1048575 moves

    def tower_of_hanoi(n, source, target, auxiliary):
        if n == 1:
            print(f"Move disk 1 from {source} to {target}")
            return
        tower_of_hanoi(n - 1, source, auxiliary, target)
        print(f"Move disk {n} from {source} to {target}")
        tower_of_hanoi(n - 1, auxiliary, target, source)  # O(2^n)

With each additional disk, the runtime doubles, making this algorithm impractical for anything but small inputs. A good real life scenario would be trying to generate all possible transformations of an object in augmented reality, considering every possible rotation and scale for matching against a template in a scene. If you allow 10 possible rotations, 10 scales, and 10 translations, the total combinations explode exponentially, making this an exponentially crazy thing to do! 🙃

Exponential time algorithms are like pouring petrol on a fire: the problem size grows out of control with each added element. While they may be unavoidable in some theoretical or NP-hard problems, again, they’re computationally expensive and typically unsuitable for real-world applications.

Linearithmic Time - O(n log n)

Finally, 𝑂 ( 𝑛 log ⁡ 𝑛 ) is a happy medium between linear and logarithmic time, striking a balance of efficiency for more complex problems. It’s commonly seen in divide-and-conquer algorithms, like merge sort and quick sort. A classic example is merge sort, where the list is repeatedly split in half (logarithmic) and then recombined in sorted order (linear).

 def merge_sort(arr):
     if len(arr) > 1:
         mid = len(arr) // 2
         left = arr[:mid]
         right = arr[mid:]

         merge_sort(left)
         merge_sort(right)

         i = j = k = 0
         while i < len(left) and j < len(right):
             if left[i] < right[j]:
                 arr[k] = left[i]
                 i += 1
             else:
                 arr[k] = right[j]
                 j += 1
             k += 1

         while i < len(left):
             arr[k] = left[i]
             i += 1
             k += 1

         while j < len(right):
             arr[k] = right[j]
             j += 1
             k += 1

     return arr  # O(n log n)

For a dataset of size 𝑛 n, merge sort splits it into log ⁡ 𝑛 logn levels, with each level requiring 𝑛 n operations to merge. This makes 𝑂 ( 𝑛 log ⁡ 𝑛 ) algorithms highly efficient for sorting and other large-scale tasks.

Although we love fancy and complicated terms in CS, we embrace them for a reason—they embody powerful concepts that tend to guide us in building efficiency and scalable systems. These terms are not academic jargon, they are the very backbone of smart problem-solving.

So, the next time you’re writing code—whether it’s something you plan to push to production, scale to millions of users, showcase as a proof of concept, or highlight in a research paper—pause and think about its computational complexity. Will your algorithm gracefully handle a growing workload? Does it scale efficiently? As your understanding of these intricacies can be the difference between something that works and something that thrives - there is a big difference!

Write smart, scale boldly, and let the science behind your algorithms shine!

Demystifying Big-O notation

What is computational complexity, really?

Big-O Notation

Big-O Notation Classifications

Subscribe to my newsletter

Samuel Adebayo

Samuel Adebayo