Picking the Best Data Structure and Algorithm for Your Needs.

In today’s dispensation, applications have gone far above mere CRUD operations. People worldwide have become more involves in the tech space as everyone especially starting from 2019 with the COVID-19 outbreak. There are theories, stipulating that the human attention as drastically dropped to 8 seconds but according to my research, there are experts that don’t seem to know where these numbers stem from. However, hear me out, How do you feel whenever an application takes forever to load a response you expect as soon as possible? In my case, I’ll easily get distracted and move away to the next distracting thing and probably even forget about that application for a moment. So, it feels true (considering how distracted we are with constant info, ads, notifications, etc.)
What does this mean to Backend Engineers?
They are now faced with diverse problems— from search/indexing and sorting to caching, load balancing, scheduling, graph processing, rate limiting, streaming, deduplication, and recommendation systems. Billions of users and depending on the kind of applications mean super high volumes of data to be processed. How do they store them(Data Structures)? How do they manipulate them in a way that solves the problem efficiently and make the business grow, be outstanding and bring value?
How then do we go about choosing the right algorithm for the problem?
Let us imagine we are need to build a caching mechanism for a backend service. I am using caching here because well, in today’s fast-paced world, speed and efficiency are crucial for any application and caching brings massive assistance. We need an API gateway that caches user session data with limited memory. In order to maximize cache hit-rate given this workload, we first need to analyze usage patterns (e.g. are recently-used sessions most likely reused?).
Understanding your Problem.
This goes without saying that finding or determining the right data structure or algorithm for a specific problem requires a deeper understanding of the problem. You need to ask questions that will help you gain context and clarity. What is the problem exactly? what is the objective? what then are the inputs? what are the constraints?
These questions lead us to actually breaking the problem down into smaller and manageable parts.By clearly defining the task (cache for session data, capacity limits, access patterns), we immediately focus on appropriate algorithm types (eviction strategies) and data structures and avoid irrelevant options.
Detailed Evaluation of Options
Once the problem is defined, survey candidate algorithms or paradigms that could solve it. Understanding this leads us to candidate data structures such as LRU (Least Recently Used) Cache, LFU (Least Frequently Used) Cache, or FIFO (first-in-first-out). As such how do these algorithms differ?LRU Cache (Hash + Doubly-Linked List): Evicts least-recently used item. Offers O(1) access and update with a hash+list. Well-suited if recency predicts future access (common in web caches). However, it uses extra memory for the list and hash (≈O(n) space)
LFU Cache: Evicts least-frequently used item. Good if some items are accessed repeatedly over time, but usually more complex to maintain (often implemented with multiple lists or a priority queue) and not strictly O(1) per operation.
FIFO (first-in-first-out): Objects are added to the queue and are evicted with the same order. Even though it provides a simple and low-cost method to manage the cache but even the most used objects are eventually evicted when they're old enough.
You should further familiarize yourself with common algorithmic paradigms such as brute force, divide and conquer, dynamic programming, and greedy algorithms. This also plays a vital role in which data structures would make solving the problem more and more ideal.
Comparison and evaluation
At this point and with a list of potential algorithms in hand, we need to establish criteria for evaluation. This involves considering factors such as time complexity, which is how the execution time of an algorithm increases with the size of the input data, and space complexity, which is the amount of memory an algorithm uses to run. Other factors might include the readability and maintainability of the algorithm, its scalability, and how well it handles edge cases. We need to prioritize these criteria based on the specific needs of our problem.Optimization and Enhancement
The last step to finding the right data structure or algorithm is to refactor and improve our solution. We can apply various techniques and principles to optimize and enhance our code quality, its readability, maintainability, and its reusability. Some examples of such improvements are modularization, abstraction, naming conventions, documentation, comments, and coding standards. We can also use code review and feedback to get insights and suggestions from other developers which is why being in a developer communities and sharing your progress and findings is important.Here’s something else I deem important to consider.
These things do not just come to us out of the blue. Lack of intuition leads to not knowing the right questions to ask and therefore missing the important constraints of a problem. The best way to solve for that is to go through many problems. Once we learn about various problems, existing solutions in the space (e.g. existing data structures, how they differ in performance, their assumptions... etc.), we will be able to ask the right questions to understand the problem at hand.
Here are my points of how to choose the right data structure and/or algorithm for your problem. What else would you like to add?
Subscribe to my newsletter
Read articles from Ngoran Aristide directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
