Data structures

As a senior backend engineer, which data structures must I know?

All of them. And then some. I mean, it’s not really feasible to know everything about everything, but this is kind of what is expected from a senior backend developer. So we do the next best thing, which is to learn as much as humanly possible. The good thing is that you end up working with a lot of these on a daily basis, so you’ll learn a lot about it. The bad thing is that you won’t work with most of them regularly, so it’s easy to forget about them. That’s when a list of the most important data structures can come in handy :)

What are the most important data structures then?

The concept of data structure can be reused across programming languages. Some languages will even provide most structures by default. I’d say the most important ones are (in order of importance according to my personal opinion and background in Java):

  1. Array

  2. List

  3. Map

  4. Set

  5. Queue

  6. Tree

  7. Stack

  8. Graph

Arrays x Lists x… Arraylist?

In Java we have the great ArrayList, which mixes two different concepts and makes everyone crazy (okay, maybe not everyone, but this is confusing). The way I approach this is to understand very clearly the difference between an array and a list. I see the array as a very simple space. It’s like a box that you create and that you will put items there. Like this: [ ], [ ], [ ]. It’s a contiguous memory space where you will add elements of the same type. Or like a box where you can only put the same items inside. The main point is that it has a fixed size and the elements are exactly where you put them. So, if your box can fit 3 items, you can put the elements in position 0, 1 or 2. And you can access them exactly where you put them.

Cool, what about the list? Well, I see a list more like that snake from the game. It starts very small, but then it eats some more elements and all of a sudden it’s very big and hard to manage. The list data structure is like this… you have a starting element, and when you want to add a new element you create this new element in a random place, and then you go to the starting element and tell it where the new element is. So if you want to find the second element, you need to go to the first, elements and check where the second element is. The third element will be the same thing.

And the ArrayList is a list, but internally it uses an array. It uses the good characteristics of an array, like accessing the items real fast because you just jump directly to its position, while also providing the list functionality that allows the data structure to grow. To do that, the ArrayList creates an initial array of fixed size, like 10 elements, and once it’s full it creates a new array, with double the size, copies everything in there and, like magic, it can now handle more items. Pretty nifty, huh?

When should I use an Array and when should I use a List?

Arrays have a fixed size. So, if you are handling data that grows and shrinks you might face some trouble there. If you need to grow your array, you can create a new (bigger) one and copy the old one into that. This is a bit of overhead if you implement it on your own. On the other hand, a list can grow or shrink easily, as it is built into its definition. All you need is one element and a reference to the next element. It’s pretty fast to insert a new element at the beginning or end of a list. But if you need to access an object in the middle of a list, you need to traverse the list until you find that object, while an array you have a very fast access time if you just need to get the item in the specified position directly.

In general, an array is pretty good when you have a fixed size data set that you need to retrieve elements from the middle frequently. A list is good when the data set grows and shrinks dynamically, elements are added to the beginning or end and there isn’t a lot of retrieval of elements in the middle.

Should I go for ArrayList or for LinkedList then?

I would say that in most real life scenarios you would be fine using the one you are most comfortable with. Of course, there are efficiency considerations to be made, but these only make sense when you are developing an application that handles millions of operations in a short period. Most of the time you write a method that will hold 10 to 100 records and do some operation with them. The performance increase/decrease you’ll get isn’t that relevant in this case. With that in mind, if you are indeed working in an application that will handle a lot of data, then choosing the correct structure will have an impact. The main thing you need to keep in mind is how each structure handles adding, removing, and accessing records.

Both are lists with a difference in how they are built internally. An ArrayList is internally an array, which grows if it reaches the initial capacity. A LinkedList is internally an element that points to the next element (and the previous one). So if I’m looking for efficiency retrieving elements by index, the arraylist come out on top. On the other hand, if I’m looking for efficiency adding items to the beginning of the list, linkedlist will perfom better. In most cases, ArrayList will be the preferred choice, but if your use case involves frequent insertions and deletions at the ends or in the middle, LinkedList may be the better option.

In summary:

  • Arrays: Fixed-size, high-performance random access.

  • ArrayList: Dynamic size with efficient retrieval.

  • LinkedList: Optimal for frequent insertions/deletions at beginning/end of list.

0
Subscribe to my newsletter

Read articles from Just Another Dev directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Just Another Dev
Just Another Dev