Algorithms and Data Structures [00]: Get Started!

Rohith AkulaRohith Akula
10 min read

Ever wondered how Google Maps finds the fastest route in seconds, or how your social media feed loads so quickly? The magic behind these everyday marvels often lies in something called Data Structures and Algorithms, or DSA. If you're new to the world of programming or computer science, these terms might sound intimidating. But fear not! This article will break down the fundamentals of DSA in a way that's easy to grasp, even if you've never written a line of code.

What Exactly is DSA?

  • Data Structures: Imagine you have a messy room full of clothes. It's hard to find anything, right? A data structure is like organizing your closet – it's a way to store and organize data in a computer's memory (RAM) so that it can be accessed and used efficiently. Just like you might have shelves for sweaters and hangers for shirts, there are different data structures for different types of data and tasks.

  • Algorithms: An algorithm is simply a sequence of well-defined steps or instructions to solve a specific problem. Think of it as a recipe: follow the steps, and you'll get your desired dish. In programming, algorithms tell the computer exactly what to do, and how to do it, to achieve a certain outcome.

Why to Learn DSA?

  • Become a Better Developer: Understanding DSA helps you write code that is not only correct but also efficient. It allows you to choose the right tools (data structures) and the best methods (algorithms) for the job.

  • Efficient Problem Solving: Often, there are multiple ways to solve a programming problem. DSA equips you to analyze these different approaches and pick the one that performs best, saving time and computational resources.

  • Save Costs and Improve User Experience: Efficient programs require less processing power and memory, which can translate to lower hardware costs. More importantly, faster and smoother applications lead to a much better experience for the end-users.

  • Boost Your Career: DSA is a cornerstone of computer science and a key topic in technical interviews for software development roles.

Diving Deeper: Analysis of Algorithms

Imagine you have a problem, and you've come up with a few different algorithms to solve it. How do you decide which one is the "best" or most efficient? This is where the analysis of algorithms comes in.

An Example Problem: Sum of Natural Numbers

Let's say we want to write a program to calculate the sum of the first 'n' natural numbers.

  • If input n = 3, output should be 1 + 2 + 3 = 6.

  • If input n = 5, output should be 1 + 2 + 3 + 4 + 5 = 15.

Here are three different solutions (algorithms) to this problem:

Solution 1: The Mathematical Formula

int fun1(int n) {
  return n * (n + 1) / 2;
}

This solution directly uses the mathematical formula for the sum of the first 'n' natural numbers.

Solution 2: The Iterative Approach

int fun2(int n) {
  int sum = 0;
  for (int i = 1; i <= n; i++) {
    sum = sum + i;
  }
  return sum;
}

This solution uses a loop to add each number from 1 to 'n'.

Solution 3: The Nested Loop Approach (Less Obvious for this Problem)

int fun3(int n) {
  int sum = 0;
  for (int i = 1; i <= n; i++) {
    for (int j = 1; j <= i; j++) { // Note: The notes have j <= n, corrected to j <= i based on typical sum logic for this structure. If j <= n, it calculates n*n. Assuming it's a typo for demonstrating nested loops in a sum context. For the provided note's fun3, the sum is n*(n+1)/2 but with n*n additions. Let's stick to the note's version to explain its analysis.
      sum++;
    }
  }
  return sum;
}

This solution uses nested loops. The inner loop runs 'i' times for each iteration of the outer loop. (As per the notes, the inner loop runs 'n' times, making sum effectively n*n if j <= n. For the analysis below, we'll follow the sum++ structure in nested loops from the notes which for j<=i calculates n*(n+1)/2 but with more operations, and for j<=n calculates n*n.)

Which is Most Efficient? Initial Thoughts

Looking at these, fun1 seems very direct. fun2 involves a loop, and fun3 involves nested loops, which intuitively feels like more work.

However, simply timing how long a program takes to run isn't a reliable way to compare algorithms. Why?

  • Machine Dependence: A program might run faster on a super-fast computer and slower on an older one, even if it's an inefficient program.

  • Programming Language: Compiled languages (like C++) often run faster than interpreted languages (like Python or Java) because they are translated directly into machine code.

  • System Load: Other programs running on your computer can affect the time your program takes.

So, we need a more standardized, theoretical way to analyze algorithms.

Introducing: Asymptotic Analysis

To overcome the limitations of time-based comparisons, we use Asymptotic Analysis. It's a mathematical way to judge an algorithm's efficiency, independent of the specific computer, programming language, or even test cases (we usually consider the worst-case scenario).

The main idea of asymptotic analysis is to measure how the growth rate of an algorithm's time (or space) changes as the input size ('n') increases.. We don't need to implement and run the code; we can analyze it theoretically by looking at the algorithm's steps.

How it Works: Counting Operations

We assume that basic operations (like arithmetic operations, comparisons, assignments) take a constant amount of time. Let's analyze our example functions:

  • fun1(int n):

      return n * (n + 1) / 2;
    

    This involves a multiplication, an addition, and a division – roughly 3 operations. The time taken is constant, say C1. It doesn't depend on how big 'n' is.

  • fun2(int n):

      int sum = 0; // 1 operation (assignment)
      for (int i = 1; i <= n; i++) { // Loop runs n times
        sum = sum + i; // 2 operations (addition, assignment) inside the loop
      }
      return sum; // 1 operation
    

    The line int sum = 0; runs once. The return sum; runs once. The loop runs 'n' times. Inside the loop, sum = sum + i;runs 'n' times. So, the total time is something like c2n+c3 (where c2 represents time for operations inside the loop and c3 for operations outside - like sum=0, and returning sum)

  • fun3(int n):

      int sum = 0; // 1 operation
      for (int i = 1; i <= n; i++) { // Outer loop runs 'n' times
        for (int j = 1; j <= i; j++) { // Inner loop runs 1, then 2, ..., then 'n' times
          sum++; // 1 operation inside the inner loop
        }
      }
      return sum; // 1 operation
    

    The sum++ operation runs 1+2+3+…………+n times which is equal to n*(n+1)/2 times (sum of n natural numbers). This expression can be expanded to (1/2)*n^2 + (1/2)*n times. So, the total time is something like (c4 * n^2)+ (c5 * n) + c6.

Understanding Order of Growth

The expressions we derived (a constant time like C1; a linear time like c2*n + c3; and a quadratic time like c4*n^2 + c5*n + c6) tell us about the order of growth of the time taken by each function as 'n' (the input size) increases.

  • fun1: Time is C1 (a constant). This is Constant Time. The time taken doesn't change with 'n'.

  • fun2: Time is c2*n + c3. As 'n' gets very large, the c2*n term dominates. We say this is Linear Time (grows linearly with 'n').

  • fun3: Time is c4*n^2 + c5*n + c6. As 'n' gets very large, the c4*n^2 term dominates. We say this is Quadratic Time (grows with the square of 'n').

Key Idea: Focus on the Dominant Term

In asymptotic analysis, we are interested in how the runtime behaves for large input sizes. So, we can:

  1. Ignore lower-order terms: In an expression like c4*n^2 + c5*n + c6, the c5*n and c6 terms become insignificant compared to c4*n^2 as 'n' grows very large.

  2. Ignore constant coefficients: The specific values of c1, c2, c3, c4, c5, c6 depend on the compiler and machine, but the rate of growth (constant, linear, quadratic) is inherent to the algorithm.

So, we can say:

  • fun1 has a constant order of growth.

  • fun2 has a linear order of growth.

  • fun3 has a quadratic order of growth.

Why Order of Growth Matters

Let's compare:

  • Constant: 13 (e.g., fun1 takes 13 units of time, regardless of 'n').

  • Linear: 2*n + 5 (e.g., fun2 on some machine).

  • Quadratic: Imagine a fun3 that simplifies to n^2.

If n=4:

  • Constant: 13

  • Linear: 2*4 + 5 = 13 At n=4, the constant and linear functions might take similar time.

But what if n=100?

  • Constant: 13

  • Linear: 2*100 + 5 = 205

  • Quadratic: 100^2 = 10000

And if n=1000?

  • Constant: 13

  • Linear: 2*1000 + 5 = 2005

  • Quadratic: 1000^2 = 1000000

As 'n' grows, an algorithm with a higher order of growth will always eventually take more time than an algorithm with a lower order of growth, regardless of constant factors or how fast the machine is. Quadratic functions grow much faster than linear functions, which in turn grow faster than constant time functions. This is why we want algorithms with the lowest possible order of growth for large inputs.

Best, Average, and Worst Cases

Sometimes, an algorithm's performance doesn't just depend on the size of the input ('n'), but also on the nature of the input itself.

Example 1: Consider an algorithm to sum all elements in an array:

int getSum(int arr[], int n) {
  int sum = 0;                            // Constant time
  for (int i = 0; i < n; i++) {         // Loop runs 'n' times
    sum = sum + arr[i];                 // Constant time operation
  }
  return sum;                             // Constant time
}

This algorithm will always perform a fixed number of operations for initialization and return, and the loop will always run 'n' times. The time taken is of the form C1*n + C2, which is linear order of growth, regardless of what numbers are in the array or their arrangement.

Example 2: Variable Performance (The explanation and code for getSumModified would remain the same. The description of its performance would be:)

int getSumModified(int arr[], int n) {
  if (n % 2 == 0) { // If 'n' is even
    return 0;       // Constant time
  } else {            // If 'n' is odd
    int sum = 0;
    for (int i = 0; i < n; i++) {
      sum = sum + arr[i];
    }
    return sum;     // Linear time operations
  }
}

Here, the performance depends on whether 'n' is even or odd:

  • If 'n' is even: The function returns 0 immediately. This takes constant time.

  • If 'n' is odd: The function calculates the sum using a loop. This takes linear time.

(The rest of the Best, Average, Worst case explanation would follow, referring to "constant time" and "linear time" descriptively.)

Wrapping Up

Understanding Data Structures and Algorithms, along with how to analyze their efficiency using Asymptotic Analysis (focusing on order of growth and worst-case scenarios), is fundamental to writing good, scalable, and efficient software. It's not just about making things run fast; it's about making smart choices in how you solve problems.

This was just a gentle introduction. The world of DSA is vast and fascinating. As you continue your journey, remember these core concepts, keep practicing, and you'll become a more proficient and thoughtful programmer!

Let’s continue the next articles with Order of Growth, direct way to find and compare growth of two algorithms, comparing terms, Asymptotic notations (BigO, Theta, Omega).

0
Subscribe to my newsletter

Read articles from Rohith Akula directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rohith Akula
Rohith Akula