Are Your Database Queries Costing The Business Time and Revenue?

Godwin LawalGodwin Lawal
6 min read

Why Should You Care

The performance of a business website or application can determine if a company gains or loses customers. Customer expectations are rising, so every part, including the database design, must be optimized to meet these expectations.

For junior backend engineers, the difference between junior and senior roles often involves attention to detail, like asking: “Why is there a delay in data retrieval?”

The insights in this article can boost your professional standing and reduce negative feedback or job insecurity.

Common signs of performance issues in an application

The following inexhaustive list of symptoms indicates performance issues in an enterprise application.

  • Users Complaints

  • Slow page loading times

  • Error Messages

  • High churn rate

  • The interface is not responsive on a mobile device.

Several factors could contribute to the symptoms listed above. However, the main focus of this article is the fixes that can be carried out on the backend side of the infrastructure, as the database queries retrieving the data, if not painstakingly scripted, can also contribute to this side effect.

Identifying Inefficient Database Queries

This is where we go deeper. If you're not technically inclined, please bear with me as this is how I communicate with engineers.

My Engineers would say:

“But I don’t write raw SQL... I use ORMs.”

Yes, ORMs often work well for us. However, in an Agile environment, we frequently change the product based on user feedback. Some changes may need custom queries that ORMs can't handle, allowing us to embed custom scripts.

Manual processes can lead to errors and inefficiencies. That's why we're here to learn how to spot inefficient queries.

Here are some examples of poorly scripted queries:

  • Wasteful use of the SELECT statement: This is illustrated below

      -- A script that retrieves all records in the bookings table
      SELECT * FROM bookings
      -- --------------------------------------
      -- This query also retreives all columns each row fetched.
      -- Thus, increasing the required bandwidth for the interfaced applications
    
  • The use of Subqueries - While subqueries can be powerful, they hurt code readability. Each nested query becomes a nested loop, which can be inefficient as per Big O notation O(n^m).

    Example:

      -- A script that retrieves properties with average rating > 4.0 using a non-subquery:
      SELECT 
          p.property_id, 
          p.name, 
          p.location
      FROM 
          properties AS p
      WHERE 
          (SELECT AVG(rating) 
           FROM reviews AS r 
           WHERE r.property_id = p.property_id) > 4.0;
    
      -- This correlated subquery retreives the users who have made over 3 bookings on our platform:
      SELECT 
          u.user_id, 
          CONCAT(u.first_name, " ", u.last_name) AS `name`,
          u.email
      FROM 
          users AS u
      WHERE 
          (SELECT COUNT(*) 
           FROM bookings AS b 
           WHERE b.user_id = u.user_id) > 3;
    
  • Filtering a large table with a non-indexed column slows down queries. Indexed columns speed up search queries.

  • Large tables not partitioned can cause slow responses due to the extra work needed to fetch data.

How To Optimize SQL Database Queries

  1. Analyze Query Performance - Back to our bookings table

    Assuming we only need some columns and not all, we can save bandwidth by running the script below instead:

     -- A script that retrieves all records in the bookings table
     SELECT
         b.bookind_id,
         b.booking_date,
         b.status
     FROM bookings AS b
     -- --------------------------------------
     -- This query only retreives the needed columns each row fetched.
     -- Thus, saving bandwidth
    
  2. Refactor Inefficient Queries - A good place to start with refactoring is replacing all subqueries with JOINS and CTEs. A more readable and efficient option for the subquery above is shown below:

     -- A script that retrieves properties with average rating > 4.0 using a non-subquery:
     SELECT 
         p.property_id, 
         p.name, 
         p.location
      FROM properties AS p
     INNER JOIN reviews AS r 
         ON r.property_id = p.property_id)
     WHERE r.rating > 4.0;
    
     -- Both correlated and Non-correlated subquery can be improved using CTEs and Windows Fuction which is a topic for another day.
    
  3. Optimize Database Schema - Pay more attention to both the conceptual and logical design phases of the Database:

    Establish a foreign key relationship with related tables using the same data types for columns to improve JOINs operation, Normalise the data from First Normal Form(1NF) to at least Third Normal Form(3NF), Make use of the data types that meet the minimum storage requirement for data (Use CHAR data type where Possible Instead of VARCHAR)
  4. Continuously Monitor and Iterate - Database query optimization is an iterative process. For insights into how effective your applied optimization techniques are, you need to analyze the query before and after optimization. There are a plethora of tools at your disposal, which is beyond the scope of this article.

    The ones worthy of note in the MYSQL world are and EXPLAIN ANALYZE and `EXPLAIN`

    An example of the basic usage of these tools is shown below:

     -- Inspect the query to identify inefficiency
     -- EXPLAIN
     EXPLAIN ANALYZE
     SELECT 
         b.booking_id,
         b.start_date,
         b.end_date,
         b.total_price,
         b.status AS booking_status,
         b.created_at AS booking_created_at,
         u.user_id AS guest_id,
         CONCAT(u.first_name, ' ', u.last_name) AS guest_name,
         u.email AS guest_email,
         p.property_id,
         p.name AS property_name,
         p.location AS property_location,
         p.pricepernight,
         p.host_id,
         CONCAT(h.first_name, ' ', h.last_name) AS host_name,
         pay.payment_id,
         pay.amount AS payment_amount,
         pay.payment_date,
         pay.payment_method
     FROM 
         bookings b
     JOIN 
         users u ON b.user_id = u.user_id
     JOIN 
         properties p ON b.property_id = p.property_id
     JOIN 
         users h ON p.host_id = h.user_id
     LEFT JOIN 
         payments pay ON b.booking_id = pay.booking_id
     WHERE 
         b.status = 'confirmed' 
         AND b.start_date >= '2024-01-01' 
         AND b.end_date <= '2024-12-31'
     ORDER BY 
         b.created_at DESC\G;
    
-- Refactor the query to reduce execution time, by using indexing.

CREATE INDEX idx_booking_user ON bookings(user_id);
CREATE INDEX idx_booking_property ON bookings(property_id);
CREATE INDEX idx_booking_filter ON bookings(status, start_date, end_date);
CREATE INDEX idx_payment_booking ON payments(booking_id);
CREATE INDEX idx_property_host ON properties(host_id);


EXPLAIN ANALYZE
    SELECT 
        b.booking_id,
        b.start_date,
        b.end_date,
        b.total_price,
        b.status AS booking_status,
        b.created_at AS booking_created_at,
        u.user_id AS guest_id,
        CONCAT(u.first_name, ' ', u.last_name) AS guest_name,
        u.email AS guest_email,
        p.property_id,
        p.name AS property_name,
        p.location AS property_location,
        pay.payment_id,
        pay.amount AS payment_amount,
        pay.payment_date,
        pay.payment_method
    FROM 
        bookings b
    JOIN 
        users u ON b.user_id = u.user_id
    JOIN 
        properties p ON b.property_id = p.property_id
    LEFT JOIN 
        payments pay ON b.booking_id = pay.booking_id
    WHERE 
        b.status = 'confirmed'
        AND b.start_date >= '2024-01-01'
        AND b.end_date <= '2024-12-31'
    ORDER BY 
        b.created_at DESC;

The output of the EXPLAIN command gives a rough estimate of the number of rows to be traversed, the query type(An indicator of how the table was looked up), and more. More realistic and extensive information is provided by the output of the EXPLAIN ANALYZE command

Best Practices for SQL Query Optimization

  • Proper query structuring - Partitioning frequently accessed large tables based on an indexed column is also a good practice.

  • Indexing strategies: A good candidate column for indexing is the column used with the WHERE, JOIN, GROUP BY and aggregate functions

  • Use of query optimization tools and software: Two popular ones are EXPLAIN and EXPLAIN ANALYZE

  • Regularly updating and maintaining the database

  • Training for developers on efficient query practices

  • Lastly, never underestimate the power of that Database Administrator guy close to you. Consult him for better query analysis and recommendations.

What’s Next

Performance monitoring is not a one-off exercise, as the goal is to continuously monitor and refine database performance. Iteratively analyze query execution plans and make schema adjustments based on needs by following best practices.

0
Subscribe to my newsletter

Read articles from Godwin Lawal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Godwin Lawal
Godwin Lawal

Godwin is an experienced professional with a strong background in Software Engineering, specializing in Database management, API design, and Implementation, Software delivery, and System Automation