Magical Merchandise: Analyzing Sales Data with PostgreSQL CTEs

Steve MeckmanSteve Meckman
13 min read

As data wizards know, the real magic happens not with wands and potions, but with powerful SQL queries. Today, I'll break down an enchanting PostgreSQL query that uses Common Table Expressions (CTEs) to transform raw sales data into quarterly and annual insights for a magical merchandise shop.

We are going to start out with a table that presents data like this:

   id   |    date    |        item         | sold 
--------+------------+---------------------+------
     56 | 2024-05-17 | Flying Carpets      |   28
    556 | 2024-09-29 | Invisibility Cloaks |   34
   1031 | 2025-01-17 | Magic Wands         |   73
...

and use a single query to turn it into a report that looks like this:

order |     label     |    date    |  balls  | bottles | carpets | cloaks  |  wands  
-------+---------------+------------+---------+---------+---------+---------+---------
     1 | q1 details    | 2024-03-23 |   18788 |   18441 |   18353 |   19421 |   18761
...
     2 | q1 totals     |            |  166617 |  165033 |  164572 |  167103 |  168202
     3 | q2 details    | 2024-04-01 |   18694 |   18974 |   18204 |   18270 |   18891
...
     4 | q2 totals     |            | 1703406 | 1697453 | 1698596 | 1696089 | 1690337
     5 | q3 details    | 2024-07-01 |   18149 |   19056 |   17847 |   18151 |   18466
...
     6 | q3 totals     |            | 1713328 | 1717496 | 1714871 | 1708107 | 1710860
     7 | q4 details    | 2024-10-01 |   19727 |   18224 |   18786 |   18316 |   19204
...
     8 | q4 totals     |            | 1716334 | 1709330 | 1705079 | 1711495 | 1706742
     9 | annual totals |            | 5299685 | 5289312 | 5283118 | 5282794 | 5276141

While it may at first look large and cumbersome, this query will create an entire report in a single query, using SQL to do what often becomes multiple database calls and iterating through collections in code to calculate aggregates. With proper indexing, the report can be created extremely quickly. This article will walk you through every section of the query, and hopefully help you see the patterns and help you come up with ways you can use some of these strategies in your own environment.

Getting Set Up

First let’s get a table set up. These examples are using PostgreSQL, and although CTEs are featured by all major RDBMS vendors, there may be some minor variations in the SQL code to make it work on other platforms.

CREATE TABLE sales (
    id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    date date NOT NULL,
    item text NOT NULL,
    sold integer NOT NULL
);

Additional resources

Here is a link to a compressed CSV file containing sample data that can be used for this exercise. Use your favorite tool to load it into the table, for example with PSQL you can simply do:

\copy sales FROM ./sales_data.csv DELIMITER ',' CSV HEADER

Once the data is loaded in the table, you’re ready to go.

The Query's Purpose

This SQL query analyzes sales data for five magical items:

  • Crystal Balls

  • Potion Bottles

  • Flying Carpets

  • Invisibility Cloaks

  • Magic Wands

It calculates daily, quarterly, and annual sales figures for 2024, presenting the data in a structured report.

Understanding CTEs: SQL's Secret Spell

Common Table Expressions (CTEs) act like temporary tables that exist only for the duration of the query. They make complex queries more readable by breaking them into logical sections, much like how a complex spell might be broken down into manageable incantations.

Step-by-Step Query Breakdown

1. Getting a List of Distinct Dates

First we want to obtain a list of all the dates for which there are sales figures. This could easily be done with SELECT DISTINCT:

WITH
dates AS (
    SELECT DISTINCT date
    FROM sales
    WHERE date >= '2024-01-01' AND date < '2025-01-01'
),

However SELECT DISTINCT is a very expensive operation and one that should be avoided if possible. Instead, the same range can be done with the PostgreSQL generate_series function. Other RDBMS vendors may provide similar functions, but the implementation may differ.

dates AS (
    SELECT t.date::date
    FROM generate_series(
        (SELECT MIN(date) FROM sales),
        (SELECT MAX(date) FROM sales),
        interval '1 day')
    AS t(date)

This CTE returns a list of all the dates in the year range. It will be used as a source of truth for every available date that there are sales numbers available for.

2. Gathering Item-Specific Daily Sales

The next five CTEs collect daily sales data for each product:

WITH balls AS (
    SELECT date, SUM(sold) AS daily_sales
    FROM sales
    WHERE item = 'Crystal Balls'
    GROUP BY date
),
/* Similar CTEs for bottles, carpets, cloaks, and wands */

Each CTE filters the sales table for a specific item and calculates daily sales totals.

3. Combining Daily Sales Across Products

daily_sales AS (
    SELECT d.date AS date, ba.daily_sales AS balls, bo.daily_sales AS bottles, ca.daily_sales AS carpets,
        cl.daily_sales AS cloaks, w.daily_sales AS wands
    FROM dates d
    LEFT JOIN balls ba ON ba.date = d.date
    LEFT JOIN bottles bo ON bo.date = d.date
    LEFT JOIN carpets ca ON ca.date = d.date
    LEFT JOIN cloaks cl ON cl.date = d.date
    LEFT JOIN wands w ON w.date = d.date
),

This CTE creates a consolidated view with one row per date and columns for sales of each product. The LEFT JOIN ensures we include all dates, including those where some products may not have had sales recorded.

Removing the comma from the line above the start of this CTE, and removing the CTE definition (daily_sales as ( and accompanying closing parenthesis at the end, yields the following result set:

    date    | balls | bottles | carpets | cloaks | wands 
------------+-------+---------+---------+--------+-------
 2024-03-23 | 18788 |   18441 |   18353 |  19421 | 18761
 2024-03-24 | 18915 |   18901 |   18370 |  18078 | 18951
 2024-03-25 | 17855 |   18741 |   18124 |  18867 | 18846
 2024-03-26 | 18732 |   17657 |   17669 |  17694 | 18329
 2024-03-27 | 18050 |   16989 |   18728 |  17356 | 18357
 2024-03-28 | 18520 |   19171 |   18162 |  18493 | 19038
 2024-03-29 | 18094 |   18251 |   18642 |  19380 | 19012
 2024-03-30 | 19143 |   18676 |   17133 |  18562 | 18254
 2024-03-31 | 18520 |   18206 |   19391 |  19252 | 18654
 2024-04-01 | 18694 |   18974 |   18204 |  18270 | 18891
 2024-04-02 | 18782 |   19615 |   18751 |  18398 | 18219
 2024-04-03 | 18129 |   19284 |   18899 |  18890 | 18679
...

You have taken vertical data where every item was on an individual column, and turned it horizonal, giving each item its own column.

4. Calculating Quarterly Totals

q1_sales AS (
    SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, 
           SUM(carpets) AS carpets, SUM(cloaks) AS cloaks, SUM(wands) AS wands
    FROM daily_sales
    WHERE date >= '2024-01-01' AND date < '2024-04-01'
),
/* Similar CTEs for Q2, Q3, and Q4 */

These CTEs calculate the total sales for each product within each quarter of 2024.

5. Calculating Annual Totals

annual_sales AS (
    SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, 
           SUM(carpets) AS carpets, SUM(cloaks) AS cloaks, SUM(wands) AS wands
    FROM daily_sales
    WHERE date >= '2024-01-01' AND date < '2025-01-01'
)

This CTE calculates the total sales for each product across the entire year.

6. Producing the Final Report

SELECT 1, 'q1 details', date, balls, bottles, carpets, cloaks, wands
    FROM daily_sales
    WHERE date >= '2024-01-01' AND date < '2024-04-01'
UNION ALL
SELECT 2, 'q1 totals', null, balls, bottles, carpets, cloaks, wands
    FROM q1_sales
/* Similar SELECT statements for other quarters and annual totals */

The final part of the query uses UNION ALL to combine:

  • Daily sales details for each quarter

  • Quarterly summary totals

  • Annual totals

The first column (1, 2, 3, etc.) creates a sort order, while the second column adds descriptive labels.

The Full Query

Here is the full query in all its glory:

WITH
dates AS (
    SELECT t.date::date
    FROM generate_series(
        (SELECT MIN(date) FROM sales),
        (SELECT MAX(date) FROM sales),
        interval '1 day')
    AS t(date)
),
balls AS (
    SELECT date, SUM(sold) AS daily_sales
    FROM sales
    WHERE item = 'Crystal Balls'
    AND date >= '2024-01-01' AND date < '2025-01-01'
    GROUP BY date
),
bottles AS (
    SELECT date, SUM(sold) AS daily_sales
    FROM sales
    WHERE item = 'Potion Bottles'
    AND date >= '2024-01-01' AND date < '2025-01-01'
    GROUP BY date
),
carpets AS (
    SELECT date, SUM(sold) AS daily_sales
    FROM sales
    WHERE item = 'Flying Carpets'
    AND date >= '2024-01-01' AND date < '2025-01-01'
    GROUP BY date
),
cloaks AS (
    SELECT date, SUM(sold) AS daily_sales
    FROM sales
    WHERE item = 'Invisibility Cloaks'
    AND date >= '2024-01-01' AND date < '2025-01-01'
    GROUP BY date
),
wands AS (
    SELECT date, SUM(sold) AS daily_sales
    FROM sales
    WHERE item = 'Magic Wands'
    AND date >= '2024-01-01' AND date < '2025-01-01'
    GROUP BY date
),
daily_sales AS (
    SELECT d.date AS date, ba.daily_sales AS balls, bo.daily_sales AS bottles, ca.daily_sales AS carpets,
        cl.daily_sales AS cloaks, w.daily_sales AS wands
    FROM dates d
    LEFT JOIN balls ba ON ba.date = d.date
    LEFT JOIN bottles bo ON bo.date = d.date
    LEFT JOIN carpets ca ON ca.date = d.date
    LEFT JOIN cloaks cl ON cl.date = d.date
    LEFT JOIN wands w ON w.date = d.date
),
q1_sales AS (
    SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
        SUM(cloaks) AS cloaks, SUM(wands) AS wands
    FROM daily_sales
    WHERE date >= '2024-01-01' AND date < '2024-04-01'
),
q2_sales AS (
    SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
        SUM(cloaks) AS cloaks, SUM(wands) AS wands
    FROM daily_sales
    WHERE date >= '2024-04-01' AND date < '2024-07-01'
),
q3_sales AS (
    SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
        SUM(cloaks) AS cloaks, SUM(wands) AS wands
    FROM daily_sales
    WHERE date >= '2024-07-01' AND date < '2024-10-01'
),
q4_sales AS (
    SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
        SUM(cloaks) AS cloaks, SUM(wands) AS wands
    FROM daily_sales
    WHERE date >= '2024-10-01' AND date < '2025-01-01'
),
annual_sales AS (
    SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
        SUM(cloaks) AS cloaks, SUM(wands) AS wands
    FROM daily_sales
    WHERE date >= '2024-01-01' AND date < '2025-01-01'
)
SELECT 1 AS order, 'q1 details' AS label, date, balls, bottles, carpets, cloaks, wands
    FROM daily_sales
    WHERE date >= '2024-01-01' AND date < '2024-04-01'
UNION ALL
SELECT 2 AS order, 'q1 totals' AS label, null, balls, bottles, carpets, cloaks, wands
    FROM q1_sales
UNION ALL
SELECT 3 AS order, 'q2 details' AS label, date, balls, bottles, carpets, cloaks, wands
    FROM daily_sales
    WHERE date >= '2024-04-01' AND date < '2024-07-01'
UNION ALL
SELECT 4 AS order, 'q2 totals' AS label, null, balls, bottles, carpets, cloaks, wands
    FROM q2_sales
UNION ALL
SELECT 5 AS order, 'q3 details' AS label, date, balls, bottles, carpets, cloaks, wands
    FROM daily_sales
    WHERE date >= '2024-07-01' AND date < '2024-10-01'
UNION ALL
SELECT 6 AS order, 'q3 totals' AS label, null, balls, bottles, carpets, cloaks, wands
    FROM q3_sales
UNION ALL
SELECT 7 AS order, 'q4 details' AS label, date, balls, bottles, carpets, cloaks, wands
    FROM daily_sales
    WHERE date >= '2024-10-01' AND date < '2025-01-01'
UNION ALL
SELECT 8 AS order, 'q4 totals' AS label, null, balls, bottles, carpets, cloaks, wands
    FROM q4_sales
UNION ALL
SELECT 9 AS order, 'annual totals' AS label, null, balls, bottles, carpets, cloaks, wands
    FROM annual_sales
;

And here is the result set it yields:

order |     label     |    date    |  balls  | bottles | carpets | cloaks  |  wands  
-------+---------------+------------+---------+---------+---------+---------+---------
     1 | q1 details    | 2024-03-23 |   18788 |   18441 |   18353 |   19421 |   18761
     1 | q1 details    | 2024-03-24 |   18915 |   18901 |   18370 |   18078 |   18951
     1 | q1 details    | 2024-03-25 |   17855 |   18741 |   18124 |   18867 |   18846
...
     1 | q1 details    | 2024-03-29 |   18094 |   18251 |   18642 |   19380 |   19012
     1 | q1 details    | 2024-03-30 |   19143 |   18676 |   17133 |   18562 |   18254
     1 | q1 details    | 2024-03-31 |   18520 |   18206 |   19391 |   19252 |   18654
     2 | q1 totals     |            |  166617 |  165033 |  164572 |  167103 |  168202
     3 | q2 details    | 2024-04-01 |   18694 |   18974 |   18204 |   18270 |   18891
     3 | q2 details    | 2024-04-02 |   18782 |   19615 |   18751 |   18398 |   18219
     3 | q2 details    | 2024-04-03 |   18129 |   19284 |   18899 |   18890 |   18679
...
     3 | q2 details    | 2024-06-28 |   18468 |   19168 |   18357 |   19566 |   17682
     3 | q2 details    | 2024-06-29 |   18601 |   18812 |   17888 |   19691 |   18614
     3 | q2 details    | 2024-06-30 |   18956 |   18633 |   18368 |   19517 |   17905
     4 | q2 totals     |            | 1703406 | 1697453 | 1698596 | 1696089 | 1690337
     5 | q3 details    | 2024-07-01 |   18149 |   19056 |   17847 |   18151 |   18466
     5 | q3 details    | 2024-07-02 |   18880 |   18118 |   17997 |   19045 |   18357
     5 | q3 details    | 2024-07-03 |   19764 |   18375 |   18685 |   19531 |   18106
...
     5 | q3 details    | 2024-09-28 |   19428 |   17799 |   18383 |   18848 |   18317
     5 | q3 details    | 2024-09-29 |   17926 |   19523 |   18332 |   16948 |   18225
     5 | q3 details    | 2024-09-30 |   17798 |   19186 |   18618 |   17637 |   18864
     6 | q3 totals     |            | 1713328 | 1717496 | 1714871 | 1708107 | 1710860
     7 | q4 details    | 2024-10-01 |   19727 |   18224 |   18786 |   18316 |   19204
     7 | q4 details    | 2024-10-02 |   18485 |   19082 |   18736 |   18077 |   18339
     7 | q4 details    | 2024-10-03 |   19107 |   18576 |   18504 |   19477 |   18026
...
     7 | q4 details    | 2024-12-29 |   17895 |   18489 |   18194 |   18457 |   19344
     7 | q4 details    | 2024-12-30 |   17740 |   19918 |   18230 |   18650 |   18533
     7 | q4 details    | 2024-12-31 |   18786 |   18871 |   18440 |   18704 |   18210
     8 | q4 totals     |            | 1716334 | 1709330 | 1705079 | 1711495 | 1706742
     9 | annual totals |            | 5299685 | 5289312 | 5283118 | 5282794 | 5276141

Why This Approach Works Wonders

This query structure offers several advantages:

  1. Modularity: Each CTE handles a specific calculation, making the query easier to understand and maintain.

  2. Readability: Breaking the complex logic into smaller pieces makes the query flow logically.

  3. Efficiency: PostgreSQL can optimize CTEs, potentially improving performance.

  4. Reusability: The intermediate CTEs can be referenced multiple times.

Practical Applications

This query pattern is ideal for:

  • Creating sales reports by product, time period, or category

  • Analyzing seasonal trends in product performance

  • Preparing data for visualization dashboards

  • Generating quarterly business reviews

Optimization

Using Explain and Analyze to obtain information that can help minimize the query’s running time is the subject of another post, but even just looking at what the CTEs are doing, it becomes clear that the query would benefit from indexing both date, and item and date as a composite index.

CREATE INDEX sales_idx_date ON "sales" ("date");
CREATE INDEX sales_idx_item_date ON "sales" ("item","date");

Because indexes are expensive, which ones to use or whether to even use them at all depends on the size of the dataset, the frequency that it gets added to, and other factors.

The index on the item and date columns can improve the performance of the previous query for several reasons:

  • Filtering Efficiency: The query involves filtering the sales table by specific item values (e.g., 'Crystal Balls', 'Potion Bottles', etc.) and grouping by date. An index on these columns allows the database to quickly locate the relevant rows for each item and date combination, reducing the amount of data that needs to be scanned.

  • Join Optimization: The query uses multiple CTEs that join on the date column. The index on date will speed up these joins by allowing the database to efficiently match rows based on the date.

  • Aggregation Speed: The query performs aggregation operations (e.g., SUM(sold)) grouped by date. The index on date helps the database quickly access and group the relevant rows, improving the speed of these operations.

  • Reduced I/O: By narrowing down the search space, the index reduces the amount of data that needs to be read from disk, leading to faster query execution times.

Overall, the index helps the database engine to quickly locate and process the relevant data, which is especially beneficial for large datasets like the one in the sales table.

Conclusion

Common Table Expressions are a powerful tool for transforming raw data into meaningful business insights. This query demonstrates how complex reporting requirements can be met with a well-structured SQL approach.

Whether you're tracking magical merchandise or mundane products, the principles remain the same: break down complex problems into manageable pieces, maintain a logical flow, and let your SQL do the heavy lifting.

0
Subscribe to my newsletter

Read articles from Steve Meckman directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Steve Meckman
Steve Meckman