Magical Merchandise: Analyzing Sales Data with PostgreSQL CTEs

As data wizards know, the real magic happens not with wands and potions, but with powerful SQL queries. Today, I'll break down an enchanting PostgreSQL query that uses Common Table Expressions (CTEs) to transform raw sales data into quarterly and annual insights for a magical merchandise shop.
We are going to start out with a table that presents data like this:
id | date | item | sold
--------+------------+---------------------+------
56 | 2024-05-17 | Flying Carpets | 28
556 | 2024-09-29 | Invisibility Cloaks | 34
1031 | 2025-01-17 | Magic Wands | 73
...
and use a single query to turn it into a report that looks like this:
order | label | date | balls | bottles | carpets | cloaks | wands
-------+---------------+------------+---------+---------+---------+---------+---------
1 | q1 details | 2024-03-23 | 18788 | 18441 | 18353 | 19421 | 18761
...
2 | q1 totals | | 166617 | 165033 | 164572 | 167103 | 168202
3 | q2 details | 2024-04-01 | 18694 | 18974 | 18204 | 18270 | 18891
...
4 | q2 totals | | 1703406 | 1697453 | 1698596 | 1696089 | 1690337
5 | q3 details | 2024-07-01 | 18149 | 19056 | 17847 | 18151 | 18466
...
6 | q3 totals | | 1713328 | 1717496 | 1714871 | 1708107 | 1710860
7 | q4 details | 2024-10-01 | 19727 | 18224 | 18786 | 18316 | 19204
...
8 | q4 totals | | 1716334 | 1709330 | 1705079 | 1711495 | 1706742
9 | annual totals | | 5299685 | 5289312 | 5283118 | 5282794 | 5276141
While it may at first look large and cumbersome, this query will create an entire report in a single query, using SQL to do what often becomes multiple database calls and iterating through collections in code to calculate aggregates. With proper indexing, the report can be created extremely quickly. This article will walk you through every section of the query, and hopefully help you see the patterns and help you come up with ways you can use some of these strategies in your own environment.
Getting Set Up
First let’s get a table set up. These examples are using PostgreSQL, and although CTEs are featured by all major RDBMS vendors, there may be some minor variations in the SQL code to make it work on other platforms.
CREATE TABLE sales (
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
date date NOT NULL,
item text NOT NULL,
sold integer NOT NULL
);
Additional resources
Here is a link to a compressed CSV file containing sample data that can be used for this exercise. Use your favorite tool to load it into the table, for example with PSQL you can simply do:
\copy sales FROM ./sales_data.csv DELIMITER ',' CSV HEADER
Once the data is loaded in the table, you’re ready to go.
The Query's Purpose
This SQL query analyzes sales data for five magical items:
Crystal Balls
Potion Bottles
Flying Carpets
Invisibility Cloaks
Magic Wands
It calculates daily, quarterly, and annual sales figures for 2024, presenting the data in a structured report.
Understanding CTEs: SQL's Secret Spell
Common Table Expressions (CTEs) act like temporary tables that exist only for the duration of the query. They make complex queries more readable by breaking them into logical sections, much like how a complex spell might be broken down into manageable incantations.
Step-by-Step Query Breakdown
1. Getting a List of Distinct Dates
First we want to obtain a list of all the dates for which there are sales figures. This could easily be done with SELECT DISTINCT
:
WITH
dates AS (
SELECT DISTINCT date
FROM sales
WHERE date >= '2024-01-01' AND date < '2025-01-01'
),
However SELECT DISTINCT
is a very expensive operation and one that should be avoided if possible. Instead, the same range can be done with the PostgreSQL generate_series
function. Other RDBMS vendors may provide similar functions, but the implementation may differ.
dates AS (
SELECT t.date::date
FROM generate_series(
(SELECT MIN(date) FROM sales),
(SELECT MAX(date) FROM sales),
interval '1 day')
AS t(date)
This CTE returns a list of all the dates in the year range. It will be used as a source of truth for every available date that there are sales numbers available for.
2. Gathering Item-Specific Daily Sales
The next five CTEs collect daily sales data for each product:
WITH balls AS (
SELECT date, SUM(sold) AS daily_sales
FROM sales
WHERE item = 'Crystal Balls'
GROUP BY date
),
/* Similar CTEs for bottles, carpets, cloaks, and wands */
Each CTE filters the sales table for a specific item and calculates daily sales totals.
3. Combining Daily Sales Across Products
daily_sales AS (
SELECT d.date AS date, ba.daily_sales AS balls, bo.daily_sales AS bottles, ca.daily_sales AS carpets,
cl.daily_sales AS cloaks, w.daily_sales AS wands
FROM dates d
LEFT JOIN balls ba ON ba.date = d.date
LEFT JOIN bottles bo ON bo.date = d.date
LEFT JOIN carpets ca ON ca.date = d.date
LEFT JOIN cloaks cl ON cl.date = d.date
LEFT JOIN wands w ON w.date = d.date
),
This CTE creates a consolidated view with one row per date and columns for sales of each product. The LEFT JOIN
ensures we include all dates, including those where some products may not have had sales recorded.
Removing the comma from the line above the start of this CTE, and removing the CTE definition (daily_sales as (
and accompanying closing parenthesis at the end, yields the following result set:
date | balls | bottles | carpets | cloaks | wands
------------+-------+---------+---------+--------+-------
2024-03-23 | 18788 | 18441 | 18353 | 19421 | 18761
2024-03-24 | 18915 | 18901 | 18370 | 18078 | 18951
2024-03-25 | 17855 | 18741 | 18124 | 18867 | 18846
2024-03-26 | 18732 | 17657 | 17669 | 17694 | 18329
2024-03-27 | 18050 | 16989 | 18728 | 17356 | 18357
2024-03-28 | 18520 | 19171 | 18162 | 18493 | 19038
2024-03-29 | 18094 | 18251 | 18642 | 19380 | 19012
2024-03-30 | 19143 | 18676 | 17133 | 18562 | 18254
2024-03-31 | 18520 | 18206 | 19391 | 19252 | 18654
2024-04-01 | 18694 | 18974 | 18204 | 18270 | 18891
2024-04-02 | 18782 | 19615 | 18751 | 18398 | 18219
2024-04-03 | 18129 | 19284 | 18899 | 18890 | 18679
...
You have taken vertical data where every item was on an individual column, and turned it horizonal, giving each item its own column.
4. Calculating Quarterly Totals
q1_sales AS (
SELECT SUM(balls) AS balls, SUM(bottles) AS bottles,
SUM(carpets) AS carpets, SUM(cloaks) AS cloaks, SUM(wands) AS wands
FROM daily_sales
WHERE date >= '2024-01-01' AND date < '2024-04-01'
),
/* Similar CTEs for Q2, Q3, and Q4 */
These CTEs calculate the total sales for each product within each quarter of 2024.
5. Calculating Annual Totals
annual_sales AS (
SELECT SUM(balls) AS balls, SUM(bottles) AS bottles,
SUM(carpets) AS carpets, SUM(cloaks) AS cloaks, SUM(wands) AS wands
FROM daily_sales
WHERE date >= '2024-01-01' AND date < '2025-01-01'
)
This CTE calculates the total sales for each product across the entire year.
6. Producing the Final Report
SELECT 1, 'q1 details', date, balls, bottles, carpets, cloaks, wands
FROM daily_sales
WHERE date >= '2024-01-01' AND date < '2024-04-01'
UNION ALL
SELECT 2, 'q1 totals', null, balls, bottles, carpets, cloaks, wands
FROM q1_sales
/* Similar SELECT statements for other quarters and annual totals */
The final part of the query uses UNION ALL
to combine:
Daily sales details for each quarter
Quarterly summary totals
Annual totals
The first column (1, 2, 3, etc.) creates a sort order, while the second column adds descriptive labels.
The Full Query
Here is the full query in all its glory:
WITH
dates AS (
SELECT t.date::date
FROM generate_series(
(SELECT MIN(date) FROM sales),
(SELECT MAX(date) FROM sales),
interval '1 day')
AS t(date)
),
balls AS (
SELECT date, SUM(sold) AS daily_sales
FROM sales
WHERE item = 'Crystal Balls'
AND date >= '2024-01-01' AND date < '2025-01-01'
GROUP BY date
),
bottles AS (
SELECT date, SUM(sold) AS daily_sales
FROM sales
WHERE item = 'Potion Bottles'
AND date >= '2024-01-01' AND date < '2025-01-01'
GROUP BY date
),
carpets AS (
SELECT date, SUM(sold) AS daily_sales
FROM sales
WHERE item = 'Flying Carpets'
AND date >= '2024-01-01' AND date < '2025-01-01'
GROUP BY date
),
cloaks AS (
SELECT date, SUM(sold) AS daily_sales
FROM sales
WHERE item = 'Invisibility Cloaks'
AND date >= '2024-01-01' AND date < '2025-01-01'
GROUP BY date
),
wands AS (
SELECT date, SUM(sold) AS daily_sales
FROM sales
WHERE item = 'Magic Wands'
AND date >= '2024-01-01' AND date < '2025-01-01'
GROUP BY date
),
daily_sales AS (
SELECT d.date AS date, ba.daily_sales AS balls, bo.daily_sales AS bottles, ca.daily_sales AS carpets,
cl.daily_sales AS cloaks, w.daily_sales AS wands
FROM dates d
LEFT JOIN balls ba ON ba.date = d.date
LEFT JOIN bottles bo ON bo.date = d.date
LEFT JOIN carpets ca ON ca.date = d.date
LEFT JOIN cloaks cl ON cl.date = d.date
LEFT JOIN wands w ON w.date = d.date
),
q1_sales AS (
SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
SUM(cloaks) AS cloaks, SUM(wands) AS wands
FROM daily_sales
WHERE date >= '2024-01-01' AND date < '2024-04-01'
),
q2_sales AS (
SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
SUM(cloaks) AS cloaks, SUM(wands) AS wands
FROM daily_sales
WHERE date >= '2024-04-01' AND date < '2024-07-01'
),
q3_sales AS (
SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
SUM(cloaks) AS cloaks, SUM(wands) AS wands
FROM daily_sales
WHERE date >= '2024-07-01' AND date < '2024-10-01'
),
q4_sales AS (
SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
SUM(cloaks) AS cloaks, SUM(wands) AS wands
FROM daily_sales
WHERE date >= '2024-10-01' AND date < '2025-01-01'
),
annual_sales AS (
SELECT SUM(balls) AS balls, SUM(bottles) AS bottles, SUM(carpets) AS carpets,
SUM(cloaks) AS cloaks, SUM(wands) AS wands
FROM daily_sales
WHERE date >= '2024-01-01' AND date < '2025-01-01'
)
SELECT 1 AS order, 'q1 details' AS label, date, balls, bottles, carpets, cloaks, wands
FROM daily_sales
WHERE date >= '2024-01-01' AND date < '2024-04-01'
UNION ALL
SELECT 2 AS order, 'q1 totals' AS label, null, balls, bottles, carpets, cloaks, wands
FROM q1_sales
UNION ALL
SELECT 3 AS order, 'q2 details' AS label, date, balls, bottles, carpets, cloaks, wands
FROM daily_sales
WHERE date >= '2024-04-01' AND date < '2024-07-01'
UNION ALL
SELECT 4 AS order, 'q2 totals' AS label, null, balls, bottles, carpets, cloaks, wands
FROM q2_sales
UNION ALL
SELECT 5 AS order, 'q3 details' AS label, date, balls, bottles, carpets, cloaks, wands
FROM daily_sales
WHERE date >= '2024-07-01' AND date < '2024-10-01'
UNION ALL
SELECT 6 AS order, 'q3 totals' AS label, null, balls, bottles, carpets, cloaks, wands
FROM q3_sales
UNION ALL
SELECT 7 AS order, 'q4 details' AS label, date, balls, bottles, carpets, cloaks, wands
FROM daily_sales
WHERE date >= '2024-10-01' AND date < '2025-01-01'
UNION ALL
SELECT 8 AS order, 'q4 totals' AS label, null, balls, bottles, carpets, cloaks, wands
FROM q4_sales
UNION ALL
SELECT 9 AS order, 'annual totals' AS label, null, balls, bottles, carpets, cloaks, wands
FROM annual_sales
;
And here is the result set it yields:
order | label | date | balls | bottles | carpets | cloaks | wands
-------+---------------+------------+---------+---------+---------+---------+---------
1 | q1 details | 2024-03-23 | 18788 | 18441 | 18353 | 19421 | 18761
1 | q1 details | 2024-03-24 | 18915 | 18901 | 18370 | 18078 | 18951
1 | q1 details | 2024-03-25 | 17855 | 18741 | 18124 | 18867 | 18846
...
1 | q1 details | 2024-03-29 | 18094 | 18251 | 18642 | 19380 | 19012
1 | q1 details | 2024-03-30 | 19143 | 18676 | 17133 | 18562 | 18254
1 | q1 details | 2024-03-31 | 18520 | 18206 | 19391 | 19252 | 18654
2 | q1 totals | | 166617 | 165033 | 164572 | 167103 | 168202
3 | q2 details | 2024-04-01 | 18694 | 18974 | 18204 | 18270 | 18891
3 | q2 details | 2024-04-02 | 18782 | 19615 | 18751 | 18398 | 18219
3 | q2 details | 2024-04-03 | 18129 | 19284 | 18899 | 18890 | 18679
...
3 | q2 details | 2024-06-28 | 18468 | 19168 | 18357 | 19566 | 17682
3 | q2 details | 2024-06-29 | 18601 | 18812 | 17888 | 19691 | 18614
3 | q2 details | 2024-06-30 | 18956 | 18633 | 18368 | 19517 | 17905
4 | q2 totals | | 1703406 | 1697453 | 1698596 | 1696089 | 1690337
5 | q3 details | 2024-07-01 | 18149 | 19056 | 17847 | 18151 | 18466
5 | q3 details | 2024-07-02 | 18880 | 18118 | 17997 | 19045 | 18357
5 | q3 details | 2024-07-03 | 19764 | 18375 | 18685 | 19531 | 18106
...
5 | q3 details | 2024-09-28 | 19428 | 17799 | 18383 | 18848 | 18317
5 | q3 details | 2024-09-29 | 17926 | 19523 | 18332 | 16948 | 18225
5 | q3 details | 2024-09-30 | 17798 | 19186 | 18618 | 17637 | 18864
6 | q3 totals | | 1713328 | 1717496 | 1714871 | 1708107 | 1710860
7 | q4 details | 2024-10-01 | 19727 | 18224 | 18786 | 18316 | 19204
7 | q4 details | 2024-10-02 | 18485 | 19082 | 18736 | 18077 | 18339
7 | q4 details | 2024-10-03 | 19107 | 18576 | 18504 | 19477 | 18026
...
7 | q4 details | 2024-12-29 | 17895 | 18489 | 18194 | 18457 | 19344
7 | q4 details | 2024-12-30 | 17740 | 19918 | 18230 | 18650 | 18533
7 | q4 details | 2024-12-31 | 18786 | 18871 | 18440 | 18704 | 18210
8 | q4 totals | | 1716334 | 1709330 | 1705079 | 1711495 | 1706742
9 | annual totals | | 5299685 | 5289312 | 5283118 | 5282794 | 5276141
Why This Approach Works Wonders
This query structure offers several advantages:
Modularity: Each CTE handles a specific calculation, making the query easier to understand and maintain.
Readability: Breaking the complex logic into smaller pieces makes the query flow logically.
Efficiency: PostgreSQL can optimize CTEs, potentially improving performance.
Reusability: The intermediate CTEs can be referenced multiple times.
Practical Applications
This query pattern is ideal for:
Creating sales reports by product, time period, or category
Analyzing seasonal trends in product performance
Preparing data for visualization dashboards
Generating quarterly business reviews
Optimization
Using Explain
and Analyze
to obtain information that can help minimize the query’s running time is the subject of another post, but even just looking at what the CTEs are doing, it becomes clear that the query would benefit from indexing both date
, and item
and date
as a composite index.
CREATE INDEX sales_idx_date ON "sales" ("date");
CREATE INDEX sales_idx_item_date ON "sales" ("item","date");
Because indexes are expensive, which ones to use or whether to even use them at all depends on the size of the dataset, the frequency that it gets added to, and other factors.
The index on the item and date columns can improve the performance of the previous query for several reasons:
Filtering Efficiency: The query involves filtering the sales table by specific item values (e.g., 'Crystal Balls', 'Potion Bottles', etc.) and grouping by date. An index on these columns allows the database to quickly locate the relevant rows for each item and date combination, reducing the amount of data that needs to be scanned.
Join Optimization: The query uses multiple CTEs that join on the date column. The index on date will speed up these joins by allowing the database to efficiently match rows based on the date.
Aggregation Speed: The query performs aggregation operations (e.g., SUM(sold)) grouped by date. The index on date helps the database quickly access and group the relevant rows, improving the speed of these operations.
Reduced I/O: By narrowing down the search space, the index reduces the amount of data that needs to be read from disk, leading to faster query execution times.
Overall, the index helps the database engine to quickly locate and process the relevant data, which is especially beneficial for large datasets like the one in the sales table.
Conclusion
Common Table Expressions are a powerful tool for transforming raw data into meaningful business insights. This query demonstrates how complex reporting requirements can be met with a well-structured SQL approach.
Whether you're tracking magical merchandise or mundane products, the principles remain the same: break down complex problems into manageable pieces, maintain a logical flow, and let your SQL do the heavy lifting.
Subscribe to my newsletter
Read articles from Steve Meckman directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
