Performance Matters: A SQL Optimization Guide for Data Professionals

Table of contents
- Introduction
- Query optimization techniques
- 1. Select only necessary columns
- 2. Implement the correct indexes
- 3. Apply filters early
- 4. Query caching
- 5. JOINs done right
- 6. Use Common Table Expressions (CTEs)
- 7. Materialized Views
- 8. Substitute IN with EXISTS (when appropriate)
- 9. Prioritize Window Functions over Aggregation Functions
- 10. Avoid using LIMIT and OFFSET for pagination
- Using tools to analyze and improve queries
- Final considerations
- Conclusion

Introduction
As data professionals, we live and breathe SQL. Whether you're building dashboards, or developing analytics pipelines, chances are you're constantly interacting with queries that utilize large datasets. And when performance deteriorates, everything downstream suffers — users wait longer, jobs fail, and infrastructure costs spike.
SQL query optimization isn’t just about improving execution time — it's about building scalable, reliable systems that can handle growth sustainably and efficiently. In this post, we’ll go beyond the basics and dive into tactical, reliable strategies to improve your queries' efficiency and make them more robust to changes.
By the end of this article, you’ll learn:
- How to identify poorly written queries
- How to optimize queries
Query optimization techniques
1. Select only necessary columns
🔍 Rule of thumb: Explicitly select only necessary columns.
Using SELECT *
retrieves all columns from a table, loading more data into memory than may be needed. This can lead to inefficient queries by increasing memory usage, I/O operations, and slowing down aggregations and transformations.
Instead, explicitly selecting only the columns required for your analysis reduces resource consumption and improves query performance. It ensures the database engine processes only the essential data, making your queries faster and more efficient.
2. Implement the correct indexes
🔍 Rule of thumb: Create indexes on frequently queried columns.
Indexing in a database involves creating a data structure that stores information about specific columns in a table. These indexes are used to speed up data retrieval by allowing the database engine to locate rows more efficiently — just like the index of a book helps you quickly find the page for a specific topic. Without indexes, the database may need to scan the entire table to find the relevant data (a full table scan), which can be slow for large datasets.
Best practices for creating indexes:
- Index columns used in joins, filters, or sorting.
- Avoid over-indexing, especially on large tables.
- Use partial and multicolumn indexes when appropriate for more targeted performance gains.
Keep in mind that:
- Indexes could slow down write operations (
INSERT
,UPDATE
,DELETE
, etc.) since each index must also be updated as underlying data changes. - Creating indexes require additional storage, which can be significant for large tables or numerous indexes.
- Applying functions to indexed columns can prevent the index from being used efficiently, unless the index is created on the expression itself.
3. Apply filters early
🔍 Rule of thumb: Apply filters as early as possible in your queries (especially in subqueries, common table expressions (CTEs), or joins) to significantly improve performance.
Early filtering reduces the amount of data the database engine needs to process, transfer, and store in intermediate steps.
Benefits of early filtering:
- Reduces memory and CPU consumption.
- Speeds up query execution by narrowing down data early in the pipeline.
- Helps avoid unnecessary computation on irrelevant rows.
💡 Pro tip: Most SQL engines process the query blocks in this logical order:
FROM
(including joins, subqueries, referenced CTEs)WHERE
GROUP BY
HAVING
SELECT
WINDOW FUNCTIONS
QUALIFY
(if supported by the system)ORDER BY
LIMIT/OFFSET
Filtering as early as possible in the evaluation sequence can go a long way to improving your query's performance.
4. Query caching
🔍 Rule of thumb: Store the results of frequently executed queries to avoid repeated computations.
Many databases have built-in support for query caching. When a query is executed, the database stores the result in memory. If the same query is executed again, the database can return the cached result without running the query again.
5. JOINs done right
🔍 Rule of thumb: Start with the most restrictive JOIN
conditions first and avoid including unnecessary tables in your query.
Use restrictive joins early: Begin with the JOIN that filters out the most rows to minimize data processing. This helps limit the number of rows the database needs to process in subsequent operations.
Avoid unnecessary tables: Only include tables that are needed for filtering, joining, or aggregating.
Watch out for join explosion: In many-to-many relationships, be cautious of resulting data growth, which can lead to excessive row multiplication. Always check your JOIN conditions and verify that they don’t unintentionally multiply rows.
6. Use Common Table Expressions (CTEs)
🔍 Rule of thumb: Use CTEs to simplify complex queries and recursive operations, but test performance to ensure they don’t add unnecessary overhead.
When CTEs improve performance
- Simplify complex queries: Break down complicated queries into manageable parts, improving readability and maintenance.
- Avoid repeated calculations: CTEs prevent recalculating the same subquery multiple times.
- Recursive queries: For hierarchical data, recursive CTEs are often more efficient than alternatives like self-joins.
When CTEs might hurt performance
- Materialization: Some databases materialize all CTEs, which can add overhead if the result set is large.
- Limits on optimization: CTEs may prevent certain query optimizations like join reordering or filter pushing.
In short, CTEs can boost performance in the right situations, but it’s important to evaluate their impact on a case-by-case basis.
Check out this series for more on SQL Recursion!
7. Materialized Views
🔍 Rule of thumb: Store the results of expensive queries in materialized views.
Precompute and store the results of expensive queries in a materialized view. Unlike regular views, which are computed every time they are queried, materialized views store the query results physically, allowing faster access.
However, materialized views can add a layer of complexity to your solution, as they need to be periodically refreshed to remain up-to-date.
8. Substitute IN
with EXISTS
(when appropriate)
🔍 Rule of thumb: Use EXISTS
when filtering based on presence of related rows, especially with large or indexed subqueries. Use IN
when dealing with small, static lists.
When filtering based on related rows, using EXISTS
can often lead to better performance than IN
for the following reasons:
EXISTS
short-circuits, it stops scanning as soon as it finds a match.EXISTS
could leverage indexesIN
evaluates the entire subquery result, even if only one match is needed.
9. Prioritize Window Functions over Aggregation Functions
🔍 Rule of thumb: Use Window Functions instead of Aggregation Functions when possible.
Window functions avoid the overhead of grouping and joining by passing the data only once through the query, which reduces processing time, especially with large datasets.
10. Avoid using LIMIT and OFFSET for pagination
🔍 Rule of thumb: Use keyset pagination method.
Using LIMIT
and OFFSET
for pagination can lead to performance issues with large datasets, as high OFFSET
values cause the database to scan and skip many rows.
A more efficient solution is keyset pagination, which uses a reference point, such as the last retrieved record's ID. This method fetches the next set of rows by querying for records greater than the last seen value, improving performance by eliminating unnecessary row scans and providing more consistent results, especially with large or frequently updated datasets.
Before (LIMIT
and OFFSET
):
-- Fetching rows 1001 to 1010
SELECT *
FROM <table>
ORDER BY id
LIMIT 10 OFFSET 1000
After (keyset pagination):
-- Fetching rows 1001 to 1010
SELECT *
FROM <table>
WHERE id > 1000 -- last retrieved id from the previous query
ORDER BY id
LIMIT 10
Using tools to analyze and improve queries
Most database systems provide tools to monitor and troubleshoot query performance. For example, Snowflake offers the EXPLAIN
command to visualize the query execution plan.
Use these tools to check if your query:
- Performs undesired full table scans.
- Includes unexpectedly expensive operations.
You can also log queries that exceed a certain runtime threshold. This helps track performance trends over time—especially as your data grows. Monitoring this way allows you to identify bottlenecks early and take corrective action before they impact your application or users.
Final considerations
- Regularly review slow query logs to stay ahead of performance issues.
- Benchmark performance before and after changes to understand their real impact.
- Even small query optimizations can scale massively in production environments.
- Make performance tuning a habit—optimize early and often.
- Treat query performance like code quality—invest in profiling and observability from the start.
Conclusion
Query optimization isn’t just about shaving milliseconds—it's about writing smarter SQL that scales with your data, keeps systems responsive, and saves costs in compute-heavy environments.
By applying the correct optimization strategy like filtering early, indexing strategically, avoiding SELECT *
, and leveraging tools like execution plans and window functions, you can drastically improve performance and maintainability.
🚀 The best-performing query is the one that does the least work to get exactly what you need.
Mastering these techniques not only improves your queries—it elevates your skills as a data professional. The more intentional you are with your SQL, the more control you have over how your systems perform.
Subscribe to my newsletter
Read articles from Elie Fayad directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Elie Fayad
Elie Fayad
I'm a data professional specializing in SQL and Snowflake, with a strong background in cloud migrations, data platform configuration, ETL/ELT pipeline development, data modeling, and workflow orchestration. I'm proactive, eager to learn, and passionate about tackling new challenges. I enjoy exploring emerging technologies and sharing knowledge with the community!