Database indexing
When it comes to database performance, optimized and fast queries are essential for efficient data retrieval and manipulation. In today's data-driven world, where applications often handle massive amounts of information, the speed at which a database can execute queries can significantly impact or break user experience and overall system performance.
Database indexing is a powerful technique for improving the performance of read queries. In this blog, we'll explore the most effective ways to use indexing and when it’s best to avoid it.
Anatomy of an Index
“An index makes the query fast”
This is the basic explanation of the index and describes the most important use case of an index.
An index is a distinct data structure in the database that requires its own space of the indexed table data. This means that indexing is a form of redundancy in terms of memory, similar to a book index: it references actual information located elsewhere and occupies its own space.
There are various types of indexing, but this blog will specifically focus on B-tree indexing.
The problem with indexing
While indexing can significantly boost query performance, it's not a silver bullet for all database performance issues. In fact, improper use of indices can sometimes lead to decreased performance or wasted resources.
Ideal scenarios for indexing
Don't add an index unless you actually need it.
When considering indexing in databases, it's essential to balance performance improvement with the potential overhead that indexes introduce. Indexes can significantly speed up data retrieval, but they also consume additional disk space and slow down write operations like INSERT
, UPDATE
, and DELETE
.
Let’s discuss some use cases where indexing could be useful.
1. Frequent SELECT Queries with Filtering
If your application frequently retrieves records using a WHERE
clause on a specific column (or set of columns), adding an index can speed up these queries. This is especially true for large datasets where scanning the entire table for every query would be inefficient.
Example:
SELECT * FROM users WHERE email = 'example@example.com';
Indexing the email
column would make this query significantly faster, as the database can quickly locate the row without scanning the entire table.
2. Sorting (ORDER BY)
If your queries often involve sorting rows by a specific column using ORDER BY
, creating an index on that column can make the sorting process faster. This is especially helpful for large datasets where sorting without an index could be slow.
Example:
SELECT * FROM products ORDER BY price DESC;
Creating an index on the price
column speeds up the sorting operation.
2.1 Sorting more than one column
This section is based on insights from SQL Performance Explained by Markus Winand
SQL queries with an order by clause do not need to sort the result explicitly if the relevant index already delivers the rows in the required order. That means the same index that is used for the where clause must also cover the order by clause.
Assume that we got a table sales that consists of three columns (sale_date, product_id, quantity)
Consider the following query that selects yesterday’s sales ordered by sale data and product ID:
SELECT sale_date, product_id, quantity
FROM sales
WHERE sale_date = TRUNC(sysdate) - INTERVAL '1' DAY
ORDER BY sale_date, product_id;
Let’s try to create and index on the column in which we fetch in the WHERE clause
CREATE INDEX sales_date ON sales (sale_date);
The database must perform an explicit sort operation to satisfy the ORDER BY clause, let’s take a look on the execution plan:
The execution plan indicates index usage, however we can take an advantage of indexing properties to avoid explicit sorting operation.
Let’s drop the index and use multi-column index (Composite index) that contains both (sale_date, product_id)
DROP INDEX sales_date;
CREATE INDEX sales_dt_pr ON sales (sale_date, product_id);
The sort operation ORDER BY disappeared from the execution plan even though the query still has an order by clause. The database exploits the index order and skips the explicit sort operation, that’s because If the index order corresponds to the order by clause, the database can omit the explicit sort operation.
3. Range Queries
Range queries are situations where you need to retrieve a subset of data that falls within a specific range of values. Indexes can significantly speed up these queries by allowing the database to quickly locate the starting point and scan through the relevant rows.
Example:
SELECT * FROM products WHERE price BETWEEN 100 AND 500;
In this case, indexing the price
column allows the database to quickly find the range of prices without having to scan the entire products
table.
4. Full-Text Search
If your application requires searching through large text fields (like articles or comments), implementing a full-text index can significantly enhance search capabilities and performance.
Example:
SELECT * FROM articles WHERE MATCH(content) AGAINST('database indexing' IN NATURAL LANGUAGE MODE);
Full-text indexes are optimized for searching and can return results much faster than a regular index.
The drawbacks of indexing
Increased storage requirements: Each index requires additional storage space, as it essentially creates a separate data structure to enable faster lookups. The more indexes you add, the more storage space is consumed.
Slower write operations: When you insert, update, or delete data in a table, any indexes on that table must also be updated. This means that write operations become slower as the number of indexes increases, since the database has to maintain consistency across all indexes.
Impact on overall database performance: Too many indexes can cause the query optimizer to spend more time deciding which index to use, potentially slowing down query execution.
Given these drawbacks of indexing, several scenarios exist where implementing indexes may be unnecessary or even counterproductive:
Small tables: For tables with a small number of rows the overhead of maintaining indexes often outweighs their benefits. Sequential scans can be faster in these cases.
Tables with frequent updates: If a table undergoes frequent
INSERT
,UPDATE
, orDELETE
operations, the constant need to update indexes can significantly slow down these write operations. In such cases, the performance gain in read operations might be offset by the performance loss in write operations.Columns with low selectivity: Indexes are less effective on columns with low selectivity (i.e., columns with many duplicate values). For example, a boolean column or a column with only a few distinct values might not benefit much from an index.
When using full-text search capabilities: For full-text search operations especially in searching for text prefix, specialized full-text indexes are often more appropriate than standard B+ tree indexes.
In conclusion, while database indexing is a powerful tool for enhancing query performance, it must be used judiciously. Properly implemented indexes can significantly speed up data retrieval, especially in scenarios involving frequent SELECT
queries, sorting, range queries, and full-text searches.
However, the benefits of indexing come with trade-offs, including increased storage requirements and slower write operations. It's crucial to evaluate the specific needs of your application and database to determine when and where to apply indexing. By understanding the key do's and don'ts of indexing, you can optimize your database performance and ensure a balanced approach to data management.
Subscribe to my newsletter
Read articles from Omar Ashour directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by