Indexing in Databases
When working with databases, performance is often a key concern, especially as data grows in size. Efficient data retrieval becomes critical for applications that demand quick query results. One of the most effective ways to enhance database performance is through indexing.
In this blog, we'll delve into the concept of indexing in databases, understand how it works, explore different types of indexes, and walk through practical examples.
What is Indexing in Databases?
In simple terms, indexing in databases is a data structure technique that allows for quick data retrieval. Think of an index in a database like an index in a book. Instead of scanning every page of the book to find a particular word, you use the index to jump directly to the relevant page. Similarly, database indexes help in speeding up the query process by reducing the amount of data the system needs to scan.
Why Indexing?
Without indexes, the database engine scans the entire table (a full table scan) to fetch the required data. This can be slow, especially when dealing with millions or even billions of rows. Indexing helps in:
Improving query performance: By minimizing the number of rows that need to be scanned.
Optimizing sorting and filtering: Queries with
ORDER BY
,WHERE
, andJOIN
clauses benefit significantly from proper indexing.Enforcing uniqueness: Unique indexes ensure that values in specific columns are unique (like the primary key).
However, while indexes can boost query performance, they come with trade-offs like additional storage space and slower write operations (inserts, updates, and deletes) since the index needs to be updated.
How Does Indexing Work?
Indexes are typically built using B-trees (balanced tree data structures), which enable efficient searching, inserting, and deleting of records. When a query is executed on a column with an index, the database engine navigates the index structure (like traversing a tree) instead of scanning every row in the table.
Types of Indexes
Different types of indexes serve different purposes, each with its advantages and trade-offs.
1. Primary Index
A primary index is automatically created when a primary key is defined on a table.
It ensures the uniqueness of records in the column(s) and accelerates data retrieval.
Example:
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
position VARCHAR(50)
);
In the above example, the employee_id
is indexed automatically as it is the primary key. Queries searching by employee_id
will be faster.
2. Secondary Index (Non-Clustered Index)
A secondary index is created manually on columns to speed up search queries. These indexes do not alter the physical order of data in the table but create a separate index structure to point to the data.
Example:
CREATE INDEX idx_position ON employees (position);
In this case, the index on position
will make queries filtering by the position
column faster, such as:
SELECT * FROM employees WHERE position = 'Manager';
3. Unique Index
A unique index ensures that all values in the indexed column(s) are unique. Unlike primary indexes, unique indexes can be applied to non-primary key columns.
Example:
CREATE UNIQUE INDEX idx_email ON employees (email);
This guarantees that no two employees can have the same email address.
4. Clustered Index
A clustered index defines the physical order of data in a table. Every table can have only one clustered index because data can only be sorted in one order.
- By default, the primary key of a table often serves as the clustered index.
Example:
CREATE CLUSTERED INDEX idx_emp_id ON employees (employee_id);
Here, the table's data will be stored in the order of the employee_id
.
5. Composite Index
A composite index is an index that is created on more than one column. It is useful when queries often filter or search by multiple columns together.
Example:
CREATE INDEX idx_name_position ON employees (name, position);
This index will be used for queries that filter by both name
and position
, such as:
SELECT * FROM employees WHERE name = 'John' AND position = 'Manager';
6. Full-Text Index
A full-text index is designed for searching large text-based columns. It allows for more efficient text searching than using LIKE
queries.
Example:
CREATE FULLTEXT INDEX idx_description ON products (description);
This will speed up search queries that involve looking for specific terms or phrases in the description
column.
How to Choose the Right Index?
Choosing the right index depends on the types of queries run on the database. Here are a few guidelines:
Frequent Lookups: Index columns that are frequently used in
WHERE
clauses. If a column is often searched, an index will drastically improve lookup performance.Sorting and Grouping: Index columns that are frequently used in
ORDER BY
,GROUP BY
, orJOIN
operations.Avoid Indexing Small Tables: For small tables (with a few hundred rows), indexing might not be necessary since the database can quickly scan the entire table.
Avoid Over-indexing: Indexes take up disk space and slow down write operations (inserts, updates, and deletes). Adding too many indexes can result in diminishing returns.
Practical Examples of Indexing
Example 1: Basic Index
Let’s assume we have a table employees
with 1 million records. The most frequent query is to fetch an employee by their employee_id
.
Without indexing:
SELECT * FROM employees WHERE employee_id = 12345;
The database performs a full table scan, checking each row to find a match. This is time-consuming as the database needs to inspect 1 million rows.
With indexing:
CREATE INDEX idx_employee_id ON employees (employee_id);
Now, when the same query is executed, the database will look into the index, which is significantly smaller, to quickly retrieve the result.
Example 2: Composite Index
Suppose you frequently run a query that filters by both name
and position
in a table:
SELECT * FROM employees WHERE name = 'Alice' AND position = 'Developer';
A composite index will be more efficient:
CREATE INDEX idx_name_position ON employees (name, position);
Indexing Trade-offs
While indexing can dramatically speed up query performance, it’s important to be aware of its trade-offs:
Storage Cost: Indexes require additional disk space.
Slower Write Operations: Every time data is inserted, updated, or deleted, the corresponding indexes must be updated. This adds overhead.
Index Maintenance: As the data changes, indexes must be rebuilt or reorganized to maintain performance.
Conclusion
Indexing is a powerful technique that can greatly improve the performance of your database queries. However, like any tool, it should be used wisely. Over-indexing can lead to unnecessary overhead, while under-indexing can result in sluggish query performance. By carefully analyzing your query patterns and applying the appropriate indexes, you can strike a balance between query speed and system efficiency.
That's it for now. Did you like this blog? Please let me know.
You can Buy Me a Coffee if you want to and please don't forget to follow me on Youtube, Twitter, and LinkedIn also.
Subscribe to my newsletter
Read articles from Saarthak Maini directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by