Indexing in Databases: A Comprehensive Guide
1. What is Indexing?
Indexing is a technique used by databases (such as MongoDB, MySQL, PostgreSQL) to make data retrieval faster. Without indexing, the database must scan each row or document in the collection/table to find the data you requested. This is called a full table/collection scan and can be very slow for large datasets.
An index is a data structure (typically a B-Tree or a Hash Table) that stores a subset of the database's data in a structured way to make lookups faster. Think of it like the index at the back of a book, which helps you quickly find pages related to a particular topic instead of reading the whole book.
2. How Does Indexing Work?
Imagine you have a table of job postings with fields like job_title
, company_name
, location
, salary
, etc. Without an index, if you search for all jobs with a specific company_name
, the database would have to check each row one by one.
When you create an index on the company_name
column, the database stores the values of company_name
in a sorted order. Now, when you search for a particular company, the database can use the index to go directly to the rows that match your query, skipping irrelevant rows.
3. Benefits of Indexing
Faster Data Retrieval: Indexing drastically speeds up queries, especially on large datasets.
Efficient Range Queries: Queries that search within a range of values (e.g., all salaries between $50,000 and $100,000) are much faster.
Unique Constraints: Indexes can enforce uniqueness, ensuring that no two rows have the same value for the indexed field(s).
4. Drawbacks of Indexing
Extra Storage: Indexes consume additional disk space since the database must store the index structure alongside the actual data.
Slower Writes: Inserting, updating, or deleting data may be slower because the index needs to be updated whenever data changes.
Overhead: Maintaining too many indexes can cause performance overhead.
5. Types of Indexes
Different databases support various types of indexes. The most common types are:
Single-Column Index: Index on a single field (e.g.,
company_name
).Multi-Column/Compound Index: Index on multiple fields (e.g.,
company_name
,location
), which can improve performance when queries involve multiple columns.Unique Index: Ensures that all values in a column or group of columns are unique (e.g., ensuring no two job ads have the same job ID).
Full-Text Index: Used for searching large amounts of text (e.g., search in descriptions).
Hash Index: Useful for equality lookups (e.g., finding an exact match).
6. How to Implement Indexing in MongoDB
Let’s focus on MongoDB, as it's a popular database for modern web apps.
In MongoDB, you can create an index on a field using the createIndex()
function. Let’s say we want to index the company_name
field of a jobAds
collection.
db.jobAds.createIndex({ company_name: 1 });
The
1
means ascending order. If you use-1
, it would be descending.This creates an index on the
company_name
field, speeding up searches for job ads by company name.
7. Compound Indexes
If you frequently query by both company_name
and location
, you might want to create a compound index.
db.jobAds.createIndex({ company_name: 1, location: 1 });
This will optimize queries that filter by both company_name
and location
.
8. Best Practices for Indexing
Index Fields Used in Queries: Focus on fields that are frequently used in search queries or sorting.
Use Compound Indexes for Multi-Field Queries: If you often filter by multiple fields, use compound indexes instead of separate indexes for each field.
Don’t Over-Index: While indexes speed up reads, they slow down writes (inserts, updates, deletes). Only index fields that you query frequently.
Unique Indexes for Uniqueness Constraints: If a field must be unique (e.g.,
email
in a user collection), use a unique index to enforce this at the database level.Use Full-Text Indexing for Large Text Searches: When searching within text-heavy fields (like job descriptions), consider using full-text indexes.
9. Indexing in SQL Databases (e.g., MySQL, PostgreSQL)
In relational databases, indexing works similarly. Here’s how you would create an index in MySQL for a jobAds
table on the company_name
field.
CREATE INDEX idx_company_name ON jobAds(company_name);
For a compound index on company_name
and location
:
CREATE INDEX idx_company_location ON jobAds(company_name, location);
You can also create a unique index:
CREATE UNIQUE INDEX idx_job_id ON jobAds(job_id);
10. Monitoring Index Performance
Most databases provide tools to monitor and analyze how your indexes are being used. For example, MongoDB has the explain()
method, which shows how a query is executed and whether an index is being used.
db.jobAds.find({ company_name: "Google" }).explain();
If the index is being used, you will see "IXSCAN"
(Index Scan) in the execution plan. If not, you’ll see "COLLSCAN"
(Collection Scan), meaning the database had to scan every document.
11. Optimizing Indexes for Performance
Use Indexes for Filtering and Sorting: If you are sorting data by a field, creating an index on that field will speed up the sorting.
Index Cardinality: Fields with high cardinality (many unique values, like
job_id
) are more beneficial to index than fields with low cardinality (few unique values, likestatus
fields).Query Patterns: Look at your most frequent query patterns and optimize indexes based on those.
Subscribe to my newsletter
Read articles from Muhammad Sufiyan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Muhammad Sufiyan
Muhammad Sufiyan
As a former 3D Animator with more than 12 years of experience, I have always been fascinated by the intersection of technology and creativity. That's why I recently shifted my career towards MERN stack development and software engineering, where I have been serving since 2021. With my background in 3D animation, I bring a unique perspective to software development, combining creativity and technical expertise to build innovative and visually engaging applications. I have a passion for learning and staying up-to-date with the latest technologies and best practices, and I enjoy collaborating with cross-functional teams to solve complex problems and create seamless user experiences. In my current role as a MERN stack developer, I have been responsible for developing and implementing web applications using MongoDB, Express, React, and Node.js. I have also gained experience in Agile development methodologies, version control with Git, and cloud-based deployment using platforms like Heroku and AWS. I am committed to delivering high-quality work that meets the needs of both clients and end-users, and I am always seeking new challenges and opportunities to grow both personally and professionally.