Complete Guide to Composite Indexes: From Theory to Practice

As developers, we often focus on writing clean code and building features, but database performance optimization is equally crucial. Today, I'll share my deep dive into composite indexes - a powerful technique that can transform your query performance from seconds to milliseconds.
What Are Composite Indexes?
A composite index (also called a multi-column or compound index) is a database index that spans multiple columns, treating them as a single unit for optimization.
sql-- Traditional approach: Separate indexes
CREATE INDEX idx_customer_id ON orders (customer_id);
CREATE INDEX idx_status ON orders (status);
CREATE INDEX idx_order_date ON orders (order_date);
-- Composite approach: Single multi-column index
CREATE INDEX idx_orders_composite ON orders (customer_id, status, order_date);
Think of it like organizing a library. Instead of having separate filing systems for author, genre, and publication year, you create one system that organizes books by author-genre-year combinations.
The Problem: When Individual Indexes Fall Short
Let's examine a realistic e-commerce scenario:
sql-- Our sample table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
status VARCHAR(20),
order_date DATE,
total_amount DECIMAL(10,2)
);
-- Sample data: 1,000,000 orders
-- customer_id: 1-10,000 (avg 100 orders per customer)
-- status: 'pending', 'shipped', 'delivered', 'cancelled', 'returned'
-- order_date: Last 2 years
Common query pattern:
sqlSELECT order_id, total_amount
FROM orders
WHERE customer_id = 7829
AND status = 'shipped'
AND order_date >= '2024-01-01';
Database Strategies for Multi-Column Queries
Let's analyze how different indexing strategies handle this query:
Strategy 1: No Indexes (Full Table Scan)
sql-- No indexes on any column
Execution:
Scan every single row (1,000,000 records)
Check all three conditions for each row
Time Complexity: O(n) where n = total records
Estimated Time: 2-5 seconds π±
sql-- Query plan would show:
Seq Scan on orders (cost=0.00..25000.00 rows=5 width=16)
Filter: (customer_id = 7829 AND status = 'shipped' AND order_date >= '2024-01-01')
Strategy 2: Individual Indexes (Single Index Usage)
sqlCREATE INDEX idx_customer_id ON orders (customer_id);
CREATE INDEX idx_status ON orders (status);
CREATE INDEX idx_order_date ON orders (order_date);
Database's decision-making process:
Analyze selectivity of each condition:
customer_id = 7829
: ~100 records (high selectivity)status = 'shipped'
: ~200,000 records (low selectivity)order_date >= '2024-01-01'
: ~500,000 records (very low selectivity)
Choose most selective index (customer_id)
Execution:
Use customer_id index β retrieve 100 records
Filter those 100 records for status and date
Time Complexity: O(log n) + O(k) where k = records from first index Estimated Time: 50-200ms
Strategy 3: Index Intersection (Advanced)
Some database engines (PostgreSQL, SQL Server) can use multiple indexes simultaneously:
sql-- Same individual indexes as Strategy 2
Execution:
customer_id index lookup β Record IDs: [245, 1001, 1567, 2103, ...]
status index lookup β Record IDs: [432, 1001, 1789, 2103, ...]
Find intersection of both sets β [1001, 2103, ...]
Apply date filter on intersected results
Time Complexity: O(log n) + O(log n) + O(intersection) Estimated Time: 20-100ms
Caveat: Not all databases support this efficiently, and intersection operations have overhead.
Strategy 4: Composite Index β
sqlCREATE INDEX idx_orders_composite ON orders (customer_id, status, order_date);
Execution:
Single index traversal handling all three conditions simultaneously
Direct navigation to matching records
No additional filtering required
Time Complexity: O(log n) Estimated Time: 1-10ms π
Time Complexity Analysis
StrategyIndex LookupsFilter OperationsTotal Time ComplexityPractical PerformanceFull Scan0O(n)O(n)2-5 secondsSingle Index1O(k)O(log n) + O(k)50-200msIndex Intersection2-3O(intersection)O(log n) + O(intersection)20-100msComposite Index10O(log n)1-10ms
Key Insight: Composite indexes eliminate the filtering phase entirely by incorporating all conditions into the index structure.
Best Practices: Column Ordering
The order of columns in a composite index is crucial. Follow the selectivity rule:
Selectivity Analysis
sql-- Calculate selectivity for each column
SELECT
'customer_id' as column_name,
COUNT(DISTINCT customer_id) as unique_values,
COUNT(*) as total_rows,
COUNT(DISTINCT customer_id) * 100.0 / COUNT(*) as selectivity_percent
FROM orders
UNION ALL
SELECT
'status',
COUNT(DISTINCT status),
COUNT(*),
COUNT(DISTINCT status) * 100.0 / COUNT(*)
FROM orders
UNION ALL
SELECT
'order_date',
COUNT(DISTINCT order_date),
COUNT(*),
COUNT(DISTINCT order_date) * 100.0 / COUNT(*)
FROM orders;
Sample Results:
column_name | unique_values | total_rows | selectivity_percent
-------------|---------------|------------|-------------------
customer_id | 10000 | 1000000 | 1.0%
order_date | 730 | 1000000 | 0.073%
status | 5 | 1000000 | 0.0005%
Optimal Column Ordering
sql-- β
OPTIMAL: Most selective first
CREATE INDEX idx_orders_optimal ON orders (customer_id, order_date, status);
-- β SUBOPTIMAL: Least selective first
CREATE INDEX idx_orders_bad ON orders (status, order_date, customer_id);
Why order matters:
B-tree indexes work like a multi-level directory
First column determines the primary organization
Subsequent columns create sub-organizations within each primary group
Visualization:
Optimal Index Structure (customer_id, order_date, status):
Customer 7829
βββ 2024-01-15 β [shipped, delivered]
βββ 2024-01-20 β [pending, shipped]
βββ 2024-02-01 β [shipped]
Suboptimal Index Structure (status, order_date, customer_id):
Status: shipped
βββ 2024-01-15 β [customers: 1, 15, 29, 7829, ...]
βββ 2024-01-20 β [customers: 3, 22, 67, 7829, ...]
βββ ... (much more data to traverse)
Verification with EXPLAIN Plans
Always verify your optimization works using database-specific EXPLAIN commands:
PostgreSQL
sqlEXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT order_id, total_amount
FROM orders
WHERE customer_id = 7829
AND status = 'shipped'
AND order_date >= '2024-01-01';
Without Composite Index:
Index Scan using idx_customer_id on orders (cost=0.43..25.89 rows=5 width=16)
(actual time=0.123..2.456 rows=5 loops=1)
Index Cond: (customer_id = 7829)
Filter: ((status = 'shipped') AND (order_date >= '2024-01-01'))
Rows Removed by Filter: 78
Buffers: shared hit=12
With Composite Index:
Index Scan using idx_orders_composite on orders (cost=0.43..8.45 rows=5 width=16)
(actual time=0.034..0.041 rows=5 loops=1)
Index Cond: ((customer_id = 7829) AND (status = 'shipped') AND (order_date >= '2024-01-01'))
Buffers: shared hit=3
MySQL
sqlEXPLAIN FORMAT=JSON
SELECT order_id, total_amount
FROM orders
WHERE customer_id = 7829
AND status = 'shipped'
AND order_date >= '2024-01-01';
Look for these indicators:
β
"using_index": true
- Goodβ
"key": "idx_orders_composite"
- Using your composite indexβ
"Extra": "Using where"
- Additional filtering happeningβ
"type": "ALL"
- Full table scan
SQL Server
sqlSET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT order_id, total_amount
FROM orders
WHERE customer_id = 7829
AND status = 'shipped'
AND order_date >= '2024-01-01';
When NOT to Use Composite Indexes
1. Independent Column Queries
sql-- If you frequently query individual columns:
SELECT * FROM orders WHERE status = 'pending'; -- Won't use (customer_id, status, date) efficiently
SELECT * FROM orders WHERE order_date = '2024-01-15'; -- Won't use the composite index
Solution: Create both composite and individual indexes based on query patterns.
2. High Write Frequency
sql-- Heavy INSERT/UPDATE workload
-- Each insert must update ALL indexes
-- More indexes = slower writes
Benchmark Example:
sql-- With 5 individual indexes: 1000 INSERTs/second
-- With 1 composite index: 2500 INSERTs/second
-- Trade-off: Read performance vs Write performance
3. Low Selectivity Combinations
sql-- All columns have poor selectivity
CREATE INDEX idx_poor ON orders (status, payment_method, shipping_type);
-- If most orders are 'shipped', 'credit_card', 'standard'
-- Index won't provide significant benefit
4. Unused Query Patterns
sql-- Creating indexes for queries that rarely run
-- Wastes storage and slows down writes
-- Monitor query patterns before optimizing
Real-World Case Study
Company: E-commerce platform Problem: Order dashboard queries timing out (30+ seconds) Table Size: 50 million orders
Original Query:
sqlSELECT o.order_id, o.total_amount, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.merchant_id = 12345
AND o.status IN ('shipped', 'delivered')
AND o.order_date BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY o.order_date DESC
LIMIT 100;
Before Optimization:
sql-- Individual indexes on each column
-- Query time: 35 seconds
-- Scanned: 15 million rows
-- Filtered: 99.9% rows discarded
After Composite Index:
sqlCREATE INDEX idx_orders_dashboard ON orders (merchant_id, status, order_date, customer_id);
-- Column order based on selectivity analysis
-- Query time: 180ms
-- Scanned: 15,000 rows
-- 200x improvement! π
Key Lessons:
Measure before optimizing - Use query profiling tools
Consider covering indexes - Include frequently selected columns
Monitor after deployment - Ensure production benefits match testing
Regular maintenance - Update index statistics periodically
Advanced Tips
Covering Indexes
sql-- Include frequently selected columns
CREATE INDEX idx_orders_covering ON orders (customer_id, status, order_date)
INCLUDE (total_amount, shipping_address);
-- Eliminates table lookups entirely
Partial Indexes
sql-- Index only relevant data
CREATE INDEX idx_orders_active ON orders (customer_id, order_date)
WHERE status NOT IN ('cancelled', 'returned');
-- Smaller index, better performance for active orders
Index Maintenance
sql-- PostgreSQL: Update statistics
ANALYZE orders;
-- MySQL: Optimize table
OPTIMIZE TABLE orders;
-- SQL Server: Update statistics
UPDATE STATISTICS orders;
Conclusion
Composite indexes are powerful tools for optimizing multi-column queries, but they require thoughtful implementation:
Key Takeaways:
Use composite indexes when columns are frequently queried together
Order columns by selectivity (most selective first)
Always verify with EXPLAIN plans - theory and practice can differ
Consider the write performance trade-off - more indexes = slower writes
Monitor query patterns before and after optimization
Regular maintenance ensures continued performance
Remember: Database optimization is about understanding your data, your queries, and your database engine's behavior. Composite indexes are just one tool in your optimization toolkit.
Have you implemented composite indexes in your projects? What performance improvements did you see? Share your experiences in the comments below!
Tags: #Database #SQL #Performance #Optimization #Indexing #PostgreSQL #MySQL #SQLServer
Subscribe to my newsletter
Read articles from Khan Mohammed ahmed directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Khan Mohammed ahmed
Khan Mohammed ahmed
I'm a Full-Stack Developer passionate about building scalable products that solve real-world problems through technology. I work across the full stack β designing robust backend systems, crafting intuitive front-end experiences, and managing performant data storage β to deliver end-to-end solutions that matter. Whether itβs a web app, mobile interface, or conversational platform, my focus is always on building products that are reliable, maintainable, and impactful. π οΈ What I Do: Backend Development β Building APIs, microservices, and systems that scale Frontend Engineering β Creating responsive and user-friendly interfaces Data Layer β Modeling, querying, and optimizing structured data systems Product Thinking β Understanding user needs, optimizing workflows, and shipping value I believe great products are built at the intersection of clean architecture, collaborative teams, and continuous learning. I'm always exploring new tools, patterns, and technologies to push whatβs possible.