Cracking the SQL Interview: Real Questions and PostgreSQL Internals You Should Know

Over the past few months since the start of new year, I’ve been actively interviewing for technical roles, and each conversation proved to be both challenging and rewarding. In this post, I’ll share some of the SQL questions I was asked, along with deep-dive PostgreSQL internals topics that stood out in interviews.
If you’re preparing for data engineering, backend, or database-focused roles, this guide is for you.
Real SQL Questions I Was Asked in Interviews
Here are some real SQL questions I encountered across multiple interviews. These test your ability to work with complex queries and understand database structures:
Recursive Hierarchy with Levels - Return employee ID, name, manager name, and depth in the org chart (Recursive CTE)
Find Managers with 2 or more employees
Departmental Top 3 Salaries
Customers who bought all products
Rolling sums - Calculate a 7-day rolling sum of sales for each product, ordered by date.
Find continuous date ranges when there was sales activity every day without gaps.
The below are advanced topics came up in several interview rounds and often led to deep discussions.
1. PostgreSQL Indexing: Internals, Use Cases, and Performance
Had low level discussions on indexes along with their use cases.
Index Type | Use Case |
B-Tree | Equality, Range queries |
Hash | Equality only |
GIN | Full-text, JSONB, array overlap |
GiST | Geometric search, similarity search |
BRIN | Large, naturally sorted tables (e.g., logs) |
Partial | Index on filtered rows |
Expression | Index on function output (e.g., lower(name) ) |
Covering Indexes:
CREATE INDEX ON emp (deptid) INCLUDE (salary)
Multicolumn Indexes: Consider column order (leading column matters!)
Bloom Index: Used in specialized fuzzy match cases
Index Usage: How planner choose Indexes, Analyze & Statistics effects on Index Usage.
Index Maintenance: Need for vacuum &/ reindex, index Bloat, Fill factor & tuning.
Other: Index Impact on Write Performance, OLTP & OLAP workloads, Finding unused index, Index usage metrics, Indexes and concurrent writes, etc.
2. Reading PostgreSQL Execution Plans
How to interpret & understand execution plans in turn understanding query performance. What are the things to look into the plans.
EXPLAIN (ANALYZE, BUFFERS, COSTS, VERBOSE)
SELECT * FROM employee WHERE salary > 100000;
Things to know:
Seq Scan: No useful index found
Index Scan: Ideal for high-selectivity queries
Bitmap Heap Scan: Used for broader matches, combines index results
Rows Removed by Filter: Wasted work — suggests filter isn’t pushed down
Joins & Join Methods (Nested Join, Merge Join & Hash Join): Use case scenarios, Memory Impacts, how sorting effects the join, hash table spills etc.
Other Topics: Parallel Query, CTEs & subplans, cost estimates, watch for loops > 1 in nested loops, monitor buffers read/hit to spot I/O inefficiencies, etc.
3. Partitioning
I was asked: “How would you partition a 5B-row logs table by region and time?”
Few of the topics discussed during interviews were,
Partitioning Methods - Range Partition, List Partition, Hash Partition, Composite Partition
Partitioning Work - Declarative partitioning vs table inheritance, Planner routes inserts/update, default partitions.
Performance Implications - Partition Pruning, indexing on partitions vs global index (Postgres doesn’t support native global index)
Impact on Query Plans & when pruning fails.
Maintenance - Adding / Removing partitions, Detaching / Attaching partitions, Archiving, Table bloat, constraints behavior with partitioned tables etc.
4. Sharding PostgreSQL (Horizontal Scale)
Interviewer: “What happens when a single node isn’t enough?”
Sharding = horizontal partitioning → splitting data across multiple physical databases/servers, not just partitions in one DB.
Each shard holds a subset of rows.
Common shard keys: customer ID, tenant ID, geographic region, time.
Goal: keep each shard small & fast to query independently.
Manual Sharding Approaches:
Application-controlled sharding (based on user_id, region)
Foreign Data Wrappers (
postgres_fdw
)Tools like Citus
Example Shard Strategy (Customer DB):
Users A–M → shard_1
Users N–Z → shard_2
Metadata service maps user → shard
You must handle picking a good shard key, cross-shard joins, global transactions, Distributed transactions, Data rebalancing, Consistency & failover, replication lag etc.
5. Change Data Capture (CDC)
Change Data Capture means continuously capturing row-level data changes (INSERT, UPDATE, DELETE) from a source database and delivering them to downstream systems.
“How do you build a real-time pipeline to stream changes?”
✅ PostgreSQL CDC Techniques:
Technique | Latency | Setup Effort | Pros | Cons | When to Use |
🔹 Triggers + Audit Tables | Low | Easy | Simple to implement for small DBs | Performance overhead at high TPS, hard to scale | When you just need an audit trail (e.g. small B2B SaaS) |
🔹 Logical Replication | Medium–Low | Native | Supports pub/sub; only committed changes; easy failover | Cannot replicate DDL; table must have PK | Multi-region read replicas, selective replication |
🔹 WAL Stream Decoding | Low | Medium–Hard | True real-time streaming; works well with Kafka, Debezium | Harder to manage slots, risk of WAL bloat | Enterprise streaming pipelines |
🔹 pgoutput (logical decoding plugin) | Efficient | Native, preferred | Default plugin for logical replication; Debezium uses it | Limited to logical replication; same WAL slot constraints | Best practice for Kafka CDC |
Logical Replication (Built-in)
PostgreSQL 10+ supports logical replication.
Publishes row-level changes as a stream.
Use:
CREATE PUBLICATION
andCREATE SUBSCRIPTION
.Works via WAL (Write-Ahead Log): changes are encoded as logical changes, not raw blocks.
Good for replica clusters or feeding Kafka connectors.
Logical Decoding
The foundation for logical replication.
pgoutput
is the default plugin.Or use plugins like
wal2json
ordecoderbufs
:wal2json
: output changes as JSON — easy for Kafka, Debezium.decoderbufs
: protobuf format.
Tools read the WAL stream via a replication slot.
Triggers-Based CDC
Implemented with
AFTER INSERT/UPDATE/DELETE
triggers.Simpler for small systems, but high overhead under heavy writes.
6. Other concepts:
Few other things which I think it’s important to go over are,
When to use BRIN over B-tree indexes
Index design tradeoffs in high-write OLTP systems
Join types and when to apply each
Timeseries partitioning and pruning validation
Postgres internals: MVCC, WAL, autovacuum
Locking, concurrency control, and deadlocks
Designing a hybrid search (full-text + semantic)
Data ingestion pipelines with Kafka and Postgres
Handling SCDs, surrogate vs natural keys
Real-time event guarantees: exactly-once vs at-least-once delivery
What indexing strategies do you use for analytical vs transactional workloads?
If you’re preparing for roles that involve SQL, data engineering, or backend systems, mastering both query skills and PostgreSQL internals can really set you apart. The questions span from hands-on SQL to system-level architecture — and each was an opportunity to demonstrate practical depth.
Let me know in the comments if you'd like to add more questions or topics to this list!
Subscribe to my newsletter
Read articles from kiran sabne directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
