SQL Interview & PostgreSQL Key Questions

Over the past few months since the start of new year, I’ve been actively interviewing for technical roles, and each conversation proved to be both challenging and rewarding. In this post, I’ll share some of the SQL questions I was asked, along with deep-dive PostgreSQL internals topics that stood out in interviews.

If you’re preparing for data engineering, backend, or database-focused roles, this guide is for you.

Real SQL Questions I Was Asked in Interviews

Here are some real SQL questions I encountered across multiple interviews. These test your ability to work with complex queries and understand database structures:

Recursive Hierarchy with Levels - Return employee ID, name, manager name, and depth in the org chart (Recursive CTE)
Find Managers with 2 or more employees
Departmental Top 3 Salaries
Customers who bought all products
Rolling sums - Calculate a 7-day rolling sum of sales for each product, ordered by date.
Find continuous date ranges when there was sales activity every day without gaps.

The below are advanced topics came up in several interview rounds and often led to deep discussions.

1. PostgreSQL Indexing: Internals, Use Cases, and Performance

Had low level discussions on indexes along with their use cases.

Index Type	Use Case
B-Tree	Equality, Range queries
Hash	Equality only
GIN	Full-text, JSONB, array overlap
GiST	Geometric search, similarity search
BRIN	Large, naturally sorted tables (e.g., logs)
Partial	Index on filtered rows
Expression	Index on function output (e.g., `lower(name)`)

Covering Indexes: CREATE INDEX ON emp (deptid) INCLUDE (salary)
Multicolumn Indexes: Consider column order (leading column matters!)
Bloom Index: Used in specialized fuzzy match cases
Index Usage: How planner choose Indexes, Analyze & Statistics effects on Index Usage.
Index Maintenance: Need for vacuum &/ reindex, index Bloat, Fill factor & tuning.
Other: Index Impact on Write Performance, OLTP & OLAP workloads, Finding unused index, Index usage metrics, Indexes and concurrent writes, etc.

2. Reading PostgreSQL Execution Plans

How to interpret & understand execution plans in turn understanding query performance. What are the things to look into the plans.

EXPLAIN (ANALYZE, BUFFERS, COSTS, VERBOSE)
SELECT * FROM employee WHERE salary > 100000;

Things to know:

Seq Scan: No useful index found
Index Scan: Ideal for high-selectivity queries
Bitmap Heap Scan: Used for broader matches, combines index results
Rows Removed by Filter: Wasted work — suggests filter isn’t pushed down
Joins & Join Methods (Nested Join, Merge Join & Hash Join): Use case scenarios, Memory Impacts, how sorting effects the join, hash table spills etc.
Other Topics: Parallel Query, CTEs & subplans, cost estimates, watch for loops > 1 in nested loops, monitor buffers read/hit to spot I/O inefficiencies, etc.

3. Partitioning

I was asked: “How would you partition a 5B-row logs table by region and time?”

Few of the topics discussed during interviews were,

Partitioning Methods - Range Partition, List Partition, Hash Partition, Composite Partition
Partitioning Work - Declarative partitioning vs table inheritance, Planner routes inserts/update, default partitions.
Performance Implications - Partition Pruning, indexing on partitions vs global index (Postgres doesn’t support native global index)
Impact on Query Plans & when pruning fails.
Maintenance - Adding / Removing partitions, Detaching / Attaching partitions, Archiving, Table bloat, constraints behavior with partitioned tables etc.

4. Sharding PostgreSQL (Horizontal Scale)

Interviewer: “What happens when a single node isn’t enough?”

Sharding = horizontal partitioning → splitting data across multiple physical databases/servers, not just partitions in one DB.

Each shard holds a subset of rows.
Common shard keys: customer ID, tenant ID, geographic region, time.
Goal: keep each shard small & fast to query independently.

Manual Sharding Approaches:

Application-controlled sharding (based on user_id, region)
Foreign Data Wrappers (postgres_fdw)
Tools like Citus

Example Shard Strategy (Customer DB):

Users A–M → shard_1
Users N–Z → shard_2
Metadata service maps user → shard

You must handle picking a good shard key, cross-shard joins, global transactions, Distributed transactions, Data rebalancing, Consistency & failover, replication lag etc.

5. Change Data Capture (CDC)

Change Data Capture means continuously capturing row-level data changes (INSERT, UPDATE, DELETE) from a source database and delivering them to downstream systems.

“How do you build a real-time pipeline to stream changes?”

✅ PostgreSQL CDC Techniques:

Technique	Latency	Setup Effort	Pros	Cons	When to Use
🔹 Triggers + Audit Tables	Low	Easy	Simple to implement for small DBs	Performance overhead at high TPS, hard to scale	When you just need an audit trail (e.g. small B2B SaaS)
🔹 Logical Replication	Medium–Low	Native	Supports pub/sub; only committed changes; easy failover	Cannot replicate DDL; table must have PK	Multi-region read replicas, selective replication
🔹 WAL Stream Decoding	Low	Medium–Hard	True real-time streaming; works well with Kafka, Debezium	Harder to manage slots, risk of WAL bloat	Enterprise streaming pipelines
🔹 pgoutput (logical decoding plugin)	Efficient	Native, preferred	Default plugin for logical replication; Debezium uses it	Limited to logical replication; same WAL slot constraints	Best practice for Kafka CDC

Logical Replication (Built-in)

PostgreSQL 10+ supports logical replication.
Publishes row-level changes as a stream.
Use: CREATE PUBLICATION and CREATE SUBSCRIPTION.
Works via WAL (Write-Ahead Log): changes are encoded as logical changes, not raw blocks.
Good for replica clusters or feeding Kafka connectors.

Logical Decoding

The foundation for logical replication.
pgoutput is the default plugin.
Or use plugins like wal2json or decoderbufs:
- wal2json: output changes as JSON — easy for Kafka, Debezium.
- decoderbufs: protobuf format.
Tools read the WAL stream via a replication slot.

Triggers-Based CDC

Implemented with AFTER INSERT/UPDATE/DELETE triggers.
Simpler for small systems, but high overhead under heavy writes.

6. Other concepts:

Few other things which I think it’s important to go over are,

When to use BRIN over B-tree indexes
Index design tradeoffs in high-write OLTP systems
Join types and when to apply each
Timeseries partitioning and pruning validation
Postgres internals: MVCC, WAL, autovacuum
Locking, concurrency control, and deadlocks
Designing a hybrid search (full-text + semantic)
Data ingestion pipelines with Kafka and Postgres
Handling SCDs, surrogate vs natural keys
Real-time event guarantees: exactly-once vs at-least-once delivery
What indexing strategies do you use for analytical vs transactional workloads?

If you’re preparing for roles that involve SQL, data engineering, or backend systems, mastering both query skills and PostgreSQL internals can really set you apart. The questions span from hands-on SQL to system-level architecture — and each was an opportunity to demonstrate practical depth.

Let me know in the comments if you'd like to add more questions or topics to this list!

Cracking the SQL Interview: Real Questions and PostgreSQL Internals You Should Know