Cracking the SQL Interview: Real Questions and PostgreSQL Internals You Should Know

kiran sabnekiran sabne
6 min read

Over the past few months since the start of new year, I’ve been actively interviewing for technical roles, and each conversation proved to be both challenging and rewarding. In this post, I’ll share some of the SQL questions I was asked, along with deep-dive PostgreSQL internals topics that stood out in interviews.

If you’re preparing for data engineering, backend, or database-focused roles, this guide is for you.

Real SQL Questions I Was Asked in Interviews

Here are some real SQL questions I encountered across multiple interviews. These test your ability to work with complex queries and understand database structures:

  • Recursive Hierarchy with Levels - Return employee ID, name, manager name, and depth in the org chart (Recursive CTE)

  • Find Managers with 2 or more employees

  • Departmental Top 3 Salaries

  • Customers who bought all products

  • Rolling sums - Calculate a 7-day rolling sum of sales for each product, ordered by date.

  • Find continuous date ranges when there was sales activity every day without gaps.

The below are advanced topics came up in several interview rounds and often led to deep discussions.

1. PostgreSQL Indexing: Internals, Use Cases, and Performance

Had low level discussions on indexes along with their use cases.

Index TypeUse Case
B-TreeEquality, Range queries
HashEquality only
GINFull-text, JSONB, array overlap
GiSTGeometric search, similarity search
BRINLarge, naturally sorted tables (e.g., logs)
PartialIndex on filtered rows
ExpressionIndex on function output (e.g., lower(name))
  • Covering Indexes: CREATE INDEX ON emp (deptid) INCLUDE (salary)

  • Multicolumn Indexes: Consider column order (leading column matters!)

  • Bloom Index: Used in specialized fuzzy match cases

  • Index Usage: How planner choose Indexes, Analyze & Statistics effects on Index Usage.

  • Index Maintenance: Need for vacuum &/ reindex, index Bloat, Fill factor & tuning.

  • Other: Index Impact on Write Performance, OLTP & OLAP workloads, Finding unused index, Index usage metrics, Indexes and concurrent writes, etc.

2. Reading PostgreSQL Execution Plans

How to interpret & understand execution plans in turn understanding query performance. What are the things to look into the plans.

EXPLAIN (ANALYZE, BUFFERS, COSTS, VERBOSE)
SELECT * FROM employee WHERE salary > 100000;

Things to know:

  • Seq Scan: No useful index found

  • Index Scan: Ideal for high-selectivity queries

  • Bitmap Heap Scan: Used for broader matches, combines index results

  • Rows Removed by Filter: Wasted work — suggests filter isn’t pushed down

  • Joins & Join Methods (Nested Join, Merge Join & Hash Join): Use case scenarios, Memory Impacts, how sorting effects the join, hash table spills etc.

  • Other Topics: Parallel Query, CTEs & subplans, cost estimates, watch for loops > 1 in nested loops, monitor buffers read/hit to spot I/O inefficiencies, etc.

3. Partitioning

I was asked: “How would you partition a 5B-row logs table by region and time?”

Few of the topics discussed during interviews were,

  • Partitioning Methods - Range Partition, List Partition, Hash Partition, Composite Partition

  • Partitioning Work - Declarative partitioning vs table inheritance, Planner routes inserts/update, default partitions.

  • Performance Implications - Partition Pruning, indexing on partitions vs global index (Postgres doesn’t support native global index)

  • Impact on Query Plans & when pruning fails.

  • Maintenance - Adding / Removing partitions, Detaching / Attaching partitions, Archiving, Table bloat, constraints behavior with partitioned tables etc.

4. Sharding PostgreSQL (Horizontal Scale)

Interviewer: “What happens when a single node isn’t enough?”

Sharding = horizontal partitioning → splitting data across multiple physical databases/servers, not just partitions in one DB.

  • Each shard holds a subset of rows.

  • Common shard keys: customer ID, tenant ID, geographic region, time.

  • Goal: keep each shard small & fast to query independently.

Manual Sharding Approaches:

  • Application-controlled sharding (based on user_id, region)

  • Foreign Data Wrappers (postgres_fdw)

  • Tools like Citus

Example Shard Strategy (Customer DB):

  • Users A–M → shard_1

  • Users N–Z → shard_2

  • Metadata service maps user → shard

You must handle picking a good shard key, cross-shard joins, global transactions, Distributed transactions, Data rebalancing, Consistency & failover, replication lag etc.

5. Change Data Capture (CDC)

Change Data Capture means continuously capturing row-level data changes (INSERT, UPDATE, DELETE) from a source database and delivering them to downstream systems.

“How do you build a real-time pipeline to stream changes?”

✅ PostgreSQL CDC Techniques:

TechniqueLatencySetup EffortProsConsWhen to Use
🔹 Triggers + Audit TablesLowEasySimple to implement for small DBsPerformance overhead at high TPS, hard to scaleWhen you just need an audit trail (e.g. small B2B SaaS)
🔹 Logical ReplicationMedium–LowNativeSupports pub/sub; only committed changes; easy failoverCannot replicate DDL; table must have PKMulti-region read replicas, selective replication
🔹 WAL Stream DecodingLowMedium–HardTrue real-time streaming; works well with Kafka, DebeziumHarder to manage slots, risk of WAL bloatEnterprise streaming pipelines
🔹 pgoutput (logical decoding plugin)EfficientNative, preferredDefault plugin for logical replication; Debezium uses itLimited to logical replication; same WAL slot constraintsBest practice for Kafka CDC

Logical Replication (Built-in)

  • PostgreSQL 10+ supports logical replication.

  • Publishes row-level changes as a stream.

  • Use: CREATE PUBLICATION and CREATE SUBSCRIPTION.

  • Works via WAL (Write-Ahead Log): changes are encoded as logical changes, not raw blocks.

  • Good for replica clusters or feeding Kafka connectors.

Logical Decoding

  • The foundation for logical replication.

  • pgoutput is the default plugin.

  • Or use plugins like wal2json or decoderbufs:

    • wal2json: output changes as JSON — easy for Kafka, Debezium.

    • decoderbufs: protobuf format.

  • Tools read the WAL stream via a replication slot.

Triggers-Based CDC

  • Implemented with AFTER INSERT/UPDATE/DELETE triggers.

  • Simpler for small systems, but high overhead under heavy writes.

6. Other concepts:

Few other things which I think it’s important to go over are,

  • When to use BRIN over B-tree indexes

  • Index design tradeoffs in high-write OLTP systems

  • Join types and when to apply each

  • Timeseries partitioning and pruning validation

  • Postgres internals: MVCC, WAL, autovacuum

  • Locking, concurrency control, and deadlocks

  • Designing a hybrid search (full-text + semantic)

  • Data ingestion pipelines with Kafka and Postgres

  • Handling SCDs, surrogate vs natural keys

  • Real-time event guarantees: exactly-once vs at-least-once delivery

  • What indexing strategies do you use for analytical vs transactional workloads?

If you’re preparing for roles that involve SQL, data engineering, or backend systems, mastering both query skills and PostgreSQL internals can really set you apart. The questions span from hands-on SQL to system-level architecture — and each was an opportunity to demonstrate practical depth.

Let me know in the comments if you'd like to add more questions or topics to this list!

0
Subscribe to my newsletter

Read articles from kiran sabne directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

kiran sabne
kiran sabne