Level 1 · 20 min

Query Optimization

PostgreSQL query optimization starts with reading EXPLAIN ANALYZE output. Understanding how the planner chooses between sequential and index scans, and which join algorithm it uses, is the foundation of PostgreSQL performance work.

Reading EXPLAIN ANALYZE

EXPLAIN shows the query plan. EXPLAIN ANALYZE actually executes the query and shows actual vs estimated row counts and timing. Key output fields: 'actual time' (ms per loop iteration), 'rows' (actual vs estimated), 'loops' (how many times this node was executed), 'buffers' (cache hits vs reads). When actual rows >> estimated rows, the planner made a bad estimate — usually means stale statistics. Run ANALYZE on the table to refresh. 'Planning time' vs 'Execution time' are both important: high planning time on complex queries suggests too many join permutations.

Sequential Scan vs Index Scan

PostgreSQL chooses Seq Scan when: the table is small (reading all pages is cheaper than random index lookups), the query has low selectivity (many rows match, index overhead not worth it), or no useful index exists. Index Scan is chosen for high-selectivity queries. Bitmap Index Scan is a hybrid: build a bitmap of matching TIDs, then fetch pages in heap order — efficient for medium selectivity (more matches than Index Scan handles well, fewer than Seq Scan). Index Only Scan avoids the heap entirely when all required columns are in the index. When the planner chooses Seq Scan and you expect Index Scan, check: is the index actually being used (pg_stat_user_indexes)? Is random_page_cost too high? A real EXPLAIN output from Obe & Hsu shows the anatomy clearly: GroupAggregate (cost=111.29..151.93 rows=1478 width=20) → Sort (cost=111.29..114.98 rows=1478) → Seq Scan on census.hisp_pop (cost=0.00..33.48). Each cost is startup..total in arbitrary planner units (1 unit ≈ 1 sequential page read). The planner key tunable is random_page_cost (default 4.0 for spinning disk, set to 1.1 for SSDs); reducing it makes the planner prefer index scans over seq scans by shrinking the estimated random I/O penalty. work_mem controls the memory granted per sort or hash operation — too low forces on-disk sorts (visible in EXPLAIN as 'Sort Method: external merge'). Parallel query divides a seq scan across multiple workers (max_parallel_workers_per_gather, default 2); look for 'Gather' or 'Gather Merge' nodes in EXPLAIN to confirm parallelism is active.

Join Algorithms

Nested Loop Join: for each row in the outer relation, scan the inner relation for matches. Efficient when the inner relation is small or has an index. Hash Join: build a hash table from the smaller relation, probe with the larger. Efficient for large equi-joins but requires memory (work_mem). Merge Join: sort both relations then merge. Efficient when both sides are already sorted (index scans) or for large relations where hash table doesn't fit in work_mem. The planner estimates cost using pg_class.reltuples (row count), pg_class.relpages (page count), pg_statistic (column statistics from ANALYZE). Stale statistics → wrong row estimates → wrong plan choice.

Key Takeaways

EXPLAIN ANALYZE shows actual vs estimated rows. Large discrepancy = stale statistics. Run ANALYZE to fix. Cost estimates drive plan choice: wrong estimates = wrong plan.
Bitmap Index Scan is the sweet spot between Index Scan (very few rows) and Seq Scan (many rows). If you see Bitmap Index Scan being used, the planner is working correctly for medium selectivity queries.
Hash Join requires work_mem to build the hash table. If a Hash Join spills to disk (Batches > 1 in EXPLAIN output), increase work_mem for the session or globally to avoid disk-based batching.

Code example

-- See query plan with timing and buffer stats\nEXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)\nSELECT * FROM orders WHERE user_id = 42 AND status = 'pending';\n\n-- Update statistics if planner estimates are wrong\nANALYZE orders;\n\n-- Check index usage\nSELECT indexrelname, idx_scan, idx_tup_read\nFROM pg_stat_user_indexes WHERE relname = 'orders';