Level 3 · 30 min

Performance

Elasticsearch performance tuning spans indexing throughput, query latency, and cluster resource utilization. Understanding shard sizing, query profiling, and bulk indexing patterns is essential for operating production clusters at scale.

Shard Sizing and Management

A shard is a Lucene index — the unit of distribution in Elasticsearch. Each shard has overhead: JVM heap (~few MB per shard for metadata), file descriptors, and thread pool slots. Over-sharding is a common mistake: 10,000 small shards on a cluster waste more resources than 100 large ones. The recommended shard size is 10-50 GB (50 GB hard limit). Calculate: target_shards = total_data_size_gb / 30. For time-series data, use ILM (Index Lifecycle Management) with rollover: roll over when size > 30 GB or age > 30 days. Avoid shards smaller than 1 GB — they have disproportionate overhead.

Query Profiling and Slow Logs

The Profile API (add profile: true to your search request) returns a detailed breakdown of time spent in each query clause and aggregation phase: query parse time, next_doc (scoring) time, score time per shard. Use it to identify which clause dominates latency. Slow logs capture queries exceeding a threshold: index.search.slowlog.threshold.query.warn: 5s. They log the full query JSON and shard-level execution times. Combine Profile API (interactive debugging) with slow logs (production alerting). The _explain API shows why a specific document was or was not returned and how its score was calculated. Key insight from Elasticsearch: The Definitive Guide: Lucene writes new documents to an in-memory buffer, then flushes them to immutable on-disk segment files. A refresh (default every 1 second) makes a new segment visible to search by writing it to the filesystem cache — without an fsync, it is fast but not durable. An fsync (flush) happens every 30 minutes or when the translog reaches 512MB, making segments durable. Segment merging is automatic: Lucene continuously merges small segments into larger ones in the background. Merges consume I/O and CPU; during heavy indexing, uncontrolled merging can saturate disk throughput. The indices.store.throttle.max_bytes_per_sec setting (deprecated in 5.0, replaced by OS-level I/O scheduling) controlled merge I/O. For bulk indexing, set refresh_interval: -1 and replica count to 0 during load, then restore after — this removes the 1-second refresh overhead and eliminates intra-cluster replication during the write phase.

Indexing Performance

The Bulk API batches multiple index/update/delete operations in one HTTP request — the single most important indexing optimization. Optimal batch size: 5-15 MB of payload (not document count). index.refresh_interval controls how often new documents become searchable (default 1s). For bulk loads, set refresh_interval: -1 (disable) and number_of_replicas: 0 during load, then restore after. This avoids constant segment merging during indexing. After bulk load, call POST /index/_refresh and PUT /index/_settings to restore replicas. Thread pool and queue depth: the bulk thread pool (size = CPU cores) limits concurrent bulk requests.

Key Takeaways

Keep shards between 10-50 GB. Over-sharding wastes JVM heap and file descriptors. Calculate target shard count from data size, not document count.
Profile API for interactive query debugging. Slow logs for production alerting. Both are essential — slow logs catch regressions you never ran Profile on.
For bulk indexing: disable refresh (refresh_interval: -1), zero replicas, batch at 5-15 MB. Restore settings after. This can improve throughput 10x.

Code example

// Profile a slow query\nPOST /products/_search\n{"profile": true, "query": {"match": {"name": "laptop"}}} \n\n// Bulk indexing setup\nPUT /products/_settings\n{"index": {"refresh_interval": "-1", "number_of_replicas": "0"}} \n\n// Slow log threshold\nPUT /products/_settings\n{"index.search.slowlog.threshold.query.warn": "5s"}