Level 2 · 25 min

Query DSL

Elasticsearch Query DSL is a JSON-based language for defining searches. Understanding the difference between query context and filter context, and the semantics of term, match, and bool queries, is essential for building both relevant and performant search.

Bool Query Structure

The bool query is the workhorse of Elasticsearch. It combines clauses with four operators: must (clause must match, contributes to score), filter (clause must match, does NOT affect score — cached), should (clause should match, boosts score if it does), must_not (clause must not match, no score contribution). Query context (must, should) computes relevance scores via BM25. Filter context (filter, must_not) is binary — match or not — and its results are cached in the filter cache. Always move non-scoring conditions to filter for better performance.

term vs match vs match_phrase

term queries match exact values without analysis — the search value is compared as-is to the indexed value. Use term on keyword fields. match queries analyze the search string before matching — the query goes through the same analyzer as the indexed field, enabling full-text relevance search on text fields. match_phrase requires all terms to appear in the exact order with no intervening tokens (slop: N allows N positional swaps). Never use term on a text field — the analyzed index tokens won't match the unanalyzed query value. Relevance scoring in Elasticsearch uses TF/IDF (older versions) or BM25 (default since ES 5.0). As the Definitive Guide explains, the score incorporates: "Term frequency — how often does the term appear in the field? The more often, the more relevant. Inverse document frequency — how often does each term appear in the index? The more often, the less relevant. Field-length norm — how long is the field?" — Clinton Gormley & Zachary Tong, Elasticsearch: The Definitive Guide. The bool query composes clauses under must (required, scored), should (optional, boosts score), filter (required, not scored, cached), and must_not (exclusion, not scored). A key insight: filter clauses are cached at the shard level as bitsets and reused across queries. High-selectivity filters (date ranges, status flags) should always go in filter context — they run faster and improve cache hit rates compared to equivalent must clauses.

Relevance and Boosting

BM25 (Best Match 25) is the default scoring algorithm. It considers term frequency (TF: how often the term appears in the document), inverse document frequency (IDF: how rare the term is across the index), and field length normalization (shorter fields score higher for the same term match). boost parameter multiplies a clause's score contribution. Function score and script score queries enable custom scoring: decay functions for geo/time proximity, field value factors for popularity weighting. Use the Explain API (_explain endpoint) to debug why a document scores as it does.

Key Takeaways

Put non-scoring conditions in filter context — they are cached and do not compute BM25 scores, making them significantly faster.
term is for keyword (exact match). match is for text (analyzed, full-text). Using term on a text field returns no results because analyzed tokens differ from raw values.
BM25 rewards term frequency and penalizes common terms (IDF). Use boost to weight specific clauses. Use _explain to debug unexpected scores.

Code example

POST /products/_search\n{\n  "query": {\n    "bool": {\n      "filter": [\n        {"term": {"category": "electronics"}}',\n        {"range": {"price": {"lt": 1000}}} \n      ],\n      "must": [\n        {"match": {"name": {"query": "laptop"}}} \n      ]\n    }\n  }\n}