TechEarl

Elasticsearch Cheat Sheet

Practitioner reference for Elasticsearch 9.x: index and document operations, Query DSL, aggregations, vector / kNN search, ESQL, cluster management, version compatibility notes, and the gotchas that bite first-time operators.

Ishan Karunaratne⏱️ 13 min readUpdated
Share thisCopied
Elasticsearch 9.x cheat sheet: index and document operations, Query DSL, aggregations, vector / kNN search, ESQL, cluster management, and common mistakes.

Quick reference

Elasticsearch 9.x Quick Reference

REST endpoints, Query DSL fragments, aggregations, and admin commands for Elasticsearch 9.x.

Core concepts

ClusterOne or more nodes sharing the same `cluster.name`, coordinating to store data and serve requests.
NodeA single Elasticsearch process. Nodes have roles: master-eligible, data, ingest, coordinating, ML, transform.
IndexA logical namespace for documents. Conceptually like a database table; physically a set of one or more shards.
DocumentA JSON object stored in an index, identified by an `_id`. Has a fixed mapping (schema) once written.
ShardA Lucene index. Primary shards hold the data; replica shards are copies for redundancy and read throughput.
MappingThe schema. Defines field types (`keyword`, `text`, `long`, `date`, `dense_vector`, etc.) and how each is indexed.
AliasA pointer to one or more indices. Use aliases for zero-downtime reindexing and to abstract away rolling indices.
Data streamA managed alias over a sequence of backing indices, optimized for append-only time-series data with ILM rollover.

Index operations

GET /_cat/indices?vList all indices with health, docs count, store size.
PUT /:indexCreate an index. Optional body: `settings`, `mappings`, `aliases`.
DELETE /:indexDelete an index. Irreversible; protect with `action.destructive_requires_name: true` in production.
POST /:index/_closeClose an index. Closed indices keep their files on disk but are not searchable.
POST /:index/_openReopen a closed index.
PUT /:index/_settingsUpdate dynamic settings (number of replicas, refresh interval). Most static settings need close + reopen.
GET /:index/_mappingInspect the mapping. Run this before troubleshooting any "why didn't my query match" issue.
PUT /:index/_mappingAdd new fields to an existing mapping. Existing fields cannot be re-typed; you must reindex.
POST /_reindexCopy documents from one index to another, optionally transforming via script. Use for mapping changes or shard re-balancing.

Document operations

POST /:index/_docIndex a document with an auto-generated ID.
PUT /:index/_doc/:idCreate or replace a document with a specific ID. Returns `result: created` or `result: updated`.
PUT /:index/_create/:idCreate only. Fails with 409 if the ID already exists. Use when you must NOT overwrite.
GET /:index/_doc/:idRetrieve a document by ID, including `_source` and metadata.
POST /:index/_update/:idPartial update. Body: `{ "doc": { ... } }` for a merge, or `{ "script": { ... } }` for scripted.
DELETE /:index/_doc/:idDelete a single document by ID.
POST /:index/_delete_by_queryDelete every document matching a query. The async version returns a task ID; poll with `/_tasks/:task_id`.
POST /_bulkSend many index/create/update/delete actions in one round trip. The NDJSON format alternates action lines and source lines.
POST /_mgetMulti-get: fetch many documents by ID in one request.

Search basics

GET /:index/_searchDefault search: returns the first 10 documents sorted by relevance score.
GET /:index/_search?q=field:valueURI query string. Convenient for ad-hoc; less expressive than the JSON DSL.
POST /:index/_search { "query": { "match_all": {} } }Match every document. Equivalent to no query, but explicit.
POST /:index/_search { "query": { "match": { "title": "engine" } } }Full-text match. Runs the query through the field's analyzer; matches if any token overlaps.
POST /:index/_search { "query": { "term": { "status.keyword": "published" } } }Exact term match on a `keyword` field. No analysis. Use for IDs, enums, exact strings.
POST /:index/_search { "query": { "range": { "price": { "gte": 10, "lt": 50 } } } }Range query. Works on numeric, date, ip fields.
POST /:index/_search { "query": { "prefix": { "name.keyword": "foo" } } }Prefix match. Expensive on text fields without a properly mapped `keyword` subfield.

Compound queries (bool)

mustAll clauses must match. Contributes to the relevance score (acts like AND for ranking).
filterAll clauses must match. Does NOT contribute to the score; cached aggressively. Use for binary filters.
shouldAt least one (`minimum_should_match`, default 0 if `must`/`filter` present, else 1) must match. Contributes to score.
must_notNo clause may match. Does NOT contribute to score. Use for exclusions.
Example`{ "bool": { "must": [{"match":{"title":"phone"}}], "filter": [{"term":{"in_stock":true}}], "must_not":[{"term":{"discontinued":true}}] } }`

Search options

sizeNumber of hits to return per page. Default 10. Maximum 10000 without `search_after`.
fromPagination offset. Combined with `size`, hard-capped at `index.max_result_window` (default 10000).
search_afterCursor-based pagination using sort values from the last hit. The only way to paginate past 10000 results.
sortSort by one or more fields. Sorting on `text` fields requires `fielddata: true` (expensive) or use a `keyword` subfield.
_sourceFilter which source fields are returned. `false` = no source. Array = include only these. Object with includes/excludes for fine control.
highlightReturns matched fragments with HTML tags around hits. Useful for snippet UIs.
explainAdds `_explanation` to each hit, breaking down the score. Use when relevance ranking is mysterious.
track_total_hits`true` for exact total counts (slower); a number for accurate up to N; `false` to skip counting entirely.

Aggregations

termsBucket per distinct value: `{ "agg": { "terms": { "field": "category.keyword", "size": 10 } } }`. Like SQL GROUP BY.
date_histogramBucket by time: `{ "agg": { "date_histogram": { "field": "@timestamp", "calendar_interval": "day" } } }`.
histogramNumeric buckets at fixed interval. Useful for price brackets, latency distributions.
rangeCustom buckets with `from`/`to` boundaries. Use when buckets are not equal-width.
filtersOne bucket per named filter. Best for ad-hoc "slice the data N ways" queries.
avg / sum / min / max / statsMetric aggs. `stats` returns all five in one pass.
cardinalityApproximate distinct count (HyperLogLog). Cheap and accurate to ~40000 values.
percentilesApproximate percentile values (T-Digest). Use for latency reports.
Sub-aggregationsAggs nest: a `terms` agg can contain an `avg` sub-agg to get "average price per category".

Vector and kNN search

dense_vector field typeMapping: `{ "type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine" }`. Required to query with kNN.
knn search`{ "knn": { "field": "embedding", "query_vector": [...], "k": 10, "num_candidates": 100 } }`. Returns top-k nearest neighbors.
Hybrid searchCombine kNN with a text query via `bool` and per-clause `boost` to mix semantic + lexical relevance.
QuantizationSet `"element_type": "byte"` or `"int4"` to shrink vector storage by 4-8x with minor recall cost.
Semantic textES 8.15+ field type `semantic_text` runs embedding inference at index and query time; no client-side embeddings needed.

ESQL (8.11+)

POST /_queryRun an ESQL query. Body: `{ "query": "FROM logs | WHERE status >= 400 | STATS count() BY host" }`.
FROMSource index or alias. Comma-separated for multi-source: `FROM logs-*,metrics-*`.
WHEREFilter rows. SQL-like operators: `=`, `!=`, `>`, `<`, `LIKE`, `IN`, `IS NULL`.
STATS ... BYAggregate. `STATS count() BY host, status` is equivalent to a `terms` agg on (host, status).
EVALCompute new columns: `EVAL latency_ms = duration / 1000`.
SORT / LIMITOrder and cap results. `SORT @timestamp DESC | LIMIT 50`.
When to useESQL is preferred for ad-hoc analytics and dashboards; Query DSL is still preferred for search relevance tuning.

Cluster and admin

GET /_cluster/healthCluster status: `green` (all primaries + replicas assigned), `yellow` (replicas missing), `red` (a primary is missing).
GET /_cat/nodes?vList nodes with heap, CPU, load, role.
GET /_cat/shards?vEvery shard with state (`STARTED`, `INITIALIZING`, `RELOCATING`, `UNASSIGNED`) and reason if unassigned.
GET /_cluster/allocation/explainWhen you have unassigned shards, this tells you exactly why each one can't be placed. Single most useful debug endpoint.
PUT /_cluster/settingsUpdate cluster-wide settings. `transient` (lost on restart) vs `persistent` (survives).
POST /_cluster/rerouteManually move or allocate shards. Last-resort tool; usually `allocation/explain` shows a config issue to fix instead.
GET /_nodes/hot_threadsSnapshot of what each node's hottest threads are doing. The first stop when CPU is pinned.

Snapshots and backup

PUT /_snapshot/:repoRegister a snapshot repository (S3, GCS, Azure Blob, shared filesystem).
PUT /_snapshot/:repo/:snapshotTake a snapshot. Add `?wait_for_completion=true` for synchronous (small clusters only).
GET /_snapshot/:repo/:snapshot/_statusProgress of a running snapshot.
POST /_snapshot/:repo/:snapshot/_restoreRestore a snapshot. Specify `indices` and `rename_pattern` to restore alongside live indices.
SLMSnapshot Lifecycle Management automates snapshots on a schedule with retention. `PUT /_slm/policy/:name`.

An Elasticsearch cheat sheet is a single-page reference of the REST endpoints, Query DSL fragments, and admin commands that get you from "I have a running cluster" to "I have a useful search experience" without re-reading the manual every time. This sheet covers Elasticsearch 9.x as of 2026, including the bits that newcomers trip over: index lifecycle, the difference between match and term, aggregations, the newer ESQL query language, and vector / kNN search for embedding-based retrieval.

I run Elasticsearch behind a few production search experiences and an analytics pipeline. The sections below are organized in the order you actually hit them: first connect to the cluster, then create an index and load documents, then search, then aggregate, then tune.

What this cheat sheet covers

Use this page as a reference card. Each section is a self-contained chunk: skip to Search DSL if you already have a populated index, skip to Aggregations if you need facet counts, skip to Vector / kNN search if you are wiring up semantic search. The end-of-page Common mistakes and FAQ sections answer the questions I get asked most.

What this sheet does NOT cover:

  • Elastic's commercial features (Watcher, machine learning UI, cross-cluster replication). Those need an enterprise license. The free / open AGPL+ELv2 stack covers everything below.
  • Kibana dashboards. Kibana is its own product with its own docs.
  • The legacy Transport client. It was removed in Elasticsearch 8.0. Use the REST clients (Java, Python elasticsearch, Node @elastic/elasticsearch, or raw HTTP).

Connecting to Elasticsearch

Every command in this sheet is an HTTP request. I show them in curl form because that is the universal denominator. The same calls work verbatim in Kibana Dev Tools (paste without the curl -X prefix), in httpie, or through any official client.

A local single-node 9.x cluster started fresh requires basic auth (the elastic user) and HTTPS by default. The first-time setup prints the password to the console; reset it any time with:

bash
bin/elasticsearch-reset-password -u elastic

A typical authenticated request:

bash
curl -k -u elastic:CHANGEME https://localhost:9200/_cluster/health

The -k skips self-signed cert verification. For production, use a real certificate and pass --cacert instead. Most of the URLs in the rest of this sheet are shown without the host prefix; assume https://localhost:9200 (or wherever your cluster lives).

Search DSL: a worked example

The Query DSL is more verbose than the URI query string but composable. The idiom I reach for first is a bool query with must for text relevance and filter for binary conditions:

bash
curl -k -u elastic:CHANGEME -X POST "https://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' -d '{
  "size": 20,
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "wireless headphones" } }
      ],
      "filter": [
        { "term":  { "in_stock": true } },
        { "range": { "price": { "lte": 200 } } }
      ]
    }
  },
  "sort": [
    "_score",
    { "rating": "desc" }
  ],
  "_source": ["name", "price", "rating", "image_url"]
}'

What each part does:

  • must runs the user query through the name field's analyzer and contributes to _score.
  • filter enforces "in stock and under $200" without touching the score; these clauses are cached.
  • sort ranks by relevance first, then by rating as a tiebreaker.
  • _source restricts the response to the four fields the UI actually renders, saving bandwidth.

Aggregations: facets and analytics

Aggregations turn search results into facet counts, time series, and summary statistics. A typical product-search facet response:

bash
curl -k -u elastic:CHANGEME -X POST "https://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' -d '{
  "size": 0,
  "query": { "match": { "name": "headphones" } },
  "aggs": {
    "by_brand":     { "terms": { "field": "brand.keyword", "size": 10 } },
    "price_bucket": { "histogram": { "field": "price", "interval": 50 } },
    "stats_price":  { "stats": { "field": "price" } }
  }
}'

"size": 0 skips the hits and just returns the aggregations. That is the right pattern when you only need counts.

Aggregations can nest. To get the average price by brand:

json
"aggs": {
  "by_brand": {
    "terms": { "field": "brand.keyword", "size": 10 },
    "aggs": {
      "avg_price": { "avg": { "field": "price" } }
    }
  }
}

Elasticsearch 8.x added native vector search; 9.x stabilized it as the recommended path for semantic retrieval. Define a dense_vector field in the mapping, then run a knn query with a query vector you compute client-side (or use semantic_text to skip the client step).

Mapping:

json
{
  "mappings": {
    "properties": {
      "title":     { "type": "text" },
      "embedding": { "type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine" }
    }
  }
}

Query:

json
{
  "knn": {
    "field": "embedding",
    "query_vector": [0.012, -0.041, ...],
    "k": 10,
    "num_candidates": 100
  }
}

num_candidates controls recall vs latency. A common ratio is 10× k. For hybrid search that combines lexical relevance with vector similarity, put both knn and a bool/match query into the same request:

json
{
  "query": { "match": { "title": "noise cancelling" } },
  "knn":   { "field": "embedding", "query_vector": [...], "k": 10, "num_candidates": 100, "boost": 0.7 }
}

For the broader context on how vector search fits with embeddings, document chunking, and a working app, see how to build RAG with embeddings and vector search and how to add semantic search to a MySQL app.

ESQL: the SQL-like query language

Introduced in 8.11 and refined through the 8.x and 9.x lines, ESQL gives Elasticsearch a piped query language that reads like SQL with a Unix-pipeline twist. For analytics, dashboards, and log exploration it is often easier than building the equivalent Query DSL by hand.

code
FROM logs-2026.05.*
| WHERE status >= 400 AND status < 500
| STATS count = count() BY host, status
| SORT count DESC
| LIMIT 20

When to use which:

Use caseDSLESQL
Search-relevance tuning, scoringYesNo (no relevance scoring)
Time-series analytics, log queriesPossible but verboseYes
Joins across multiple indicesNoYes (LOOKUP JOIN, tech preview in 8.18/9.0, GA in 8.19/9.1)
AggregationsYesYes (cleaner syntax)
Vector / kNN searchYesNot yet

Version compatibility

FeatureAvailable sinceNotes
Single type per index (no _type)7.0Multi-type indices are gone. References to types in older tutorials are obsolete.
Composable index templates7.8Old index_template API replaced by index_templates + component_templates.
Searchable snapshots7.10Mount a snapshot as a read-only index in the frozen tier.
Runtime fields7.11Compute fields at query time from _source or other fields. Cheap mapping additions.
dense_vector indexed for kNN8.0Required for the knn query.
ESQL8.11General-availability piped query language. Major addition.
semantic_text field type8.15Embedding inference at index and query time, no external embedding step.
Elasticsearch 8.162024License returned to open AGPL + ELv2 alongside the existing free Basic.

For specific upgrade walkthroughs, the official Elasticsearch upgrade documentation is the source of truth. Always snapshot before upgrading.

Common mistakes

The bugs I have shipped or seen in code review.

Searching a keyword field with match, or a text field with term. A match runs the value through the analyzer (lowercase, tokenize); a term does not. If your category field is keyword, { "match": { "category": "Footwear" } } will look for the analyzed form, which a keyword field never produced. Use term. Conversely, term against a text field looks for the exact analyzed token (often lowercased), so { "term": { "title": "Headphones" } } misses Headphones if the analyzer lowercased it to headphones.

Using from/size to paginate past the 10,000-result wall. The default index.max_result_window caps from + size at 10,000 because deep pagination forces every shard to maintain a deep priority queue. Raising the setting is almost always the wrong fix. Use search_after with a unique sort field (usually _id as a tiebreaker) or a Point In Time (PIT) for stable cursor pagination.

Sorting on a text field. Text fields do not support sorting unless you enable fielddata: true (which loads every term into memory). The correct fix is to add a keyword subfield in the mapping (fields: { keyword: { type: "keyword" } }) and sort on name.keyword instead.

Forgetting to refresh after indexing in a test. Elasticsearch refreshes every second by default. If your test indexes a doc and immediately queries it back, the doc is not yet searchable. Append ?refresh=true to the index call, or call POST /:index/_refresh. Do NOT set refresh_interval to a tiny value in production; the cost in segment count is brutal.

Letting unmapped fields run wild. Without an explicit mapping, Elasticsearch auto-detects types from the first document. A field that holds "42" on doc 1 and 42 on doc 2 will end up text and queries with term fail in confusing ways. Define mappings up front for anything that matters.

Treating shards as cheap. Each shard is a Lucene index with overhead. Hundreds of small shards per node will starve heap. The general guidance: 10-50GB per shard for search workloads, fewer larger shards over more smaller ones, total shards per node roughly 20 × heap_in_GB.

Running a single-node cluster in production. A one-node cluster has number_of_replicas: 1 by default and cannot allocate the replica anywhere; cluster status sits at yellow. Either set replicas to 0 (knowing you have no redundancy) or run at least two data nodes. Don't ignore the yellow.

Snapshotting without a registered repository. Snapshots require a repository registered ahead of time; PUT /_snapshot/:repo once, then PUT /_snapshot/:repo/:snapshot. Many teams discover at recovery time that they never set this up. Test the restore path before you need it.

Frequently asked questions

See also

External references: Elasticsearch official documentation is the source of truth for endpoint behavior and version-specific changes. The Elastic Search Labs blog covers the newer ESQL, vector search, and semantic_text features in depth.

Sources

Authoritative references this article was fact-checked against.

TagsElasticsearchSearchQuery DSLLuceneESQLVector SearchREST APIJSONClusterCheat Sheet

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts

A working Docker cheat sheet: images, container lifecycle, run flags, exec, logs, build, networks, volumes, Compose, registry, and prune commands. With swappable variables.

Docker Cheat Sheet

The Docker commands I actually use, grouped by job: images, container lifecycle, run flags, exec and logs, build, networks, volumes, Compose, registry, and the prune/inspect commands for keeping a host clean.

Regex Cheat Sheet including regex symbols, ranges, grouping, assertions, syntax tables, examples, matches, and compatibility tables. Definitive Regular Expressions Quick Reference!

Regex Cheat Sheet

Regex Cheat Sheet including regex symbols, ranges, grouping, assertions, syntax tables, examples, matches, and compatibility tables. Definitive Regular Expressions Quick Reference!

Open vintage hardcover reference manual on a dark slate desk, dense columned print on warm cream pages lit by a single warm amber side lamp

MySQL Cheat Sheet

MySQL cheat sheet covering CLI commands, database and table operations, joins, indexes, backups, user management, and transactions, with version notes for 5.7, 8.0, and 8.4.