TechEarl

Elasticsearch Cheat Sheet

Practitioner reference for Elasticsearch 9.x: index and document operations, Query DSL, aggregations, vector / kNN search, ESQL, cluster management, version compatibility notes, and the gotchas that bite first-time operators.

Ishan KarunaratneIshan Karunaratne⏱️ 13 min readUpdated
Elasticsearch 9.x cheat sheet: index and document operations, Query DSL, aggregations, vector / kNN search, ESQL, cluster management, and common mistakes.

An Elasticsearch cheat sheet is a single-page reference of the REST endpoints, Query DSL fragments, and admin commands that get you from "I have a running cluster" to "I have a useful search experience" without re-reading the manual every time. This sheet covers Elasticsearch 9.x as of 2026, including the bits that newcomers trip over: index lifecycle, the difference between match and term, aggregations, the newer ESQL query language, and vector / kNN search for embedding-based retrieval.

I run Elasticsearch behind a few production search experiences and an analytics pipeline. The sections below are organized in the order you actually hit them: first connect to the cluster, then create an index and load documents, then search, then aggregate, then tune.

What this cheat sheet covers

Use this page as a reference card. Each section is a self-contained chunk: skip to Search DSL if you already have a populated index, skip to Aggregations if you need facet counts, skip to Vector / kNN search if you are wiring up semantic search. The end-of-page Common mistakes and FAQ sections answer the questions I get asked most.

What this sheet does NOT cover:

  • Elastic's commercial features (Watcher, machine learning UI, cross-cluster replication). Those need an enterprise license. The free / open AGPL+ELv2 stack covers everything below.
  • Kibana dashboards. Kibana is its own product with its own docs.
  • The legacy Transport client. It was removed in Elasticsearch 8.0. Use the REST clients (Java, Python elasticsearch, Node @elastic/elasticsearch, or raw HTTP).

Connecting to Elasticsearch

Every command in this sheet is an HTTP request. I show them in curl form because that is the universal denominator. The same calls work verbatim in Kibana Dev Tools (paste without the curl -X prefix), in httpie, or through any official client.

A local single-node 9.x cluster started fresh requires basic auth (the elastic user) and HTTPS by default. The first-time setup prints the password to the console; reset it any time with:

bash
bin/elasticsearch-reset-password -u elastic

A typical authenticated request:

bash
curl -k -u elastic:CHANGEME https://localhost:9200/_cluster/health

The -k skips self-signed cert verification. For production, use a real certificate and pass --cacert instead. Most of the URLs in the rest of this sheet are shown without the host prefix; assume https://localhost:9200 (or wherever your cluster lives).

Quick reference

Elasticsearch 9.x Quick Reference

REST endpoints, Query DSL fragments, aggregations, and admin commands for Elasticsearch 9.x.

Core concepts

Cluster
One or more nodes sharing the same `cluster.name`, coordinating to store data and serve requests.
Node
A single Elasticsearch process. Nodes have roles: master-eligible, data, ingest, coordinating, ML, transform.
Index
A logical namespace for documents. Conceptually like a database table; physically a set of one or more shards.
Document
A JSON object stored in an index, identified by an `_id`. Has a fixed mapping (schema) once written.
Shard
A Lucene index. Primary shards hold the data; replica shards are copies for redundancy and read throughput.
Mapping
The schema. Defines field types (`keyword`, `text`, `long`, `date`, `dense_vector`, etc.) and how each is indexed.
Alias
A pointer to one or more indices. Use aliases for zero-downtime reindexing and to abstract away rolling indices.
Data stream
A managed alias over a sequence of backing indices, optimized for append-only time-series data with ILM rollover.

Index operations

GET /_cat/indices?v
List all indices with health, docs count, store size.
PUT /:index
Create an index. Optional body: `settings`, `mappings`, `aliases`.
DELETE /:index
Delete an index. Irreversible; protect with `action.destructive_requires_name: true` in production.
POST /:index/_close
Close an index. Closed indices keep their files on disk but are not searchable.
POST /:index/_open
Reopen a closed index.
PUT /:index/_settings
Update dynamic settings (number of replicas, refresh interval). Most static settings need close + reopen.
GET /:index/_mapping
Inspect the mapping. Run this before troubleshooting any "why didn't my query match" issue.
PUT /:index/_mapping
Add new fields to an existing mapping. Existing fields cannot be re-typed; you must reindex.
POST /_reindex
Copy documents from one index to another, optionally transforming via script. Use for mapping changes or shard re-balancing.

Document operations

POST /:index/_doc
Index a document with an auto-generated ID.
PUT /:index/_doc/:id
Create or replace a document with a specific ID. Returns `result: created` or `result: updated`.
PUT /:index/_create/:id
Create only. Fails with 409 if the ID already exists. Use when you must NOT overwrite.
GET /:index/_doc/:id
Retrieve a document by ID, including `_source` and metadata.
POST /:index/_update/:id
Partial update. Body: `{ "doc": { ... } }` for a merge, or `{ "script": { ... } }` for scripted.
DELETE /:index/_doc/:id
Delete a single document by ID.
POST /:index/_delete_by_query
Delete every document matching a query. The async version returns a task ID; poll with `/_tasks/:task_id`.
POST /_bulk
Send many index/create/update/delete actions in one round trip. The NDJSON format alternates action lines and source lines.
POST /_mget
Multi-get: fetch many documents by ID in one request.

Search basics

GET /:index/_search
Default search: returns the first 10 documents sorted by relevance score.
GET /:index/_search?q=field:value
URI query string. Convenient for ad-hoc; less expressive than the JSON DSL.
POST /:index/_search { "query": { "match_all": {} } }
Match every document. Equivalent to no query, but explicit.
POST /:index/_search { "query": { "match": { "title": "engine" } } }
Full-text match. Runs the query through the field's analyzer; matches if any token overlaps.
POST /:index/_search { "query": { "term": { "status.keyword": "published" } } }
Exact term match on a `keyword` field. No analysis. Use for IDs, enums, exact strings.
POST /:index/_search { "query": { "range": { "price": { "gte": 10, "lt": 50 } } } }
Range query. Works on numeric, date, ip fields.
POST /:index/_search { "query": { "prefix": { "name.keyword": "foo" } } }
Prefix match. Expensive on text fields without a properly mapped `keyword` subfield.

Compound queries (bool)

must
All clauses must match. Contributes to the relevance score (acts like AND for ranking).
filter
All clauses must match. Does NOT contribute to the score; cached aggressively. Use for binary filters.
should
At least one (`minimum_should_match`, default 0 if `must`/`filter` present, else 1) must match. Contributes to score.
must_not
No clause may match. Does NOT contribute to score. Use for exclusions.
Example
`{ "bool": { "must": [{"match":{"title":"phone"}}], "filter": [{"term":{"in_stock":true}}], "must_not":[{"term":{"discontinued":true}}] } }`

Search options

size
Number of hits to return per page. Default 10. Maximum 10000 without `search_after`.
from
Pagination offset. Combined with `size`, hard-capped at `index.max_result_window` (default 10000).
search_after
Cursor-based pagination using sort values from the last hit. The only way to paginate past 10000 results.
sort
Sort by one or more fields. Sorting on `text` fields requires `fielddata: true` (expensive) or use a `keyword` subfield.
_source
Filter which source fields are returned. `false` = no source. Array = include only these. Object with includes/excludes for fine control.
highlight
Returns matched fragments with HTML tags around hits. Useful for snippet UIs.
explain
Adds `_explanation` to each hit, breaking down the score. Use when relevance ranking is mysterious.
track_total_hits
`true` for exact total counts (slower); a number for accurate up to N; `false` to skip counting entirely.

Aggregations

terms
Bucket per distinct value: `{ "agg": { "terms": { "field": "category.keyword", "size": 10 } } }`. Like SQL GROUP BY.
date_histogram
Bucket by time: `{ "agg": { "date_histogram": { "field": "@timestamp", "calendar_interval": "day" } } }`.
histogram
Numeric buckets at fixed interval. Useful for price brackets, latency distributions.
range
Custom buckets with `from`/`to` boundaries. Use when buckets are not equal-width.
filters
One bucket per named filter. Best for ad-hoc "slice the data N ways" queries.
avg / sum / min / max / stats
Metric aggs. `stats` returns all five in one pass.
cardinality
Approximate distinct count (HyperLogLog). Cheap and accurate to ~40000 values.
percentiles
Approximate percentile values (T-Digest). Use for latency reports.
Sub-aggregations
Aggs nest: a `terms` agg can contain an `avg` sub-agg to get "average price per category".

Vector and kNN search

dense_vector field type
Mapping: `{ "type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine" }`. Required to query with kNN.
knn search
`{ "knn": { "field": "embedding", "query_vector": [...], "k": 10, "num_candidates": 100 } }`. Returns top-k nearest neighbors.
Hybrid search
Combine kNN with a text query via `bool` and per-clause `boost` to mix semantic + lexical relevance.
Quantization
Set `"element_type": "byte"` or `"int4"` to shrink vector storage by 4-8x with minor recall cost.
Semantic text
ES 8.15+ field type `semantic_text` runs embedding inference at index and query time; no client-side embeddings needed.

ESQL (8.11+)

POST /_query
Run an ESQL query. Body: `{ "query": "FROM logs | WHERE status >= 400 | STATS count() BY host" }`.
FROM
Source index or alias. Comma-separated for multi-source: `FROM logs-*,metrics-*`.
WHERE
Filter rows. SQL-like operators: `=`, `!=`, `>`, `<`, `LIKE`, `IN`, `IS NULL`.
STATS ... BY
Aggregate. `STATS count() BY host, status` is equivalent to a `terms` agg on (host, status).
EVAL
Compute new columns: `EVAL latency_ms = duration / 1000`.
SORT / LIMIT
Order and cap results. `SORT @timestamp DESC | LIMIT 50`.
When to use
ESQL is preferred for ad-hoc analytics and dashboards; Query DSL is still preferred for search relevance tuning.

Cluster and admin

GET /_cluster/health
Cluster status: `green` (all primaries + replicas assigned), `yellow` (replicas missing), `red` (a primary is missing).
GET /_cat/nodes?v
List nodes with heap, CPU, load, role.
GET /_cat/shards?v
Every shard with state (`STARTED`, `INITIALIZING`, `RELOCATING`, `UNASSIGNED`) and reason if unassigned.
GET /_cluster/allocation/explain
When you have unassigned shards, this tells you exactly why each one can't be placed. Single most useful debug endpoint.
PUT /_cluster/settings
Update cluster-wide settings. `transient` (lost on restart) vs `persistent` (survives).
POST /_cluster/reroute
Manually move or allocate shards. Last-resort tool; usually `allocation/explain` shows a config issue to fix instead.
GET /_nodes/hot_threads
Snapshot of what each node's hottest threads are doing. The first stop when CPU is pinned.

Snapshots and backup

PUT /_snapshot/:repo
Register a snapshot repository (S3, GCS, Azure Blob, shared filesystem).
PUT /_snapshot/:repo/:snapshot
Take a snapshot. Add `?wait_for_completion=true` for synchronous (small clusters only).
GET /_snapshot/:repo/:snapshot/_status
Progress of a running snapshot.
POST /_snapshot/:repo/:snapshot/_restore
Restore a snapshot. Specify `indices` and `rename_pattern` to restore alongside live indices.
SLM
Snapshot Lifecycle Management automates snapshots on a schedule with retention. `PUT /_slm/policy/:name`.

Search DSL: a worked example

The Query DSL is more verbose than the URI query string but composable. The idiom I reach for first is a bool query with must for text relevance and filter for binary conditions:

bash
curl -k -u elastic:CHANGEME -X POST "https://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' -d '{
  "size": 20,
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "wireless headphones" } }
      ],
      "filter": [
        { "term":  { "in_stock": true } },
        { "range": { "price": { "lte": 200 } } }
      ]
    }
  },
  "sort": [
    "_score",
    { "rating": "desc" }
  ],
  "_source": ["name", "price", "rating", "image_url"]
}'

What each part does:

  • must runs the user query through the name field's analyzer and contributes to _score.
  • filter enforces "in stock and under $200" without touching the score; these clauses are cached.
  • sort ranks by relevance first, then by rating as a tiebreaker.
  • _source restricts the response to the four fields the UI actually renders, saving bandwidth.

Aggregations: facets and analytics

Aggregations turn search results into facet counts, time series, and summary statistics. A typical product-search facet response:

bash
curl -k -u elastic:CHANGEME -X POST "https://localhost:9200/products/_search" \
  -H 'Content-Type: application/json' -d '{
  "size": 0,
  "query": { "match": { "name": "headphones" } },
  "aggs": {
    "by_brand":     { "terms": { "field": "brand.keyword", "size": 10 } },
    "price_bucket": { "histogram": { "field": "price", "interval": 50 } },
    "stats_price":  { "stats": { "field": "price" } }
  }
}'

"size": 0 skips the hits and just returns the aggregations. That is the right pattern when you only need counts.

Aggregations can nest. To get the average price by brand:

json
"aggs": {
  "by_brand": {
    "terms": { "field": "brand.keyword", "size": 10 },
    "aggs": {
      "avg_price": { "avg": { "field": "price" } }
    }
  }
}

Elasticsearch 8.x added native vector search; 9.x stabilized it as the recommended path for semantic retrieval. Define a dense_vector field in the mapping, then run a knn query with a query vector you compute client-side (or use semantic_text to skip the client step).

Mapping:

json
{
  "mappings": {
    "properties": {
      "title":     { "type": "text" },
      "embedding": { "type": "dense_vector", "dims": 768, "index": true, "similarity": "cosine" }
    }
  }
}

Query:

json
{
  "knn": {
    "field": "embedding",
    "query_vector": [0.012, -0.041, ...],
    "k": 10,
    "num_candidates": 100
  }
}

num_candidates controls recall vs latency. A common ratio is 10× k. For hybrid search that combines lexical relevance with vector similarity, put both knn and a bool/match query into the same request:

json
{
  "query": { "match": { "title": "noise cancelling" } },
  "knn":   { "field": "embedding", "query_vector": [...], "k": 10, "num_candidates": 100, "boost": 0.7 }
}

For the broader context on how vector search fits with embeddings, document chunking, and a working app, see how to build RAG with embeddings and vector search and how to add semantic search to a MySQL app.

ESQL: the SQL-like query language

Introduced in 8.11 and refined through the 8.x and 9.x lines, ESQL gives Elasticsearch a piped query language that reads like SQL with a Unix-pipeline twist. For analytics, dashboards, and log exploration it is often easier than building the equivalent Query DSL by hand.

code
FROM logs-2026.05.*
| WHERE status >= 400 AND status < 500
| STATS count = count() BY host, status
| SORT count DESC
| LIMIT 20

When to use which:

Use caseDSLESQL
Search-relevance tuning, scoringYesNo (no relevance scoring)
Time-series analytics, log queriesPossible but verboseYes
Joins across multiple indicesNoYes (LOOKUP JOIN in 8.15+)
AggregationsYesYes (cleaner syntax)
Vector / kNN searchYesNot yet

Version compatibility

FeatureAvailable sinceNotes
Single type per index (no _type)7.0Multi-type indices are gone. References to types in older tutorials are obsolete.
Composable index templates7.8Old index_template API replaced by index_templates + component_templates.
Searchable snapshots7.10Mount a snapshot as a read-only index in the frozen tier.
Runtime fields7.11Compute fields at query time from _source or other fields. Cheap mapping additions.
dense_vector indexed for kNN8.0Required for the knn query.
ESQL8.11General-availability piped query language. Major addition.
semantic_text field type8.15Embedding inference at index and query time, no external embedding step.
Elasticsearch 9.02025License returned to open AGPL + ELv2 alongside the existing free Basic.

For specific upgrade walkthroughs, the official Elasticsearch upgrade documentation is the source of truth. Always snapshot before upgrading.

Common mistakes

The bugs I have shipped or seen in code review.

Searching a keyword field with match, or a text field with term. A match runs the value through the analyzer (lowercase, tokenize); a term does not. If your category field is keyword, { "match": { "category": "Footwear" } } will look for the analyzed form, which a keyword field never produced. Use term. Conversely, term against a text field looks for the exact analyzed token (often lowercased), so { "term": { "title": "Headphones" } } misses Headphones if the analyzer lowercased it to headphones.

Using from/size to paginate past the 10,000-result wall. The default index.max_result_window caps from + size at 10,000 because deep pagination forces every shard to maintain a deep priority queue. Raising the setting is almost always the wrong fix. Use search_after with a unique sort field (usually _id as a tiebreaker) or a Point In Time (PIT) for stable cursor pagination.

Sorting on a text field. Text fields do not support sorting unless you enable fielddata: true (which loads every term into memory). The correct fix is to add a keyword subfield in the mapping (fields: { keyword: { type: "keyword" } }) and sort on name.keyword instead.

Forgetting to refresh after indexing in a test. Elasticsearch refreshes every second by default. If your test indexes a doc and immediately queries it back, the doc is not yet searchable. Append ?refresh=true to the index call, or call POST /:index/_refresh. Do NOT set refresh_interval to a tiny value in production; the cost in segment count is brutal.

Letting unmapped fields run wild. Without an explicit mapping, Elasticsearch auto-detects types from the first document. A field that holds "42" on doc 1 and 42 on doc 2 will end up text and queries with term fail in confusing ways. Define mappings up front for anything that matters.

Treating shards as cheap. Each shard is a Lucene index with overhead. Hundreds of small shards per node will starve heap. The general guidance: 10-50GB per shard for search workloads, fewer larger shards over more smaller ones, total shards per node roughly 20 × heap_in_GB.

Running a single-node cluster in production. A one-node cluster has number_of_replicas: 1 by default and cannot allocate the replica anywhere; cluster status sits at yellow. Either set replicas to 0 (knowing you have no redundancy) or run at least two data nodes. Don't ignore the yellow.

Snapshotting without a registered repository. Snapshots require a repository registered ahead of time; PUT /_snapshot/:repo once, then PUT /_snapshot/:repo/:snapshot. Many teams discover at recovery time that they never set this up. Test the restore path before you need it.

Frequently asked questions

See also

External references: Elasticsearch official documentation is the source of truth for endpoint behavior and version-specific changes. The Elastic Search Labs blog covers the newer ESQL, vector search, and semantic_text features in depth.

TagsElasticsearchSearchQuery DSLLuceneESQLVector SearchREST APIJSONClusterCheat Sheet
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

Regex Cheat Sheet including regex symbols, ranges, grouping, assertions, syntax tables, examples, matches, and compatibility tables. Definitive Regular Expressions Quick Reference!

Regex Cheat Sheet

Regex Cheat Sheet including regex symbols, ranges, grouping, assertions, syntax tables, examples, matches, and compatibility tables. Definitive Regular Expressions Quick Reference!

Open vintage hardcover reference manual on a dark slate desk, dense columned print on warm cream pages lit by a single warm amber side lamp

MySQL Cheat Sheet

MySQL cheat sheet covering CLI commands, database and table operations, joins, indexes, backups, user management, and transactions, with version notes for 5.7, 8.0, and 8.4.

Add semantic search to an existing MySQL app with MySQL 9's VECTOR type and embeddings from Voyage or OpenAI. Index, query, and rank without a separate vector DB.

How to Add Semantic Search to a MySQL App

Add semantic search to an existing MySQL app with MySQL 9's VECTOR type, an embedding model (Voyage, OpenAI), and a cosine-similarity index. No separate vector database needed.