Two related but distinct features: restart policies say what happens when a container exits (crashes, host reboots), and health checks say whether the process inside the container is actually working. Pair them with Compose's depends_on: condition: service_healthy and your stack starts in the right order with the right resilience.
Restart policies
Set with --restart on docker run, or restart: in Compose. Four values:
| Policy | Behavior |
|---|---|
no (default) | Never restart. Container exits, stays exited. |
on-failure[:max] | Restart only on non-zero exit. Optional max-retries count. |
always | Always restart, including after host reboot. Even restarts after you manually stopped it. |
unless-stopped | Always restart, except when you explicitly stopped it. Same as always minus the surprise. |
docker run -d --restart unless-stopped --name web nginx:alpineservices:
web:
image: nginx:alpine
restart: unless-stoppedunless-stopped is the right default for almost every long-running service. It restarts on crashes and host reboots (what you want), but stays stopped when you docker stop it explicitly (also what you want — you stopped it on purpose).
always differs in exactly one place: after a docker stop, the next time the daemon starts (e.g., after a host reboot), always starts the container even though you'd stopped it. Surprising. Use unless-stopped instead.
What about exit codes?
on-failure only restarts on non-zero exit. The conventions:
- Exit 0 — normal, intentional exit.
on-failuredoesn't restart. - Exit 1-126 — application error.
on-failurerestarts. - Exit 137 — OOMKilled. Container ran out of memory and got killed.
on-failurerestarts (and the same OOM will probably happen again — fix the memory limit). - Exit 143 — SIGTERM. Usually from
docker stop.on-failuredoesn't restart.
on-failure:5 retries up to 5 times before giving up. Useful when an app crashes on bad input and you want it to try a few times but not loop forever.
For pure services I want to stay running, unless-stopped. For workers that should die hard when something goes wrong, on-failure:N.
Health checks
A health check tells Docker "is this container actually doing its job?" — separate from "is the process alive?" A web server with the process running but returning 500 to every request is alive, but unhealthy.
In a Dockerfile:
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD curl -f http://localhost:3000/healthz || exit 1--interval=30s— check every 30 seconds.--timeout=5s— kill the check if it takes longer than 5 seconds.--start-period=10s— grace period after start during which failures don't count.--retries=3— flip to "unhealthy" only after this many consecutive failures.CMD ...— the command. Exit 0 = healthy, non-zero = unhealthy.
In Compose:
services:
web:
image: my-app
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/healthz"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10sTwo test forms:
["CMD", ...]— exec form, runs directly without a shell.["CMD-SHELL", "..."]— shell form, supports pipes and redirects.
docker ps then shows the status next to each container:
CONTAINER ID IMAGE ... STATUS NAMES
abc123def my-app ... Up 5 minutes (healthy) web
Healthcheck patterns by service type
Web app:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/healthz"]Assumes the app has a /healthz route that returns 200 when healthy. Standard pattern.
Postgres:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]pg_isready ships with the Postgres image and is purpose-built for this.
MySQL:
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-p${MYSQL_ROOT_PASSWORD}"]Redis:
healthcheck:
test: ["CMD", "redis-cli", "ping"]MongoDB:
healthcheck:
test: ["CMD", "mongosh", "--quiet", "--eval", "db.runCommand({ ping: 1 })"]Elasticsearch:
healthcheck:
test: ["CMD-SHELL", "curl -fs http://localhost:9200/_cluster/health || exit 1"]depends_on: condition: service_healthy
This is the payoff. Without it, Compose's depends_on only controls start order:
services:
web:
depends_on:
- db # web starts after db... but before db is readyPostgres takes 5-30 seconds to initialize on first start. web boots in less than that and tries to connect, fails, crashes. With the healthy condition:
services:
web:
depends_on:
db:
condition: service_healthy
db:
image: postgres:17
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5Now Compose waits until the db healthcheck reports healthy before starting web. The dependency chain works the way the name suggests.
Three conditions:
service_started— default, container is created/started.service_healthy— healthcheck reports healthy.service_completed_successfully— service exits with code 0 (for one-shot init containers).
Inspect health-check status
docker inspect --format '{{json .State.Health}}' :container_name | jqThat shows the full health-check history — status, last 5 results, exit codes, output. Useful when a container is unhealthy and you need to see why.
docker events --filter event=health_status streams health transitions in real time.
Common pitfalls
restart: alwaysfightingdocker stop. Useunless-stoppedinstead.- No healthcheck, depends_on still works in name. Without a healthcheck on the dependency, Compose can't wait for "healthy" — only "started." Add the healthcheck.
- Healthcheck that uses
curlon an image without curl. Alpine images don't ship curl by default. Usewget --spider, or install curl, or write a healthcheck in the app's own runtime (e.g., a Node script that hits its own server). - Healthcheck interval too aggressive. Every 1s means the check itself is a load on the container. Default 30s is fine; for slow-starting services 60s or higher.
- No
start_period. A web app that takes 20 seconds to compile a TypeScript bundle on first start will fail every health check for the first 20 seconds. Withoutstart_period, those failures count as retries and the container is marked unhealthy from the start. Setstart_periodto roughly the slow-start duration.
What to do next
- Docker Compose: Getting Started — full Compose context.
- How to Write a Dockerfile — the HEALTHCHECK instruction in a Dockerfile.
- Docker Container Lifecycle — restart policies in context of stop/start.
FAQ
Sources
Authoritative references this article was fact-checked against.
- docker run --restart — officialdocs.docker.com
- HEALTHCHECK in Dockerfile referencedocs.docker.com


