Two related but distinct features: restart policies say what happens when a container exits (crashes, host reboots), and health checks say whether the process inside the container is actually working. Pair them with Compose's depends_on: condition: service_healthy and your stack starts in the right order with the right resilience.
Restart policies
Set with --restart on docker run, or restart: in Compose. Four values:
| Policy | Behavior |
|---|---|
no (default) | Never restart. Container exits, stays exited. |
on-failure[:max] | Restart only on non-zero exit. Optional max-retries count. |
always | Always restart, including after host reboot. Even restarts after you manually stopped it. |
unless-stopped | Always restart, except when you explicitly stopped it. Same as always minus the surprise. |
docker run -d --restart unless-stopped --name web nginx:alpineservices:
web:
image: nginx:alpine
restart: unless-stoppedunless-stopped is the right default for almost every long-running service. It restarts on crashes and host reboots (what you want), but stays stopped when you docker stop it explicitly (also what you want — you stopped it on purpose).
always differs in exactly one place: after a docker stop, the next time the daemon starts (e.g., after a host reboot), always starts the container even though you'd stopped it. Surprising. Use unless-stopped instead.
What about exit codes?
on-failure only restarts on non-zero exit. The conventions:
- Exit 0 — normal, intentional exit.
on-failuredoesn't restart. - Exit 1-126 — application error.
on-failurerestarts. - Exit 137 — OOMKilled. Container ran out of memory and got killed.
on-failurerestarts (and the same OOM will probably happen again — fix the memory limit). - Exit 143 — SIGTERM. Usually from
docker stop.on-failuredoesn't restart.
on-failure:5 retries up to 5 times before giving up. Useful when an app crashes on bad input and you want it to try a few times but not loop forever.
For pure services I want to stay running, unless-stopped. For workers that should die hard when something goes wrong, on-failure:N.
Health checks
A health check tells Docker "is this container actually doing its job?" — separate from "is the process alive?" A web server with the process running but returning 500 to every request is alive, but unhealthy.
In a Dockerfile:
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD curl -f http://localhost:3000/healthz || exit 1--interval=30s— check every 30 seconds.--timeout=5s— kill the check if it takes longer than 5 seconds.--start-period=10s— grace period after start during which failures don't count.--retries=3— flip to "unhealthy" only after this many consecutive failures.CMD ...— the command. Exit 0 = healthy, non-zero = unhealthy.
In Compose:
services:
web:
image: my-app
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/healthz"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10sTwo test forms:
["CMD", ...]— exec form, runs directly without a shell.["CMD-SHELL", "..."]— shell form, supports pipes and redirects.
docker ps then shows the status next to each container:
CONTAINER ID IMAGE ... STATUS NAMES
abc123def my-app ... Up 5 minutes (healthy) web
Healthcheck patterns by service type
Web app:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/healthz"]Assumes the app has a /healthz route that returns 200 when healthy. Standard pattern.
Postgres:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]pg_isready ships with the Postgres image and is purpose-built for this.
MySQL:
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-p${MYSQL_ROOT_PASSWORD}"]Redis:
healthcheck:
test: ["CMD", "redis-cli", "ping"]MongoDB:
healthcheck:
test: ["CMD", "mongosh", "--quiet", "--eval", "db.runCommand({ ping: 1 })"]Elasticsearch:
healthcheck:
test: ["CMD-SHELL", "curl -fs http://localhost:9200/_cluster/health || exit 1"]depends_on: condition: service_healthy
This is the payoff. Without it, Compose's depends_on only controls start order:
services:
web:
depends_on:
- db # web starts after db... but before db is readyPostgres takes 5-30 seconds to initialize on first start. web boots in less than that and tries to connect, fails, crashes. With the healthy condition:
services:
web:
depends_on:
db:
condition: service_healthy
db:
image: postgres:17
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5Now Compose waits until the db healthcheck reports healthy before starting web. The dependency chain works the way the name suggests.
Three conditions:
service_started— default, container is created/started.service_healthy— healthcheck reports healthy.service_completed_successfully— service exits with code 0 (for one-shot init containers).
Inspect health-check status
docker inspect --format '{{json .State.Health}}' :container_name | jqThat shows the full health-check history — status, last 5 results, exit codes, output. Useful when a container is unhealthy and you need to see why.
docker events --filter event=health_status streams health transitions in real time.
Common pitfalls
restart: alwaysfightingdocker stop. Useunless-stoppedinstead.- No healthcheck, depends_on still works in name. Without a healthcheck on the dependency, Compose can't wait for "healthy" — only "started." Add the healthcheck.
- Healthcheck that uses
curlon an image without curl. Alpine images don't ship curl by default. Usewget --spider, or install curl, or write a healthcheck in the app's own runtime (e.g., a Node script that hits its own server). - Healthcheck interval too aggressive. Every 1s means the check itself is a load on the container. Default 30s is fine; for slow-starting services 60s or higher.
- No
start_period. A web app that takes 20 seconds to compile a TypeScript bundle on first start will fail every health check for the first 20 seconds. Withoutstart_period, those failures count as retries and the container is marked unhealthy from the start. Setstart_periodto roughly the slow-start duration.
What to do next
- Docker Compose: Getting Started — full Compose context.
- How to Write a Dockerfile — the HEALTHCHECK instruction in a Dockerfile.
- Docker Container Lifecycle — restart policies in context of stop/start.
FAQ
unless-stopped. It does everything always does (restart on crash, restart after host reboot) except the one thing you don't want: it doesn't restart a container after you explicitly stopped it. always will start it back up the next time the Docker daemon starts, which is rarely what you want.
Until the healthcheck reports healthy, or until Compose times out. The timeout per healthcheck attempt is set by timeout:; the number of attempts before giving up is retries:. Combined with start_period: (grace period at start), you control how long Compose will wait. If the dependency never goes healthy, Compose eventually fails the dependent service's startup.
By itself, no — HEALTHCHECK only changes the reported status. Restart policies trigger on exit code, not health status. To restart an unhealthy container, pair the healthcheck with an external supervisor (Docker Swarm, Kubernetes, or a small monitoring tool) that watches health status and restarts.
By default, depends_on only waits for the container to be started, not ready. Add a healthcheck to the dependency and use condition: service_healthy on the depends_on entry. That's the only way Compose knows when "ready" actually means ready.
The Dockerfile version travels with the image — anyone who pulls and runs it gets the healthcheck. The Compose version is per-service config and overrides any Dockerfile healthcheck for that service. Put it in the Dockerfile if you own the image and want the healthcheck baked in; in Compose if you're using a third-party image and want to add or change the healthcheck.
Sources
Authoritative references this article was fact-checked against.
- docker run --restart — officialdocs.docker.com
- HEALTHCHECK in Dockerfile referencedocs.docker.com





