Docker Restart Policies and Health Checks (and depends

Two related but distinct features: restart policies say what happens when a container exits (crashes, host reboots), and health checks say whether the process inside the container is actually working. Pair them with Compose's depends_on: condition: service_healthy and your stack starts in the right order with the right resilience.

Restart policies

Set with --restart on docker run, or restart: in Compose. Four values:

Policy	Behavior
`no` (default)	Never restart. Container exits, stays exited.
`on-failure[:max]`	Restart only on non-zero exit. Optional max-retries count.
`always`	Always restart, including after host reboot. Even restarts after you manually stopped it.
`unless-stopped`	Always restart, except when you explicitly stopped it. Same as `always` minus the surprise.

bash

docker run -d --restart unless-stopped --name web nginx:alpine

yaml

services:
  web:
    image: nginx:alpine
    restart: unless-stopped

unless-stopped is the right default for almost every long-running service. It restarts on crashes and host reboots (what you want), but stays stopped when you docker stop it explicitly (also what you want — you stopped it on purpose).

always differs in exactly one place: after a docker stop, the next time the daemon starts (e.g., after a host reboot), always starts the container even though you'd stopped it. Surprising. Use unless-stopped instead.

What about exit codes?

on-failure only restarts on non-zero exit. The conventions:

Exit 0 — normal, intentional exit. on-failure doesn't restart.
Exit 1-126 — application error. on-failure restarts.
Exit 137 — OOMKilled. Container ran out of memory and got killed. on-failure restarts (and the same OOM will probably happen again — fix the memory limit).
Exit 143 — SIGTERM. Usually from docker stop. on-failure doesn't restart.

on-failure:5 retries up to 5 times before giving up. Useful when an app crashes on bad input and you want it to try a few times but not loop forever.

For pure services I want to stay running, unless-stopped. For workers that should die hard when something goes wrong, on-failure:N.

Health checks

A health check tells Docker "is this container actually doing its job?" — separate from "is the process alive?" A web server with the process running but returning 500 to every request is alive, but unhealthy.

In a Dockerfile:

dockerfile

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:3000/healthz || exit 1

--interval=30s — check every 30 seconds.
--timeout=5s — kill the check if it takes longer than 5 seconds.
--start-period=10s — grace period after start during which failures don't count.
--retries=3 — flip to "unhealthy" only after this many consecutive failures.
CMD ... — the command. Exit 0 = healthy, non-zero = unhealthy.

In Compose:

yaml

services:
  web:
    image: my-app
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/healthz"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s

Two test forms:

["CMD", ...] — exec form, runs directly without a shell.
["CMD-SHELL", "..."] — shell form, supports pipes and redirects.

docker ps then shows the status next to each container:

code

CONTAINER ID   IMAGE      ...   STATUS                       NAMES
abc123def      my-app     ...   Up 5 minutes (healthy)       web

Healthcheck patterns by service type

Web app:

yaml

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:3000/healthz"]

Assumes the app has a /healthz route that returns 200 when healthy. Standard pattern.

Postgres:

yaml

healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]

pg_isready ships with the Postgres image and is purpose-built for this.

MySQL:

yaml

healthcheck:
  test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-p${MYSQL_ROOT_PASSWORD}"]

Redis:

yaml

healthcheck:
  test: ["CMD", "redis-cli", "ping"]

MongoDB:

yaml

healthcheck:
  test: ["CMD", "mongosh", "--quiet", "--eval", "db.runCommand({ ping: 1 })"]

Elasticsearch:

yaml

healthcheck:
  test: ["CMD-SHELL", "curl -fs http://localhost:9200/_cluster/health || exit 1"]

depends_on: condition: service_healthy

This is the payoff. Without it, Compose's depends_on only controls start order:

yaml

services:
  web:
    depends_on:
      - db   # web starts after db... but before db is ready

Postgres takes 5-30 seconds to initialize on first start. web boots in less than that and tries to connect, fails, crashes. With the healthy condition:

yaml

services:
  web:
    depends_on:
      db:
        condition: service_healthy
  db:
    image: postgres:17
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 5

Now Compose waits until the db healthcheck reports healthy before starting web. The dependency chain works the way the name suggests.

Three conditions:

service_started — default, container is created/started.
service_healthy — healthcheck reports healthy.
service_completed_successfully — service exits with code 0 (for one-shot init containers).

Inspect health-check status

bash

docker inspect --format '{{json .State.Health}}' :container_name | jq

That shows the full health-check history — status, last 5 results, exit codes, output. Useful when a container is unhealthy and you need to see why.

docker events --filter event=health_status streams health transitions in real time.

Common pitfalls

restart: always fighting docker stop. Use unless-stopped instead.
No healthcheck, depends_on still works in name. Without a healthcheck on the dependency, Compose can't wait for "healthy" — only "started." Add the healthcheck.
Healthcheck that uses curl on an image without curl. Alpine images don't ship curl by default. Use wget --spider, or install curl, or write a healthcheck in the app's own runtime (e.g., a Node script that hits its own server).
Healthcheck interval too aggressive. Every 1s means the check itself is a load on the container. Default 30s is fine; for slow-starting services 60s or higher.
No start_period. A web app that takes 20 seconds to compile a TypeScript bundle on first start will fail every health check for the first 20 seconds. Without start_period, those failures count as retries and the container is marked unhealthy from the start. Set start_period to roughly the slow-start duration.

What to do next

Docker Compose: Getting Started — full Compose context.
How to Write a Dockerfile — the HEALTHCHECK instruction in a Dockerfile.
Docker Container Lifecycle — restart policies in context of stop/start.

FAQ

unless-stopped. It does everything always does (restart on crash, restart after host reboot) except the one thing you don't want: it doesn't restart a container after you explicitly stopped it. always will start it back up the next time the Docker daemon starts, which is rarely what you want.

Until the healthcheck reports healthy, or until Compose times out. The timeout per healthcheck attempt is set by timeout:; the number of attempts before giving up is retries:. Combined with start_period: (grace period at start), you control how long Compose will wait. If the dependency never goes healthy, Compose eventually fails the dependent service's startup.

By itself, no — HEALTHCHECK only changes the reported status. Restart policies trigger on exit code, not health status. To restart an unhealthy container, pair the healthcheck with an external supervisor (Docker Swarm, Kubernetes, or a small monitoring tool) that watches health status and restarts.

By default, depends_on only waits for the container to be started, not ready. Add a healthcheck to the dependency and use condition: service_healthy on the depends_on entry. That's the only way Compose knows when "ready" actually means ready.

The Dockerfile version travels with the image — anyone who pulls and runs it gets the healthcheck. The Compose version is per-service config and overrides any Dockerfile healthcheck for that service. Put it in the Dockerfile if you own the image and want the healthcheck baked in; in Compose if you're using a third-party image and want to add or change the healthcheck.

Docker Restart Policies and Health Checks

Restart policies

What about exit codes?

Health checks

Healthcheck patterns by service type

depends_on: condition: service_healthy

Inspect health-check status

Common pitfalls

What to do next

FAQ

Sources

Ishan Karunaratne

Related posts

Docker Networking Basics: Bridge, Host, and Custom Networks

Do I Need ElasticPress? An Honest Decision Checklist

docker run Cheat Sheet: Flags and Common Patterns

always vs unless-stopped — which should I use?

How long does Compose wait for a service_healthy condition?

Does HEALTHCHECK affect container restart behavior?

Why isn't depends_on waiting for my database to be ready?

What's the difference between HEALTHCHECK in Dockerfile and healthcheck: in Compose?

Sources

Ishan Karunaratne