Docker production patterns cheatsheet.

Restart policies

docker run --restart=no app                 # default
docker run --restart=on-failure app
docker run --restart=on-failure:3 app       # max 3 retries
docker run --restart=always app             # also on docker daemon restart
docker run --restart=unless-stopped app     # like always but respects manual stop

Prefer unless-stopped.

docker daemon config

// /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": { "max-size": "10m", "max-file": "3" },
  "default-ulimits": { "nofile": { "soft": 65536, "hard": 65536 } },
  "live-restore": true,
  "userns-remap": "default",
  "dns": ["1.1.1.1", "8.8.8.8"]
}
systemctl restart docker

live-restore

Keeps containers running when Docker daemon restarts. Important for updates.

Pull strategy in prod

Don’t :latest. Use:

  • Git SHA: myreg/app:abc1234.
  • Semver: myreg/app:v1.2.3.
  • Or digest: myreg/app@sha256:....
docker run myreg/app@sha256:abc...

Immutable, reproducible.

docker compose for single-host

docker compose -f compose.prod.yml up -d
docker compose -f compose.prod.yml pull
docker compose -f compose.prod.yml up -d --no-deps web   # update only web

Combine override files:

docker compose -f compose.yml -f compose.prod.yml up -d

Rolling updates (Swarm or scripts)

Compose v2 supports rolling-update via Swarm. Outside Swarm: manual pattern.

docker compose up -d --no-deps --force-recreate web

Blue-green

services:
  web-blue:
    image: myapp:v1
  web-green:
    image: myapp:v2
  proxy:
    image: nginx
    volumes: ["./active.conf:/etc/nginx/conf.d/default.conf"]

Switch active.conf between blue/green. nginx -s reload.

Watchtower (auto-pull)

services:
  watchtower:
    image: containrrr/watchtower
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    command: --interval 300 --cleanup

Pulls newer images, restarts. Risk: auto-deploying without testing.

docker login (registry auth)

docker login ghcr.io -u USER -p TOKEN
docker login myreg.example.com

Better: per-host config with ~/.docker/config.json or credential helpers.

Self-hosted registry

docker run -d -p 5000:5000 \
  -v $(pwd)/registry:/var/lib/registry \
  --restart=unless-stopped \
  registry:2

For private images without a cloud provider.

docker context

docker context create remote --docker host=ssh://user@host
docker context use remote
docker ps                       # runs on remote host

Useful for managing remote Docker hosts from laptop.

Cron jobs (one-off containers)

0 2 * * * docker run --rm myreg/backup:v1

Scheduled docker run from cron. Or use ofelia / cronicle.

Networking with reverse proxy

services:
  proxy:
    image: caddy:2
    ports: ["80:80", "443:443"]
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
    networks: [public, app]
  
  web:
    image: myreg/app:v1
    networks: [app]
    expose: ["8000"]
    restart: unless-stopped

networks:
  public:
  app:

volumes:
  caddy_data:

Caddyfile:

example.com {
    reverse_proxy web:8000
}

Auto-HTTPS via Let’s Encrypt.

Resource sizing per container

  • 100m CPU + 128MB RAM minimum for any service.
  • 500m + 512MB for typical web apps.
  • 2 CPU + 2GB for DB.
  • Stress test to verify.

Backups

# DB
docker exec db pg_dump -U postgres myapp | gzip > backup-$(date +%F).sql.gz

# Named volume
docker run --rm -v pg_data:/data -v $(pwd):/backup alpine \
  tar czf /backup/pg_data.tar.gz -C /data .

Off-site to S3 / B2 / similar.

Monitor / alert

  • cAdvisor: per-container metrics.
  • Prometheus + node_exporter + cAdvisor: full stack.
  • Grafana: dashboards.
  • Uptime Kuma: external uptime monitoring.

Health endpoint convention

GET /health          # liveness — minimal
GET /health/ready    # readiness — DB, cache, etc.

Probes can hit /health/ready to gate traffic.

Disk pressure

Docker holds unused images / containers / volumes. On low disk:

docker system prune -a --filter "until=24h"
docker system prune --volumes

Crontab:

0 3 * * 0 docker system prune -af --filter "until=168h"

SSH keys via secrets

services:
  deployer:
    image: myreg/deploy
    secrets:
      - ssh_key

secrets:
  ssh_key:
    file: ~/.ssh/id_ed25519

Or use SSH agent forwarding.

Common mistakes

  • :latest tag in prod.
  • Restart=always on broken container → restart loop.
  • No log rotation → disk full.
  • Updating image but not recreating container.
  • Volumes shared between incompatible image versions.

Read this next

If you want my prod compose templates + backup scripts, they’re at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .