K8s debugging cheatsheet.

Pod not starting

kubectl describe pod web | tail -50          # Events section
kubectl get events --sort-by=.lastTimestamp
kubectl logs web
kubectl logs web --previous                  # crashed previous

Common statuses:

StatusMeaning
PendingScheduling / image pull
ContainerCreatingVolume mount / image pull
CrashLoopBackOffContainer exits, restarted, exits
ImagePullBackOffCan’t pull image
ErrImagePullBad image / no creds
RunContainerErrorFailed to start (entrypoint, perms)
ErrorContainer exited non-zero
TerminatingCleanup
OOMKilledOut of memory

CrashLoopBackOff

kubectl logs web
kubectl logs web --previous
kubectl describe pod web                     # liveness probe failing?

Common causes:

  • App crash (check logs).
  • Bad command/args.
  • Missing config / env / volume.
  • Liveness probe too strict.

ImagePullBackOff

kubectl describe pod web
# Events: Failed to pull image "..."

Causes:

  • Bad image name/tag.
  • Private registry, no imagePullSecrets.
  • Network to registry.
  • Arch mismatch (arm64 image on amd64 node).

Stuck Pending

kubectl describe pod web
# Events: 0/3 nodes are available: 3 Insufficient cpu
  • Insufficient resources → scale cluster.
  • No matching nodes (selector/affinity).
  • PVC not bound (wait for provisioning).
  • Taints with no tolerations.

kubectl debug (ephemeral container)

kubectl debug -it web --image=nicolaka/netshoot --target=web

Attaches debug container to existing pod. Toolkit available.

kubectl run debug pod

kubectl run debug --rm -it --image=nicolaka/netshoot -- bash

# Inside:
nslookup web.prod
curl http://web.prod/health
ping db.prod.svc.cluster.local

Inspect ConfigMaps / Secrets

kubectl get cm app-config -o yaml
kubectl get secret app-secrets -o jsonpath='{.data.PASSWORD}' | base64 -d

DNS check

kubectl run debug --rm -it --image=nicolaka/netshoot -- nslookup kubernetes
kubectl run debug --rm -it --image=nicolaka/netshoot -- nslookup web.prod

If DNS fails: check kube-dns/CoreDNS pods.

kubectl -n kube-system get pods -l k8s-app=kube-dns
kubectl -n kube-system logs deploy/coredns

Service has no endpoints

kubectl get endpoints web
# If empty: selector doesn't match pods
kubectl get pods --show-labels

Network policy blocking

kubectl get networkpolicy -A

Try temporarily deleting (in dev) to confirm. In prod, use kubectl describe netpol to inspect rules.

Port-forward not working

kubectl port-forward svc/web 8080:80
  • Service has no endpoints.
  • Pod listening on wrong interface (must be 0.0.0.0).
  • Firewall locally.

Image pull from private registry

kubectl create secret docker-registry ghcr-secret \
  --docker-server=ghcr.io \
  --docker-username=USER \
  --docker-password=TOKEN

# Attach to SA (default)
kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "ghcr-secret"}]}'

Resource starvation

kubectl top pod
kubectl top node

kubectl describe node | grep -A 10 "Allocated resources"

OOMKilled

kubectl get pod web -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'
# OOMKilled → memory limit too low or memory leak

Slow pod

Use exec + profile inside:

kubectl exec -it web -- top
kubectl exec -it web -- py-spy top --pid 1

Drift / unexpected state

kubectl diff -f deployment.yaml
kubectl rollout history deploy/web
kubectl rollout undo deploy/web

Check audit log

If audit enabled, see who changed what:

kubectl logs -n kube-system kube-apiserver-... | grep "resource: ..."

kubectl trace (eBPF)

kubectl trace run node/myhost -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", str(args.filename)); }'

eBPF-powered host debugging.

stern (multi-pod logs)

brew install stern
stern web                          # tail logs from all pods matching
stern -n prod -l app=web
stern --selector app=web --tail 50 --since 1h

Better than kubectl logs for multi-replica deploys.

k9s

brew install k9s
k9s

Interactive TUI. Browse pods, logs, exec, describe.

kubectl-debug

kubectl debug node/mynode -it --image=nicolaka/netshoot

Drops into a privileged pod on the node. Mount /host to inspect filesystem.

Common mistakes

  • Looking at kubectl logs without --previous after crash.
  • Forgetting to check kubectl describe events.
  • Wrong namespace (default vs prod).
  • Pod thrown in wrong cluster (wrong context).
  • Assuming readiness = working — check actual endpoint.

Read this next

If you want my K8s debugging cookbook, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .