Kubernetes Cheatsheet 19 — Debugging in K8s

K8s debugging cheatsheet.

Pod not starting

kubectl describe pod web | tail -50          # Events section
kubectl get events --sort-by=.lastTimestamp
kubectl logs web
kubectl logs web --previous                  # crashed previous

Common statuses:

Status	Meaning
Pending	Scheduling / image pull
ContainerCreating	Volume mount / image pull
CrashLoopBackOff	Container exits, restarted, exits
ImagePullBackOff	Can’t pull image
ErrImagePull	Bad image / no creds
RunContainerError	Failed to start (entrypoint, perms)
Error	Container exited non-zero
Terminating	Cleanup
OOMKilled	Out of memory

CrashLoopBackOff

kubectl logs web
kubectl logs web --previous
kubectl describe pod web                     # liveness probe failing?

Common causes:

App crash (check logs).
Bad command/args.
Missing config / env / volume.
Liveness probe too strict.

ImagePullBackOff

kubectl describe pod web
# Events: Failed to pull image "..."

Causes:

Bad image name/tag.
Private registry, no imagePullSecrets.
Network to registry.
Arch mismatch (arm64 image on amd64 node).

Stuck Pending

kubectl describe pod web
# Events: 0/3 nodes are available: 3 Insufficient cpu

Insufficient resources → scale cluster.
No matching nodes (selector/affinity).
PVC not bound (wait for provisioning).
Taints with no tolerations.

kubectl debug (ephemeral container)

kubectl debug -it web --image=nicolaka/netshoot --target=web

Attaches debug container to existing pod. Toolkit available.

kubectl run debug pod

kubectl run debug --rm -it --image=nicolaka/netshoot -- bash

# Inside:
nslookup web.prod
curl http://web.prod/health
ping db.prod.svc.cluster.local

Inspect ConfigMaps / Secrets

kubectl get cm app-config -o yaml
kubectl get secret app-secrets -o jsonpath='{.data.PASSWORD}' | base64 -d

DNS check

kubectl run debug --rm -it --image=nicolaka/netshoot -- nslookup kubernetes
kubectl run debug --rm -it --image=nicolaka/netshoot -- nslookup web.prod

If DNS fails: check kube-dns/CoreDNS pods.

kubectl -n kube-system get pods -l k8s-app=kube-dns
kubectl -n kube-system logs deploy/coredns

Service has no endpoints

kubectl get endpoints web
# If empty: selector doesn't match pods
kubectl get pods --show-labels

Network policy blocking

kubectl get networkpolicy -A

Try temporarily deleting (in dev) to confirm. In prod, use kubectl describe netpol to inspect rules.

Port-forward not working

kubectl port-forward svc/web 8080:80

Service has no endpoints.
Pod listening on wrong interface (must be 0.0.0.0).
Firewall locally.

Image pull from private registry

kubectl create secret docker-registry ghcr-secret \
  --docker-server=ghcr.io \
  --docker-username=USER \
  --docker-password=TOKEN

# Attach to SA (default)
kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "ghcr-secret"}]}'

Resource starvation

kubectl top pod
kubectl top node

kubectl describe node | grep -A 10 "Allocated resources"

OOMKilled

kubectl get pod web -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'
# OOMKilled → memory limit too low or memory leak

Slow pod

Use exec + profile inside:

kubectl exec -it web -- top
kubectl exec -it web -- py-spy top --pid 1

Drift / unexpected state

kubectl diff -f deployment.yaml
kubectl rollout history deploy/web
kubectl rollout undo deploy/web

Check audit log

If audit enabled, see who changed what:

kubectl logs -n kube-system kube-apiserver-... | grep "resource: ..."

kubectl trace (eBPF)

kubectl trace run node/myhost -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", str(args.filename)); }'

eBPF-powered host debugging.

stern (multi-pod logs)

brew install stern
stern web                          # tail logs from all pods matching
stern -n prod -l app=web
stern --selector app=web --tail 50 --since 1h

Better than kubectl logs for multi-replica deploys.

k9s

brew install k9s
k9s

Interactive TUI. Browse pods, logs, exec, describe.

kubectl-debug

kubectl debug node/mynode -it --image=nicolaka/netshoot

Drops into a privileged pod on the node. Mount /host to inspect filesystem.

Common mistakes

Looking at kubectl logs without --previous after crash.
Forgetting to check kubectl describe events.
Wrong namespace (default vs prod).
Pod thrown in wrong cluster (wrong context).
Assuming readiness = working — check actual endpoint.

Pod not starting#

CrashLoopBackOff#

ImagePullBackOff#

Stuck Pending#

kubectl debug (ephemeral container)#

kubectl run debug pod#

Inspect ConfigMaps / Secrets#

DNS check#

Service has no endpoints#

Network policy blocking#

Port-forward not working#

Image pull from private registry#

Resource starvation#

OOMKilled#

Slow pod#

Drift / unexpected state#

Check audit log#

kubectl trace (eBPF)#

stern (multi-pod logs)#

k9s#

kubectl-debug#

Common mistakes#

Read this next#