K8s debugging cheatsheet.
Pod not starting
kubectl describe pod web | tail -50 # Events section
kubectl get events --sort-by=.lastTimestamp
kubectl logs web
kubectl logs web --previous # crashed previous
Common statuses:
| Status | Meaning |
|---|---|
| Pending | Scheduling / image pull |
| ContainerCreating | Volume mount / image pull |
| CrashLoopBackOff | Container exits, restarted, exits |
| ImagePullBackOff | Can’t pull image |
| ErrImagePull | Bad image / no creds |
| RunContainerError | Failed to start (entrypoint, perms) |
| Error | Container exited non-zero |
| Terminating | Cleanup |
| OOMKilled | Out of memory |
CrashLoopBackOff
kubectl logs web
kubectl logs web --previous
kubectl describe pod web # liveness probe failing?
Common causes:
- App crash (check logs).
- Bad command/args.
- Missing config / env / volume.
- Liveness probe too strict.
ImagePullBackOff
kubectl describe pod web
# Events: Failed to pull image "..."
Causes:
- Bad image name/tag.
- Private registry, no imagePullSecrets.
- Network to registry.
- Arch mismatch (arm64 image on amd64 node).
Stuck Pending
kubectl describe pod web
# Events: 0/3 nodes are available: 3 Insufficient cpu
- Insufficient resources → scale cluster.
- No matching nodes (selector/affinity).
- PVC not bound (wait for provisioning).
- Taints with no tolerations.
kubectl debug (ephemeral container)
kubectl debug -it web --image=nicolaka/netshoot --target=web
Attaches debug container to existing pod. Toolkit available.
kubectl run debug pod
kubectl run debug --rm -it --image=nicolaka/netshoot -- bash
# Inside:
nslookup web.prod
curl http://web.prod/health
ping db.prod.svc.cluster.local
Inspect ConfigMaps / Secrets
kubectl get cm app-config -o yaml
kubectl get secret app-secrets -o jsonpath='{.data.PASSWORD}' | base64 -d
DNS check
kubectl run debug --rm -it --image=nicolaka/netshoot -- nslookup kubernetes
kubectl run debug --rm -it --image=nicolaka/netshoot -- nslookup web.prod
If DNS fails: check kube-dns/CoreDNS pods.
kubectl -n kube-system get pods -l k8s-app=kube-dns
kubectl -n kube-system logs deploy/coredns
Service has no endpoints
kubectl get endpoints web
# If empty: selector doesn't match pods
kubectl get pods --show-labels
Network policy blocking
kubectl get networkpolicy -A
Try temporarily deleting (in dev) to confirm. In prod, use kubectl describe netpol to inspect rules.
Port-forward not working
kubectl port-forward svc/web 8080:80
- Service has no endpoints.
- Pod listening on wrong interface (must be 0.0.0.0).
- Firewall locally.
Image pull from private registry
kubectl create secret docker-registry ghcr-secret \
--docker-server=ghcr.io \
--docker-username=USER \
--docker-password=TOKEN
# Attach to SA (default)
kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "ghcr-secret"}]}'
Resource starvation
kubectl top pod
kubectl top node
kubectl describe node | grep -A 10 "Allocated resources"
OOMKilled
kubectl get pod web -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'
# OOMKilled → memory limit too low or memory leak
Slow pod
Use exec + profile inside:
kubectl exec -it web -- top
kubectl exec -it web -- py-spy top --pid 1
Drift / unexpected state
kubectl diff -f deployment.yaml
kubectl rollout history deploy/web
kubectl rollout undo deploy/web
Check audit log
If audit enabled, see who changed what:
kubectl logs -n kube-system kube-apiserver-... | grep "resource: ..."
kubectl trace (eBPF)
kubectl trace run node/myhost -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", str(args.filename)); }'
eBPF-powered host debugging.
stern (multi-pod logs)
brew install stern
stern web # tail logs from all pods matching
stern -n prod -l app=web
stern --selector app=web --tail 50 --since 1h
Better than kubectl logs for multi-replica deploys.
k9s
brew install k9s
k9s
Interactive TUI. Browse pods, logs, exec, describe.
kubectl-debug
kubectl debug node/mynode -it --image=nicolaka/netshoot
Drops into a privileged pod on the node. Mount /host to inspect filesystem.
Common mistakes
- Looking at
kubectl logswithout--previousafter crash. - Forgetting to check
kubectl describeevents. - Wrong namespace (default vs prod).
- Pod thrown in wrong cluster (wrong context).
- Assuming readiness = working — check actual endpoint.
Read this next
If you want my K8s debugging cookbook, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .