Linux debugging cheatsheet.

Where to look first

journalctl -xe                       # latest errors
dmesg | tail                         # kernel
tail -f /var/log/syslog
systemctl status <unit>

strace

strace cmd
strace -p PID
strace -f -p PID                     # follow forks
strace -e network cmd
strace -e openat cmd
strace -c cmd                        # summary
strace -e %file cmd                  # file syscalls
strace -tt -T cmd                    # timestamps + duration

ltrace

ltrace cmd                           # library calls
ltrace -p PID

lsof

lsof -p PID                          # all open files
lsof -i :80                          # who's on port 80
lsof | grep deleted                  # deleted files still held
lsof +D /var/log                     # by directory

/proc inspection

cat /proc/PID/cmdline
cat /proc/PID/status
cat /proc/PID/environ                # env vars (NULs)
cat /proc/PID/limits
cat /proc/PID/io                     # I/O stats
ls -la /proc/PID/fd/                 # open file descriptors
cat /proc/PID/maps                   # memory map

Stack trace

cat /proc/PID/stack                  # kernel stack
pstack PID                           # user stack

gdb

gdb -p PID                           # attach
gdb cmd
(gdb) run arg1 arg2
(gdb) break main
(gdb) continue
(gdb) bt                             # backtrace
(gdb) info threads
(gdb) thread 2
(gdb) print var
(gdb) detach
(gdb) quit

Core dumps

ulimit -c unlimited

# Where dumps go (systemd-coredump)
coredumpctl list
coredumpctl info PID
coredumpctl gdb PID

# Or set:
sysctl -w kernel.core_pattern="/tmp/core-%e-%p"

OOM debugging

dmesg -T | grep -i oom
journalctl -k --grep "killed process"

# Per-process OOM score
cat /proc/PID/oom_score

Disk full

df -h
df -i                                # inodes
du -sh /* | sort -rh | head
ncdu /                               # interactive
lsof | grep deleted                  # holding deleted files

Deleted files held by processes still occupy disk. Restart process to free.

Network connection issues

ss -tlnp                             # listening
ss -tnp                              # established
ss -tn state syn-sent
nc -zv host 443                      # port reachable?
mtr host                             # path
curl -v https://host/
tcpdump -i any host 1.2.3.4

DNS issues

dig +trace example.com
nslookup example.com 1.1.1.1
resolvectl query example.com
cat /etc/resolv.conf
systemd-resolve --statistics

High CPU

top -1                               # per-cpu
mpstat -P ALL 1
pidstat 1
perf top -p PID

High memory

top -o %MEM
ps -eo pid,rss,comm --sort=-rss | head
slabtop                              # kernel slab
cat /proc/meminfo

High I/O

iostat -xz 1
iotop -ao
biotop                               # eBPF

Slow process

strace -c -p PID                     # 30s, then ctrl-c
perf record -F 99 -p PID -g -- sleep 30
perf report

Process stuck (D state)

ps aux | awk '$8=="D"'
cat /proc/PID/wchan                  # waiting where?
cat /proc/PID/stack

D state usually = I/O wait. Check disk / NFS.

Service won’t start

systemctl status myapp
journalctl -u myapp -xe
sudo -u myapp /opt/myapp/bin/start   # run manually

Crashed binary

file core.PID
coredumpctl gdb PID

In gdb: bt to see where it crashed.

eBPF tools

# Install: apt install bpfcc-tools
opensnoop                            # opens
execsnoop                            # execs
biotop                               # disk I/O by process
tcpconnect                           # outbound conns
tcplife                              # connection lifetimes
profile -F 99 -p PID 30              # CPU profile

# bpftrace
bpftrace -e 'syscall:sys_enter_openat { @[comm] = count(); }'

Trace package install

apt-file search /usr/bin/foo
dpkg -S /usr/bin/foo
rpm -qf /usr/bin/foo

Container/namespace debug

nsenter -t PID -m -u -n -p           # enter all namespaces

When all else fails

  • Reproduce locally with docker run.
  • Compare working vs broken host: diff /etc/foo, package versions.
  • Increase log verbosity.
  • Bisect git history.
  • Try --strace-on-fail from systemd.

Common mistakes

  • Assuming “it worked before” — what changed?
  • Editing config without backup.
  • Restarting service before diagnosing.
  • Looking only at app logs (kernel/syslog has clues).
  • Treating symptom (restart) instead of cause.

Read this next

If you want my debugging cookbook, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .