Linux Cheatsheet 17 — Performance Tuning

Linux performance cheatsheet.

USE method

For every resource (CPU, memory, disk, network):

Utilization
Saturation
Errors

Tool overview

top / htop / btop          # processes
vmstat 1                   # CPU/mem/IO/swap
iostat -xz 1               # disk I/O
mpstat -P ALL 1            # per-CPU
free -h                    # memory
sar 1                      # historical
pidstat 1                  # per-process
dstat                      # combined
glances                    # all-in-one

CPU

top -1                     # per-CPU
mpstat -P ALL 1
pidstat -u 1
cat /proc/cpuinfo
nproc
lscpu

# Per-process CPU
ps -eo pid,pcpu,comm --sort=-pcpu | head

High %sy (system) → kernel work; %us user; %wa I/O wait; %si softirq.

Memory

free -h
free -m -s 1               # update every second
vmstat 1
cat /proc/meminfo
slabtop                    # kernel slab
ps -eo pid,rss,vsz,comm --sort=-rss | head

# OOM
dmesg | grep -i oom
journalctl -k --grep oom

MemAvailable (not Free) is what counts.

Disk I/O

iostat -xz 1
iotop
ioping /var/lib/data       # latency
biotop                     # eBPF
nfsiostat                  # NFS
fio --name=test --filename=test --size=1G --bs=4k --rw=randread

%util close to 100% = saturated. r_await / w_await = latency.

Network

sar -n DEV 1
ip -s link show eth0
ifstat
nload
iftop                      # interactive
nethogs                    # per-process
ss -s                      # socket summary
tcpdump

Top by metric

# Top CPU
ps -eo pid,pcpu,comm --sort=-pcpu | head

# Top mem
ps -eo pid,rss,comm --sort=-rss | head

# Top I/O
iotop -ao                  # accumulated

# Top network (per process)
nethogs

sysctl tuning (network)

# /etc/sysctl.d/perf.conf
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_time = 60
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

sysctl -p

sysctl tuning (file system)

fs.file-max = 2097152
fs.inotify.max_user_watches = 524288
vm.swappiness = 10
vm.vfs_cache_pressure = 50
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

ulimit

# /etc/security/limits.conf or limits.d/
* soft nofile 65536
* hard nofile 65536
* soft nproc 65535

CPU governor

cpufreq-info
cpupower frequency-info
cpupower frequency-set -g performance

For servers: performance over ondemand / powersave.

I/O scheduler

cat /sys/block/sda/queue/scheduler
echo none > /sys/block/nvme0n1/queue/scheduler        # NVMe usually
echo mq-deadline > /sys/block/sda/queue/scheduler     # SATA

Transparent Huge Pages

cat /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Some DBs (MongoDB, Redis) recommend disabling.

perf

perf top
perf stat -p PID
perf record -p PID -- sleep 30
perf report

eBPF tools (bpftrace, bcc)

apt install bpfcc-tools bpftrace

# bcc
opensnoop
execsnoop
biotop
tcpconnect
tcptop
profile -p PID 30          # 30s CPU profile

# bpftrace
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { @[comm] = count(); }'

eBPF is the modern tracing.

Flame graphs

perf record -F 99 -p PID -g -- sleep 30
perf script > out.perf
# https://github.com/brendangregg/FlameGraph
./stackcollapse-perf.pl out.perf | ./flamegraph.pl > flame.svg

strace

strace -p PID -c            # syscall summary
strace -fc cmd              # forks counted
strace -e network curl ...

ftrace

trace-cmd record -p function_graph -P PID
trace-cmd report

numactl (NUMA)

numastat
numactl --hardware
numactl --cpunodebind=0 --membind=0 cmd

ksoftirqd

High CPU on ksoftirqd → softirq overload (network or storage). Check mpstat -I CPU 1 for %soft.

Tools per layer (Brendan Gregg)

applications: perf, eBPF, java/python profilers
languages:    pyperf, py-spy, async-profiler
syscalls:     strace, ltrace, bpftrace
kernel:       perf, ftrace, eBPF
devices:      iostat, biotop, blktrace
network:      tcpdump, ss, ip -s

Common bottlenecks

TCP listen backlog full (somaxconn).
File descriptor exhaustion (ulimit).
Swap thrashing (vm.swappiness).
Disk full / inodes full.
I/O scheduler mismatch (NVMe + mq-deadline).
Single-threaded app on multi-core box.

Common mistakes

Tuning without measuring.
Copy-pasting random sysctl recipes.
Ignoring wa (I/O wait) in top.
swappiness too high causing swap thrash.
Profiling debug builds (use prod).

USE method#

Tool overview#

CPU#

Memory#

Disk I/O#

Network#

Top by metric#

sysctl tuning (network)#

sysctl tuning (file system)#

ulimit#

CPU governor#

I/O scheduler#

Transparent Huge Pages#

perf#

eBPF tools (bpftrace, bcc)#

Flame graphs#

strace#

ftrace#

numactl (NUMA)#

ksoftirqd#

Tools per layer (Brendan Gregg)#

Common bottlenecks#

Common mistakes#

Read this next#