K8s networking deep dive.

CNI plugins

CNI = Container Network Interface. Plugin assigns pod IPs and routes packets.

  • Calico: BGP + IPIP/VXLAN. NetworkPolicy support.
  • Cilium: eBPF-based. Best performance + observability. Replaces kube-proxy.
  • Flannel: simple overlay. Limited policy.
  • Weave: declining.
  • AWS VPC CNI: pods get real VPC IPs (EKS).
  • GCP / Azure: native cloud CNIs.

Pod networking model

  • Every pod gets unique IP cluster-wide.
  • Pods talk directly without NAT.
  • Same-pod containers share network namespace (localhost).

Service implementation (kube-proxy modes)

  • iptables: default. Random load balance via DNAT rules.
  • ipvs: scalable to thousands of services.
  • eBPF (Cilium): replaces kube-proxy entirely.

Inspect node networking

kubectl get nodes -o yaml | grep podCIDR
ip a                                      # on node
ip route
iptables -L -t nat -n -v | less

DNS

CoreDNS runs in kube-system:

kubectl -n kube-system get pods -l k8s-app=kube-dns

Pods get /etc/resolv.conf:

nameserver 10.0.0.10
search myns.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

DNS resolution order

<svc>                              # within same namespace
<svc>.<ns>                         # any namespace
<svc>.<ns>.svc.cluster.local       # FQDN

DNS gotchas

options ndots:5 means anything with <5 dots tries each search first. Slow for external DNS (api.example.com → 5 lookups). Fix:

spec:
  dnsConfig:
    options:
      - { name: ndots, value: "2" }

Headless service DNS

clusterIP: None

DNS returns one A record per pod IP, not a single VIP.

ExternalDNS

Manages real DNS records (Route53, Cloudflare) from K8s resources:

metadata:
  annotations:
    external-dns.alpha.kubernetes.io/hostname: api.example.com

NodeLocal DNSCache

Local DNS cache per node, reduces CoreDNS load:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

Significant DNS speedup.

Dual-stack

# Cluster config
clusterNetwork:
  pods:
    cidrBlocks: [10.0.0.0/16, fd00::/48]
  services:
    cidrBlocks: [10.96.0.0/12, fd00:96::/108]
# Service
spec:
  ipFamilyPolicy: PreferDualStack
  ipFamilies: [IPv6, IPv4]

ClusterIP range

kubectl cluster-info dump | grep service-cluster-ip-range

Service traffic policy

spec:
  internalTrafficPolicy: Local     # in-node routing only
  externalTrafficPolicy: Local     # for LB

Local preserves source IP but skips pods on other nodes.

EndpointSlices

kubectl get endpointslices

Replaces Endpoints; better scaling.

Ingress vs Gateway API

Both are L7 routing. Gateway is the modern replacement; richer + cross-namespace.

Service mesh

  • Istio: full-featured, complex.
  • Linkerd: lightweight, Rust-based proxy.
  • Cilium Service Mesh: eBPF, no sidecars.
  • Consul Connect.

Adds: mTLS, traffic shaping, observability, retries, circuit breaking — at the cost of complexity.

sidecar pattern

spec:
  containers:
    - name: app
      ...
    - name: envoy
      image: envoyproxy/envoy

For service mesh, OPA sidecar, log collector, etc.

Init containers

spec:
  initContainers:
    - name: wait-for-db
      image: busybox
      command: ["sh", "-c", "until nc -z db 5432; do sleep 1; done"]
  containers: [...]

Runs before main containers. Useful for setup.

hostNetwork / hostPort

spec:
  hostNetwork: true                # pod uses node's network
  containers:
    - ports: [{ containerPort: 80, hostPort: 80 }]

Reserve sparingly. Used by ingress controllers in some configs.

MTU concerns

Overlay networks reduce MTU (50+ bytes for VXLAN encap). Check:

ip link show eth0          # 1500 vs 1450

If overlay misconfigured: packets > MTU silently dropped. Symptom: small packets work, big ones (TLS handshakes) fail.

eBPF debugging (Cilium)

cilium status
cilium connectivity test
cilium monitor                     # see all traffic

Common mistakes

  • DNS ndots:5 slowness for external lookups.
  • Service selector typo → no endpoints, mysterious 503s.
  • NetworkPolicy default-deny without DNS allow → app can’t resolve.
  • Pod-to-pod traffic blocked by node-level firewall.
  • MTU mismatch — connection hangs on large payloads.

Read this next

If you want my Cilium + NodeLocal DNS setup, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .