In 2026, eBPF is the cloud-native networking layer. Cilium ships as the default CNI on GKE, EKS, and AKS. The “service mesh” conversation has moved from sidecars to per-node eBPF programs. The CPU overhead is under 1%; the operational overhead has nearly vanished.
This post is the working knowledge: why Cilium won, what’s in the modern eBPF observability stack, when sidecars still win, and how to migrate.
What changed
Three years ago, eBPF was novel. The skepticism was reasonable: it’s literally a virtual machine inside the kernel. Today:
- Cilium is the default CNI on every major managed Kubernetes (GKE Dataplane V2, EKS with Cilium CNI, AKS Advanced Networking).
- Sidecarless service meshes (Cilium Service Mesh, Istio’s ambient mode) are mainstream.
- 67% of Kubernetes-at-scale teams in CNCF surveys use at least one eBPF observability tool.
The “scary kernel-level magic” reputation is gone. eBPF programs are signed, verified, sandboxed; the verifier rejects unsafe ones at load time.
The mental model
A sidecar mesh injects a proxy (Envoy) into every pod. Two containers, double the network hops, double the memory:
[ pod A: app + envoy ] ←→ [ pod B: envoy + app ]
A Cilium mesh runs one eBPF program per node that handles routing, policy, mTLS, and observability for every pod on that node. No sidecar:
[ pod A: app ] ── eBPF (kernel) ── eBPF (kernel) ── [ pod B: app ]
For L7 features (HTTP routing, gRPC), Cilium uses a per-node Envoy (one shared instance) instead of one per pod. The cost goes from N sidecars to 1 daemon.
Result: under 1% CPU overhead per node, no pod restarts to enable, no sidecar memory tax.
What Cilium does
CNI (the table stakes)
Pod-to-pod networking, IPAM, NodePort, LoadBalancer, ExternalIPs. Faster than kube-proxy because eBPF replaces iptables for service routing — important once your cluster has 1000+ services.
Network Policy
Beyond Kubernetes’ built-in NetworkPolicy, Cilium supports L7 policies:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata: { name: api-allow-frontend }
spec:
endpointSelector:
matchLabels: { app: api }
ingress:
- fromEndpoints:
- matchLabels: { app: frontend }
toPorts:
- ports: [{ port: "8080", protocol: TCP }]
rules:
http:
- method: "GET"
path: "/users(/|$)(.*)"
- method: "POST"
path: "/users$"
The frontend can GET /users or POST /users on the api, but not DELETE. Encoded in YAML, enforced at line rate.
Service mesh
mTLS between pods, traffic shifting, retries, circuit breaking, header rewrites — same features as Istio, no sidecar:
apiVersion: cilium.io/v2alpha1
kind: CiliumEnvoyConfig
metadata: { name: api-canary }
spec:
services:
- { name: api, namespace: default }
resources:
- "@type": type.googleapis.com/envoy.config.route.v3.RouteConfiguration
virtualHosts:
- name: api
domains: ["*"]
routes:
- match: { prefix: "/" }
route:
weightedClusters:
clusters:
- { name: api-v1, weight: 90 }
- { name: api-v2, weight: 10 }
90/10 traffic split for canary. Lives in Cilium’s CRDs, not bolted on.
Observability — Hubble
$ hubble observe --namespace api
Apr 29 09:42:01.234 default/frontend-7c4fb7 → default/api-6cd45b HTTP 200 GET /users
Apr 29 09:42:01.245 default/api-6cd45b → default/postgres TCP to:5432
Hubble is Cilium’s flow-level observability. Every connection is captured by the eBPF program — no app instrumentation. Pair with Hubble UI for a service map that auto-builds from real traffic.
Runtime security — Tetragon
Tetragon hooks into the kernel for security events:
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata: { name: detect-pod-shell }
spec:
kprobes:
- call: "sys_execve"
syscall: true
args:
- { index: 0, type: "string" }
selectors:
- matchArgs:
- { index: 0, operator: "Equal", values: ["/bin/sh", "/bin/bash"] }
Detects shell execution inside pods. Pair with rules to alert (or kill) on tripped conditions. Replaces what Falco was doing, faster, more deterministic.
The 2026 stack
For a production Kubernetes cluster in 2026, the eBPF stack:
| Layer | Tool |
|---|---|
| CNI + service mesh | Cilium |
| Network observability | Hubble |
| App-level observability (auto) | Pixie |
| Runtime security | Tetragon |
| Custom traces | OpenTelemetry (see OpenTelemetry End-to-End ) |
Pixie is worth special mention: it auto-instruments HTTP, gRPC, MySQL, Postgres traffic from kernel observation. No app code change. You get a service map and per-request traces “for free.”
Cilium vs Istio in 2026
| Cilium | Istio (sidecar) | Istio Ambient (sidecarless) | |
|---|---|---|---|
| Per-pod overhead | None | ~50 MB + 2% CPU | ~5 MB + small CPU |
| L7 features | Yes (per-node Envoy) | Yes (per-pod) | Yes (per-node) |
| Operational simplicity | Highest | Lowest | Mid |
| Multi-cluster mesh | Cilium Cluster Mesh | Yes | Yes |
| Maturity (2026) | Production-grade | Battle-tested | Stable, gaining ground |
| Default in managed K8s | Yes (GKE, EKS, AKS option) | Possible | Possible |
For new clusters in 2026: Cilium. For Istio shops happy with sidecars: stay where you are; the migration cost is real. For Istio shops struggling with sidecar overhead: ambient mode or Cilium.
Migration from Istio — high level
If you’re moving from Istio sidecar to Cilium, the rough sequence:
- Install Cilium as the CNI (or migrate from existing CNI). This is the heaviest step; managed clusters often need recreation.
- Run both in parallel for a phase. Cilium handles networking; Istio sidecars stay for mesh features.
- Migrate policies namespace by namespace from Istio AuthorizationPolicy to CiliumNetworkPolicy.
- Migrate mTLS — enable Cilium mutual auth; remove Istio mTLS.
- Migrate traffic management — VirtualService → CiliumEnvoyConfig. The hardest step; complex VirtualServices don’t all translate cleanly.
- Remove sidecars namespace by namespace.
- Migrate observability to Hubble + Tetragon.
Plan a quarter. Don’t try to do it in a sprint.
Network policy that’s not painful
NetworkPolicy gets a bad reputation because the default-deny model is unforgiving. The 2026 approach:
# Default deny in every workload namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny, namespace: api }
spec:
podSelector: {}
policyTypes: [Ingress, Egress]
Then explicit allows per service. Cilium’s L7 policies make this practical without exploding the policy count.
For day-to-day developer ergonomics, ship a default policy generator with each service template (Platform Engineering and IDPs ). Developers don’t write policy by hand; they declare app→app dependencies and the platform generates the policy.
mTLS without certs
Cilium mTLS uses SPIFFE identities (workload identities, not user certs). Each pod gets a SPIFFE ID derived from its service account. Cilium handles cert issuance and rotation. Application code does not change.
apiVersion: cilium.io/v2alpha1
kind: CiliumNetworkPolicy
metadata: { name: api-mtls }
spec:
endpointSelector: { matchLabels: { app: api } }
ingress:
- authentication:
mode: required # mTLS required
fromEndpoints:
- matchLabels: { app: frontend }
Done. mTLS is enforced at the kernel; no app changes; no per-pod overhead.
Multi-cluster
Cilium Cluster Mesh federates services across clusters. A pod in cluster A can call api.default.svc.cluster.local and reach an instance in cluster B transparently:
apiVersion: v1
kind: Service
metadata:
name: api
namespace: default
annotations:
service.cilium.io/global: "true" # makes it cluster-mesh aware
spec:
selector: { app: api }
ports: [{ port: 8080 }]
Failover, cross-cluster load balancing, and policy-aware routing all just work. Compare to multi-Istio meshes which are real engineering effort.
eBPF observability: the part that surprises
The single biggest unlock from eBPF isn’t networking. It’s observability that needs no code change:
- Pixie auto-traces HTTP, gRPC, Kafka, MySQL, Postgres. Gives you a service map, per-request flame graphs, query latency breakdowns.
- Hubble captures every flow. You can ask “what services talked to Postgres in the last hour?” and the kernel knows.
- Tetragon captures every exec, file open, network connect. Forensics, compliance, intrusion detection.
For a team that hasn’t built application-level instrumentation, this is a lot of value for almost no work.
The 1% myth (and reality)
Cilium has measurable overhead. A few production data points:
- CNI vs
kube-proxy: Cilium is faster on connection establishment (eBPF replaces iptables); slightly heavier on long-running connections. - L7 policy: ~50 microseconds per packet for HTTP parsing. Negligible for HTTP traffic; can matter for very high RPS gRPC.
- Hubble flow capture: <1% CPU per node when sampled; ~3% when capturing full flow logs.
For 99% of clusters, this is below the noise floor. Compare to sidecar overhead: 50–100 MB per pod, 5–10% CPU per pod.
When sidecars still win
A few honest cases where Istio sidecars (or per-pod Envoy) earn their cost:
- Heavy custom Envoy filter chains. If you’ve built domain-specific WASM filters and need them per-pod, sidecars deliver.
- Strict per-pod resource isolation. Per-node shared Envoy means one pod’s traffic affects another’s.
- Specific Istio features (e.g., very fine-grained AuthorizationPolicy) you don’t want to translate.
For everyone else, Cilium is the path of least resistance.
A pragmatic adoption path
- Start with CNI. Move to Cilium on a new cluster (managed K8s makes this easy).
- Add Hubble for visibility. Free win even before you adopt mesh features.
- Adopt L4 NetworkPolicy with default-deny. The platform team writes the templates.
- Add L7 policy for high-value services (auth, billing, admin).
- Enable mTLS per namespace. Audit, then enforce.
- Add Tetragon for runtime security signals.
- Add Pixie for app-level traces if your apps aren’t already OTel-instrumented.
Each phase ships value on its own.
Read this next
- Kubernetes for App Developers — the runtime.
- Platform Engineering and IDPs — where Cilium fits in the platform.
- GitOps with Argo CD and Flux Explained — deploy Cilium config the right way.
- OpenTelemetry End-to-End — pair with eBPF observability for the full picture.
If you want a Cilium + Hubble + Tetragon Helm chart with sane defaults and a migration runbook from Istio, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .