The classic gRPC scaling story: “we added 5 more pods and load didn’t shift.” The cause is almost always wrong load balancing. This post is the working playbook.
The HTTP/2 problem
HTTP/1.1 closes the connection between requests; an L4 LB sees each request as a separate TCP connection and balances naturally.
HTTP/2 (used by gRPC) holds one persistent connection. Many requests multiplex over it. An L4 LB pins that connection to a server; all your traffic ends up on whichever pod the client connected to first.
Adding pods doesn’t help. Existing connections stay where they were.
Client-side load balancing
conn, err := grpc.NewClient(
"dns:///api.svc.cluster.local:50051",
grpc.WithDefaultServiceConfig(`{
"loadBalancingConfig": [{"round_robin":{}}]
}`),
grpc.WithTransportCredentials(insecure.NewCredentials()),
)
dns:/// resolver returns all backend IPs; client opens one connection per backend; round-robins requests across them.
Pair with headless service in Kubernetes:
apiVersion: v1
kind: Service
metadata: { name: api }
spec:
clusterIP: None # headless — DNS returns pod IPs directly
selector: { app: api }
ports: [{ port: 50051 }]
Now dns:///api returns all pod IPs; client connects to each.
Service mesh balancing
In a Cilium / Istio mesh, the data plane sees gRPC at L7 and balances per-RPC:
- Linkerd: built-in.
- Cilium Service Mesh: built-in.
- Istio: yes, with Envoy as data plane.
- AWS App Mesh: yes.
Mesh removes the load-balancing concern from clients. Recommended for any non-trivial gRPC fleet.
Subset balancing
For very large fleets, round-robin across 1000 backends isn’t practical (each client maintains 1000 connections). Subset balancing:
- Each client picks a stable subset (e.g., 20 of 1000 backends).
- Round-robins within subset.
- Periodically rotates subset to avoid hotspots.
Envoy / Cilium / Linkerd implement this automatically. For raw gRPC, configure subset selection in service config.
Custom resolvers
For non-DNS service discovery (Consul, etcd, custom):
import "google.golang.org/grpc/resolver"
// Register a custom resolver
resolver.Register(&consulResolver{})
conn, _ := grpc.NewClient("consul:///api", ...)
The resolver hands gRPC a list of addresses; gRPC handles the connection and balancing.
Outbound connection pooling
A typical service calls many backends. Without care, you have N connections to each. Connection pooling on the client:
// gRPC uses one connection per backend by default.
// Multiple connections to the same backend (for HOL avoidance):
conn, _ := grpc.NewClient(addr,
grpc.WithDefaultServiceConfig(`{
"loadBalancingConfig": [{"round_robin":{}}],
"channelArgs": {"grpc.max_concurrent_streams": 100}
}`),
)
For most workloads, the default per-backend connection is fine. Multiple connections per backend matter only when head-of-line blocking on a single connection becomes the bottleneck.
Health checks
gRPC has a standard health check protocol (grpc.health.v1). Both K8s and meshes use it:
import healthv1 "google.golang.org/grpc/health/grpc_health_v1"
import "google.golang.org/grpc/health"
healthsvc := health.NewServer()
healthv1.RegisterHealthServer(s, healthsvc)
healthsvc.SetServingStatus("orders.OrderService", healthv1.HealthCheckResponse_SERVING)
For probe configuration see Health Checks That Don’t Lie .
Common mistakes
1. ClusterIP service for gRPC
L4 balancing → connection pinning. Headless or mesh.
2. No retries on transient
gRPC can fail with UNAVAILABLE on connection blips. Retry policy in service config:
"methodConfig": [{
"name": [{"service": "orders.OrderService"}],
"retryPolicy": {
"maxAttempts": 3,
"initialBackoff": "0.1s",
"maxBackoff": "1s",
"backoffMultiplier": 2,
"retryableStatusCodes": ["UNAVAILABLE"]
}
}]
3. Connection lifetime forever
A connection opened at app start stays forever, never rebalances when backends scale. Set MaxConnectionAge:
keepalive.ServerParameters{ MaxConnectionAge: 30 * time.Minute }
Periodically forces clients to reconnect; rebalances naturally.
4. Ignoring deadlines
Without deadlines, slow backends cascade. Always context.WithTimeout.
5. No subsetting at scale
10k clients × 1000 backends = 10M connections. Use subset balancing.
Read this next
- Go + gRPC + Protocol Buffers
- gRPC Streaming Patterns
- Cilium and eBPF in Production
- Health Checks That Don’t Lie
If you want a Go gRPC client with proper LB + retries + deadlines, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .