Stateful workloads in K8s cheatsheet.
Should you run a database in K8s?
Pros:
- Single deployment pattern.
- Operators handle complex ops.
- GitOps for DB too.
Cons:
- More moving parts.
- Storage perf depends on CSI driver.
- Backups + failover require care.
Rule of thumb: small/mid scale and a good operator → yes. Massive scale or compliance constraints → consider managed (RDS, Cloud SQL).
Postgres operators
- CloudNativePG: simple, well-maintained.
- Zalando: mature.
- Crunchy Postgres: full-featured.
CloudNativePG example
kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.23/releases/cnpg-1.23.0.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata: { name: pg }
spec:
instances: 3
primaryUpdateStrategy: unsupervised
storage:
size: 20Gi
storageClass: gp3
backup:
barmanObjectStore:
destinationPath: s3://my-bucket/pg
s3Credentials: { ... }
retentionPolicy: "30d"
monitoring: { enablePodMonitor: true }
Includes streaming replication, failover, backup, monitoring.
Redis (Bitnami chart)
helm install redis bitnami/redis \
--set auth.password=x \
--set replica.replicaCount=3 \
--set master.persistence.size=8Gi \
--set sentinel.enabled=true
Or use redis-operator for advanced setups.
Kafka
- Strimzi: official, mature.
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata: { name: my-cluster }
spec:
kafka:
version: 3.7.0
replicas: 3
listeners:
- { name: plain, port: 9092, type: internal, tls: false }
- { name: tls, port: 9093, type: internal, tls: true }
storage:
type: persistent-claim
size: 100Gi
zookeeper:
replicas: 3
storage: { type: persistent-claim, size: 10Gi }
Modern Kafka (KRaft) — no zookeeper.
Elasticsearch / OpenSearch
ECK (Elastic Cloud on Kubernetes) operator. Resource-hungry; consider managed if budget allows.
MongoDB
- MongoDB Community Operator: free.
- Percona MongoDB Operator.
RabbitMQ
helm install rabbit bitnami/rabbitmq --set auth.password=x
Or rabbitmq-cluster-operator.
Volumes for state
- RWO is fine for primary+replica DBs (each replica gets own PVC).
- Use cloud-native SSD-class storage.
- WaitForFirstConsumer binding mode (zone-aware).
Backups
Native operators usually have built-in backups (CloudNativePG → S3 via Barman). For others:
- Velero: cluster-wide backup including PVs (via snapshots or restic).
- App-level:
pg_dump,mongodumpto S3 via CronJob.
StatefulSet basics
See cheatsheet 07. Operators usually create StatefulSets under the hood.
Anti-affinity for HA
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: { matchLabels: { app: pg } }
topologyKey: kubernetes.io/hostname
Spread replicas across nodes (and ideally AZs).
Disruption budgets
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: pg }
spec:
minAvailable: 2
selector: { matchLabels: { app: pg } }
Prevent autoscaler/drain from taking down all replicas.
Connection pooling
PgBouncer / Odyssey in front of Postgres:
# CloudNativePG built-in
spec:
pooler:
instances: 3
type: rw
pgbouncer: { poolMode: transaction }
Operator pattern
Operators encode operational knowledge:
- Detect primary failure → promote replica.
- Schedule backups.
- Rolling upgrade with consistency.
Anti-pattern: roll your own Postgres StatefulSet for prod.
When to use managed
- Compliance: SOC 2, HIPAA, etc.
- Massive scale.
- Limited ops bandwidth.
- High-availability SLAs.
Hybrid: app + caching in K8s, DBs managed (RDS/Cloud SQL).
Common mistakes
- Single PVC for primary + replica (RWO can’t).
- No anti-affinity → all replicas on one node → SPOF.
- No backups, or backups never tested.
- Updating operator version without reading release notes (data migration).
- Heavy I/O on slow storage class.
Read this next
If you want my stateful workload setup (CNPG + Redis + Kafka), it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .