Kubernetes probes: liveness vs readiness vs startup (practical)

Tabela de Conteúdo

If a rollout drops traffic or your pod gets stuck in a restart loop, probes are usually involved.

The mental model is simple:

  • readinessProbe decides if the pod can receive traffic
  • livenessProbe decides if the container should be restarted
  • startupProbe gives boot time to slow apps so they don’t get killed too early

When to use each probe

readinessProbe (traffic)

Use it when your app needs time to become ready, or when you want to stop sending traffic during warmup.

If it fails:

  • the container keeps running
  • but it is removed from the Service endpoints

livenessProbe (restart)

Use it when you want a basic self-healing mechanism for deadlocks/hangs.

If it fails:

  • kubelet restarts the container

startupProbe (slow boot)

Use it when startup takes time (Java apps, caches warming up, migrations, etc.).

It prevents liveness/readiness from killing the pod before it is actually up.

Common mistakes

  • using liveness for everything (and creating restart loops)
  • probes too aggressive (short timeouts, low thresholds)
  • expensive probe endpoints (DB queries inside /health)

Quick debug checklist

kubectl describe pod -n <ns> <pod>
kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -n 30
kubectl logs -n <ns> <pod> -c <container> --tail=200

Look for:

  • Readiness probe failed
  • Liveness probe failed
  • Back-off restarting failed container

Copy/paste example (HTTP probes)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: nginx:1.27
          ports:
            - containerPort: 80
          readinessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 2
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 2
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /health
              port: 80
            periodSeconds: 5
            failureThreshold: 30

How I read this config:

  • readiness starts early (to block traffic until ready)
  • liveness starts later (to avoid killing during boot)
  • startup allows ~150s to boot (30 * 5s)

Final tip

When in doubt:

  • readiness = “can I take traffic?”
  • liveness = “am I stuck and should restart?”

And keep /health cheap.

Easy peasy! :)