Kubernetes probes: liveness vs readiness vs startup (practical)
Tabela de Conteúdo
If a rollout drops traffic or your pod gets stuck in a restart loop, probes are usually involved.
The mental model is simple:
readinessProbedecides if the pod can receive trafficlivenessProbedecides if the container should be restartedstartupProbegives boot time to slow apps so they don’t get killed too early
When to use each probe
readinessProbe (traffic)
Use it when your app needs time to become ready, or when you want to stop sending traffic during warmup.
If it fails:
- the container keeps running
- but it is removed from the Service endpoints
livenessProbe (restart)
Use it when you want a basic self-healing mechanism for deadlocks/hangs.
If it fails:
- kubelet restarts the container
startupProbe (slow boot)
Use it when startup takes time (Java apps, caches warming up, migrations, etc.).
It prevents liveness/readiness from killing the pod before it is actually up.
Common mistakes
- using liveness for everything (and creating restart loops)
- probes too aggressive (short timeouts, low thresholds)
- expensive probe endpoints (DB queries inside
/health)
Quick debug checklist
kubectl describe pod -n <ns> <pod>
kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -n 30
kubectl logs -n <ns> <pod> -c <container> --tail=200
Look for:
Readiness probe failedLiveness probe failedBack-off restarting failed container
Copy/paste example (HTTP probes)
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 2
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: nginx:1.27
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 3
startupProbe:
httpGet:
path: /health
port: 80
periodSeconds: 5
failureThreshold: 30
How I read this config:
- readiness starts early (to block traffic until ready)
- liveness starts later (to avoid killing during boot)
- startup allows ~150s to boot (30 * 5s)
Final tip
When in doubt:
- readiness = “can I take traffic?”
- liveness = “am I stuck and should restart?”
And keep /health cheap.
Easy peasy! :)