Kubernetes requests vs limits and QoS (what actually happens)

Tabela de Conteúdo

If your pod is being OOMKilled or the app feels slow without a clear cause, the culprit is usually requests/limits.

The mental model is simple:

  • requests = reservation used by the scheduler to decide where to place the pod
  • limits = ceiling the container cannot exceed (CPU can be throttled, memory can be OOMKilled)

QoS classes (what Kubernetes uses internally)

Guaranteed (requests == limits)

  • Guaranteed resources
  • No CPU throttling (within the limit)
  • OOMKill if memory exceeds limit

Burstable (requests < limits)

  • Can be preempted by Guaranteed pods
  • CPU can be throttled under node pressure
  • OOMKill if memory exceeds limit

BestEffort (no requests/limits)

  • First to be preempted if the node needs resources
  • CPU can drop to near zero under pressure
  • OOMKill if the node needs memory

Quick debug checklist

kubectl top pod -A
kubectl describe pod -n <ns> <pod>
kubectl describe node <node>
kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -n 30

Look for:

  • Killing container with id ... (OOM)
  • Throttling (CPU)
  • Insufficient cpu/memory (preemption)

Copy/paste example (Burstable)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: nginx:1.27
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"

How to read this YAML

  • requests.memory: reserves 128 MiB (scheduler uses this for placement)
  • limits.memory: ceiling of 256 MiB (OOMKill if exceeded)
  • requests.cpu: reserves 0.1 vCPU (100 millicores)
  • limits.cpu: ceiling of 0.5 vCPU (throttling if exceeded)

Final tip

  • Always set requests if you set limits (otherwise you might get BestEffort unintentionally)
  • For critical apps, consider Guaranteed (requests == limits)
  • For batch or low-priority apps, Burstable is usually enough

Easy peasy! :)