Problem
You're running a pod in Google Kubernetes Engine (GKE), and it randomly restarts without showing any clear errors in the logs. Everything seems fine, but the pod keeps going down and coming back up.
What causes this error?
GKE pods can randomly restart due to Out of Memory (OOMKilled), node auto-scaling or preemptible node termination, GKE auto-upgrades, or failing liveness probes.
Solution
The most common reasons for this issue are:
- OOM Kills (Out of Memory)
- Check if the pod is running out of memory. Run:
shCopyEditkubectl describe pod <pod-name>
- If you see
OOMKilled
in the events, increase memory limits in your deployment:
yamlCopyEditresources: limits: memory: "512Mi" requests: memory: "256Mi"
- Node Auto-Scaling & Preemptible Nodes
- If you're using preemptible nodes, they get terminated after 24 hours.
- Check if the node is disappearing:
shCopyEditkubectl get nodes -o wide
- If nodes are frequently changing, use standard nodes instead of preemptible ones.
- GKE Auto-Upgrade Restarting Nodes
- Google Cloud might be auto-upgrading your cluster. Check upgrade history in GCP Console under Kubernetes Engine > Cluster History.
- If auto-upgrades are causing restarts, you can disable them:
shCopyEditgcloud container clusters update <cluster-name> --no-enable-autoupgrade
- Liveness & Readiness Probes Causing Restarts
- If a liveness probe fails, Kubernetes restarts the pod. Check your probe logs:
- If the probe is too aggressive, increase the failure threshold:
yamlCopyEditlivenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 10 failureThreshold: 5
Final Check
If the problem persists, run:
shCopyEditkubectl get events --sort-by=.metadata.creationTimestamp
This will show recent events and might give a clue about what’s causing the restarts.
