Skip to main content
Container Orchestration Pitfalls

Why Your Pods Keep Crashing at 2 AM: 5 Resource Limit Mistakes That Undermine Deployment Peace of Mind

This comprehensive guide explores why containerized applications running on Kubernetes often crash during off-hours, specifically due to misconfigured resource limits. Drawing on common patterns observed across teams, we dissect five critical mistakes: setting CPU limits too low without understanding throttling, ignoring memory request-to-limit ratios, failing to account for burst versus steady-state workloads, neglecting namespace-level resource quotas, and overlooking how node-level resource p

Introduction: The 2 AM Wake-Up Call

If you operate Kubernetes clusters in production, you have likely experienced the unsettling pattern: a notification pierces the silence at 2 AM, informing you that a critical pod has crashed. Your first instinct might be to blame the application code, a recent deployment, or external load. However, in many cases, the root cause is not a bug in your software but a subtle misconfiguration in how you declared resource limits. This article is written for platform engineers, DevOps practitioners, and site reliability engineers who want to move from reactive firefighting to proactive stability. We focus on five specific resource limit mistakes that consistently undermine deployment stability, based on patterns observed across numerous teams and environments. By the end, you will have a clear framework for diagnosing and preventing these issues, restoring the peace of mind that comes with knowing your infrastructure can handle the night shift without human intervention.

This overview reflects widely shared professional practices as of May 2026; verify critical details against current official Kubernetes documentation where applicable. The guidance here is general information only and does not substitute for professional advice tailored to your specific infrastructure.

Mistake 1: Setting CPU Limits Too Low—The Throttling Trap

One of the most common resource limit mistakes is setting CPU limits aggressively low, often based on the assumption that lower limits save money or prevent noisy neighbors. In practice, this approach frequently backfires. When a pod's CPU usage reaches its limit, Kubernetes enforces throttling by restricting the container's CPU quota over a given time window (typically 100ms). This throttling does not cause a crash outright—it causes latency spikes and timeouts, which can cascade into health check failures, leading to pod restarts by the kubelet. The problem is insidious because the pod appears to be within limits on average, but micro-bursts of CPU activity are punished, causing intermittent failures that are hard to reproduce.

The Mechanism of CPU Throttling

CPU limits in Kubernetes are implemented using the Completely Fair Scheduler (CFS) quota. When you set a CPU limit of 1 core, the container is allowed to use up to 100ms of CPU time per 100ms period. If the container bursts beyond that in a short interval, it is throttled until the next period. Many teams mistakenly believe that a limit of 1 core means the container can always use one full core, but in reality, sustained high usage near the limit leads to throttling. For example, a container processing a batch of requests may spike to 1.5 cores for 50ms, then drop to 0.5 cores—but the throttle mechanism does not smooth this out; it punishes the spike.

Composite Scenario: The Batch Processor Crash

Consider a composite scenario: a team runs a data-processing pod that transforms incoming event streams. They set CPU limit to 500 millicores (0.5 core) based on average usage, but the pod occasionally needs 1 core for a few seconds during spike loads (e.g., parsing large payloads). At 2 AM, a large batch of events arrives, the pod bursts to 1.2 cores, gets throttled, and its processing latency increases. The readiness probe, which expects responses within 1 second, starts failing. The kubelet restarts the pod, but the new pod immediately hits the same throttle condition. The cluster eventually marks the deployment as unhealthy, triggering an alert. The fix was not to increase the application's capacity but to set the CPU limit to at least 1 core—matching the actual burst requirement—or remove the limit entirely and rely on requests only, combined with proper node sizing.

Actionable Advice: How to Correctly Set CPU Limits

To avoid the throttling trap, follow these steps: First, monitor actual CPU usage over at least one week using a tool like Prometheus, capturing both average and P99 percentiles. Second, set the CPU request to the P99 of steady-state usage, and set the limit to at least 2x the request, or remove the limit if your cluster has adequate capacity and strict resource isolation is not required. Third, use the "cpuThrottling" metric (available via kube-state-metrics) to detect throttled pods. Fourth, consider using Vertical Pod Autoscaler (VPA) in recommendation mode to generate suggested limits based on historical usage. Remember: CPU limits are not cost controls—they are behavior controls. Misusing them transforms a smooth deployment into a ticking time bomb.

By addressing the throttling trap, you eliminate a major source of 2 AM crashes. However, CPU is only part of the story; memory limits introduce their own dangers, as we will see next.

Mistake 2: Ignoring the Memory Request-to-Limit Ratio

Memory limits are often treated as a safety net—a way to prevent a single pod from consuming all node memory. However, the relationship between memory requests and limits is more nuanced. When a pod exceeds its memory request but stays within its limit, it runs normally under normal memory pressure. But when the node experiences memory pressure, the kernel's out-of-memory (OOM) killer does not differentiate based on requests—it kills processes based on their memory usage relative to the limit, combined with other factors. If your pod's request is far below its limit, the pod may be using 80% of its limit but 400% of its request, making it a prime OOM victim even though it hasn't violated its limit.

Understanding the OOM Score Adjustment

Kubernetes adjusts the OOM score of containers based on the ratio of memory usage to memory request. A container using 500 MiB with a request of 100 MiB and a limit of 1 GiB will have a higher OOM score (more likely to be killed) than a container using 500 MiB with a request of 500 MiB and a limit of 1 GiB. This is because the kernel sees the first container as overusing its guaranteed resources. Many teams set requests conservatively low to "pack more pods" on a node, but this backfires under memory pressure. The pods that appear to be within limits are actually the first to die when the node runs low on memory.

Composite Scenario: The Web Server Night Shift

In a typical project, a team runs a web server pod with a memory request of 128 MiB and a limit of 512 MiB. During daytime, traffic is moderate, and the pod uses 200–300 MiB. At 2 AM, a cron job on the same node kicks off a data-sync, consuming node memory. The node enters memory pressure, and the kernel selects the web server pod as a victim because its usage (300 MiB) is 234% of its request (128 MiB), even though it is only at 58% of its limit. The pod dies, causing a brief outage. The team's dashboards show memory usage within limits, so they blame the cron job—but the real fix was to adjust the memory request to match the actual usage (around 256 MiB), making the pod less vulnerable to OOM killing.

Step-by-Step Guide to Setting Memory Requests

To avoid this mistake, implement the following process: 1) Collect memory usage data over two weeks, noting the P95 and P99 values during peak and off-peak hours. 2) Set the memory request to the P95 of actual usage, not the minimum. For the web server above, that would be 256 MiB. 3) Set the memory limit to 1.5x to 2x the request to allow for bursts, but ensure the request is at least 70% of the limit. 4) Use the "oom_kill" metric from kube-state-metrics to identify pods killed due to OOM. 5) Test under memory pressure by simulating node load with a tool like stress-ng in a staging environment. 6) Document the rationale for each service's request-to-limit ratio in your deployment manifests.

By aligning requests with actual usage, you reduce the likelihood of OOM kills during off-hours. Next, we examine how burst and steady-state workload differences create hidden instability.

Mistake 3: Overlooking Burst vs. Steady-State Workload Profiles

Not all workloads are created equal. Some pods have predictable, steady resource consumption—a database cache, for example—while others experience dramatic bursts—a batch job, an image processing service, or a webhook handler. Teams often apply a one-size-fits-all resource limit strategy, leading to either over-provisioning (wasting capacity) or under-provisioning (causing crashes during bursts). The key insight is that burst workloads require higher limits relative to requests, while steady-state workloads can operate with tighter ratios. Failing to distinguish between these profiles is a recipe for instability.

Comparing Three Approaches to Resource Management

Below is a comparison of three approaches to managing container resources, each suited to different workload profiles. Consider this when designing your deployment strategies.

ApproachBest ForProsConsWhen to Avoid
Static Limits (manual)Steady-state, predictable workloadsSimple to configure; no external dependenciesRequires manual tuning; brittle under load changesBursty or unpredictable workloads; large fleets
Vertical Pod Autoscaler (VPA)Workloads with variable resource needsAutomatically adjusts requests/limits; reduces manual workRequires restart to apply new recommendations; can cause churnStatefulSets that cannot tolerate restarts
Horizontal Pod Autoscaler (HPA) + resource profilesStateless burst workloads (e.g., web APIs)Scales out during bursts; no restart neededComplexity with multiple metrics; requires good startup timeWorkloads with long startup or initialization

Composite Scenario: The Image Processing Pipeline

Consider an image processing service that transforms user-uploaded photos. The service runs as a Deployment with 10 replicas, each configured with a CPU request of 0.5 cores and a limit of 1 core. During normal operation, each pod uses 0.3–0.4 cores. However, when a user uploads a very large image (e.g., a 50 MB TIFF), the pod spikes to 1.5 cores for 2–3 seconds. The CPU limit of 1 core causes throttling, and the processing takes 10 seconds instead of 3 seconds. Other requests queue up, leading to timeouts and eventual pod restarts. The team had assumed steady-state usage would protect them, but the burst profile made static limits dangerous.

Actionable Framework: Matching Profiles and Controls

To handle burst workloads, use HPA with a target CPU utilization of 50% (leaving headroom for spikes) combined with a generous CPU limit (e.g., 2x the request). For steady workloads, use VPA to tune requests over time, and set limits to 1.2x the request to conserve capacity. Implement the following checklist: 1) Classify each workload as "burst," "steady," or "unknown" based on one week of metric history. 2) If burst, set CPU limit to at least 2x request; memory limit to 1.5x request. 3) If steady, set CPU limit to 1.2x request; memory limit to 1.2x request. 4) If unknown, use VPA in recommendation mode for one month before finalizing. 5) Review the classification quarterly, as usage patterns change.

By adapting resource limits to workload profiles, you prevent the mismatch that causes midnight failures. Next, we look at how namespace-level quotas can silently sabotage your deployments.

Mistake 4: Neglecting Namespace Resource Quotas and Limit Ranges

Even if individual pods have correctly configured resource limits, the lack of namespace-level ResourceQuotas and LimitRanges can lead to chaotic resource consumption. Without quotas, a single team or namespace can consume all cluster resources, starving other namespaces and causing pods to be evicted or fail to schedule. Additionally, without LimitRanges, pods can be created with no limits at all—or extremely high limits—which contradicts the intention of cluster governance. This is especially dangerous during off-hours when automated systems may deploy new pods without human oversight.

How ResourceQuotas Work and Why They Matter

A ResourceQuota object in Kubernetes sets aggregate limits on resource consumption within a namespace, such as total CPU requests, memory requests, number of pods, or persistent volume claims. When a namespace approaches its quota, new pods will fail to create with an error message like "exceeded quota." This is a safety valve that prevents runaway resource consumption. However, many teams skip setting quotas because they assume individual pod limits are sufficient. In practice, a single misconfigured deployment that creates 100 replicas with high limits can drain a cluster's capacity, triggering evictions of other pods—including your critical 2 AM jobs.

Composite Scenario: The Unbounded Batch Job

Imagine a team deploys a batch job that processes a queue of messages. The job is configured to scale up to 50 replicas, each with a memory request of 256 MiB and a limit of 1 GiB (four times the request). Without a namespace quota, the job scales up to 50 replicas, consuming 50 GiB of memory requests and 50 GiB of limits (though actual usage is lower). Another team's web application in the same cluster needs to deploy a critical update, but the scheduler cannot find a node with enough free memory to place the new pod. The web application experiences downtime at 2 AM when the batch job runs. The root cause is not the batch job itself but the absence of a namespace quota that would have limited its maximum memory consumption.

Step-by-Step Guide to Configuring Namespace Quotas

Follow these steps to implement namespace resource governance: 1) Calculate the total allocatable resources per node in your cluster. 2) Define ResourceQuota objects for each namespace, setting limits for cpu, memory, pods, and persistentvolumeclaims. Start with conservative values (e.g., 50% of total cluster capacity) and adjust based on usage. 3) Define LimitRange objects that enforce minimum and maximum resource limits per pod, preventing pods from being created without limits (min: 50m CPU, 64Mi memory; max: 4 CPU, 8Gi memory). 4) Use the following command to validate: kubectl describe quota -n to see current usage. 5) Monitor quota usage with alerts when a namespace reaches 80% of its quota. 6) Review quotas quarterly and adjust based on team needs.

By enforcing namespace-level controls, you prevent a single team's mistake from bringing down the entire cluster. Now, let's examine how node-level resource pressure exacerbates pod instability.

Mistake 5: Ignoring Node-Level Pressure and Pod Eviction Policies

The final mistake is considering pod resources in isolation from the node they run on. Kubernetes schedules pods based on requests, but the node's actual memory and disk pressure can cause the kubelet to evict pods that are well within their limits. This often happens at 2 AM because that is when batch jobs, backups, or system processes (like log rotation or node maintenance) run, temporarily increasing node resource usage. If your pods are not configured with appropriate priority classes or quality of service (QoS) guarantees, they become eviction candidates.

Understanding Pod QoS Classes

Kubernetes assigns a QoS class to each pod based on the relationship between requests and limits: Guaranteed (requests equal limits for all containers), Burstable (requests less than limits for at least one container), and BestEffort (no requests or limits). Under node pressure, the kubelet evicts BestEffort pods first, then Burstable, and finally Guaranteed. Many teams inadvertently create Burstable pods by setting different requests and limits, thinking it gives flexibility. However, when the node experiences memory pressure, Burstable pods that are using more than their request are evicted before Guaranteed pods. If your critical 2 AM job is Burstable, it is at risk.

Composite Scenario: The Log Aggregator Eviction

Consider a log aggregator pod that runs as a DaemonSet on each node. The pod has a memory request of 100 MiB and a limit of 500 MiB, making it Burstable. At 2 AM, the node's log rotation script runs, consuming additional disk I/O and memory (temp files). The node enters memory pressure, and the kubelet evicts the log aggregator pod because it is Burstable and using 200 MiB (double its request). The pod is rescheduled on another node, but that node also runs log rotation, causing a cascade of evictions. The team is puzzled because the pod never exceeded its limit of 500 MiB. The fix was to either set the memory request equal to the limit (making it Guaranteed) or use priority classes to protect critical pods.

Actionable Advice: Protecting Pods from Eviction

To safeguard critical pods, implement the following: 1) For mission-critical services, set requests equal to limits for all containers to achieve Guaranteed QoS. 2) For less critical pods, use priority classes (e.g., priorityClassName: high-priority) to ensure they are evicted last. 3) Monitor node-level metrics like node_memory_MemAvailable_bytes and node_filesystem_avail_bytes to predict pressure. 4) Configure podDisruptionBudgets to ensure a minimum number of replicas remain available during voluntary disruptions. 5) Test eviction behavior in a staging cluster by simulating node pressure with stress tools. 6) Use the kubelet's --eviction-hard and --eviction-soft flags to fine-tune eviction thresholds, but be aware of the trade-off: softer thresholds reduce evictions but risk node instability.

By addressing node-level pressure, you add another layer of defense against 2 AM crashes. Now, let's answer some common questions that arise from these topics.

Frequently Asked Questions

This section addresses typical reader concerns about resource limits and pod crashes. The answers are based on common patterns and Kubernetes documentation; always test in your own environment.

What is the difference between a resource request and a resource limit?

A request is the amount of resources guaranteed to a container—the scheduler uses this to place the pod on a node with enough free capacity. A limit is the maximum amount of resources the container can use. If the node has spare resources, the container can burst above its request up to the limit. If the container exceeds its limit, it is throttled (CPU) or killed (memory). The key distinction: requests are for scheduling and guarantee, limits are for constraining usage. Setting requests too low can lead to OOM kills under pressure, while setting limits too low can cause throttling and timeouts.

Why does my pod restart even though it never exceeded its declared memory limit?

This often happens because the pod's memory usage relative to its request made it a target for OOM killing, or because the node experienced pressure and evicted the pod. Check the pod's last termination state using kubectl describe pod -n . If the reason is "OOMKilled," the kernel killed it due to memory pressure—possibly because the request was too low. If the reason is "Evicted," the kubelet removed it due to node-level pressure. Also check if the pod had a priority class lower than other pods on the same node. A common fix is to align request with actual usage and set limits generously.

Should I set CPU limits at all?

There is ongoing debate. Setting CPU limits prevents a single container from monopolizing CPU and causing latency for neighbors, which is important in multi-tenant clusters. However, limits also cause throttling, which can degrade performance. The best practice is to set CPU limits only if you have measured burst behavior and set them at least 2x the request. For clusters with dedicated nodes per team or application, you can omit CPU limits and rely on requests plus node sizing. For shared clusters, use limits but monitor throttling. Many teams find that removing CPU limits entirely (using only requests) improves latency and reduces crashes, provided they have enough node capacity.

How do I detect resource-related crashes proactively?

Use Prometheus to alert on the following metrics: container_cpu_cfs_throttled_seconds_total (for CPU throttling), kube_pod_container_status_last_terminated_reason (for OOM kills), and kube_node_status_capacity_memory_bytes minus kube_node_status_allocatable_memory_bytes (for node pressure). Set up alerts that trigger when throttling exceeds 1% of total CPU time over 5 minutes, or when any OOM kill occurs. Also, use tools like kube-ops-view for a visual overview of resource usage across nodes. Regular load testing (e.g., with locust or k6) can reveal bottlenecks before they cause production incidents.

Can Vertical Pod Autoscaler (VPA) replace manual limit configuration?

VPA can recommend or automatically adjust resource requests and limits based on historical usage, which reduces manual tuning. However, VPA has limitations: it requires pod restarts to apply changes (which can cause brief unavailability), it does not handle burst workloads well unless configured with generous margins, and it does not address node-level pressure or eviction priorities. Use VPA as a tool for steady-state workloads and as a recommendation engine, but combine it with manual oversight for critical services. For burst workloads, prefer HPA with resource profiles. VPA is not a silver bullet; it is a component of a broader resource management strategy.

Conclusion: Reclaiming Nighttime Peace of Mind

Pods crashing at 2 AM is not inevitable. By addressing the five resource limit mistakes outlined in this guide—CPU throttling from overly tight limits, misaligned memory request-to-limit ratios, failure to match resource profiles to workload types, neglected namespace quotas, and ignorance of node-level pressure—you can dramatically reduce the frequency and severity of incidents. The path to peace of mind involves three steps: measure actual usage over time, set requests to match real needs (not arbitrary minimums), and ensure limits provide safe headroom for bursts. Implement namespace governance and priority classes to protect critical workloads. No single configuration fits all environments, so invest in monitoring and iterate based on observed behavior. Your nights can be quiet, your alerts meaningful, and your deployments truly stable. Remember: resource limits are not just numbers in a YAML file—they are the guardrails that keep your system safe, and they deserve careful, continuous attention.

About the Author

This article was prepared by the editorial team for this publication. We focus on practical explanations of infrastructure and DevOps practices, drawing on patterns observed across many teams and environments. We update articles when major practices change or when new Kubernetes features affect the advice given here.

Last reviewed: May 2026

Share this article:

Comments (0)

No comments yet. Be the first to comment!