Blog

Flexibility Matters When Setting Kubernetes Resource Limits


By Shane Sorbello | Jan 07, 2025

an illustration of 4 silhouettes dancing on a green background

When it comes to allocating CPU and memory resources to workloads in Kubernetes, there’s  broad agreement on the importance of setting request values, but setting resource limits is a contentious topic. The lack of controversy around setting requests is due to general awareness that workloads without a minimum amount of CPU are at risk of pod eviction. The situation is less clear-cut with limits. 

The internet is full of opinions, of course. With a modest amount of searching, you’ll find these three suggestions:

  1. Don’t set limits at all
  2. Always set limits
  3. Always set limits for memory, but not for CPU

If you accept the second or third suggestion, you’ll encounter further division on how to set memory limits, with the prevailing camps being:

  1. Set memory limits equal to requests
  2. Set memory limits greater than requests

So what should you do? The short answer is: It depends. On a variety of factors, including CPU versus memory resources and the nature of your specific applications and use cases. Let’s look at a few scenarios for each resource type to help guide your decision-making about whether and how to set resource limits. 

Considerations for Setting CPU Limits 

The general consensus is to not set CPU limits because workloads with requests set in alignment with expected usage will get the resources they need. As such, the risk of any workload over-consuming resources and starving its neighbors is mitigated. Additionally, CPU limits can lead to unnecessary throttling. This is especially evident in the world of multithreaded applications, where allotted CPU time can be quickly consumed by more than one thread in a container’s process. This also leaves a lot of unused computing power on the table, which is wasted every CPU cycle until the container’s job is done. 

While those are all logical reasons to not set CPU limits, doing so across the board doesn't satisfy all use cases. Consider applications that are I/O bound. For example, ad tech platforms ingest and process millions of requests per second, calling out to multiple external sources and compiling a range of data to determine which ads to display. With all of this activity happening in milliseconds, latency must be minimized to ensure performance is consistent and predictable. As more workloads with different needs are provisioned on the same node, CPU limits help maintain predictable utilization and prevent performance degradation that might occur on highly sensitive workloads due to unexpected bursts from noisy neighbors.

Another scenario where setting CPU limits is beneficial is when applications are being developed and tested. Understanding a workload’s resource requirements and performance limitations prepares you to configure the workload for real-world traffic. By setting CPU limits that are equal to CPU requests and allowing a workload to experience CPU throttling when usage tries to burst above those values, a developer can learn roughly how much CPU the workload requires before it faces the real world. 

These are just a few examples of how a blanket policy of not ever setting CPU limits doesn’t always work. But what about setting limits for memory?

Considerations for Setting Memory Limits

While a workload’s compute needs might be stretched out and satisfied over time (since CPU is a compressible resource), we’re not so lucky with memory. Being an incompressible resource, memory can’t be rationed out over time the same way as CPU. A workload either has the memory it needs to operate or it doesn’t. If it doesn’t, you end up with an out-of memory (OOM) kill — or worse, if you’re dealing with a memory leak, which can provoke failure across the entire node. In those scenarios, setting memory limits helps avoid potential outages.

A common practice is to set memory limits equal to memory requests to avoid workloads taking more than their fair share. This forces excessively hungry workloads to OOM kill when hitting their limits, so pods that are consuming less than their limit will have access to the memory they need. Additionally, node pressure eviction (from low memory availability) will be less likely. 

The OOM kill also signals that the workload requires more memory, and adjustments can be made to raise the request and limit values accordingly. Monitoring memory usage relative to the request and limit values provides a window into how well those values are set. Alerts on OOM kill metrics can indicate that those workloads need more memory than initially assumed, and those alerts can trigger actions to remediate the problem. 

In these ways, setting static limit values is effective; however, it isn’t always the best practice. Workloads with needs that change across their life cycle are a good example of when a different approach is needed.  

What to Do About Workloads with Dynamic Requirements

There’s an element of flux in all applications. Once real-world traffic comes into the picture, application resource needs change. There are some workloads that experience startup CPU spikes that are high above steady state usage. JVM workloads are a well-recognized example of this. 

While a fixed limit-to-request ratio might be beneficial in some cases, it will not necessarily work with JVMs. If a workload’s requests are solely determined by its steady state usage, and limits are set as a fixed multiple of requests, the workload will not have the resources it needs to start up in the first place. Static values will not always account for the changing needs of these workloads without introducing inefficiency and increasing cost through overprovisioning.

To further complicate things, internal resource configurations in the JVM are tied to limits in Kubernetes. For instance, the heap size is calculated as a percentage of the configured memory limit. This means that if the limit is lowered, the heap size will also be lowered, potentially starving a workload of memory. Conversely, if the limit is too high, the amount of heap allocated will also be high, consuming unnecessary amounts of memory on the node, driving up cost. In this situation, it would be helpful to dynamically manage memory limit values to give the application the headroom needed, without dipping below an undesirable threshold. 

On the CPU side, when limits are set using certain versions of Java (Java 8u191 and some versions of Java 11), the `XX:ActiveProcessorCount` defaults to the total number of cores available on the node. This could lead to throttling during resource contention because those unintended cores will not be available to workloads on the node that might need them. If your application is running on a Java version in one of those ranges and you are setting CPU limits on your workloads, you might need to override the `XX:ActiveProcessorCount` to a more reasonable value. 

Suffice it to say, workloads with dynamic requirements are especially unique when it comes to setting limits.

Flexibility and Automation are Key

The reality is that there is no single axiomatic truth when it comes to setting Kubernetes limits. There are people on all sides of the debate with valuable advice gained from both real-world experience and theoretical knowledge. 

Since different best practices apply to different scenarios, platform engineers and application developers need to cover all the bases. And the more bases there are, the more time-consuming and error-prone manual operations can be. If you’re manually setting requests and limits throughout the life cycle of a workload, you’re incurring operational overhead, draining time and effort away from innovation. 

Automation informed both by engineering expertise and the real-time needs of an application is the only combination suited to address the challenges we’ve discussed. Some tools, like the Vertical Pod Autoscaler (VPA), work well for a subset of use cases, but they are limited when it comes to covering the gamut. 

For example, single replica workloads will inevitably be disrupted when the VPA makes changes to resource values, since the “old” pod will be evicted before the new one is created. Other tools, like the Horizontal Pod Autoscaler (HPA) or Kubernetes Event-Driven Autoscaling (KEDA), are great for supporting real-time scaling needs, but they work best in tandem with vertical pod scaling and cluster autoscaling. Having a solution that allows all three dimensions of autoscaling to work harmoniously drives a truly automated and flexible rightsizing practice. 

Conclusion: Pairing Diverse Practices with Diverse Needs

We’ve looked at a variety of application needs for different use cases that illustrate why the debate of setting Kubernetes limits is nuanced. From environments with mixed priority workloads to applications that are sensitive to latency, workloads that consume more resources at startup than during steady state usage, workloads with predictable or unexpected spikes at varying intervals — and any combination of the above — a mechanism for automatically determining resource values and applying them safely takes engineers out of the grind. It also empowers them to pair diverse practices with diverse needs. 

Intelligently setting (or not setting) limits (and requests) is just one piece of the Kubernetes resource management puzzle. Flexible configurations and guardrails support evolving workload requirements and engender necessary trust in automation. 

At StormForge, we designed Optimize Live with these needs in mind: to flexibly meet diverse demands and safely automate away the time-consuming, error-prone practice of manual Kubernetes resource management. 

See for yourself with a free trial or play around in our sandbox environment

Latest Posts

We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.