Originally published by The New Stack

Kubernetes is an amazing cloud native platform for delivering software. It elegantly combines the speed, portability and consistency benefits of containerized software development with the speed, risk reduction and cost efficiency advantages of declarative Infrastructure as Code.

The central theme here is speed. Platforms exist to reduce extraneous cognitive load on developers and IT staff, letting everyone focus on delivering better software to the business faster. The more toil you can take off people’s plates, the better their focus and the more energy they can spend on doing impactful work to the best of their capabilities.

Kubernetes helps by making it so that at least most of the time, developers don’t have to think hard about infrastructure-related concerns. The scheduler handles finding hardware to run workloads on. So developers don’t have to think about it. Containerizing workloads isolates apps from library versioning or dependency conflicts with other apps. And developers don’t have to think about it. The platform ensures that what developers do think about is germane to their value stream and their application, automating and abstracting everything else away.

At least, that’s what it would do — if it were a perfect platform. #platformgoals 🤙

Have you, or has anyone you know, ever had to define resource requests for a workload in Kubernetes?

The answer is probably “yes” if you’ve ever run serious software on Kubernetes. Managing CPU and memory requests has been a front-and-center part of the Kubernetes developer experience since the earliest days, while understanding of resource management and confidence in dealing with it properly is elusive even today. How confident have you ever been that you’ve set CPU and memory requests right?

Wait, What Are CPU and Memory Requests Again?

As a quick refresher, recall that every workload in Kubernetes eventually boils down to running pods, and running pods boils down to running containers. Pods describe the container(s) you want to run in terms of image, arguments, environment variables and other declarative details germane to running the app. To deploy software on Kubernetes, developers have to know how to create pods.

Alongside the foundational stuff like images and arguments, containers in pods also have resource requests. Kubernetes asks developers to provide numerical request values for each container — a request value for CPU and a request value for memory. The values supplied are expected to accurately convey how much of each resource the container needs when it runs.

In a production environment, these values aren’t optional. They have to be set and have to be accurate for optimal cluster operation. Kubernetes resource requests and resource management are critical to reliability, cost management and performance.

Where workloads run is based on CPU and memory request values.
Request values affect how containers are scheduled to run on nodes. If your workload requests 500 millicore of CPU and 1 GiB of memory, Kubernetes is going to find and reserve 0.5 CPUs and 1 GiB of memory on a node somewhere in the cluster, setting aside that fraction of the node’s resources for your workload’s pseudo-exclusive use.
Undersized request values can hurt performance or make apps crash.
Requests affect what happens to apps when processes on the same node are all fighting for resources and there aren’t enough to go around. When CPU requests are too low, apps can become CPU-starved. When memory requests are too low, apps can get OOM-killed. The exact impact on performance varies widely according to many factors, leading to additional costs trying to identify and debug issues.
Oversized request values can cost the business money.
Nodes have a finite amount of reservable CPU and memory resources. Once all of a node’s resources are claimed, no more pods will be scheduled on the node. When your nodes run out of room and you still have workloads to run, you need a bigger cluster. Oversized requests can quickly have you paying for more nodes than you might otherwise need.

It seems like CPU and memory requests are kinda important, right? At least, it seems like they’re an important infrastructure-related concern.

But Developers Are Responsible for CPU and Memory Requests

If you’re a developer delivering an app via Kubernetes, you know which container image(s) you want to run. Which image you specify defines which workload you’re going to have running. The container image is germane, first-principle information about the workload. Declarative configuration at its finest. You get exactly what you ask for, and what you get is, by definition, correct.

That isn’t how CPU and memory requests work, though.

As important as they are to the infrastructure-related concerns of reliability, cost management and performance, developers don’t get to just declare whatever CPU and memory requests they want. Requests are not first-principle information about your application.

Instead, the specification of CPU and memory request values is a test.

CPU and memory requests are a test because there are right answers, there are less-right answers and there are outright wrong answers. How well you do might affect the reliability, cost or performance of your workload individually. There’s a bit of a grading curve imposed by various cluster-management decisions, as well as how well other developers in your organization have done on their own request value quizzes. Because Kubernetes is a complex system, you might be let off the hook even if you don’t get a good test score. But there is such a thing as a good test score.

Sometimes there’s a hint of prisoner’s dilemma thrown in for fun. The dilemma could be set up between you, the developer and whatever platform or infrastructure team owns the budget for Kubernetes. You’ll have the option of receiving a reliability and performance benefit if you just set your request values really high. Really high requests aren’t necessarily right, but they can be an easy way to get the infrastructure reliability and performance you want, at the expense of the platform team’s Kubernetes cost management.

For the sake of argument, let’s say you’re a highly cooperative, motivated team player and you want to score well on your request values quiz for everyone’s mutual benefit: reliability, performance and good cost management, too. Here’s how you do that.

How to Set Requests Right

Study for the test.

Things that might influence what the right request settings include:

Which container(s) or application version(s) your workload is running
Which software development life cycle (SDLC) environment the workload is running in
How many users your app serves right now in that environment
The kind of hardware or VMs your Kubernetes cluster’s nodes run on

The highlight here is that with these kinds of factors involved, you can’t expect developers to be able to set resource requests right without access to additional information outside the pod definition. It’s also really hard to simulate or anticipate all of these influences and interactions apriori.

The real-world solution is to experiment. Start with a guess, then run the app and watch. Run it in a realistic environment, observe how much CPU and memory it uses, tinker, iterate a few times if you need to, then write down what you learned as your app’s appropriate CPU and memory request values.

Seems doable. Not glamorous, but doable. Unfortunately though, there are probably still a few other complications.

Typical User Journey Around Setting Requests

It turns out that setting requests right is not something you can just do once and forget about. It’s recurring toil and kind of a pain.

Because CPU and memory requirements vary between SDLC environments, right out of the “enterprise software” gate the problem morphs from having one right answer into having a permutation of right answers. That makes it a permutation of work. Even better, many of the factors that influence the right answer(s) are variable, so they change over time. How much load your app is under in dev, test and production environments matters, is different between them all and changes over time. How efficient your code is matters, and that changes over time too.

The reality is, many factors that influence the ideal CPU and memory request values for apps are dynamic. That means CPU and memory request values would ideally be elastic and adjusted over time in response to the environmental factors that influence them.

But since CPU and memory request values have been made the responsibility of developers to define — yeah, that elasticity’s not gonna happen. Developers have better things to do with their time than check in on and mess around with these infrastructure settings on a semi-regular basis. Instead, the typical Kubernetes user journey around resource management practices goes something like this:

Stage 1: Don’t bother setting requests – ends when performance problems become too frequent.
Stage 2: One-size-fits-all approach – ends when low resource utilization becomes too expensive.
Stage 3: Manually and irregularly tune every workload – grueling and poor use of engineering resources.

In the beginning of your journey, you don’t set requests, but then you start noticing the issues and realize you kind of need to. Next, you might try to pre-define standard sizes for everyone to use — think T-shirt sizing, like (S)mall/(M)edium/(L)arge. This works for a while, but eventually, there’s cost pressure because everybody is using M and L sizing everywhere when they probably don’t need to and that costs the business money. The most mature stage these days is a kind of resigned recognition that request values need to be managed, and the creation of processes or expert assistance teams to try and minimize developer toil while still managing requests well enough not to blow cost out of the water. Said another way, there isn’t a great final stage today.

Something Is Wrong with This Picture

The whole CPU and memory requests situation is perplexing. The existing incentives and structure don’t result in ideal outcomes, and it isn’t software developers’ fault.

What’s wrong here?

First and foremost, remember again that the Kubernetes is fundamentally declarative. We have spec to capture user intent, and we have state to convey system observations. What should be (desired) vs. what is (observationally).

CPU and memory requests might be presented as if they are germane spec for delivering your app, but they aren’t. Not really. No, your app’s resource usage is actually more like state, and you fill in request values based on observations about how much of each resource type your application consumes. It’s only spec because it isn’t Kubernetes that manages this data. It’s you. It’s me. It’s developers.

Kubernetes needs proper CPU and memory request information to properly schedule pods across nodes and to operate at high levels of reliability, performance and cost-effectiveness. No argument there. But it’s a suboptimal platform experience to burden developers with manually providing this important information and also to rely on them to spend time keeping it up to date.

Putting my product manager and technical UX designer hat on, the conclusion seems pretty clear to me. We shouldn’t have to think about CPU and memory resource requests. Knowing the exact right request values is not germane to our work. It is inarguably a very important infrastructure concern; it’s just not ours. The platform should be doing this for us.

What’s the Alternative?

Let me try and paint the picture of a more idealized experience.

Stop setting CPU and memory requests in Kubernetes. For the majority of workload types and profiles, it simply shouldn’t be necessary. Make CPU and memory request values more like state. You don’t set request values when you deploy your app; you go and look at them later if you’re curious about that observation-based infrastructure concern. For a small fraction of your workloads, maybe you’ll want to configure general constraints on CPU and memory requests to address a business or app-specific reality. Constraints might be minimums that CPU or memory allocation should never go below or maximums they should never exceed. But observing the app day to day and managing the specific numbers? That should be handled for you, by the platform. An infrastructure concern, abstracted. Automated. The cognitive load is reduced for you and for all other developers, making more time for you to focus on higher-output work that delivers value faster.

Where I work, this is the evolved Kubernetes developer experience we’re working hard to create.

The piece that’s missing is automation, specifically, automation that can do the following:

Observe and collect data about the CPU and memory requirements for every Kubernetes workload all the time.
Using observation data and intelligent machine learning algorithms, calculate appropriate CPU and memory request settings for all of them.
Automatically generate and apply perfectly tuned CPU and memory requests to all workloads on a regular recurring basis.
Provide easy control of resource management, through policy and exceptions, for clusters, namespaces and/or individual workloads.

Critically, this automation has to be ridiculously easy to set up. The whole point is to reduce cognitive load for developers and IT staff. The harder it is to achieve, the less of a benefit it actually is.

At StormForge, we think we’ve built a product that does it. Optimize Live is a plug-and-play Kubernetes SaaS operator that accomplishes all of the above. Combined with the prescriptive use of Limit Ranges (a lesser-known but native Kubernetes resource type that can fill in absent CPU and memory request values with placeholders when new workloads are created), installing Optimize Live on any Kubernetes cluster can instantly obviate the need for developers to set CPU and memory requests in their manifests at all.

For those of us who’ve spent a lot of time working with or focusing almost exclusively on becoming expert users of the platform, Kubernetes feels pretty exciting. But we also know that Kubernetes is still a very complicated platform for developers to onboard to and operate. To make it better, we need to make the experience of using it simpler. We need to identify, abstract and automate away infrastructure-related concerns that are still part of that experience today.

In an ideal world, developers would stop setting CPU and memory requests in Kubernetes.

Optimize Live is StormForge’s answer to this problem. Kubernetes has become big enough and this problem is universal enough that we expect there will eventually be others. However anyone tries to handle it, though, the desired outcome is the same: developers shouldn’t be doing this work this way. The platform should be handling it for us.

P.S.

Most people easily agree that toil sucks. Like anything else in IT though, you shouldn’t take my word for it. It’s important to try and actually measure its impact. But figuring out how to measure return on toil reduction is hard.

It should come as no surprise that most organizations we’ve worked with at StormForge have tended toward having Kubernetes workloads that are mostly overprovisioned for CPU and memory requests, rather than underprovisioned. It makes sense. Reliability and performance come from having greater-than-or-equal-to enough resources, and for most developers reliability and performance concerns trump infrastructure cost management.

Since it’s not worth developer time to constantly assess and elastically adjust workload sizing, the in-use request values usually stabilize into that greater-than-enough range. This means that once StormForge starts automating and optimizing resource management across a cluster, we often see total resource requests cut in half. Halving those requests then leads to a reduction in required cluster size by a related margin.

Which means cluster costs go down.

It is hard to measure the exact return on taking toil off of developer’s plates. It’s not as hard to measure cloud costs. It does make me feel like we’re onto something to consistently see cloud costs go down, as a proxy signal for value, when we automate this toil away.

Stop Setting CPU and Memory Requests in Kubernetes