Guide
Chapter 1 - Kubernetes Autoscaling
Autoscaling in Kubernetes is all about having the right resources at the right time: the goal is to balance cost and reliability through automation, fine-tuning resource allocation while ensuring application reliability under varying load levels. Autoscaling enables efficient use of cloud resources, allowing for elastic on-demand scalability and ensuring you pay only for the needed resources. This process is managed through three primary scaling dimensions: cluster scaling, horizontal scaling, and vertical scaling.
Autoscaling for each of these dimensions comes with its own set of challenges. Cluster autoscaling must balance cost, efficiency, and response times, often necessitating complex configurations. Horizontal scaling can be rapid and robust but may require careful setup to trigger based on metrics other than current resource consumption, which is needed to scale predictively rather than reactively. Vertical scaling is theoretically applicable to all workloads, but it’s tricky and can lead to service interruptions or reliability issues if attempted without sufficiently predictive intelligence driving each scaling decision. Using horizontal and vertical scaling together is ideal, but harmonization is required to prevent the two mechanisms from thrashing and ensure that pod counts and pod sizes scale effectively in tandem.
This article dives deep into Kubernetes autoscaling, exploring key components and practical challenges that Kubernetes administrators, DevOps engineers, and cloud architects encounter. It offers a comprehensive overview of advanced autoscaling strategies and tools.
To address the diverse scaling needs within a Kubernetes environment, autoscaling is typically approached in three dimensions: cluster scaling, horizontal scaling, and vertical scaling. Each of these scaling dimensions has unique strengths and weaknesses related to resource optimization. The most ideal resource management outcomes are achieved through intelligent adoption and blending together mechanisms to autoscale on all three.
Cluster scaling addresses the need to adjust the overall capacity of the Kubernetes cluster based on the sum total of workload resource requirements across the whole cluster. The scale of a cluster is determined by the number of nodes and the allocatable CPU, memory, and other resources nodes they contribute.
A cluster scaling strategy — whether manual or automated, static or elastic — is crucial for managing the cost and reliability of the cluster as a whole. If the cluster doesn’t have enough nodes to schedule all pods, some workloads simply won’t run. If the cluster’s nodes are predominantly idle or underutilized, the cluster will be costing money unnecessarily and with no real return.
All other scaling mechanisms rely on appropriate cluster scaling. The cluster must have enough node capacity to schedule all requested pods at their requested sizes. A cluster autoscaling solution automatically adds or removes nodes from the cluster to ensure sufficient node resources to run all pods while avoiding costly overprovisioning.
Horizontal scaling, primarily managed through the Horizontal Pod Autoscaler (HPA), dynamically adjusts the number of pod replicas in a deployment or replication controller. This adjustment is based on real-time metrics, such as CPU utilization or custom metrics that reflect or predict the application’s performance and demands.
For workloads that support it, horizontal autoscaling is ideally suited to enable rapid elastic capacity changes, quickly scaling up or scaling down pod replicas in response to fluctuating demand on a component or application. This rapid response characteristic often makes horizontal autoscaling essential for the purpose of ensuring application reliability in the face of unpredictable traffic or events.
Horizontal scaling increments or decrements resource allocation for a workload of whole pods at a time, so it is not necessarily well suited for workloads that do not need or benefit from having more than one instance of a pod.
Vertical scaling, often associated with Kubernetes’ built-in Vertical Pod Autoscaler (VPA), focuses on adjusting individual pods’ CPU and memory allocations to best match observed or predicted workload resource consumption.
Vertical scaling as a practice can apply to all workloads, regardless of whether they have one replica or many, making it more broadly applicable than HPA. However, vertical scale-up or scale-down activity usually requires restarting pods. This makes vertical scaling less appropriate than HPA for rapid response to unexpected spikes in demand.
Vertical scaling is best suited for recurring adjustments to resource allocations to ensure each pod’s base resource allocation is set at an optimal balance between cost and reliability. A vertical scaling solution performs tailored adjustments automatically. The frequency of vertical tailoring can be weekly, daily, or even dynamically in response to reliability events.
Before diving into autoscaling configurations, assess your applications’ dynamic needs. This assessment will guide you in choosing the right mix of cluster, horizontal, and vertical autoscaling to meet your application’s specific requirements. Understanding workload patterns, peak usage times, and resource consumption trends allows you to tailor an autoscaling strategy that ensures optimal performance and cost efficiency.
Effectively autoscaling Kubernetes workloads requires a nuanced understanding of application requirements, the intelligent integration of scaling metrics, and the seamless operation of different autoscaling mechanisms. In the next section, we will explore key strategies with the standard tooling for autoscaling on each dimension, the associated challenges, and advanced solutions that improve upon or replace the standard tooling.
Cluster Autoscaler is the standard tool for managing cluster nodes dynamically, and has support for autoscaling Kubernetes across more than twenty different cloud providers.
Cluster Autoscaler is a reliable workhorse with predictable performance. When it comes to ensuring that enough nodes are deployed to schedule all of a cluster’s pods, there is little to complain about.
Because of the additional complexity of dealing with multiple node configurations, however, unless there is a very good reason for workloads to request a specific node type (which they can do through nodeAffinity spec), it’s simplest to define a single node type which autoscaler will deploy when it needs to provision additional capacity.
Additionally, while scaling up is very responsive, Cluster Autoscaler deprovisions nodes only one at a time with a predefined delay before the next step-down. As a result of this behavior, a rapid scale-up of many nodes may take some time to slowly return to previous levels once the nodes are no longer needed.
Benefits:
Challenges:
Karpenter is an advanced project that replaces the use of cluster autoscaler. Initially released by AWS in 2021 and donated to the CNCF in 2023, its stated goals are to take full advantage of cloud capabilities while remaining fast and simple to use.
Karpenter’s approach to cluster autoscaling is designed to improve how clusters respond to dynamic workloads by scaling faster and requiring less manual configuration for optimal outcomes. Unlike the traditional cluster autoscaler that requires users to specifically configure which node instance types to use and reacts slowly to reductions in workload demand, Karpenter automates the selection of specific instance types and responds quickly to opportunities to consolidate nodes, resulting in autoscaling that’s faster, simpler, and more efficient.
Here’s how Karpenter improves Kubernetes autoscaling:
At the time of writing, Karpenter is supported on AWS and Azure.
Benefits:
Challenges:
The HPA is a core Kubernetes resource type built into the Kubernetes platform. HPA is widely adopted for its simplicity and effectiveness. Horizontal scaling isn’t appropriate for use with all workloads since not all workloads are designed for parallelism, but when HPA can be used, it’s one of the best ways available to ensure automated elasticity on a per-workload basis.
HPA is simplest when using Kubernetes’ inbuilt resource metrics pipeline, setting targets for percent CPU or memory utilization. CPU and memory consumption are usually trailing indicators for load rather than leading ones, so HPA also supports integrating with custom or external metrics, such as work queue depth or upstream gateway connection count. Integrating with custom metrics in this way is possible but requires a deeper dive into Kubernetes’ metrics server and the workload’s performance characteristics. The struggle some users face in plumbing for external metrics points to the need for solutions to simplify this process.
Notably, users often face an either/or choice between HPA and VPA if considered in isolation since the two solutions often use the same metrics when making scaling decisions. The net benefit of any autotuning or adjustment by a VPA may be reversed or undone by an unharmonized HPA when the two systems duel to assert conflicting homeostasis against common observations such as CPU or memory usage.
Benefits:
Challenges:
KEDA is an advanced project that builds on and enhances HPA, effectively replacing the HPA interface. KEDA was originally created to address a critical missing feature of the HPA, scaling on arbitrary indicators or metrics. It became an official CNCF project in 2020.
KEDA introduces a shift in Kubernetes horizontal autoscaling by enabling event-driven scaling. This approach allows applications to scale based on the occurrence of specific events, offering a more granular and responsive scaling mechanism.
A differentiator compared to HPA is KEDA’s wide selection of built-in “scalers,” or plugins for interfacing with external scaling indicators or event sources, which drastically simplifies the work required to autoscale on common leading indicators of load. Easily scaling on leading indicators or events helps developers ensure that applications can efficiently handle bursts of activity, scaling up rapidly to meet demand and scaling down to conserve resources when the activity decreases.
A distinctive feature of KEDA is its ability to scale workloads down to zero replicas. This capability is particularly beneficial for workloads that experience sporadic activity, ensuring that resources are not consumed when the workload is idle. Scaling to zero can lead to substantial cost savings, especially for applications with variable traffic patterns.
Benefits:
Challenges:
The VPA is the most commonly known tooling for vertical autoscaling and is maintained as part of the autoscaler repository alongside the cluster autoscaler.
VPA is less commonly used than either cluster autoscaler or HPA by a significant margin. The 2023 Datadog Container Report found that over half of the surveyed organizations used HPA, while less than 1% used VPA. This is due perhaps to the high configuration effort required to set up VPA at the scale necessary to achieve a significant ROI, or perhaps, it is due to few users trusting that the standard VPA’s scaling recommendations won’t negatively impact their workload reliability. The fact that the built in HPA and VPA cannot easily be deployed together for the same workload almost certainly plays a role as well. If HPA is seen as improving reliability and VPA is seen as improving cost efficiency, reliability will almost always win out.
The simultaneous use of VPA and HPA poses a challenge primarily due to their potentially conflicting actions: HPA scales pod numbers based on usage metrics, while VPA adjusts pod resource requests – requests that then factor into the calculation for usage. This interplay can break the autoscaling strategy unless carefully managed.
Configuring horizontal and vertical scaling to work in concert has been a sought-after project goal for years, but it is not appropriate today to use the standard HPA and VPA implementations together on the same workload.
Benefits:
Challenges:
StormForge Optimize Live is an advanced solution that replaces the use of VPA, solving for the limitations that have prevented VPA’s widespread adoption.
The recent rapid progress in AI and machine learning is causing these technologies to be applied to nearly every tech field, including Kubernetes. In the context of Kubernetes autoscaling, these advancements can accurately anticipate workload demands by leveraging historical data and real-time metrics. This allows AI/ML to significantly improve how autoscaling decisions can be made, moving toward more predictive and dynamically optimized scaling methods.
In the context of vertical scaling specifically, tools like StormForge use these technologies to address quality and harmonization issues that have historically held back the widespread adoption of tools like the standard VPA.
StormForge’s Optimize Live platform seamlessly integrates with Kubernetes to offer continuous vertical right-sizing of applications, ensuring that they run with the most efficient allocation of resources.
Here’s how StormForge improves Kubernetes autoscaling:
As Kubernetes environments grow increasingly dynamic and complex, integrating AI and ML in autoscaling solutions like StormForge represents a new advancement in vertical autoscaling capability for Kubernetes environments.
Incorporating best practices for successful autoscaling in Kubernetes environments is crucial for maintaining optimal application performance and resource efficiency. Here, we outline key considerations and practices, including insights on integrating various tools to guide you in refining your autoscaling strategy.
Autoscaling is about optimization, and success in optimization requires understanding. You need to be able to measure your performance as you adopt or tune your autoscaling strategy.
Use tools like Prometheus for monitoring and Grafana for visualizing Kubernetes metrics. Establish baseline performance metrics and identify key performance indicators (KPIs) that accurately reflect your cluster’s health and efficiency levels.
For cluster autoscaling, measure and track resource allocation efficiency at the node level and in aggregate. For horizontal autoscaling, pay attention to workloads idling at minReplicas or frequently topping out at maxReplicas. For vertical scaling, measure and track resource usage against resource requests at the pod level, as well as activity like OOMKills and throttling.
Tools like Kubecost can help provide some of these insights in a Kubernetes-specific packaged offering, as well as attaching cost estimates to the raw resource insights.
Each autoscaling mechanism provides value independently, and the Pareto principle commonly applies: 80% of the value is often delivered through 20% of the possible effort. Identify which elements of autoscaling are implemented easily while delivering a broad impact. Prioritize the adoption of those elements first.
Cluster autoscaling is a good mechanism to adopt early due to the relative simplicity of the problem space. While optimization of either cluster autoscaler or Karpenter can go deep, getting even a fairly basic autoscaler config in place will immediately deliver the ability for the cluster to dynamically scale down nodes when not needed, directly impacting cluster costs.
Because horizontal autoscaling is implemented per workload, the cost/performance impact of using HPA depends on the size and load fluctuation of each potential workload to autoscale. Sometimes, applying an HPA to even one or a handful of workloads can make a big impact.
Vertical scaling with tools like StormForge can have a cluster-wide impact for very low levels of time investment. Even on clusters with some workloads that you don’t want to vertically autoscale, the ability to either target or exclude swaths of applications by namespaces or individual deployments lets you apply vertical autoscaling to a majority of workloads on the cluster without needing to invest time in achieving autoscaling coverage for everything.
Simultaneous use of cluster, vertical, and horizontal autoscaling is required to achieve total elasticity and optimal cost-effectiveness while maintaining high reliability for applications on Kubernetes. The challenge lies in making all of these autoscaling dimensions operate in concert, ensuring that they complement each other to optimize resource allocation and application performance rather than conflicting.
Adopting and using advanced projects that simplify or enhance each autoscaling mechanism and which naturally work in concert with each other is key to achieving sustainable and maximized performance of all three. Identify and select appropriate tools early in the adoption process to accelerate time to value.
At the end of the day, Kubernetes is a platform supporting workloads owned by many different teams or developers, and spending time mastering Kubernetes is not the end goal or value-add in and of itself. Autoscaling should be made either as accessible or as invisible to developers as possible, reducing the cognitive load required for them to take advantage of it when building and deploying their applications.
Towards that end, create a knowledge base with resources to understand your environment’s autoscaling behavior and choices. Provide case studies and tool documentation. Organize regular short training sessions and workshops, including hands-on labs, to use or simply understand the Kubernetes autoscaling tools you’ve selected.
Encourage a culture of experimentation and learning where team members can safely explore current or new autoscaling mechanisms in sandbox environments.
In this article, we explored the three key dimensions of Kubernetes autoscaling: cluster, horizontal, and vertical. We discussed the challenges associated with autoscaling on each dimension, such as the need for efficient cluster autoscaling, the complexity of implementing custom metrics with HPA, and the fairly significant limitations of traditional VPA. To address some of these challenges, we highlighted advanced projects like Karpenter, KEDA, and StormForge, each enhancing Kubernetes autoscaling’s flexibility and efficiency for one of the three dimensions.
Finally, we outlined best practices for successful Kubernetes autoscaling and discussed how, by adopting these best practices, you can optimize your Kubernetes deployments for peak performance, cost-efficiency, and resilience in the face of dynamic workloads.
As Kubernetes environments evolve, staying informed about the latest autoscaling strategies and tools is essential. By leveraging advanced solutions like Karpenter, KEDA, and StormForge, and by following best practices, you can optimize your Kubernetes deployments for performance, cost-efficiency, and reliability.
We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.