Guide
Chapter 2 - Kubernetes Cluster Autoscaler
The Cluster Autoscaler (CAS) is a tool designed to automatically adjust the size of a Kubernetes cluster based on resource needs. It monitors pending pods and scales up or down accordingly. The CAS continuously monitors the API server for unschedulable pods and creates new nodes to host them. It also identifies underutilized nodes and removes them after migrating pods to other nodes.
Some key features of the Cluster Autoscaler include Kubernetes resource-conscious scaling, leveraging expanders for multiple node groups, and respecting Kubernetes’ pod disruption budgets (PDBs) and scheduling constraints. It can be set up using autodiscovery or through manual methods. In this article, we’ll review these concepts and demonstrate both setup processes step by step.
The CAS is designed to automatically adjust the size of a Kubernetes cluster, adding or removing nodes based on the resource needs of workloads. It scales out nodes when it detects unschedulable pods. This system focuses on maintaining just enough capacity to handle workload demands without manual intervention. Node autoscaling is one of the three pillars of Kubernetes autoscaling, along with vertical and horizontal pod autoscaling.
The Cluster Autoscaler works by monitoring the pending pods in the cluster. If there are pods that cannot be scheduled due to insufficient resources, it scales up the cluster. Conversely, it scales down when nodes are underutilized to save costs.
This tool is particularly useful in dynamic environments with fluctuating workloads, helping maintain the balance between performance and cost efficiency. The Cluster Autoscaler simplifies operations and enhances scalability by automating resource management.
The CAS looks for pods that can’t be scheduled, effectively monitoring the resource usage of your Kubernetes cluster. When it detects that pods cannot be scheduled due to insufficient resources, like memory or CPU, it automatically adds new nodes. This ensures that applications have the resources they need to run smoothly.
Conversely, if the Cluster Autoscaler identifies underutilized nodes, it removes them. This helps optimize resource usage and reduce costs. The process involves migrating pods from underutilized nodes before shutting them down.
The CAS continuously watches the API server for unschedulable pods, checking every 10 seconds by default. A pod is considered unschedulable when the Kubernetes scheduler cannot find a node with sufficient resources to accommodate it, as indicated by the “schedulable” pod condition being set to false. When such pods are found, the Cluster Autoscaler attempts to find or create new nodes that can host them.
The autoscaler operates under the assumption that all machines within a node group have identical capacities and labels. Scaling up creates a new node similar to the existing ones in the node group, which initially hosts no user-created pods but includes node manifests and DaemonSets. It creates template nodes for each node group to simulate if the unschedulable pods can fit on a new node, using a simplified process that may require multiple iterations.
Node creation speed depends on the cloud provider and the provisioning process, including TLS bootstrapping. The Cluster Autoscaler expects new nodes to register within 15 minutes; otherwise, it stops considering them and may attempt to scale up a different node group. This way, the CAS quickly adjusts to changing resource needs while minimizing delays in pod scheduling.
The CAS checks for unneeded nodes every 10 seconds if no scale-up is required; the check interval can be configured via the --scan-interval flag. A node is considered unneeded if all of the following default requirements are met:
If a node remains unneeded for over 10 minutes, it will be terminated. This interval is configurable, and the autoscaler only terminates one non-empty node at a time in order to minimize disruption. Empty nodes can be terminated in bulk, up to 10 at a time, which is also configurable.
When a non-empty node is terminated, its pods are drained, and the node is cordoned to prevent rescheduling. DaemonSet pods can be configured for eviction on both empty and non-empty nodes using specific flags. This careful process ensures efficient scaling down while maintaining cluster stability.
While the CAS is considered a node-centric autoscaler, it does rely on pod scheduling mechanics. The CAS algorithm ensures enough capacity for all pods in the cluster to be scheduled, which serves as a primary metric for scaling. Additionally, it aims to confirm that no underutilized nodes are in the cluster.
When the Cluster Autoscaler detects unschedulable pods, it decides which node group to expand by using expanders, which determine the strategy for selecting the appropriate node group for scaling. You can specify the desired expander using the --expander flag, which provides different strategies for optimizing node selection.
The Cluster Autoscaler offers several expanders, each of which features unique advantages. The default expander is random, suitable when no specific node group needs priority. Other expanders include most-pods for maximizing pod scheduling, least-waste for efficient resource utilization, least-nodes for minimizing node count, price for cost efficiency, and priority for user-defined preferences.
Starting with version 1.23.0, multiple expanders can be used together. This allows you to create a hierarchy of expanders where the output of one feeds into the next. For example, combining priority and least-waste can produce optimal scaling decisions based on both user priorities and resource efficiency.
The Cluster Autoscaler respects Kubernetes pod disruption budgets (PDBs) when scaling down nodes, making sure that critical pods are not disrupted. PDBs define the maximum number of concurrent disruptions allowed, protecting essential workloads from being interrupted.
The autoscaler also considers pod priority and preemption settings to determine which pods can be safely rescheduled or disrupted. This ensures that high-priority pods remain operational while low-priority ones are considered for eviction.
There are two primary ways to set up the Cluster Autoscaler: autodiscovery and manual. We will go through each of them step by step.
To enable autodiscovery, you need to tag your autoscaling groups (ASGs) with specific key-value pairs that the Autoscaler recognizes. This setup simplifies scaling by automatically identifying which groups to scale based on the tags.
The autodiscovery setup is particularly useful for dynamic environments where autoscaling groups may change frequently. By relying on tags, the CAS can automatically adapt to new groups or configurations without requiring manual intervention.
To enable autodiscovery in Cluster Autoscaler, follow these steps.
The IAM role includes all the permissions to list, describe, and manage nodes in an autoscaling group.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"ec2:DescribeImages",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplateVersions",
"ec2:GetInstanceTypesFromInstanceRequirements",
"eks:DescribeNodegroup"
],
"Resource": ["*"]
},
{
"Effect": "Allow",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": ["*"]
}
]
}
Using the autodiscovery option requires adding tags. Tag your ASGs as follows:
k8s.io/cluster-autoscaler/enabled: ""
k8s.io/cluster-autoscaler/<CLUSTER_NAME>: ""
These tags tell the autoscaler which groups are part of your Kubernetes cluster and should be considered for scaling operations.
Download the autodiscovery manifest file from the CAS github repo, and edit it as shown below:
Note: It's recommended that you use the same minor version of Cluster Autoscaler as your Kubernetes version. In the deployment named cluster-autoscaler (line 145), change the image version to reflect the minor version of the Kubernetes cluster that you’re using. For this example below, we are running Kubernetes version 1.30 so we’ll use v1.30.0 of CAS.
containers:
- image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.0
In the deployment named cluster-autoscaler (line 165), change the –node-group-auto-discovery line by substituting your cluster name into <YOUR CLUSTER NAME>
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<YOUR CLUSTER NAME>
Install CAS by using this command:
kubectl apply -f cluster-autoscaler-autodiscovery.yaml
The manual setup option for Cluster Autoscaler involves explicitly specifying the names of the autoscaling groups you want to manage. This method requires you to list the desired groups in the Autoscaler’s deployment configuration.
The manual setup is ideal for static environments where autoscaling groups are stable and unlikely to change frequently. This way, only specific groups are managed by the Autoscaler, providing precise control over scaling operations. It offers more control over which groups are included but requires more maintenance effort compared to the autodiscovery approach, especially in dynamic environments.
Here are the steps to follow.
For the manual setup, the IAM role would look a bit different than we saw earlier:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"eks:DescribeNodegroup"
],
"Resource": ["arn:aws:autoscaling:${YOUR_CLUSTER_AWS_REGION}:${YOUR_AWS_ACCOUNT_ID}:autoScalingGroup:*:autoScalingGroupName/${YOUR_ASG_NAME}"]
}
]
}
Download the Multi-ASG manifest file from the CAS github repo, and edit it as shown below:
Note: It's recommended that you use the same minor version of Cluster Autoscaler as your Kubernetes version. In the deployment named cluster-autoscaler (line 145), change the image version to reflect the minor version of the Kubernetes cluster that you’re using. For this example below, we are running Kubernetes version 1.30 so we’ll use v1.30.0 of CAS.
containers:
- image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.0
Define your autoscaling groups in the Cluster Autoscaler configuration. To configure the manual setup, you need to pass the --nodes
flag with the format minNodes:maxNodes:ASGName
in the Cluster Autoscaler command line. This flag defines the minimum and maximum node count for each specified autoscaling group.
Here is how it looks in the command line:
./cluster-autoscaler --cloud-provider=aws --nodes=1:10:<ASG_NAME_1> --nodes=2:20:<ASG_NAME_2>
Install CAS by using this command:
kubectl apply -f cluster-autoscaler-multi-asg.yaml
The Cluster Autoscaler may face challenges with scaling granularity, leading to potential issues such as overprovisioning or underprovisioning resources. When scaling up, the Cluster Autoscaler might add more nodes than necessary, resulting in unused capacity and increased costs. Conversely, during scale-down operations, it might not remove enough nodes, leaving the cluster with excess resources that are not optimally utilized.
Granular scaling decisions can be complex due to varying workloads and unpredictable demand patterns. The CAS uses predefined thresholds to determine when to add or remove nodes, but these thresholds may not always align perfectly with real-world application needs. As a result, finding the right balance between performance and cost efficiency can be challenging.
To address these granularity issues, it is crucial to configure the Cluster Autoscaler with accurate scaling parameters and limits. Regularly monitoring cluster performance and adjusting thresholds based on observed workload patterns can help improve scaling decisions. This proactive approach certifies that the Cluster Autoscaler operates efficiently while maintaining the desired level of resource utilization.
The Cluster Autoscaler may encounter limitations when managing diverse node groups with varying instance types. Different workloads require different types of nodes, such as high CPU, high memory, or GPU-enabled nodes. However, the CAS may struggle to effectively balance resource allocation across these heterogeneous groups.
One of the challenges is ensuring that the right node group is selected for scaling based on the specific needs of the unschedulable pods. This requires precise configuration and understanding of each node group’s capabilities and limitations. In some cases, the Cluster Autoscaler may not fully utilize the diverse resources available, leading to resource allocation inefficiencies.
To mitigate these node group limitations, organizations should carefully plan their node group configurations and scaling policies. Grouping similar workloads and defining clear scaling priorities can help the Autoscaler make more informed decisions. By aligning node group management with application requirements, it can optimize resource allocation and improve overall cluster performance.
Using the Cluster Autoscaler may introduce performance overhead that affects the efficiency of the cluster. It continuously monitors resource usage and makes scaling decisions, which can consume computational resources and network bandwidth. In clusters with a high number of workloads, this monitoring process might lead to increased latency and resource contention.
Another potential overhead comes from the time required to scale nodes up or down, particularly in large clusters. Adding or removing nodes is not instantaneous and can introduce delays, impacting the ability to respond quickly to sudden changes in demand.
To minimize performance overheads, it’s essential to optimize the configuration of the Cluster Autoscaler. Adjusting the frequency of resource checks and tuning scaling thresholds can help reduce unnecessary computations. Additionally, ensuring that scaling actions are well-aligned with workload demands can minimize delays and improve responsiveness.
While the Cluster Autoscaler offers a reliable way to manage resources in Kubernetes clusters, it might not fully address some advanced autoscaling features available in modern solutions. Newer autoscalers, like Karpenter, provide more sophisticated capabilities, such as faster scaling, improved resource utilization, and better integration with diverse workloads. While we have a separate comparison of Karpenter and Cluster Autoscaler, here is a summary of the major differences.
While the Cluster Autoscaler itself is easier to set up, it doesn’t cover node provisioning, which needs to be configured separately. Specifically, managing node groups involves creating, tweaking, and tuning them outside Kubernetes objects. This makes managing the full setup of CAS and node management with Terraform or other IaC tools more complex and less handy to maintain than Karpenter.
There are a few more advanced features missing on the CAS side:
The Cluster Autoscaler is traditionally a go-to tool for automatically adjusting Kubernetes cluster size based on resource needs. It ensures applications have the necessary resources by monitoring pending pods and scaling up or down accordingly. The CAS’s key features include resource-conscious scaling, leveraging expanders for multiple node groups, and respecting Kubernetes’ PDBs and scheduling constraints.
In this article, we outlined how to install CAS in autodiscovery and manual setup. Even in autodiscovery mode, CAS requires all NodeGroups to be available, which means created and managed by external automation. The manual method makes it even more difficult because you must track those NodeGroups and add them to the CAS configuration.
An alternative way to overcome these obstacles might be to consider Karpenter, the next-generation Kubernetes node autoscaler. In our next article, we outline a comprehensive comparison between CAS and Karpenter.
We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.