Guide

Karpenter Consolidation

Consolidation is a key feature in Karpenter that sets it apart from the traditional cluster autoscaler. Understanding how it fits into the broader Karpenter framework is crucial for fully leveraging Karpenter’s capabilities to optimize workload management on Kubernetes clusters.

This article starts with the basics, explaining core concepts like node provisioning, scheduling, and disruption management. It then covers how consolidation works within Karpenter to make resource allocation more efficient and reduce waste. Finally, we have provided a real-life example to show how an AI startup can use these tools to its advantage.

Summary of Karpenter consolidation concepts #

Concept Description
Karpenter key concepts Karpenter uses dynamic provisioning to optimize cloud resource usage. It relies on three key concepts: node provisioning using NodePools and NodeClasses, scheduling to automate decisions based on cluster demands and constraints and disruption methods to control node shutdown cycles.
Disruption methods in Karpenter Karpenter utilizes a combination of manual and automated methods to manage node lifecycles, prioritizing manual disruptions first and executing automated disruptions sequentially to optimize performance and minimize downtime.
Understanding consolidation in Karpenter Karpenter uses consolidation actions like node deletion and node replacement to optimize Kubernetes cluster resource allocation. The three types of consolidation mechanisms are empty node, single node, and multi-node consolidation.
The mechanism behind consolidation Karpenter consolidates underutilized nodes, taking into account workload, cost, lifespan, and compliance with pod disruption budgets (PDBs) to optimize resource usage and reduce costs.
Practical application scenario We explore how an abstract AI startup optimizes its Kubernetes infrastructure using Karpenter’s consolidation features across three different NodePools, strategically balancing cost efficiency with performance.

Key Karpenter concepts #

Karpenter is an advanced Kubernetes node autoscaler that optimizes the use of cloud resources through dynamic provisioning. It is designed to respond quickly to changes in workload demands by adjusting the number of nodes in a cluster. In order to achieve that, Karpenter employs specific approaches to node provisioning, scheduling, and disruption. Let’s review them.

Node provisioning with Karpenter

Unlike traditional autoscalers, Karpenter proactively assesses application needs and cluster states to provision nodes. This dynamic provisioning can be configured using NodePools and NodeClass CRDs, which are essential elements in Karpenter’s architecture. 

NodePools allow you to define groups of nodes with specific characteristics and scaling behaviors. Some example scenarios may include:

  • Defining taints to limit the pods that can run on nodes Karpenter creates
  • Defining any startup taints to inform Karpenter that it should taint the node initially, but that the taint is temporary
  • Limiting node creation to certain zones, instance types, and computer architectures
  • Setting defaults for node expiration
  • Defining instance types

NodeClasses, on the other hand, define specific configurations, like instance storage(e.g., EBS volumes) and OS settings for the nodes within a NodePool.

Scheduling and disruption

Effective node orchestration requires mechanisms for scheduling and disruption to maintain cluster health and efficiency. Karpenter introduces several concepts to manage these aspects:

  • Scheduling: Karpenter automates the decision-making process about where and when to launch or terminate nodes based on current cluster demands and predefined constraints configured by the cluster administrator. Some of them may include:
    • Resource Requests: Reserves a specific amount of memory or CPU for a pod.
    • Node Selection: Specifies that a pod should run on a node with certain characteristics.
    • Node Affinity: Attracts a pod to run on nodes that match specific attributes, like instance type, availability zone, GPU, etc.
    • Topology Spread: Distributes pods across different NodePools.
    • Pod Affinity/Anti-Affinity: Either attract pods to be scheduled near each other or keep them apart, based on the locations of other pods.
  • Disruption: Disruption management in Karpenter is designed to minimize the impact on applications during node upgrades, replacements, or scaling down operations. Karpenter offers several methods to manage these disruptions, which are categorized into manual and automated groups.

Autonomous Rightsizing for K8S Workloads

Automated vertical autoscaling designed to scale for 100K+ containers

Fully compatible with HPA functionality and cloud-based services

Powered by advanced machine learning with user-controlled guardrails

Disruption methods in Karpenter #

Disruption is one of the most sophisticated features of Karpenter, so let’s dive into it. Karpenter employs a variety of disruption methods to manage node lifecycles within a Kubernetes cluster. These methods fall into two distinct categories: manual and automated.

Manual disruption methods

These involve direct administrator actions, such as manually deleting nodes or NodePools using kubectl:

  • Node Deletion: You can use kubectl to manually remove a single Karpenter node or node claim.
  • NodePool Deletion: In the same way, kubectl can be used to delete specific NodePool custom resources, which will lead to a deletion of the entire pool of nodes.

Additionally, Karpenter enhances the standard manual disruption process in Kubernetes by introducing additional features that ensure thorough cleanup and resource management. 

  1. Adding a Finalizer: Karpenter includes a finalizer to its node termination process. This means that it won't fully delete a node from the Kubernetes cluster until all associated cloud resources are confirmed to be cleaned up, ensuring no orphan resources are left behind.
  2. Resource Cleanup: When a node is manually terminated, Karpenter verifies that all resources related to that node in the cloud provider are properly cleaned up. 

​​These improvements make Karpenter's manual disruption methods more effective and reliable, providing a seamless and cleaner node management experience.

Automated disruption methods

Automated methods are managed by Karpenter itself. These automated processes ensure that nodes are cycled out in accordance with cluster policies and workload demands without constant human intervention. They include:

  • Expiration: Nodes are automatically terminated after a specified duration to ensure that the cluster uses updated and efficient resources.
  • Drift: Automatically detects and handles discrepancies between desired and actual node configurations, so that nodes conform to their specified parameters.
  • Consolidation: Optimizes resource usage by consolidating workloads onto fewer nodes, effectively reducing the cluster size without sacrificing performance.
  • Interruption: Proactively handles potential disruptions from cloud provider maintenance or spot instance termination, minimizing the impact on running applications.

Order of Disruption 

Now, let’s review how these methods are prioritized and applied within the system to maintain optimal cluster operations.

Initially, Karpenter evaluates nodes for any manual disruption commands; if none are found, it proceeds with automated disruptions based on the cluster’s configuration and current state. Then Karpenter disrupts nodes by executing one automated method at a time in this order: expiration, drift, and consolidation. 

If interruption handling is enabled, Karpenter will watch for upcoming involuntary interruption events that would cause disruption to workloads. These interruption events include:

  • Spot interruption warnings
  • Scheduled change health events (maintenance events)
  • Instance-terminating events
  • Instance-stopping events

Stop Setting Kubernetes Requests and Limits

Understanding consolidation in Karpenter #

Karpenter uses consolidation as a strategy to reduce costs and optimize resource allocation within a Kubernetes cluster. This process involves assessing and reallocating nodes based on current demand to enhance efficiency. Within consolidation, Karpenter employs deletion and replacement techniques, prioritizing actions that least disrupt ongoing workloads.

Consolidation action types

  • Node Deletion: Karpenter safely deletes the node If all the pods running on it can be easily moved over to other nodes that have free capacity.
  • Node Replace - Karpenter replaces the old node with the new, cheaper one, in case all the apps can still fit onto the new instance type.

Consolidation mechanisms

Consolidation employs three different strategies designed to identify potential consolidation actions:

  • Empty node consolidation involves removing nodes that are completely unutilized, freeing up resources without affecting running applications. It’s the simplest form of consolidation, aiming to reduce waste by eliminating surplus capacity.
  • Single node consolidation focuses on individual nodes, determining if a node can be deleted or if it needs to be replaced with a more cost-effective option. If deletion disrupts less than replacement, the node is simply removed; otherwise, it is replaced.
  • Multi-node consolidation targets the removal of multiple nodes simultaneously. Karpenter evaluates whether the combined workloads from these nodes can be accommodated by fewer, possibly more cost-effective nodes, effectively condensing the cluster’s footprint.

The mechanism behind consolidation #

Node consolidation in Karpenter assesses various aspects of the cluster’s current state to optimize resource usage and reduce cloud costs. 

Criteria for node consolidation

This decision-making process is influenced by several key criteria:

  • Node utilization: The primary factor in consolidation is the utilization level of each node. Karpenter examines metrics such as CPU and memory usage to determine if nodes are underutilized and could have their workloads transferred to other nodes without impacting performance.
  • Workload characteristics: The nature of the workloads running on the nodes plays a crucial role. Nodes hosting lightweight or less critical workloads are more likely to be consolidated to free up resources. Karpenter also considers the compatibility of workloads (like GPU) when deciding which nodes can be merged.
  • Cost efficiency: Nodes that are more expensive to operate and can be replaced by more cost-effective alternatives are prime candidates for consolidation.
  • Node lifespan and health: Nodes approaching the end of their lifecycle or those that exhibit signs of diminished performance or reliability are also considered for consolidation. 
  • Compliance with pod disruption budgets (PDBs): Karpenter respects Kubernetes PDB settings, which specify the minimum number of replicas that must remain running for a given application during voluntary disruptions. This ensures that consolidation actions do not compromise application availability.

Nodes with low utilization and fewer critical pods are prime candidates for consolidation. This ensures that resources are allocated where they are most needed, maintaining cluster efficiency.

Consolidation and disruption controls

Disruption Budgets

Pod disruption budgets define the minimum number of pods that must remain running during voluntary disruptions. Karpenter respects these budgets, carefully evaluating which nodes can be consolidated without violating these constraints. 

In addition to a standard PDB in Kubernetes, Karpenter introduces the extended tuning of disruption strategies, specifically targeting NodePools with its own disruption budget settings.

Disruption budgets within NodePools allow for more granular control over how nodes within these pools are disrupted. When Karpenter considers node termination, it evaluates the node’s association with its NodePool and adheres to the disruption budgets defined for that pool. This means that even if a node individually could be terminated without violating a pod-specific PDB, it might still be retained if its removal would breach the NodePool’s disruption budget.

The discussions that follow show how disruption control works on the pod and the node levels.

Pod level

To ensure that certain pods remain uninterrupted, you can apply the annotation karpenter.sh/do-not-disrupt: "true" to the pod. The example would look like this:

apiVersion: v1
kind: Pod
metadata:
  name: your-pod-name
  annotations:
    karpenter.sh/do-not-disrupt: "true"
spec:
  containers:
  - name: your-container-name
    image: your-container-image

This informs Karpenter not to voluntarily terminate a node that hosts these marked pods, making it ideal for processes that require running from start to completion without interruptions. This functionality is not limited to pods, but it can also be applied to other workload management resources like Deployments and StatefulSets.

Node level

Karpenter starts by counting the active nodes in a NodePool. It ignores nodes that are being deleted or aren't ready. Karpenter won't disrupt any nodes if the amount of nodes already deleted or unready exceeds a budget.

If a NodePool has multiple disruption limits, Karpenter uses the strictest one. We can use percentages or fixed numbers for setting limits on disruptions:

  • With a percentage, Karpenter figures out how many nodes it can disrupt based on the active node count. It subtracts any nodes already being deleted or not ready. 
  • If you use a fixed number instead, Karpenter just takes that number off the total active nodes. 

By default, Karpenter will use one disruption budget with 10%. This can be configured using spec.disruption.budgets in NodePool. Budgets take into account any nodes that are currently being terminated for any reason. They specifically prevent Karpenter from stopping nodes due to expiration, lack of use, drift, or consolidation. 

Practical application scenario #

Let’s review how what we’ve discussed so far can be applied in practice.

Imagine an AI startup that is leveraging Karpenter to manage nodes for its Kubernetes environment. Based on the nature of the workloads, the architecture team created a design with three different node pools to cover the company’s needs. Each NodePool would leverage a different consolidation strategy to balance satisfying application needs and cost constraints.

Here’s a summary of the NodePools:

  • NodePool 1 (demand-driven consolidation): Perfect for unpredictable workloads, this method changes the number of resources available based on immediate needs. It allows the system to quickly adapt, increasing resources during busy times and decreasing them when demand drops, optimizing both performance and cost.
  • NodePool 2 (cost and performance metrics consolidation): This scenario adjusts resources in real-time, focusing on getting the best performance for the lowest cost. It ensures that only the most efficient and cost-effective resources are used, adjusting quickly to changes in costs and demand.
  • NodePool 3 (time-based consolidation for AI/ML model training): This approach plans resource use around the clock to match with training schedules, making sure resources are fully used when needed and saved when not. It adjusts resources to ramp up right before intensive computations and scales down after to cut costs.

Let’s now look at each of these in more detail.

Demand-driven consolidation NodePool

This configuration allows the NodePool to react quickly to changing demands, optimizing resource allocation quickly, which is crucial for maintaining high performance and operational efficiency in environments with highly variable data processing needs.

The layout of the NodePool manifest is as shown below:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: adaptive-demand-nodepool
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    budgets:
    - nodes: "25%"  # Allows flexibility to disrupt up to 25% of nodes under fluctuating demand
      reasons:
        "Empty"
        "Drifted"
    - nodes: "3"    # Ensures no more than 3 nodes can be disrupted at any one time
      reasons:
        "Empty"
        "Drifted"
    - nodes: "40%"  # Extended allowance during anticipated low-demand periods
      reasons:
        "Underutilized"
      duration: 4h
      schedule: "0 4 * * *"  # Scheduled during typical low-demand hours at 4 AM
    - nodes: "0"
      duration: 8h
      schedule: "0 9 * * 1-5"  # Prevents disruptions during peak business hours from 9 AM to 5 PM on weekdays

Explanation:

  • Consolidation policy: Set to WhenUnderutilized to ensure that node resources are consolidated based on real-time utilization metrics, which helps maintain efficiency during unpredictable workloads.
  • Budgets: The first budget of 25% disruption allowance provides significant flexibility to adapt to sudden spikes or drops in demand. The second condition caps the disruption to three nodes, which helps maintain a baseline level of service regardless of demand fluctuations. 
  • Scheduled budgets: The third budget targets known low-demand hours, allowing further consolidation that can be reversed if demand increases unexpectedly. The last budget setting prevents any consolidation during core operating hours on weekdays, safeguarding critical operational times from potential disruptions.

Cost and performance metrics consolidation NodePool

This configuration maximizes the financial efficiency of the NodePool by adjusting to both performance needs and cost considerations, which is particularly useful in continuous, high-demand environments like AI/ML data processing.

The layout of the NodePool manifest is as shown below:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: cost-performance-optimized-nodepool
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    budgets:
    - nodes: "30%"  # Allows up to 30% of nodes to be disrupted during low-cost periods
      reasons:
        "Empty"
        "Drifted"
    - nodes: "5"    # Maximum of 5 nodes can be disrupted at any given time
        "Empty"
        "Drifted"
    - nodes: "40%"  # Extended budget during off-peak hours for further cost savings
      reasons:
        "Underutilized"
      duration: "6h"
      schedule: "0 1 * * *"  # Scheduled during the lowest cost hours at 1 AM
    - nodes: "0"
      schedule: "@daily"
      duration: "10m"  # Prevents any disruptions during the first 10 minutes of each day

Explanation:

  • Consolidation policy: Set to WhenUnderutilized to actively consolidate nodes that are not fully utilized, ensuring efficient use of resources.
  • Budgets: The first budget allows for a flexible disruption of up to 30% of nodes, targeting times when it is most cost-effective. The second limit caps the number of nodes that can be disrupted at any time to five, preventing excessive downtime. 
  • Scheduled disruptions: The third budget takes advantage of off-peak hours (1 AM) when energy costs or usage charges might be lower, pushing for up to an additional 10% consolidation. The last budget blocks any disruptions during the first 10 minutes of the day, a critical period often used for routine checks or backups.

Time-based consolidation NodePool for model training

This setup is designed to optimize node usage around a predictable training schedule, reducing resources when they are not needed and preparing them just in time for scheduled training tasks, providing robust support during active training periods and minimizing costs during downtime.

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: time-based-model-training-nodepool
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h # Nodes expire after 30 days
    budgets:
    - nodes: "20%"  # Allows disruptions of up to 20% of nodes under normal operations
    - nodes: "0"
      duration: "1h"
      schedule: "55 23 * * *"  # Prevent disruptions 5 minutes before midnight, preparing for night tasks
    - nodes: "0"
      duration: "2h"
      schedule: "0 7 * * *"  # Prevent disruptions at 7 AM, post night tasks to retain results
    - nodes: "0"
      duration: "2h"
      schedule: "0 18 * * *"  # Prevent disruptions at 6 PM when evening tasks begin

Explanation:

  • Consolidation Policy: Set to WhenUnderutilized, aiming to consolidate resources when they’re not fully utilized according to the defined thresholds.
  • Budgets: The first budget allows for a general operational flexibility of up to 20% node disruption, which is suitable for non-peak hours. The subsequent budgets are specifically configured to ensure that no disruptions occur just before and during critical processing times identified: 5 minutes before midnight, 7 AM, and 6 PM. These times are likely aligned with the start and end of heavy computational tasks, ensuring that resources are fully available when most needed.
  • Expire after: Setting the node expiration to 30 days allows for regular cycling of nodes to utilize newer, more efficient resources or simply to adhere to compliance with security patches and updates.

Experience StormForge in a sandbox – no email required

Conclusion #

Karpenter’s consolidation feature is a powerful tool for managing Kubernetes clusters more efficiently, particularly when it comes to optimizing resource allocation and reducing unnecessary expenses. By integrating node provisioning, scheduling, and disruption management with advanced consolidation techniques, Karpenter offers a robust solution for dynamic resource management. 

The example with the AI startup illustrates just how impactful these capabilities can be in a practical setting, demonstrating that Karpenter is not just about maintaining performance but also about enhancing operational efficiency in real-world applications. This makes it an invaluable tool for any organization looking to intelligently optimize its cloud infrastructure.

We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.