Karpenter is an open-source Kubernetes autoscaling project that was donated to the Cloud Native Computing Foundation (CNCF) by Amazon. The project aims to optimize how Kubernetes clusters scale their worker nodes to maximize resource efficiency and cost optimization, especially compared to traditional autoscaling tools like the Cluster Autoscaler.

Karpenter provides several valuable features for cluster administrators, such as supporting dynamic instance types, gracefully handling interrupted instances (like during spot termination), and enabling faster pod scheduling time. This article provides Kubernetes administrators with a comprehensive overview of Karpenter’s architecture and benefits, installation guidance, and best practices for effectively leveraging Karpenter.

Summary of key EKS Karpenter concepts #

Concept	Description
What is Karpenter?	Karpenter provides a modern approach to autoscaling Kubernetes nodes. It interacts directly with cloud provider APIs, enabling responsive and flexible instance provisioning while leveraging native cloud provider features like spot instances. Karpenter also provides intelligent features for optimizing resource efficiency and reducing costs.
Architecture of Karpenter	Karpenter implements a Kubernetes-native approach to node provisioning, providing NodePool and NodeClass objects for administrators to manage their node resources completely via YAML.
Getting started with Karpenter	Karpenter can be easily set up with simple tools like eksctl and Helm to provision an EKS cluster and install the Karpenter project. Testing the setup is done by deploying new pods and verifying that new nodes are automatically launched by Karpenter.
Best practices for Karpenter	There are a few best practices administrators must follow to leverage Karpenter’s features, such as enabling multiple replicas, creating dashboards and alerts, annotating, tailoring NodePool and NodeClass objects, and accurately rightsizing pod resource allocations.

What is Karpenter? #

Karpenter is a modern Kubernetes autoscaler developed to tightly integrate with cloud providers, enabling more intelligent node autoscaling for Kubernetes clusters. It is designed to overcome the limitations of traditional tools like the Cluster Autoscaler and provide a flexible and powerful solution for provisioning worker nodes while ensuring proper resource allocation.

A highlight of Karpenter is its direct integration with the APIs of cloud providers like AWS EC2, enabling it to make precise decisions quickly in response to cluster events. A backlog of pods stuck in the Unschedulable status will cause Karpenter to analyze hundreds of available instance types to launch a worker node matching exactly the requirements of the incoming pods. Examples of these requirements include resource capacity, availability zone selection, operating system types, and spot/on-demand instance mixtures.

Karpenter monitors for interruption events such as spot instance termination or upcoming EC2 maintenance via AWS APIs and then quickly launches replacement instances to ensure that pods are rescheduled with minimal downtime. This contrasts with the traditional Cluster Autoscaler project, which is limited to a static set of instance types, cannot gracefully handle instance interruption, and causes pod scheduling delays due to its reliance on AutoScalingGroup resources to handle instance creation instead of directly leveraging the EC2 API.

The dynamic instance support from Karpenter is a crucial feature for administrators struggling to rightsize their clusters and avoid costly wasted resources. Karpenter’s ability to launch instance types that precisely suit workload requirements enables a cluster to accurately rightsize its resource utilization, ensuring that there is no wasted compute capacity or cost inefficiencies. The project also allows workload consolidation. It will continuously analyze node capacity and pod requirements to dynamically terminate/replace instances, allowing it to rightsize the cluster and ensure that resources aren’t being wasted—all without impacting pod performance.

Karpenter is cloud provider agnostic and currently supports AWS and Azure.

Karpenter Architecture #

Understanding how Karpenter works under the hood will help you better grasp how scaling and scheduling decisions are made, how to better leverage Karpenter’s features, and how to troubleshoot potential problems.

Karpenter is a Kubernetes controller; these are applications that monitor the Kubernetes API server to watch the state of the cluster’s objects and react to particular events. Karpenter watches a few specific objects—like NodePools and NodeClasses (which we will discuss in more detail shortly), as well as pod objects—to determine how to perform its autoscaling responsibilities.

To understand these objects and why Karpenter monitors them, let’s step through the workflow of some typical scaling operations. There are four primary components to Karpenter’s operations, which we explore in detail below.

Autonomous Rightsizing for Kubernetes Workloads

Learn More

Automated vertical autoscaling designed to scale for 100K+ containers

Fully compatible with HPA functionality and cloud-based services

Pods deployed to a Kubernetes cluster will cause a component called the Kubernetes Scheduler to decide which worker node will host the pod. This decision takes into account the pod’s requirements (like CPU/memory demands) and available node capacity. If there aren’t any available nodes that can host the pod—such as when there aren’t enough memory resources available—the pods will be stuck in an Unschedulable status. The trimmed snippet below shows a pod that is flagged as Unschedulable due to insufficient node memory:

kubectl get --output yaml pod pod-name

status:
	phase: Pending
	conditions:
	- type: PodScheduled
	  status: "False"
	  reason: Unschedulable # Pod is marked as Unschedulable.
	  message: '0/2 nodes are available: 2 Insufficient memory. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.'
	  lastTransitionTime: "2024-03-02T08:03:35Z"
	  lastProbeTime: null

Karpenter continuously monitors the API server to find pods flagged as Unschedulable. It will then determine how to get these pods scheduled to a new node. This first step is similar to how the Cluster Autoscaler operates by observing Unschedulable pods to determine whether node scaling is necessary.

Evaluating pod constraints

Karpenter’s goal is to launch enough worker nodes to satisfy the requirements of any Unschedulable pods. However, Karpenter must evaluate all constraints set in the pod’s attributes to determine what instance configuration to launch. This takes into account the following factors:

Resource requirements such as CPU and memory
Availability zone requirements that may be defined via TopologySpreadConstraints
Pod affinity / anti-affinity configurations
Node affinity configurations that may request nodes with specific labels
Taints and tolerations

Karpenter collects all constraints specified by the Unschedulable pods to determine what kind of instance it can launch to fit the pod and in which availability zone. A common challenge here is ensuring that the CPU and memory requirements are configured correctly for the pod: Misconfigured values will cause Karpenter to select inaccurate instance configurations, leading to wasted resources and excessive costs.

Provisioning worker nodes

Karpenter will now need to match the pod constraints with a set of node available node configurations. There are two Karpenter-specific object types we’ll explore here to understand how nodes are configured.

NodePools

These objects define the desired configuration of worker nodes provisioned by Karpenter. The configuration includes things like taints, labels, and instance attributes like spot and GPU hardware. Earlier versions of Karpenter called this object a “Provisioner”, which was deprecated when Karpenter graduated to beta.

Administrators can create multiple NodePool objects to organize their nodes neatly based on separate use cases. For example, nodes belonging to different teams in an organization may have team-specific taint and label configurations, which can then be leveraged by pod affinity rules to ensure that each team’s pods are only scheduled to their own NodePools. When Karpenter attempts to provision a node for Unschedulable pods, it will select an appropriate NodePool where the configuration of the NodePool and the pod match.

Here is an example of a NodePool object that specifies constraints on the operating system (Linux) and a taint:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
	name: general-purpose
	annotations:
		kubernetes.io/description: "General Purpose"
spec:
	template:
		metadata:
			labels:
				nodepooltype: general-purpose
		spec:
			taints:
				- key: example.com/taint
				effect: NoSchedule
				requirements:
				- key: kubernetes.io/os
				operator: In
				values: ["linux"]
				nodeClassRef:
					name: default

NodeClasses

The NodeClass object is the second object administrators must configure for Karpenter—you can see in the example above that a NodeClass is referenced in the NodePool resource. The NodeClass object holds cloud-provider-specific constraints, which will be applied along with the NodePool constraints when Karpenter launches a Node. The schema for a NodeClass will vary depending on what cloud provider Karpenter is deployed to.

Here is an example of an AWS-specific NodeClass object that defines basic subnet, security group, and IAM role settings:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
	name: default
spec:
	amiFamily: AL2
	subnetSelectorTerms:
	- tags:
		karpenter.sh/discovery: "cluster-1"
	securityGroupSelectorTerms:
	- tags:
		karpenter.sh/discovery: "cluster-1"
	role: "KarpenterNodeRole-cluster-1"

The NodeClass object on AWS also supports configuring UserData, tags, BlockDeviceMappings, and MetadataOptions.

Provisioning Nodes

Once Karpenter has matched the pod’s constraints with a particular NodePool, it will use the EC2 API to launch an EC2 instance with the constraints defined in the NodePool and NodeClass objects. EC2 instances are launched quickly by leveraging the EC2 API directly instead of the Cluster Autoscaler’s approach of updating AutoScalingGroup configurations, which involves extra steps that introduce provisioning delays. The Unschedulable pods will schedule to the new node once it joins the cluster, and the pods will no longer be flagged as Unschedulable. Karpenter will continuously poll the API server for Unschedulable pods and run through the above workflow whenever additional nodes are required.

Advantages of using NodePools and NodeClasses

A significant benefit of the above Kubernetes objects is that the entire cluster’s worker node configuration is managed through Kubernetes YAML manifests. This approach allows administrators to leverage capabilities like version-controlling the node configuration, implementing a GitOps strategy, and using native Kubernetes security controls like role-based access control (RBAC) and API server audit logs.

Traditional tools like the Cluster Autoscaler require administrators to additionally set up AutoScalingGroups, launch templates, and manage node group objects to enable the autoscaling functionality. This introduces complexity for the administrator, who must deploy and manage several pieces of infrastructure to facilitate autoscaling instead of just relying on Karpenter to manage the setup centrally with a Kubernetes-native approach.

Consolidation

Once pods are in the Running state, Karpenter will continue looking for efficiency improvement opportunities. It will regularly evaluate active pods and node utilization across the entire cluster to determine if pods can be consolidated into fewer nodes, enabling Karpenter to terminate unused nodes to reduce costs. Karpenter also continuously evaluates the instance size configuration across the cluster to determine whether instances can be replaced with smaller, cheaper ones based on real-time workload requirements. Consolidation performed by Karpenter is more intelligent than the Cluster Autoscaler because the former takes into account the entire cluster’s node utilization to determine scale-down actions. The Cluster Autoscaler only looks at individual node utilization which is less effective for accurately binpacking pods.

Recently, Karpenter has also rolled out a spot-to-spot consolidation feature. This functionality actively tracks spot market prices, enabling Karpenter to replace instances with more cost-efficient alternatives based on dynamic pricing data while carefully balancing the expected interruption rate. Karpenter further enhances spot support by monitoring AWS APIs to proactively detect spot interruption warnings (and other EC2 maintenance-related events), enabling it to launch replacement nodes to reschedule pods immediately before downtime occurs. Traditional projects like the Cluster Autoscaler require the setup of additional tools to allow this type of functionality.

Review of Karpenter architecture

In summary, the four components for Karpenter’s operations are as follows:

Watching for unschedulable pods: Karpenter monitors the API server for pods in the Unschedulable state that need to be scheduled.
Evaluating pod constraints: Pod constraints such as affinity and resource requirements are evaluated by Karpenter, which simulates scheduling logic to determine the optimal nodes required to host the Unschedulable pods.
Provisioning worker nodes: New EC2 worker nodes are deployed that match the Unschedulable pod’s requirements.
Consolidation: Karpenter continuously examines node utilization to terminate excess capacity, launching rightsized replacement nodes if necessary.

Administrators can significantly reduce waste and optimize costs through Karpenter’s automatic instance rightsizing features during both initial scheduling and consolidation. However, the effectiveness of Karpenter’s features hinges on the accurate configuration of pod resource allocations, specifically CPU and memory, via the pod’s “Requests” fields. If these allocations are not set correctly, Karpenter’s node rightsizing analysis may lead to inefficient resource use through overallocation (resulting in unnecessary costs) or underallocation (leading to performance issues). Therefore, it’s crucial for administrators to meticulously configure pod resource allocations to fully benefit from Karpenter’s capabilities.

Karpenter’s architecture represents a significant advancement in Kubernetes autoscaling options, offering a more efficient and cost-effective solution than traditional tools like the Cluster Autoscaler. By leveraging Kubernetes-native objects like NodePools and NodeClasses, Karpenter simplifies the deployment and management of node resources and allows for precise node provisioning based on real-time workload demands.

Getting started with Karpenter on EKS #

This section will show you how to install Karpenter in an EKS cluster to enable experimentation and further learning. The tutorial will guide you through creating an EKS cluster, enabling Fargate support, installing the Karpenter tool, and testing the node autoscaling functionality. We also have a video tutorial, if you prefer.

Prerequisites

There are a few tools we’ll install to follow this tutorial. Follow the links to view installation tutorials for each tool:

The tutorial will assume you have IAM permissions to create AWS resources.

Configure environment variables

First, we’ll define some important settings, such as the Karpenter version, Kubernetes version, and the region where the cluster will be deployed. Adjust these settings carefully based on your requirements:

export KARPENTER_NAMESPACE="kube-system"
export KARPENTER_VERSION="0.35.0"
export K8S_VERSION="1.29"
export AWS_PARTITION="aws"
export CLUSTER_NAME="karpenter-demo"
export AWS_DEFAULT_REGION="us-west-2"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export TEMPOUT="$(mktemp)"

Stop Setting Kubernetes Requests and Limits

Learn How

Deploy EKS cluster

The following script will deploy an EKS cluster using the eksctl tool and set up some Karpenter-related resources, like the EventBridge rules for monitoring spot interruption events. The script will enable Fargate support, which is how we’ll run Karpenter with a serverless approach for simplicity. We need a worker node running for hosting our Karpenter pods, and leveraging Fargate simplifies this initial bootstrapping step. IAM roles are configured automatically to allow Karpenter the appropriate IAM permissions to manage our EC2 instances. No changes are required to the script below, but administrators are encouraged to review the script to understand what is being executed by eksctl and how the ClusterConfig resource works:

# Download the cloud formation template to create a KarpenterNodeRole curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > "${TEMPOUT}"

# Create the KarpenterNodeRole
aws cloudformation deploy \
--stack-name "${CLUSTER_NAME}" \
--template-file "${TEMPOUT}" \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides "ClusterName=${CLUSTER_NAME}"

# Create a new EKS cluster
eksctl create cluster -f -  <<EOF
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
	name: ${CLUSTER_NAME}
	region: ${AWS_DEFAULT_REGION}
	version: "${K8S_VERSION}"
	tags:
		karpenter.sh/discovery: ${CLUSTER_NAME}
iam:
	withOIDC: true
	serviceAccounts:
	- metadata:
		name: karpenter
		namespace: "${KARPENTER_NAMESPACE}"
	  roleName: ${CLUSTER_NAME}-karpenter
	  attachPolicyARNs:
	  - arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}
iamIdentityMappings:
	- arn: "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}"
	  username: system:node:{{EC2PrivateDNSName}}
	  groups:
	  	- system:bootstrappers
	  	- system:nodes
fargateProfiles:
	- name: karpenter
	  selectors:
	  - namespace: "${KARPENTER_NAMESPACE}"
EOF

# Retrieve the new cluster's details export CLUSTER_ENDPOINT="$(aws eks describe-cluster --name "${CLUSTER_NAME}" --query "cluster.endpoint" --output text)" export KARPENTER_IAM_ROLE_ARN="arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/${CLUSTER_NAME}-karpenter"

# Print the cluster details to the screen echo "${CLUSTER_ENDPOINT} ${KARPENTER_IAM_ROLE_ARN}"

aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true

The output of the above should indicate that cluster creation was completed successfully. If you encounter problems (such as IAM errors), refer to AWS’s documentation on troubleshooting cluster creation.

Deploy Karpenter

We’re ready to deploy Karpenter to our new cluster; use the Helm command below to begin the installation. There are additional flags available to customize the installation configuration:

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
	--set "settings.clusterName=${CLUSTER_NAME}" \
	--set "settings.interruptionQueue=${CLUSTER_NAME}" \
	--set controller.resources.requests.cpu=1 \
	--set controller.resources.requests.memory=1Gi \
	--set serviceAccount.create=false \
	--set serviceAccount.name=karpenter \
	--wait

You can see Karpenter deployed by running:

kubectl describe deployment karpenter --namespace "${KARPENTER_NAMESPACE}"

Next, we need to set up some NodePools and NodeClass resources to define our desired worker node configuration. Running the command below will create two NodePools and a NodeClass object, with each NodePool containing a team-based taint. The example will demonstrate how we can set up multiple NodePools for different teams and segregate each workload. The subnet and security groups associated with the EC2 instances are automatically selected based on existing tags, which were all created by eksctl in the previous steps.

cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
	name: team-1
spec:
	template:
		spec:
			taints:
				- key: team-1-nodes
				  effect: NoSchedule
			requirements:
				- key: kubernetes.io/arch
				  operator: In
				  values: ["amd64"]
			nodeClassRef:
				name: default
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool metadata:
	name: team-2
spec:
	template:
		spec:
			taints:
				- key: team-2-nodes
				  effect: NoSchedule
			requirements:
				- key: kubernetes.io/arch
				  operator: In
				  values: ["amd64"]
			nodeClassRef:
				name: default
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
	name: default
spec:
	amiFamily: AL2 # Amazon Linux 2
	role: "KarpenterNodeRole-${CLUSTER_NAME}"
	subnetSelectorTerms:
		- tags:
			karpenter.sh/discovery: "${CLUSTER_NAME}"
	securityGroupSelectorTerms:
		- tags:
			karpenter.sh/discovery: "${CLUSTER_NAME}"
EOF

With the steps above completed, the cluster is ready to begin scheduling new pods. Let’s deploy some to validate that Karpenter is working correctly.

Deploy workloads

So far, no EC2 instances have been deployed to our cluster since our Karpenter pods are running on serverless Fargate nodes. We can now test whether Karpenter can provision some EC2 nodes by deploying new pods.

We will deploy two pods with the below YAML. Each pod will have a separate toleration, allowing the pods to only schedule to one of the NodePools we defined above. When these two pods are deployed, we expect to see the “team-1-nginx” pod schedule to the “team-1” NodePool, and the “team-2-nginx” pod schedule to the “team-2” NodePool. The example demonstrates how we can configure multiple NodePools to segregate different types of workloads based on taints and tolerations, and rely on Karpenter to launch appropriate nodes to schedule the desired pods:

kubectl apply -f -
<<EOF apiVersion: v1
kind: Pod
metadata:
	name: team-1-nginx
spec:
	containers:
	- name: nginx
	  image: nginx
	  resources:
	  	requests:
	  		cpu: "0.5"
	  		memory: 300Mi
	tolerations:
	- key: "team-1-nodes"
	  operator: "Exists"
	  effect: "NoSchedule"
---
apiVersion: v1
kind: Pod
metadata:
	name: team-2-nginx
spec:
	containers:
	- name: nginx
	  image: nginx
	  resources:
	  	requests:
	  		cpu: "0.5"
	  		memory: 300Mi
	tolerations:
	- key: "team-2-nodes"
	  operator: "Exists"
	  effect: "NoSchedule"
EOF

Karpenter will see the pods above stuck in the Unschedulable status since there are no EC2 instances to host them. It will react by provisioning two new nodes to fit our new pods, which we can see as follows:

kubectl logs deploy/karpenter --namespace "${KARPENTER_NAMESPACE}"
# "message":"found provisionable pod(s)"
kubectl get nodes
# ip-192-168-4-xx.us-west-2.compute.internal Ready v1.29.0
# ip-192-168-8-xx.us-west-2.compute.internal Ready v1.29.0
kubectl get pods
# team-1-nginx Running
# team-2-nginx Running

The test above demonstrates that Karpenter is successfully provisioning new worker nodes to host our pods, while respecting the taint and tolerations configuration. You can continue experimenting by extending the NodePools and NodeClass with more granular settings and deploying new workloads with varying constraints (like pod affinity) to observe Karpenter’s provisioning behavior. Deleting the pods will result in Karpenter automatically terminating the excess nodes.

Best practices for Karpenter #

Administrators can implement a few key best practices to fully leverage the value of Karpenter.

Enable multiple replicas

Karpenter supports a Kubernetes-native feature called leader elections, which allows multiple controller replicas to run in parallel without conflicting (only one makes decisions while the other is on standby). The standby replica enables high availability by taking over responsibilities if there’s a failure in the active replica. Karpenter’s Helm chart enables two replicas by default, and in a production cluster, this minimum value should be maintained.

Create dashboards and alerts

Karpenter exposes many Prometheus metrics by default, which can be scraped by any Prometheus-compatible monitoring tool. Administrators of production clusters should create dashboards and alerts for Karpenter’s metrics to allow for visibility into issues like node provisioning failures. The blast radius of broken autoscaling is significant, so enabling appropriate observability is critical in production. Metrics will also provide insight into whether the Karpenter pods need more CPU/memory resources to prevent autoscaling bottlenecks, especially in large clusters.

Consider implementing annotations

When deploying critical pods that must not be interrupted until the job is completed, Karpenter supports a valuable annotation to ensure that it cannot terminate the job’s worker node as part of consolidation efforts. Consider implementing this annotation when deploying pods that cannot handle the interruption. The annotation can also be applied to a worker node object to ensure that the node won’t be consolidated, which may be helpful if the administrator needs to keep a particular node online (such as to gather logs or other data).

kubectl annotate pod pod-name karpenter.sh/do-not-disrupt='true'
kubectl annotate node node-name karpenter.sh/do-not-disrupt='true'

Carefully configure NodePool and NodeClass objects

These objects support many configuration parameters that will impact the shape of your cluster. Values should be carefully evaluated to ensure that the desired node configurations are being deployed and are suitable for your workload’s use cases.

A key feature of NodePool resources is the “weights” attribute. It is possible to set up multiple NodePools that are compatible with your workload while setting priorities for Karpenter to respect. For example, a common use case will be setting a spot instance or reserved instance configuration for a high priority NodePool for cost optimization purposes. Setting the “weight” attribute to a high value will tell Karpenter to prioritize deploying the desired spot or reserved instance NodePool. If these instance types are unavailable or have reached a limit, Karpenter can fallback to a lower weight NodePool which might contain regular on-demand instances. Enabling multiple NodePools allows you to prioritize a desired configuration while still ensuring fallbacks are available to avoid blocking pod scheduling.

Accurately set pod CPU/memory values

This step is crucial for enabling Karpenter to correctly determine instance rightsizing choices. Without accurate resource values, Karpenter cannot select the correct instance types, consolidate them to reduce waste, and optimize costs.

Administrators typically need help with rightsizing pod resource allocations, especially in large clusters with a variety of different workloads deployed. The challenge for administrators is analyzing each pod’s historical utilization to determine optimal resource allocation, continuously performing this analysis to keep allocations up-to-date, and performing this analysis with large-scale clusters.

Oversized resources will cause waste, while undersized resources will create performance bottlenecks, so implementing an automation tool to reduce the operational overhead for administrators and improve the accuracy of resource allocation is recommended. StormForge Optimize Live can assist administrators in addressing these needs as an HPA-compatible automated rightsizing solution powered by machine learning algorithms, removing the need to manually set pod resource limits and requests.

Experience StormForge in a sandbox – no email required

Access Sandbox

Conclusion #

Karpenter represents a significant advancement for Kubernetes autoscaling, offering instance configuration flexibility, cost optimization features, and the ability to manage instances with Kubernetes-native objects. By understanding the architecture and design of Karpenter, it is easy to see how its approach to autoscaling and consolidation can bring value to any EKS cluster. Following the installation tutorial described above allows administrators to get hands-on experience configuring Karpenter and will be a starting point for further experimentation.

Leveraging Karpenter’s features will require following best practices in terms of Karpenter’s configuration, the NodePool’s constraints, implementing observability, and accurately defining pod resource allocations to enable precise scaling decisions. Administrators leveraging Karpenter’s rightsizing capabilities can consider testing an automation solution for rightsizing pod CPU/memory resources with StormForge. By using StormForge and Karpenter together, administrators will benefit from precise cluster-wide rightsizing and cost optimization, ensuring that resources aren’t wasted and that workload performance is maximized. You can try out StormForge for free.

EKS Karpenter: A Deep Dive and Best Practices