Guide
Chapter 5 - Advanced Autoscaling in Kubernetes with KEDA
The Kubernetes Horizontal Pod Autoscaler (HPA) is a foundational component for autoscaling that can be enhanced when used with Kubernetes Event-Driven Autoscaling (KEDA). As an advanced open-source project, KEDA builds on the HPA to provide significantly more flexibility, easy-to-use options for various metrics out of the box, and the important ability to scale applications to zero.
KEDA configures the HPA to manage pod activity effectively, ensuring responsiveness remains consistent while accommodating dynamic environments. This makes KEDA particularly useful for workloads that see fluctuating traffic or spikes due to events.
KEDA offers a more nuanced approach to autoscaling that aligns with modern, event-driven application demands. KEDA was designed to extend the horizontal autoscaling capabilities of Kubernetes, enabling precise and more efficient scaling decisions. Its scaling actions are based on various event sources and metrics, addressing the critical challenges that HPA faces in diverse operational contexts.
The HPA primarily scales applications based on CPU and memory metrics by adjusting the number of replicas based on demand. While the HPA supports scaling with other metrics, such as network traffic or custom indicators, configuring these is complex and not user-friendly. It’s crucial to understand that while you can set the HPA to use these advanced metrics, the setup requires careful configuration and a strong grasp of Kubernetes internals. This traditional approach also relies on maintaining static limits, potentially leading to resource overprovisioning or shortages during unexpected demand spikes, which affects both application performance and cost efficiency.
There is a growing need for more sophisticated scaling solutions capable of responding to external events and a broader array of metrics. For example, while scaling on CPU utilization might be easy, it’s often not well correlated with the actual load on the application, so replicas don’t actually get added or removed when they need to be.
In scenarios like flash sales or viral social media events, where traffic surges are sudden and unpredictable, the HPA struggles to scale effectively. These instances require a more responsive scaling mechanism, which is where an event-driven solution like KEDA comes into play. KEDA is designed to address these specific challenges by enabling Kubernetes to react not just to changes in CPU or memory metrics, but also to a multitude of external events, thereby providing a more fluid and efficient scaling response.
KEDA is a lightweight, open-source tool that enhances Kubernetes’ horizontal autoscaling capabilities by integrating event-driven scaling, allowing applications to respond more dynamically to real-world demands. KEDA functions as a bridge between Kubernetes workloads and various event sources, enabling more efficient resource management.
KEDA allows for scaling based on a multitude of external triggers, such as messages in a queue, workload in databases, or events in a stream. This capability extends beyond conventional CPU and memory metrics, facilitating scaling actions that are precisely aligned with actual usage patterns and demand spikes. By doing so, KEDA optimizes resource utilization and cost efficiency, which makes it crucial for managing cloud-native applications in variable traffic environments.
The core of KEDA’s functionality lies in its architecture, which includes custom resource definitions (CRDs) such as ScaledObjects and ScaledJobs. These CRDs allow developers to define scaling rules based on external events, seamlessly integrating with existing Kubernetes ecosystems. By leveraging a wide range of supported event sources, KEDA empowers developers to build more responsive and cost-effective applications, representing a significant advancement in cloud-native application scaling.
KEDA distinguishes itself from Kubernetes’ Horizontal Pod Autoscaler (HPA) by introducing two new crucial components:
These features facilitate direct interaction with a broad array of event sources, streamlining the autoscaling process. Unlike HPA, which may require additional setup and customization for similar capabilities, KEDA’s built-in functionalities offer a ready-to-use solution for event-driven scaling.
Consider the use case of scaling based on messages in an AWS SQS queue. With HPA, this scenario involves using an adapter like k8s-cloudwatch-adapter, which takes data from SQS and feeds it to the Kubernetes metrics-server. KEDA simplifies this by directly supporting AWS SQS as an event source, allowing for straightforward setup without the overhead of developing custom solutions.
KEDA does not operate its own standalone metrics server like the default Kubernetes Metrics Server; instead, it acts as a metrics adapter. KEDA introduces an additional layer that allows the Kubernetes HPA to scale applications based on a variety of external metrics not typically available to the HPA. This includes metrics from various event sources, which KEDA fetches and exposes via the external.metrics.k8s.io API. The metrics can be queried using kubectl commands that are targeted toward this custom API endpoint provided by KEDA.
To support a wide array of use cases, KEDA’s metrics sources can be grouped into three primary categories based on their nature:
KEDA employs distinct authentication mechanisms tailored to each category of metrics sources:
All cases are managed securely within the Kubernetes environment, facilitating a seamless and secure connection to these external metrics sources. KEDA provides a few secure patterns to manage authentication flows:
KEDA enhances Kubernetes autoscaling with a few notable features:
KEDA’s scale-down-to-zero functionality utilizes scaling jobs to efficiently manage resource allocation by deactivating idle pods when no active tasks are detected. This process is enabled through event monitoring, which adjusts the pod count to zero when there is a lack of demand, conserving resources when they are not needed.
KEDA employs a unique method to handle scaling to zero, which involves bypassing the typical behavior of the HPA that it manages. When no events are detected that would trigger scaling, KEDA scales the deployment down to zero replicas. This is in contrast to the usual HPA behavior where at least one replica would typically remain active. Once events occur that meet the defined triggers, KEDA then instructs the HPA to scale up from zero accordingly, effectively reactivating the pods as needed.
This capability offers superior resource efficiency compared to the HPA, which only scales down to a ”non-zero” minimum configured pod count. HPA’s limitation in not scaling down completely means there is always some level of resource consumption, even when it’s unnecessary. KEDA significantly reduces costs and optimizes cloud resource management by entirely eliminating unnecessary resource usage when idle.
An example of this functionality can be seen in event-driven applications, such as those processing queue messages. A system using KEDA can monitor a message queue and scale to multiple pods when messages are detected, scaling back down to zero when the queue is empty. This ensures that resources are only used when absolutely necessary, exemplifying KEDA’s efficient approach to scaling in modern cloud architectures.
Two distinct features that enhance KEDA’s flexibility in autoscaling are its ability to integrate custom scalers and its scheduling capabilities using a cron trigger.
KEDA allows the implementation of custom scalers, providing full control over the metrics that drive autoscaling decisions. While version 2 of the HPA can indeed scale on any custom metric, setting this up can be challenging and complex. In contrast, KEDA offers a wide range of out-of-the-box scaling options that simplify integration with various metrics sources. This makes KEDA particularly valuable for environments needing to scale based on non-standard or external metrics efficiently.
KEDA also introduces scheduling capabilities, such as scaling with a cron trigger, which adds another layer of flexibility. This feature allows users to define scaling actions at specific times, accommodating predictable workload variations, such as increased load during business hours or special events. For instance, an e-commerce platform could schedule additional resources in anticipation of high traffic during a promotional campaign, ensuring optimal performance when it matters most.
Before we look at a scenario to implement, it’s useful to understand how KEDA lets you specify the Kubernetes Deployment or StatefulSet to scale based on specific triggers. This functionality is implemented using the ScaledObject Custom Resource definition.
The specification below uses a ScaledObject to define how KEDA should scale your application and what the triggers are:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: {scaled-object-name}
annotations:
scaledobject.keda.sh/transfer-hpa-ownership: "true" # Optional. Use to transfer an existing HPA ownership to this ScaledObject
validations.keda.sh/hpa-ownership: "true" # Optional. Use to disable HPA ownership validation on this ScaledObject
autoscaling.keda.sh/paused: "true" # Optional. Use to pause autoscaling of objects explicitly
spec:
scaleTargetRef:
apiVersion: {api-version-of-target-resource} # Optional. Default: apps/v1
kind: {kind-of-target-resource} # Optional. Default: Deployment
name: {name-of-target-resource} # Mandatory. Must be in the same namespace as the ScaledObject
envSourceContainerName: {container-name} # Optional. Default: .spec.template.spec.containers[0]
pollingInterval: 30 # Optional. Default: 30 seconds
cooldownPeriod: 300 # Optional. Default: 300 seconds
idleReplicaCount: 0 # Optional. Default: ignored, must be less than minReplicaCount
minReplicaCount: 1 # Optional. Default: 0
maxReplicaCount: 100 # Optional. Default: 100
fallback: # Optional. Section to specify fallback options
failureThreshold: 3 # Mandatory if fallback section is included
replicas: 6 # Mandatory if fallback section is included
advanced: # Optional. Section to specify advanced options
restoreToOriginalReplicaCount: true/false # Optional. Default: false
horizontalPodAutoscalerConfig: # Optional. Section to specify HPA related options
name: {name-of-hpa-resource} # Optional. Default: keda-hpa-{scaled-object-name}
behavior: # Optional. Use to modify HPA's scaling behavior
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 100
periodSeconds: 15
triggers:
# {list of triggers to activate scaling of the target resource}
More details on the spec can be found here. Let’s review how we can use it in the real world scenario
Let’s imagine a scenario where a data processing company is handling various workloads across different systems. The primary workload is a web application hosted on a Kubernetes deployment that monitors incoming orders through an AWS SQS queue. A second workload involves a StatefulSet maintaining customer analytics data, which needs to process incoming Kafka streams to keep dashboards up to date. Finally, for performance metrics and reporting, a batch job generates regular reports using Prometheus metrics that evaluate application health.
For this scenario, we will use a combination of ScaledObjects and standard Kubernetes workloads as well as ScalingJobs. KEDA’s ScaledObjects enable flexible scaling across various workloads, including:
With custom triggers, applications can be scaled efficiently based on event-driven metrics, such as message queue lengths or specific database conditions. This adaptability ensures optimal resource management for a wide range of architectures. Let’s review what the manifests would look like for such a scenario.
Here is how to install KEDA in the Kubernetes cluster with Helm:
# Add the KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts
# Update your Helm repository
helm repo update
# Install KEDA into your Kubernetes cluster
helm install keda kedacore/keda --namespace keda --create-namespace
First, ensure your IAM role for the Kubernetes service account is set up with the right trust relationship.
Save the following trust policy to a file named trust-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.us-east-1.amazonaws.com/id/YOUR_OIDC_ID:sub": "system:serviceaccount:default:keda-sqs-sa"
}
}
}
]
}
Then, create the IAM role using the AWS CLI:
aws iam create-role --role-name KedaSQSRole --assume-role-policy-document file://trust-policy.json
Attach the policy created in step 1 to this role:
aws iam attach-role-policy --role-name KedaSQSRole --policy-arn arn:aws:iam::aws:policy/AmazonSQSReadOnlyAccess
apiVersion: v1
kind: ServiceAccount
metadata:
name: keda-sqs-sa
namespace: default
annotations:
eks.amazonaws.com/role-arn: "https://sqs.us-east-1.amazonaws.com/123456789012/order-queue"
--
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-processor
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: order-processor
template:
metadata:
labels:
app: order-processor
Spec:
serviceAccountName: keda-sqs-sa
containers:
- name: order-container
image: order-processor-image
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-scaledobject
namespace: default
spec:
scaleTargetRef:
name: order-processor
triggers:
- type: aws-sqs-queue
metadata:
queueURL: "https://sqs.us-east-1.amazonaws.com/123456789012/order-queue"
awsRegion: "us-east-1"
queueLength: "10"
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: analytics-processor
namespace: default
spec:
selector:
matchLabels:
app: analytics-processor
serviceName: "analytics-service"
template:
metadata:
labels:
app: analytics-processor
spec:
containers:
- name: analytics-container
image: analytics-processor-image
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaledobject
namespace: default
spec:
scaleTargetRef:
name: analytics-processor
triggers:
- type: kafka
metadata:
bootstrapServers: "kafka-broker1:9092,kafka-broker2:9092"
topic: "customer-analytics"
consumerGroup: "analytics-consumer-group"
Using a ScaledJob differs from the combination of a standard workload and a ScaledObject in that it allows for the scaling of batch jobs specifically, providing built-in parallelism and completions directly in the job configuration. This setup simplifies the management of repetitive or data-processing tasks that require dynamic scaling and completion control, making it ideal for batch processing or cron-like jobs.
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: health-report-job
namespace: default
spec:
jobTargetRef:
parallelism: 2
completions: 2
template:
metadata:
labels:
app: health-report-job
spec:
containers:
- name: health-container
image: health-report-image
restartPolicy: Never
triggers:
- type: prometheus
metadata:
serverAddress: "http://prometheus-server"
metricName: "http_requests_total"
threshold: "100"
query: "sum(rate(http_requests_total[5m]))"
This scenario setup demonstrates how KEDA’s ScaledObjects and ScaledJobs can be used to scale a diverse set of workloads dynamically and efficiently.
StormForge’s machine learning complements KEDA’s event-driven autoscaling by providing precise recommendations for CPU and memory allocations based on real-world application usage data. While KEDA ensures horizontal scalability through a broad set of triggers and event-driven metrics, StormForge offers tailored vertical scaling adjustments, removing guesswork and ensuring optimal resource allocation. Together, KEDA and StormForge create a comprehensive scaling strategy that improves performance and cost efficiency, covering both horizontal and vertical scaling needs for dynamic Kubernetes workloads.
StormForge could be applied to the data processing scenario described above to provide precise recommendations for optimal resource allocation in each workload:
KEDA has markedly advanced Kubernetes autoscaling, addressing the complexities of dynamic, cloud-native environments through event-driven horizontal scaling. Its integration simplifies scaling based on real-time data, enhancing application responsiveness and efficiency.
As KEDA evolves, it’s poised for further enhancements, potentially incorporating AI and ML for more predictive scaling capabilities. In tandem with solutions like StormForge and Karpenter, KEDA is crucial in a comprehensive scaling strategy This ensures that Kubernetes resource management is more efficient and aligns with the demands of modern applications, setting the stage for future innovations in cloud infrastructure.
We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.