When scaling applications, rely on technical metrics like CPU or memory is always the easy choice. While effective in some cases, these metrics often miss the bigger picture—failing to align scaling with actual business needs. For example, traffic spikes may not always correlate with key transactions or user actions. This is where scaling based on business metrics becomes crucial, as it allows for a more precise response to the real demands of the application, optimizing both performance and cost. In today’s post, I’ll show you scale your Kubernetes application taking advantage of KEDA and Datadog.
Solution overview
For today’s lab, we have a basic python application running on K8S. This application uses opentelemetry to export a metric called “number_of_sales” that increases every time the /sales path is called.
The Datadog agent receives the metric from the application and pushes it to Datadog. In the meanwhile, KEDA is collecting the metrics and feeding the HPA.
Once the environment is set up, k6 is going to stress the sales api to simulate a spike of customers and force the scaling to act.
The app & test is available on: scaling-k8s-datadog-keda
Environment setup
Pre-requirements:
- A Kubernetes cluster. In case you don’t have one, you can use the eks-lab-cluster available on my GitHub.
- An API and Application Key of your Datadog account.
- k6 installed
Setup:
Once the cluster is up and running, let’s start by installing the tools. For KEDA, use the following commands:
## Installing KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace
For Datadog Operator and agent, run:
## Add environment variable
export DD_API_KEY="<YOUR_DATADOG_API_KEY>"
## Install datadog-operator using helm
helm repo add datadog https://helm.datadoghq.com
helm install datadog-operator datadog/datadog-operator -n datadog --create-namespace
kubectl create secret generic datadog-secret --from-literal api-key=$DD_API_KEY -n datadog
## Install DatadogAgent with otel endpoint available
kubectl apply -f https://raw.githubusercontent.com/diego7marques/scaling-k8s-datadog-keda/refs/heads/main/k8s/datadog/datadog-agent.yaml
With all tools up and running, we must deploy the application and the auto scaling configuration:
## Add environment variables
export DD_API_KEY="<YOUR_DATADOG_API_KEY>" \
export DD_APP_KEY="<YOUR_DATADOG_API_KEY>" \
export DD_SITE="datadoghq.com"
## Create a secret to KEDA authenticate with Datadog
kubectl create secret generic datadog-secrets \
--namespace=containscloud-demo \
--from-literal=apiKey=$DD_API_KEY \
--from-literal=appKey=$DD_APP_KEY \
--from-literal=datadogSite=$DD_SITE
## Apply the app yaml
kubectl apply -f https://raw.githubusercontent.com/diego7marques/scaling-k8s-datadog-keda/refs/heads/main/app/manifests/app.yaml
## Apply the autoscaling yaml
kubectl apply -f https://raw.githubusercontent.com/diego7marques/scaling-k8s-datadog-keda/refs/heads/main/app/manifests/autoscaling.yaml
The autoscaling is configured with minimum 1 pods and maximum 6 pods. The scaling will be triggered every time the number_of_sales metric reach 30 sales in the last 90s. The configuration is specified through the ScaledObject resource, as you can see in the following example:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: containscloud-scaledobject
namespace: containscloud-demo
spec:
scaleTargetRef:
name: containscloud-app
# Minimum numer of replicas
minReplicaCount: 1
# Maximum numer of replicas
maxReplicaCount: 6
triggers:
# Use Datadog as trigger to the scaling
- type: datadog
# Whether the target value is global or average per pod.
metricType: "Value"
metadata:
# Datadog metric query
query: "sum:number_of_sales{*}.as_count()"
# Value that will trigger the scale of TargetRef
queryValue: "30"
# The metric value to return to the HPA if a metric value wasn't found
metricUnavailableValue: "0"
authenticationRef:
name: keda-trigger-auth-datadog-secret
The scaling test
For the scaling test, k6 will do the role of simulating the customers’ load with the following conditions:
- The test has a limit of 2 minutes of execution
- The number of Virtual Users is 10
- There is a sleep of 3s between iterations from the same VU, to try to simulate real-word usage
- The ends if it reaches 300 iterations OR the time is over
The test code:
// stress_test.js
import http from 'k6/http';
import { sleep } from 'k6';
// Read the API path from an environment variable
const API_PATH = __ENV.API_PATH;
const BASE_URL = '<BASE_ENDPOINT>';
const API_URL = BASE_URL + API_PATH;
// Define the options for the load test
export let options = {
duration: '2m',
vus: 10, // Number of Virtual Users
iterations: 300, // Total number of requests
};
export default function () {
// Send a GET request to the API
http.get(API_URL);
// Optional: Add a sleep (3s) to simulate real-world usage
sleep(3);
}
To run the test, perform the following command:
API_PATH=/sales k6 run stress_test.js
Results
On Datadog, we can see the number_of_sales metrics spikes:
Conclusion
As we saw today, KEDA can be a great ally in the autoscaling journey, expanding the horizon of possible triggers and ensuring that you can use the most important metrics for your business.
Discover more from contains(cloud)
Subscribe to get the latest posts sent to your email.
Be First to Comment