Scaling beyond the technical with KEDA and Datadog

When scaling applications, rely on technical metrics like CPU or memory is always the easy choice. While effective in some cases, these metrics often miss the bigger picture—failing to align scaling with actual business needs. For example, traffic spikes may not always correlate with key transactions or user actions. This is where scaling based on business metrics becomes crucial, as it allows for a more precise response to the real demands of the application, optimizing both performance and cost. In today’s post, I’ll show you scale your Kubernetes application taking advantage of KEDA and Datadog.

Solution overview

For today’s lab, we have a basic python application running on K8S. This application uses opentelemetry to export a metric called “number_of_sales” that increases every time the /sales path is called.

The Datadog agent receives the metric from the application and pushes it to Datadog. In the meanwhile, KEDA is collecting the metrics and feeding the HPA.

Once the environment is set up, k6 is going to stress the sales api to simulate a spike of customers and force the scaling to act.

The app & test is available on: scaling-k8s-datadog-keda

Environment setup

Pre-requirements:

A Kubernetes cluster. In case you don’t have one, you can use the eks-lab-cluster available on my GitHub.
An API and Application Key of your Datadog account.
k6 installed

Setup:

Once the cluster is up and running, let’s start by installing the tools. For KEDA, use the following commands:

## Installing KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace

For Datadog Operator and agent, run:

## Add environment variable
export DD_API_KEY="<YOUR_DATADOG_API_KEY>"

## Install datadog-operator using helm
helm repo add datadog https://helm.datadoghq.com
helm install datadog-operator datadog/datadog-operator -n datadog --create-namespace
kubectl create secret generic datadog-secret --from-literal api-key=$DD_API_KEY -n datadog

## Install DatadogAgent with otel endpoint available
kubectl apply -f https://raw.githubusercontent.com/diego7marques/scaling-k8s-datadog-keda/refs/heads/main/k8s/datadog/datadog-agent.yaml

With all tools up and running, we must deploy the application and the auto scaling configuration:

## Add environment variables
export DD_API_KEY="<YOUR_DATADOG_API_KEY>" \
export DD_APP_KEY="<YOUR_DATADOG_API_KEY>" \
export DD_SITE="datadoghq.com"

## Create a secret to KEDA authenticate with Datadog
kubectl create secret generic datadog-secrets \
  --namespace=containscloud-demo \
  --from-literal=apiKey=$DD_API_KEY \
  --from-literal=appKey=$DD_APP_KEY \
  --from-literal=datadogSite=$DD_SITE

## Apply the app yaml
kubectl apply -f https://raw.githubusercontent.com/diego7marques/scaling-k8s-datadog-keda/refs/heads/main/app/manifests/app.yaml

## Apply the autoscaling yaml
kubectl apply -f https://raw.githubusercontent.com/diego7marques/scaling-k8s-datadog-keda/refs/heads/main/app/manifests/autoscaling.yaml

The autoscaling is configured with minimum 1 pods and maximum 6 pods. The scaling will be triggered every time the number_of_sales metric reach 30 sales in the last 90s. The configuration is specified through the ScaledObject resource, as you can see in the following example:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: containscloud-scaledobject
  namespace: containscloud-demo
spec:
  scaleTargetRef:
    name: containscloud-app
  # Minimum numer of replicas
  minReplicaCount: 1
  # Maximum numer of replicas
  maxReplicaCount: 6
  triggers:
  # Use Datadog as trigger to the scaling
  - type: datadog
    # Whether the target value is global or average per pod.
    metricType: "Value"
    metadata:
      # Datadog metric query
      query: "sum:number_of_sales{*}.as_count()"
      # Value that will trigger the scale of TargetRef
      queryValue: "30"
      # The metric value to return to the HPA if a metric value wasn't found
      metricUnavailableValue: "0"
    authenticationRef:
      name: keda-trigger-auth-datadog-secret

The scaling test

For the scaling test, k6 will do the role of simulating the customers’ load with the following conditions:

The test has a limit of 2 minutes of execution
The number of Virtual Users is 10
There is a sleep of 3s between iterations from the same VU, to try to simulate real-word usage
The ends if it reaches 300 iterations OR the time is over

The test code:

// stress_test.js
import http from 'k6/http';
import { sleep } from 'k6';

// Read the API path from an environment variable
const API_PATH = __ENV.API_PATH;
const BASE_URL = '<BASE_ENDPOINT>';

const API_URL = BASE_URL + API_PATH;

// Define the options for the load test
export let options = {
    duration: '2m',
    vus: 10, // Number of Virtual Users
    iterations: 300, // Total number of requests
};

export default function () {
    // Send a GET request to the API
    http.get(API_URL);

    // Optional: Add a sleep (3s) to simulate real-world usage
    sleep(3);
}

To run the test, perform the following command:

API_PATH=/sales k6 run stress_test.js

Results

On Datadog, we can see the number_of_sales metrics spikes:

Conclusion

As we saw today, KEDA can be a great ally in the autoscaling journey, expanding the horizon of possible triggers and ensuring that you can use the most important metrics for your business.

contains(cloud)

Solution overview

Environment setup

Results

Conclusion

Like this:

Discover more from contains(cloud)

Leave a Reply Cancel reply

Useful Links

Socials

Scaling beyond the technical with KEDA and Datadog

Solution overview

Environment setup

Results

Conclusion

Share:

Like this:

Discover more from contains(cloud)

Leave a Reply Cancel reply