Skip to content

Is the Amazon API Gateway cache feature a game changer?

When discussing API performance, caching often emerges as a key strategy. By storing responses from frequently accessed endpoints that are static or change infrequently, caching reduces server load and speeds up response times, making it a go-to solution for performance optimization.

There are many ways to implement caching, from the application layer (in the code) to the infrastructure layer. Today we are going to test one of the easiest ways to do this: by enabling it directly in the API gateway. This method requires low configuration effort and no code update, but it does it really make a difference? Let’s test it.

The test environment

The architecture of the application used to the test is composed by an Lambda that simulates the API and an API Gateway API with two stages, one called nocache, with cache disabled, and another one named cache, with cache enabled (yeah, I know, pretty obvious xD).

The API (lambda) return always the same body and has a response time between 300ms and 600ms to approximate a real-world API and the API Gateway Cache TTL (Time to Live) was configured to 180s (3 minutes).

Lambda code:

import time
import random

def lambda_handler(event, context):
    # Generate a random delay between 300ms (0.3s) and 600ms (0.6s)
    delay = random.uniform(0.3, 0.6)
    
    # Add the delay
    time.sleep(delay)
    
    # Return a success message
    return {
        'statusCode': 200,
        'body': 'Success',
    }

As the terraform code that create the resources is extensive, I’ve made it available on Github.

The load testing

First of all, let’s set the conditions of the test:

  • The test has a limit of 10 minutes of execution
  • The number of Virtual Users is 990 to match the provisioned concurrency of Lambda configuration so we don’t get throttled during the test
  • There is a sleep of 0.2s between requests from the same VU, to try to simulate real-word usage
  • The ends as soon we reach 1M requests OR the time is over.

As we have the conditions settled, it’s time to create the code of the test. For this scenario, we are going to use k6, the Grafana Labs open-source tool for Load testing. Here is the code:

import http from 'k6/http';
import { sleep } from 'k6';

// Read the API path from an environment variable
const API_PATH = __ENV.API_PATH;
const BASE_URL = 'https://<API_GATEWAY_ID>.execute-api.<REGION>.amazonaws.com';
const API_URL = BASE_URL + API_PATH;

// Define the options for the load test
export let options = {
    duration: '10m',
    vus: 990, // Number of Virtual Users
    iterations: 1000000, // Total number of requests
};

export default function () {
    // Send a GET request to the API
    http.get(API_URL);

    // Optional: Add a small sleep to simulate real-world usage
    sleep(0.2);
}

First test (without cache):

The first test took 9m17s to reach 1M requests with a throughput of ~1793 requests per second and an average response time of ~552ms. The requests were distributed in this way within the 9 minutes:

** Apparently, the lambda metrics reach CloudWatch faster than the API Gateway metrics, which gives the perception that Lambda started first, but it didn’t.

Second test (with cache):

The second test took 5m52s to reach the 1M requests with a throughput of ~2923 requests per second and an average response time of ~338ms . The requests were distributed in this way within the 5 minutes:

** Apparently, the lambda metrics reach CloudWatch faster than the API Gateway metrics, which gives the perception that Lambda started first, but it didn’t.

Summary & conclusions

  • Throughput: 1792,42 requests/sec vs 2922,82 requests/sec ( ⬆️ ˜63% higher with cache)
  • Average Response time: 551,88ms vs 338,27ms ( ⬆️ ~39% faster with cache)
  • Backend API invocations (lambda): 1M invocations vs 5 invocations ( ⬆️ ~99.8% less invocations with cache)

Based on the data, implementing cache delivers substantial performance improvements with low effort of implementation. Another important take-away is that Cache does not necessarily mean an increase in costs, since fewer calls will be made to the API backend, and this can lead to a reduction in costs at this layer. Our test is a good example of this, since lambda is charged for invocations and ~98% less invocations were made with cache enabled.


Discover more from contains(cloud)

Subscribe to get the latest posts sent to your email.

Published inAWSInfra as Code

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *