Blog

Advice on Performance from the AWS ‘Well-Architected Framework’


By Denise Schynol | Jun 01, 2021

Aws framework social 1

The AWS Well-Architected Framework is a guideline for review and improvement on cloud-based architectures. Amazon Web Services, or in short AWS, recommend “general design principles” (p. 4) and describe best practice examples.

In this blog, we sum up some advice on load and performance testing.

First, let’s start with an explanation of the general design principles suggested by AWS followed by a closer look at some of the so-called “five pillars” of the AWS Well-Architected Framework (p. 6).

Security, Reliability, Performance Efficiency, Cost Optimization & Operational Excellence

In their definition of a well-architected framework, AWS speaks about how they help customers make “architectural trade-offs as your designs evolve” (p. 2) and the effects and learnings on performance after deploying into a live environment. Based on this, they created the AWS Well-Architected Framework – a “set of questions you can use to evaluate how well an architecture is aligned to AWS best practices” (p. 2). Five topics (so-called “pillars”) are defined: security, reliability, performance efficiency, cost optimization, and operational excellence 1 (p. 2).

AWS describes that customers have to “make trade-offs between pillars based upon [their] business context” (p. 3) when they improve their architecture. For example, the optimization of performance efficiency has crucial effects on your cost efficiency, and you always have to measure.

We agree with that statement and recommend doing performance analysis and optimization of performance bottlenecks on a regular basis to gain effect on cost efficiency.

5 Pillars of AWS Well-Architected FrameworkGeneral Design Principles

The definition of the AWS Well-Architected Framework is followed by a bunch of general design principles “to facilitate good design in the cloud” (p. 4). We pitch on three points concerning load and performance testing. The first one deals with testing systems at production scale:

In the cloud, you can create a production-scale test environment on demand, complete your testing, and then decommission the resources. Because you only pay for the test environment when it is running, you can simulate your live environment for a fraction of the cost of testing on premises. (p. 5)

AWS makes it easy and affordable for you to test in the cloud. You can run your tests without a huge increase in cost.

Another point is AWS’s suggestion to do improvements through “game days”:

Test how your architecture and processes perform by regularly scheduling game days to simulate events in production. This will help you understand where improvements can be made and can help develop organizational experience in dealing with events. (p. 5)

An event might be the transmission of a newsletter, any other promotion or a business-related event, like the launch of a new website or product. In this case, you can make use of spike testing, which is comparable to a load or stress test. If you are curious about the different types of testing, check out our blog post about types of performance testing or read our documentation.

One more appreciable point deals with allowing for evolutionary architectures:

(…) In the cloud, the capability to automate and test on demand lowers the risk of impact from design changes. This allows systems to evolve over time so that businesses can take advantage of innovations as a standard practice. (p. 5)

It is common to introduce new features or a new technical product to your architecture, test it in an automated, cloud-based environment and learn about the performance impact of the introduction. Especially test automation is one of StormForge’s major topics as we deliver all the tools you need to do continuous load and performance testing in the cloud.

In the following, we take a look at three of the mentioned pillars: Operational Excellence, Reliability, and Performance Efficiency.

Operational Excellence

Considering performance testing the Operational Excellence pillar:

Includes the ability to support development and run workloads effectively, gain insight into their operations, and to continuously improve supporting processes and procedures to deliver business value. (p. 6)

As one best practice AWS recommends to “[f]ully automate integration and deployment” (p. 48):

Automate build, deployment, and testing of the workload. This reduces errors caused by manual processes and reduces the effort to deploy changes. (p. 48)

With StormForge, you can easily automate your test cases. You can also integrate it into your CI/CD workflow to make testing effortless and repeatable.

Reliability

For best reliability, AWS recommends to “[r]egularly back up your data and test your backup files to ensure that you can recover from both logical and physical errors.” (p. 24). One common practice is “automated testing of workloads to cause failure, and then observe how they recover” (p. 24). Once more, continuity and automation are keys to manage this: “Do this on a regular schedule and ensure that such testing is also triggered after significant workload changes.” (p. 24)

When you have designed your workload to be “resilient to the stresses of production”, the white paper emphasizes that “testing is the only way to ensure that it will operate as designed, and deliver the resiliency you expect” (p. 64).

Resiliency testing is recommended to be integrated “as part of your deployment: Resiliency tests (as part of chaos engineering) are run as part of the automated deployment pipeline in a pre-prod environment” (p. 62).

The idea behind resilience testing is to look at certain processes and behavior under load and check if you have covered this.

Ask yourself: What about deployments? Ever-changing and ever-evolving infrastructure? Or automatic scaling environments?

The idea to give this its own testing category is inspired by Principles of Chaos Engineering. You most probably have checked for those points in some form of manual or automated functional testing, but did you also check for them under load?

Performance Efficiency

The white paper actually speaks about the “Performance Efficiency” pillar, which is a bit redundant, as performance actually means resource efficiency. 😉

On page 65, the white paper asks: “How do you select the best performing architecture?” AWS offers various kinds of support and services to help answer this question and to help you establish a good working architecture. But there is one thing to keep in mind: “data obtained through benchmark or load testing will be required to optimize your architecture” (p. 26). Of course, we totally agree: You just can’t avoid load and performance testing! 🤓

Load testing is also necessary to “understand how your trade-offs impact your workload” (p. 30). AWS recommends to “[u]se a systematic approach, such as load testing, to explore whether the tradeoff improves performance” (p. 30).

Depending on your workload, various resource types and sizes can differ to fit your performance requirements. With a defined test case, it is easy to rerun the same test case against different infrastructure setups and gather needed data and learnings.

Deploy your latest workload architecture on the cloud using different resource types and sizes. Monitor the deployment to capture performance metrics that identify bottlenecks or excess capacity. Use this performance information to design or improve your architecture and resource selection. (p. 65)

What AWS is suggesting here is what’s called Configuration Testing. Configuration Testing does not only cover your software configuration, but also your entire environment. Rerunning of performance tests is part of a review routine to ensure that you continue to have the most appropriate resource type. Approaches change, new technologies develop – use these progressions to refine your architecture and improve its performance.

It all boils down to: Performance Testing is not a one-time thing. Continuity is key to success and you should be set up for doing performance tests whenever needed. In general, you should start to test early and do it on a regular basis. If possible, integrate performance analysis in your development and testing processes, ideally alongside your functional tests in your continuous integration systems.

Conclusion

AWS released a fine and valuable white paper worth reading, a practical guideline to design and steadily improve a well-architected framework. They offer a lot of useful services you might check out. Most of the targeted problems we see in the wild on a regular basis and speak out the same recommendations.

We always recommend to start early with small, easy and understandable test case scenarios. It is easy to set up your first load test with StormForge for free. Request a demo, learn more in our documentation and get in touch with us to get a personal onboarding.


1. In a former blog post from October 2015 the author Jeff Barr counts only four pillars. So, the later added pillar “operational excellence” seems to have gained more importance over the time.

Latest Posts

We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.