Blog

Using Machine Learning to Find the ElasticSearch Performance Peak


By Steph Rifai | Oct 12, 2020

Mountain scape

How do you know where the highest peak of a mountain range is? Well, probably through a quick google search. But in the absence of the internet, you pick a random place to start, climb to the top and look around (Okay, yes you can use a helicopter, plane or maybe even a drone, but stay with me a second). Now that you are at the top, you see some of the mountains are still taller and make an informed decision on where to go next. Climb down, go to the next mountain and climb up again. Look around, see there are still taller peaks and try again. Tired yet?

Find the Tallest Mountain Peak Without Ever Climbing a Mountain

The truth is, no one has time to climb all of the mountains to find out which one is the tallest, and certainly no one has time to try different parameter combinations to reach the highest performance of their application (…you knew we’d get here). So how do you know when you’ve got the best parameters for your application? If you’re anything like us, most of the time you try a lot of different options, and once you find one that performs pretty well, you keep it. Behavioral economists call this satisficing. When the trade-off is more time spent searching for a better configuration, it’s better to stop when you’ve crossed a reasonable threshold. That’s why we built the StormForge Platform, to help you find the tallest mountain peak without ever climbing a mountain yourself.

When the trade-off is more time spent searching for a better configuration, it’s better to stop when you’ve crossed a reasonable threshold.

Putting Our Optimization to the Test

We put our machine learning (ML) to the test against the preset values in the ElasticSearch Helm chart and compared the performance and cost to the configurations our ML found (view our Elasticsearch example “recipe” here). Performance is measured as the duration  it takes to process a dataset configured with the rally benchmarking tool (in seconds). For our example, cost is specific to Google Kubernetes Engine, measured as the total cost of CPU and memory per month. Here’s what we got:

The above graphic shows the cost and duration measurements for each configuration the ML tried. The ML searched the parameter space, trying 80 different configurations of four parameters (memory, CPU, heap size, and replicas). The triangle represents the helm default, while the pink dots represent the best options of all the configurations the ML tried. Unstable configurations are parameter combinations that prevented the application from deploying, in this case it was 28% of trials. Each trial helped inform where to search next, ultimately giving us 16 best options to choose from. Since we are optimizing for more than one metric, there is not a single “peak”, but at a given cost, there is a single best duration.

Get Started with StormForge

Try StormForge for FREE, and start optimizing your Kubernetes environment now.

Start Trial

The helm configuration had a duration of 864 seconds and a cost of $85 per month. Point A has a duration of 628 seconds and cost of $76 per month. So that’s a 10% reduction in cost with a 8% boost in performance. Point B has a duration of 720 seconds and a cost of $81 per month, meaning for almost no change in cost you can get a 17% improvement in performance. If you are willing to increase your budget a bit, Point C costs $92/month and gives you 32% improvement of performance (589ms). Not bad options to have without ever changing a configuration yourself.

Want to see how StormForge does against your current configuration? Request a demo today!

Latest Posts

We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.