Noah: Thanks everyone for coming today. Welcome to today’s webinar. The topic of today’s webinar is, as you can see on your screen, Automating Cloud Efficiency with the StormForge Application Performance Optimization Platform. Today we’ll be going through a bit of a deep dive. We’ll be doing some demo work and we’ll be talking about StormForge as a platform itself.

First thing I’d like to do is introduce your co-host today. I myself I’m Noah Abrahams. I am the Open Source Advocate here at StormForge. And my co-host as well, Brad…

Brad: So I’m Brad Ascar, Senior Solution Architect here. I’ve got a lot of years in the industry, a lot of places like Red Hat, Autotrader, and HP, and I’m based in Atlanta.

Noah: So let’s get started, I think. We want to talk about StormForge, the optimization platform, but I think it would be good to level set first.

If I am, Brad, if I am a user, a development team, devops team, somebody out in the world that has an application, why do I care about optimization? Why is this an important thing?

Brad: It’s an important thing for a lot of reasons. Number one, you want your applications to run as efficiently as possible. Not running efficiently has a lot of side effects, including driving your costs up, not giving you the performance that you can for your application. It’s ultimately about tuning your applications. It’s just like being in a car that’s not tuned. It’s not going to run well. It’s going to be bad at the way that it runs. It’s inefficient and might not actually meet your SLAs or your SLOs. So very important to optimize your application to be able to meet all of the things that you need to do, including your business objectives and reducing cost. And sometimes it’s not about reducing costs. Sometimes you just want maximum performance out of some application. You need to be able to do that as well.

Noah: So is there a particular application profile. What types of apps might I want to optimize? Is this, does this apply to anything, or…

Brad: Yeah so number one for our optimization platform we run in Kubernetes, right? So that’s what we optimize for is applications running in Kubernetes. Beyond that there is no specific kind of application. In fact, it can be your custom written applications, it can be off-the-shelf kinds of applications, anything that runs inside of Kubernetes, but generally the higher the performance, the more data that it handles, the more resources it’s using in your environment, the more, the better target it is for optimizing the application.

Noah: Okay, so if I’ve got a set of applications and now I have some sort of idea what I might be looking for, I’m looking for resources, I’m looking for that type of behavior, where do I go from there? I have an idea that I want to optimize, how do I get started with this?

Brad: Yeah, so several things that you need to do. Number one, if you’re testing the performance of application, it’s good to have a performance testing platform that actually allows you to send load against your application. That’s one of the areas that we get into. So let me jump out of these slides here and show a little bit about our platform and kind of talk about it from an experimenting standpoint. But first I’ll start actually on our performance testing side of the house. So we’ve got a performance testing application. You need performance testing to be part of optimizing because we’re measuring how the application behaves under load. We’ve got a great one. You don’t have to use ours, but there are a lot of interactions in the application that make it easier if you’re using our load testing, but ultimately you need to be able to send the load against your application to determine how it behaves. Some applications have their own load tests, different kinds of load tests, you might have some that you written in-house and we can utilize those as well, but in this case generally when you’re doing performance testing, you’re doing performance testing to determine how it behaves or how it acts. Performance testing platforms generally tend to have the outside look. How does it look to the system that’s doing the load testing against the system, against the application, and gives you the kinds of graphs and charts that you expect to be able to see. See where the spikes are and things like the number of active clients, things like request rate, latency, things like 99th percentile latency, when you’ve hit the wall, when you’re seeing challenges in the way that it’s able to handle it within the application, what kind of errors are you getting and where do those errors show up? Ultimately this is the outside view of can the application even handle the load you’re sending? In this particular case, they can’t handle the load that’s sending. At the same time on the inside of your application, you’re probably monitoring it with things like Prometheus, or Datadog, or whatever it is that you do for your monitoring and you’re probably looking at dashboards that have this sort of thing. Everybody knows this from IT operations, right? Then they see a bunch of dashboards about how the application behaves. The challenge is, when I’m doing this and I see these spikes, and I see these problems, and then on the inside in my APM, I’m seeing these problems, I can see the spike like the spike happens at the same time. What do I actually do about that? If I’m seeing this regularly, how do I determine how to avoid this? Sometimes it’s just tribal knowledge. People know because they know what that spike looks like previously, right? And then they know okay, that kind of looks like this or I need to go change this. Is it memory or is it CPUs? Is it number of replicas? Is something happening in the storage subsystem? What is going on that’s causing this problem? Ultimately it requires a lot of knowledge to be able to dig down in the application. That’s really where we come in on the optimization side and say there’s a better way to do this, and that is, why don’t you create experiments to be able to go against the application using a load test and then determine how the application actually behaves under that kind of load? And then what might be the better configuration? And we do that using machine learning. That’s really the core of our platform is real machine learning to determine how your application behaves.

Noah: So this doesn’t take away from the understanding of the application. It’s really supplementing it, is that correct?

Brad: That is correct. It’s another tool in your toolbox to be able to do it. And I spent a lot of years in app dev teams doing some really high performance kinds of systems. I wish I’d had something like this because ultimately we spend a lot of time trying to figure out what to do and what knobs and levers need to change to change the behavior of the application. This is where we really help devops teams because we take and give you a tool that allows you to do that, and then that allows you to go about building the business functionality, rather than trying to figure out the performance tuning part of what’s going on.

Then we do this doing experimenting. So within here, I’m going to show you from a demonstration standpoint. I’ll show you a little bit about what it is we do in our platform. In this case, I’m going to show you the cake after it’s baked and then a little bit I’ll show you how we actually bake the cake.

So here I’ve got several examples of experiments that I’ve run. One of them I like to show is the Voting Web App. This is the Docker Voting Web App. It’s the cats versus dogs application, if you’re familiar with it. It’s basically a five microservice application where users vote cats versus dogs. It stores it in memory data store on the back end, a worker picks up that work, shoves it into a database, and then there’s another UI for the people that actually see the results of the voting. So it’s contrived, but a lot of people know the application. It’s a good example of microservices. In this case, we ran 159 trials in this experiment to determine the best configuration for this application. There’s several things going on the screen. This blue square is actually the original configuration of this application. So if you went into the configuration of the YAML files, or your helm charter, depending on how you deployed this application, these are what it has configured for things like the database CPU, or the voting CPU, the amount of memory, or the number of replicas, so in this case we’ve got 10 different parameters that we’re testing against. The challenge is: what is the best configuration for this as you’re running load against it? And you’ve designed the experiment from the very beginning to do what you want it to do. The experiment says I’m going to push this vinyl load. It has to have certain performance characteristics and you can put minimums and maximums of what you consider to be allowable performance for the application. And then we run the experiment loop. Now the experiment loop is interesting in that we run this inside of your cluster. This is an example of what we do during an experiment. Basically you create an experiment, and I’ll show you how we do that, and it reports to the machine learning and says, hey I’m starting an experiment. Then what runs inside of your cluster then goes to the machine learning and says, I need some assignments. I need to know what I need to change these various parameters to so that I can run. It makes a suggestion as to what that should be, and as you go more and more through the loop, it’s getting smarter and smarter about your application because it treats your application as a black box, but it’s testing against that. Creates a trial, it sets it up, and then it basically patches those resources, waits for the application to stabilize, so it’s now got the new set of settings that we’re testing, and then it runs a trial. At the very end it measures the performance of how this thing behaved, and then it cleans up what it’s set up, and it basically reports the trial back and says, now I’ve measured this, machine learning gives me the next suggestion. And the machine learning gets smarter and smarter about what’s going on.

All of these dots here are all of the various trials that were done during the experiment, but really what you care about is the optimal because this is where it really becomes useful to you. What is the optimal configuration based on what I’m measuring? What I’m measuring is designed in the experiment. In this case, we’re testing the cost of the application against the throughput. So we’re trying to maximize the throughput. We’re trying to reduce the amount of cost in this application. You’ll see these dots here and these diamonds here, this is the pareto front. This is machine learning trying to weigh cost versus throughput and show you the minimum amount of cost, but it’s also a very low throughput, to maximum amount of cost, but the highest throughput and somewhere in the middle. This one is actually the optimal configuration as picked by the machine learning. If you click on this, watch at the very bottom of the screen you’ll see the recommendation that it makes. These are the changes that are necessary to make this thing. Now in this case, it’s just a little bit slower. As long as it falls within your parameters that’s fine, but reduce the cost by 72% to be able to do this. Maybe you’re saying I want to have the exact performance level. Okay, so you choose this dot on the curve. Now it’s a little bit faster. You still save 46%, right? This is useful information to be able to determine how to configure your application and you didn’t have to do any of this manually. This is all running in the background as it’s doing this experimenting to determine how the application behaves best. This is of course done in down level environment because you can’t do this kind of thing in production and break your application a bunch of ways because one of the things that happens when you’re doing this kind of experimenting is we also find the settings that don’t work and actually crash the application. Makes sense? So as we go from here, there’s all kinds of applications. So this is one example of a kind of application, but there’s a lot of other kinds of applications. ElasticSearch. A lot of people have ElasticSearch in their environment. As you’ll notice, the pareto front actually looks a little bit different here because it’s a little bit different kind of application and behaves a little bit differently. Ultimately you find out from the base configuration a better configuration for how the application behaves. In this case it actually has Java that’s involved, so things like heap percentage is important, just like other java parameters in other java applications. In this case I’ll show you the kinds of things that we tune for Java applications. The beauty is if it can be measured and if it can be exposed in your application manifest or the design of your applications, then it can be used as a way to tune. So it’s not just memory, CPU, and replicas, which is what people do on the Kubernetes side of things. The manifest, it’s actually all the knobs and levers of your actual software, and that’s where we have great capabilities to be able to showcase real savings for your applications. And it doesn’t matter the application, it doesn’t matter the technology, if there are knobs and levers that allow you to change the performance characteristics of your application then we can tune it. It has to be measurable of course, but then we can tune it and actually show it for your application.

Noah: So this would include things like environment variables and you’ve already mentioned Java configuration, anything we can expose?

Brad: Anything you can expose. So anything that you can pass in and that’s used in an INI file, configuration file, any environment barriers, anything that is a thing that can be used to tune the application, which makes it very powerful. As long as it’s a tunable thing and for the application or technology stack, that’s something that we can do.

Noah: Awesome.

Brad: Yeah, so got a bunch of examples. Here’s one that we did with a Spark application. Based on doing its data set and crunching data. Now what’s interesting is the configuration that was the base configuration for this out of the box, and in the particular case here, we were just tuning for one thing. We’re trying to reduce the amount of usage because this is an overnight batchy kind of thing, right? and in this case as we drill down and click on the optimal configuration, we found that for processing this particular data set 48% faster. Of course every data set’s different because in things like Spark, things like shuffle file buffers and executive memories and memory fractions, all of these things seriously change how the application behaves. But if I click on this again from the base, look how little difference there was in those settings. Now what’s interesting is we’re working with a partner and as they looked at this they weren’t surprised. Normally when we show results our customers are very surprised, like we didn’t know that those are the things that really change things. This partner said this actually wasn’t very surprising to me. What surprised me is that you did in five hours what took me two years of the school of hard knocks to figure out how all of this works, right? Basically what he said is you have an expert in the box, and that’s really where we’re getting to is, it’s an expert in a box. It doesn’t have to know anything about your application up front. The experiment lifecycle actually teaches. It actually uses the machine learning to learn about your application. So this is a little bit of showing what we’ve shown a lot of other things, which is the cake after it’s baked. Now I’m going to show you how we actually bake the cake.

So this is a little different than some of the other demos that we’ve done in our webinars.

Noah: I want to make a note here that for this particular demo we’re going a little bit deeper, we’re going into the CLI, we’re gonna go and expose some stuff under the hood. If you’re looking for something that’s gonna be more of a higher level demo, something you could share with other people, we’re gonna have one of those linked at the end of this session. So you can follow along with that and pass that as well.

Brad: Yeah and we just did a webinar recently on that, so if you’re not a command line person don’t worry, but this is great digging in. So from the simplest case, if you’ve got a performance test that you have for your application, then it’s pretty simple. You’ve got your application in your environment and you can literally use our ctl command. So we it’s called red sky ctl and you type red sky ctl run and it will go interactive and start asking you questions about what it is that you want to optimize in your environment. In this case, I’m going to choose the StormForge performance test, but I can also choose locus that’s one of the other ones that we support out of the box. If I choose StormForge in this case, it’ll talk to my StormForge API and determine what it is that I want to use as the performance test. So I’ll choose that and down below it’ll say, where is this application in your cluster that you’re currently connected to? In this case, I’m going to choose the voting app and hit enter. It’ll ask me to specify labels for what I’m testing in this namespace. So in this case, we do have labels in the environment, and so in this case, app equals voting dash app. It’ll ask me about things like discovery or memory and CPU parameters, so this is the easy path, right? So it’s going to do some of the easiest things. If I just hit enter it’ll find all of them. Same thing for there. And then out of the box for this command will give you cost against various latencies. So cost against p50 latency, p95, p99. I like p295 and p99 to give me an idea of the behavior of my application. And then if I hit enter here, it will then say the experiment is now ready to run. Here’s the name of the experiment, here are the parameters that we’re tuning for. So all those parameters that you saw below at the bottom of the screen, it picked them up out of the manifest that we’re running in your environment. So this is the running application in a particular namespace. it found all the pieces and actually chose the parameter range they wanted and then we’re going to calculate the metrics. Now I’m not going to hit run here because this takes a lot longer than our webinar is, but I’ll show you what it’s doing under the cover.

Under the cover it’s actually doing this. It’s creating an experiment file and I’ll show you here in a bit kind of the pieces of an experiment file, but this is really a very easy way to get started. If you just hit enter, it would run this experiment and then I’ll dig in a little bit about what the experiment file looks like. In fact let me bring this off the screen and this so I can share a couple of pieces. First, I want to show you what an application file is. So this is a Kubernetes object, this is a CRB in the Kubernetes environment. This is a declarative way to do what we just did at the command line. So command line, we answer some questions here, we’re going to answer the question in a declared way, in a YAML file. Your developer’s going to be very familiar with doing this sort of thing. The parameters that you want to choose, the scenarios including which load test you wanted to use for the experiment, and then what it is you’re doing. So in this case we’re doing p50, p95, and p99 latency, and cost. So we did all four of those objectives that you saw there. Ultimately this can be fed in and it generates an experiment file. In this case it’ll show you a sample experiment file and what’s in an experiment file? This is for the people that really want maximum level of control for what they’re doing in the experiment. So this is where you get in and really get down to the nitty gritty including some custom things that you might be doing in your environment, or you want maximum control, right? So you have the easy button in red sky ctl run, answer a few questions, and we run. Or all the way down to I actually want this and you can actually export that when you do it from the from the red sky ctl run, but here is an example of one of those where you go in and say hey this is an experiment, what are the labels, what’s the name of this thing, what are the metrics I’m going after, and where am I getting it? In some cases you’re going to be going against other kinds of things. In this case we’re going against Prometheus in our environment, but you can use it you know integrations like Datadog or anything else that collects metrics about what you’re doing, so that at the end of the trial we reach out to it and say give us the numbers that go along with this, right? Go fetch the performance characteristics so that we can report back to the machine learning how this thing behaved. You also get to define the parameters of what it is that I’m testing, including setting the things like the baseline for this application. If you saw before we had the baseline. That blue square? This is where it gets those values and then what’s the minimum and maximums. It’s really important because it doesn’t do any good to test things like baseline amount of memory, and then set memory value so high that they’re bigger than the nodes in your environment, or you’d only fit one thing on every node, right? You get to actually put the parameters that make sense for your environment, including also making sure minimums. If you know the application just will not run below a certain minimum, no reason to test it lower than that you just put a minimum that makes sense. Minimum numbers of things like replicas, utilization, whatever it is that you’re doing in your environment. Then ultimately when this thing is run, how is it patching? What’s going on? So this is where you get really deep into Kubernetes and doing Kubernetes patching. This is the same thing as when you’re making changes via coop ctl and you’re actually going into interactive mode changing the ML files or you just change the YAML file and reapply it in an environment, you’re changing things like numbers of replicas, you’re changing these things. This is where we get to programmatically make these changes as we’re doing it under the cover so that ultimately we can drive the behavior that we want in the experiment. So this is a maximum level of control. This is more akin to red sky ctl run, but in a declarative standpoint. People like this because they can check it into their code repos, right? So they know the versions of what they’ve tested, what they’re going on, so that ultimately they can go back in and look at what’s going on in the UI as a function of determining how this thing behaved. The other thing you can do of course is integrate with your CI/CD pipeline. Your CI/CD pipeline, you want to be able to take this because ultimately you’re trying to figure out how your application behaves.

In this case I’ll give you an example. For this particular application say this is the point that you chose, this is the one that actually had the higher performance for what you’re doing, these are the parameters you want to be able to put into your various configuration files for that application. If it’s a YAML file, whatever kind of file that is. In fact from here you can actually export that and it will show you the command to be able to export that so that you’ve got the files necessary. Of course you can automate this in the CI/CD pipeline so you can just grab this out of there and then you can of course run all of this via our command line as well.

And so gives you great capabilities.

So now you’ve optimized this application, but you’ve just optimized it one time. Ultimately you want to be able to do this again and again, and we want you to build it into what we call CI/CO/CD. Before you go to deployment, you actually want to make sure it’s still optimized. So as you’re doing this in your CI/CD pipeline, you’re actually checking to make sure does it still behave the same way? It’s important because your application is constantly changing. It may be that you just put in some new business functionality in your application. It now behaves differently because you intended it to behave differently. But it also could be that something happened in your environment that you didn’t realize. Somebody checked in some code and, unbeknownst to you, some other library that you don’t have anything to do with other than it’s a dependency of yours actually got fixed because of a security vulnerability. It may be now that it doesn’t do math well or maybe does something else, it could be that as you’re about to deploy, unbeknownst to you, something’s introduced change and now it’s actually 25% slower and no longer meets your SLAs. Before you go to deployment, that’d be a really good thing to know. Is my application still performant because if my application’s not still performant, that’s really important information before I push it out to my users because even if you’re doing blue green deployments or you’re doing canary deployments, that still means a portion of your users are getting a bad experience until you decide to roll back. Or even worse, you push it out the current scenario that causes the problem and ultimately under load isn’t present. You go and decide everything’s running okay, push it all out and now you’ve got a release, and then the spike in in traffic or the thing that causes the problem shows its head and now you’ve got to roll back, so now all of your users got a really bad experience based on how this thing crashed or caused problems. It’d be really, really good to know just ahead of time is this thing still performant and if not what do I do about it? Now in this case you would actually set it up so that your CI/CD pipeline would fail on this, so you could go take a look at it. It’s now 25% slower.

Now you may determine you’re in a regulated industry. I actually have to roll this out because I can’t roll with unpatched versions of things that are out there that have known vulnerabilities. Okay that’s fine, but now you may need to rerun this optimization because this optimization is going to look a lot different with those components running very differently than they did before. But at least you can run and make sure that the optimization is there for your application, and in your application you now have the ability to still meet your SLA and SLOs. It might be more expensive, it may not be more expensive, depends on what we find in the optimization, but at least it meets your SLAs and SLOs, which is really important. And that reduces operational risk and reputational risk because aside from things costing too much, if you give your users bad service ultimately that is a reputational risk and sometimes it’s an operational risk as well for what you’ve got going on.

So that gives you a flavor of the things that we do. Now you can do it for any kind of application and this is the beauty. Homegrown applications, great opportunity. As we work with some very large customers, we work with customers that have very large performance teams. They have entire teams that all they do is look at the performance application every single day to determine how it’s behaving. Even in scenarios like that, we’ve been able to go in with those customers and actually show them significant savings.

In case of one very large travel vendor, we’re actually able to show nearly 50% savings for their application. Basically the feedback from them was number one shock that we were able to do that and show that in a few hours of running experiments, but number two that the things that changed were not intuitive. That’s the other piece of running things with machine learning. The machine learning has no presuppositions as to how your application behaves. It just knows that it’s trying to work with numbers or strings and that different versions of numbers and strings makes it behave in a different way. Then it plots it out and says, okay well let me find something more efficient. Let me try something else that’s more efficient. What hurts you when you’re trying to do optimization of an application is information. It’s not the information that you don’t know, it’s the information that you think you know that’s actually not true. So I know how my application behaves because I’ve been doing this for a long time, and yet, that may actually be false information. You may not have tested enough combinations of things or you didn’t move enough of these things in conjunction with each other. We’re talking about billions of possible combinations here when you get to this number of parameters. Something that humans are not good at, but machine learning is really good at. So our message is don’t try and do this by hand. There is a tool for doing this and the way that we do it is very efficient, generally done in minutes or hours to give you a new optimization. You can take that time to actually work on the business functions of your application and do what you want to do from a business standpoint and leave the machine learning to do the part that’s really interesting. That’s not very interesting to humans, really interesting to us because of course we’re in the machine learning side of things, and learning about your application and getting smarter and smarter about your application. So that is the end of talking about and showing the demos. I think we’ve got enough time here for some Q&A. Do we have some Q&A questions?

Noah: Yeah, we’ve got quite a few actually and there’s a couple that go hand in hand if we wanted to drop back down to the YAML level.

One is how do we inject the application configuration parameters to automatically change them during the experiments? Another one is a corollary about the minimum and maximum parameters, are those chosen automatically when this is created?

Brad: Great question. Yep, let me jump back out of here to get back into my demo.

So for these parameters in an experiment… So number one, we’re doing normal Kubernetes patching sorts of behaviors. So as you’re dealing with things like your application and you’re patching various resources, it uses customize under the covers. So this is the same thing in Kubernetes. If you actually ctl edit any kind of object and it’s an object that is changeable, when you make that save and then basically once you make the save, under the covers what Kubernetes does is actually injects the things that you went into the file and changed. So if I was doing that manually, I would be changing numbers that were here. You know maybe one in five or something like that. This is how we change it because this method of changing it is what Kubernetes actually uses. It uses customize under the cover and that’s how we inject those. So anything that can be done in Kubernetes manually or with programmatic methods via APIs, this is the kind of thing that we can change. What was the other part of the question?

Noah: The minimums and maximums. Are those generated automatically?

Brad: We generate them automatically. When you do the red sky ctl run, we’ll take a look at the parameters there and then give you a space that’s wider than that. Otherwise you have control of what it is that you want to do in the minimums and maximums as you design your app in your application YAML or in your experiment. You can tell it what your requested mins and maxes are. You also get to do things, like for metrics, if I’m testing performance of something and I’m trying to drive throughput, you can also say minimum throughput for me is a hundred thousand, then we won’t explore things that don’t make a hundred thousand. It allows us to explore the space that you really want explored, rather than showing you hey we can save you a lot of money down here at ten thousand. It doesn’t meet your SLA.

Noah: So related to that, someone asked, based on those configurations, when we make those changes, is this applied as a rolling upgrade to the cluster itself, or is this something that has to be done manually?

Brad: No, that’s done automatically. The entire experiment life cycle is done automatically and so it’s an upgrade. How it happens, whether it’s a rolling upgrade or it’s all at once, that’s based on the design of your application. So whatever scheme or strategy you have for your application in those files, whatever that is. So all at once or rolling updates, it respects those same things. Ultimately in the testing cycle it waits to make sure that it all happens and that the application has stabilized based on those changes before we start firing load at it.

Noah: Okay, so we have another question here that I think leads into something else that we didn’t cover yet. It was, do we have a sandbox environment to try the software? So I think talking a little bit about how this gets installed, how it gets used, and what the free tier looks like might be a good thing to cover right now.

Brad: Yeah, so from an installation standpoint, it’s really easy. That red sky ctl command, literally if you type red sky ctl space in it, whatever Kubernetes environment you’re connected to, it installs all the pieces that are necessary for the controller and all the CRDs that are necessary for what we do. So that’s pretty much it and then the next thing you can do is a red sky ctl run. It runs in any Kubernetes cluster, whether it’s on-premise, whether it’s one of the hosted versions, whether it’s one of the opinionated hosted versions, doesn’t really matter as long as it has core Kubernetes functionality, something that’s an actual Kubernetes distro. Then it works just fine in all of those places. So anywhere that you’re running your Kubernetes, you can do this.

The free trial tier allows you to sign up and register. There are restrictions on the number of things that you can run and the number of parameters that you can test at the same time. Of course the more parameters ultimately the better, so we’re giving that as a free tier so you can kick the tires on it, but ultimately if you want to do more things, which is where it gets more interesting, then you have to go up a tier.

Noah: Speaking of the time and complexity, we had another question earlier of how much time would it take to test against something like postgres or [ __ ] and I think that’s probably dependent on the test that’s being run, right?

Brad: It’s dependent on the test that’s being run. So as you’re designing and this is when we talk to our customers and when we’re doing POCs, you want a load test that’s representative of what you’re trying to measure. That’s as short of a load test as you can do because if I’m going to run 150 or 160 of these things it’s a wall clock, right? So 150 times 10 minutes is 1500 minutes, right? Now there is an ability within the design of an experiment to say run some of these in parallel, but you have to have the infrastructure that can do that, and where you don’t make your own applications the noisy neighbor to the application that you’re trying to test the performance of because if you ran four of these at one time and it made your environment really busy, you might actually slow down all four of them, right? If you’re doing four in parallel. So some people have the infrastructure to be able to do that and actually run them. We work with the paints and tolerations that work inside of Kubernetes environment, so you can keep those workloads separated that you wanted to be able to run in parallel, if that’s what you want to be able to do, but generally we’re talking about minutes or hours to be able to run an experiment as long as you keep it down. Now if every one of your experiments trials because the load test is one hour, then of course it’s going to be more wall time, right, to be able to get to that.

Noah: Let’s see what else we have for questions here. Does StormForge work in an air gap environment?

Brad: Oh that’s a good question. So it is on our product roadmap to be able to allow you to do that in an air gap environment. Very close capabilities there and so that is one of the challenges. Most people when they’re talking about air gap environments, they actually still have the ability to connect to the outside world. As long as you can have a one-way outbound connection to our machine learning API, that’s all that’s necessary. So we’ve been in some very, very tight security kinds of environments, but they’re generally allowed to call out to an API that handles encrypted traffic that sort of thing and the data that we’re passing when we’re doing that is simply information about we’re trying these numbers in an experiment and we’re getting these measurements. It doesn’t tell you and there’s no data exposed about system names or anything else. It’s literally information about an experiment that’s running for five or ten minutes, which is probably going to be not the best experiment anyways, so there’s not much you can make out of it other than names of parameters and numbers or strings.

Noah: Time for a couple more questions I think. One is about comparing our solution to other solutions, claiming to reduce cloud costs.

Brad: Yeah so there’s all sorts of cloud costs, right? Number one, just generically cloud cost doing things like instance, reserved instance kind of behaviors, or spot instances, things like that. Those work at a very, very large level, but they don’t get down to the application. So if you’ve got a bunch of inefficient applications and you know you can save a little bit there, but the application itself is actually really important. How it’s behaving, how efficient it is in what you’re doing. So you’re never going to be able to squeeze it all out using those kinds of systems because they can’t actually get to the level of efficiency because they don’t make your applications any more efficient. That’s really where we work is at the application level. The other thing is that those things also don’t tell you things. A lot of tools that run only in production and don’t work in down level environments. They can’t do the what if, right? Part of what the machine learning does is, what if we just totally changed and used a totally different set of numbers for these things, and in a lot of cases this can’t be tested in production because it actually breaks or it’s actually things that don’t scale in a good way in a production environment. So that’s the other piece is getting optimizations that are just the what-ifs that you just can’t do by tweaking in production.

I think that’s probably the last one because I’m sure people are interested in the next steps.

Noah: Yeah, so let’s get on to thanking everyone for all the questions. We have some really good questions. Well, we do have one quick one left that I think we have time for about configuring the experiment budget. Can we do more than 400 in case the user is trying to use StormForge for a perf test run?

Brad: Oh that’s good. Good question. So 400, right now is that… I would like to have a conversation with whoever that is to find out why there’s a need to go higher than that because generally that’s going to be a question that it may be the wrong testing scheme because generally you’re not going to need more than that based on the number of parameters that we support.

I think on screen we’ve got our next webinar, which is a Fireside Chat. See the information there.

Noah: You’ll be able to join us next week. we’ll be having a chat with Stephen Augustus, Head of Open Source at Cisco and the Chair of KubeCon, which many of you just attended. We’ll be talking about open source and culture.

Then we’ve got some follow-ups here. If anybody’s interested in learning more, you can schedule a demo. We’ve got some blog posts and articles to give you information supplementary to everything we just talked about. If you want to get started trying the product for free, there’s free tiers of the StormForge product and the testing product. You can get started using those. You can find us online and on Twitter at @StormForgeio. That’s it. That’s all we’ve got for today. Thank you all for coming. Thank you Brad for that fantastic demo. Everyone have a wonderful day!

Automating Cloud Efficiency with StormForge Application Performance Optimization

Latest Resources

Seeing is Believing