On-Demand

Learning from the Past

Jonathan Phillips: So first off, welcome everyone to today’s webinar, “Learning from the Past: The Parallels of Virtual Machines and Kubernetes.”

So today’s webinar will be different from most that you’ve attended recently. The goal is to interact and discuss your journey to cloud native deployment, the parallels of VMs and Kubernetes, along with new machine learning driven technology to significantly improve application performance and lower overall cloud spend. 

In order to keep today’s session interactive, we have a couple of polls, as well as dedicated time at the end for questions and answers. We will also be doing $100 Amazon gift card drawing for all attendees at the end and the winner will be announced after the webinar.

If there are any questions, please add those to the Q&A within the Zoom.

All of the slides will be provided, along with this recording after the webinar and sent out directly to all of the attendees that have signed up.

So let’s get started. Today’s presenters will be myself, as well as my colleague Erwin Daria, Principal Sales Engineer, here at StormForge. Quick introductions of both myself and Erwin. I’m the enterprise account executive here, covering the Northeast territory over 15 years experience in SaaS cloud cost optimization as well as security software, working at companies like Cloud Health Technologies, IBM, and VMware. I’m also based in Boston, which is our company headquarters as well, and I’ll pass it over to Erwin for intros.

Erwin Daria: Thanks JP. So like Jon said, my name is Edwin Daria, Principal Sales Engineer here at StormForge. My background is in IT, so a lot of the concepts we’re talking about are really relevant from my previous experience. 24 years now makes me feel old. Mainly in financial services, biotech, and then in the vendor side. Places like Juniper Networks and Tintri. I’m based in the Bay Area, California. I’m really excited to be here on this webinar.

Jonathan Phillips: Excellent, thanks Erwin, and it’s our pleasure to really have this open discussion today, even though it is in webinar format, we will try to make sure that everybody has input or wants to provide input can do so in the Q&A section. If there’s anything that comes up throughout, please throw your questions into the Q&A section of the Zoom. So now I’ll pass it over to you to get started. Let’s go.

Erwin Daria: Alright, thanks JP. So like Jon said, you know, we want to start this off kind of with a little bit of a laugh right, so the title of this webinar is really about the parallels between virtual machines and Kubernetes and how we can potentially not only learn from the problems of the past, but apply some new technologies to kind of prevent those problems from repeating themselves.

But at the core of this problem is this notion of abstraction, and this is a funny way of representing it. But I think this true, right? Abstraction, starting from virtualization and leading to some of the newer concepts we’re going to talk about today, is the cause and solution of so many of our problems.

So, if we take a look at this chart really quickly, I think, just to set the level set. You know the some of the concepts we’re going to talk about we start with bare metal like this is where I started my it career, you know, and then, as we move towards virtualization cloud adoption containerization microservices and eventually serverless frameworks, you know that complexity continues to increase, and as we continue to decompose these applications into their modular eyes parts, not only does it make it super complex, it also makes it really hard to kind of understand where all the various important pieces are, how they’re being used, what their cost is, especially in a cloud model, and we know this to be true, right. When we started with virtualization, we saw things like over-provisioning, sprawl, not understanding exactly where our applications were homed on the infrastructure, but also what were the various components that make up that application. That’s where things like APM started coming to play.

And then those problems continue to get worse, so now, you know, whereas in virtualization back when I started in the early and mid 2000s, you had a physical boundary to how much compute, how much storage, how much networking, you had because there’s within the data center maybe a couple different colo, maybe a few different regions.

But the cloud really expands and it really explodes that problem, and you know we’re going to make the case, we’re talking about how things like machine learning, and maybe some adaptations from the industry, will help us kind of get our hands around it, maybe understand better how those applications are using the infrastructure, then improve not only the cost function, not only the performance function of these applications, but also improve kind of the well being of the teams that are responsible for making sure this stuff keeps running.

Jonathan Phillips: Yeah, anyone here is familiar with the CNCF landscape, the complexity around just the additional software and tooling to support this cloud native development can be tricky all in itself, right, and in dealing with the complexity around all of the supporting capabilities and functionality continues to drive that complexity higher, right? As we evolve and Erwin will talk a lot more about utilizing the tools today that are in place to not only help kind of drive this type of accelerated development, but more importantly, you know what ultimately can we learn from the early days of public cloud adoption to where we are today. 

So Erwin, I think this is a good slide that transitions nicely.

Erwin Daria: Yeah so again, this is the problem statement, right? So very simply, we talked about how increasing abstraction leads to sprawl and over provisioning, which requires us to know it’s not only like amend our tooling right? We have to have better tools. More of them like JP said.

But we also have to change some of the methodologies that we have applied historically to how we interface with our line of business or business partners.

On the business side right, how do we reconstruct our infrastructure and developer teams? Obviously we’ve seen things like DevOps and we’re even seeing additional kind of adaptation. And we’ll talk about it a little bit in later slides, but we’re really seeing this shift in how we do business, and how technology continues to service the business.

But the funny thing is, as many tools as we’ve added to the tool chain, and there’s many kind of new services we try to implement to kind of wrap our minds, wrap our hands around some of the sprawl and over provisioning, I think it’s still pretty impressive well, maybe impressive is the wrong word to use, it’s still really surprising that gap. Our customers still tell us that you know they still see a tremendous amount of waste, right? And waste is is kind of the thing that we see from a billing perspective. It’s a thing that finance kind of continually on us about with we’re on the in the IT space, right. This waste is really pervasive and it’s really obvious, but then there’s also additional places where we see a lot of waste, right? A lot of operational overhead, people that are wasting their time, more importantly than even the costs, because this kind of over provisioning of our people leads to all kinds of other issues.

Jonathan Phillips: Yeah, and that’s a great point, right? Before we get into talking about burnout, I think it’s important to note that the mandates put on developers and Dev teams to get these applications up and running, is immense. The SLAs that are tied to the types of programs can really create stress amongst the organization. In order to get these applications stood up, stabilized, and ready for production. The challenge then becomes, okay, how do we make sure that we’re doing it in the most effective way possible hitting our SLAs? But ultimately identifying when we’re over provisioning or getting to a point where we know that we’ve hit a point that ultimately we can’t go back from. In order to meet those SLAs, to get those applications set up, and that’s that’s a huge cause of stress and, ultimately, resulting in this type of burnout that we’re seeing.

Erwin Daria: Yeah, that’s right. I think many… I’m guessing that many people on this webinar have been that escalation point. These are things that as the infrastructure continues to get more complex, as we get more tools to kind of, again, wrestle with that complexity and, as we have more pressure from the organization to continue to chase this velocity you know the linchpin to this entire thing is the people that run the applications.

Jonathan Phillips: Exactly, and even when you think you’ve got everything as it should be or ready for production, there’s going to be times, where you know you will, whether it be code based or resource based failures that occur. You know you never want to be caught in a situation where, ultimately, you have to backtrack and what we’re seeing today is a lot of this manual process of going back and fine tuning the application prior to deployment.

In which case, it hinders that development, as well as the velocity that you’re chasing in order to get these applications out.

So I’m sure some of us probably have this look on a daily basis, where you’re hitting these alerts that ultimately are stifling the progress. So, Erwin, I know you wanted to raise this poll, I’ll actually pull it up for everybody to go ahead and vote on.

Erwin Daria: You guys should see a poll pop up on the Zoom platform, and if you wouldn’t mind take a second and then vote… don’t be shy. We’re not going to call anybody out. We’re not going to report anybody to their boss.

Jonathan Phillips: Yeah, this is all done anonymously, and seeing a lot of “both” popping up already, which is to be expected, and when we go through the Q&A section at the end, we’d love to get a little bit more input on these polls. I’ll end the poll now.

Erwin Daria: So this is pretty universal, right? Like this is everybody’s, you know, it’s either burnout or both, right? So I think burnout is a very kind of pervasive thing that we see in IT. It’s been there for a long time, so this is not new, right? I mean I think what my experience has been it is kind of an accelerator for business and then ultimately that all this stuff kind of rolls downhill. I won’t use the full saying there.

Stuff rolls downhill, and it hits the people that are in charge of making sure that those applications are running and running, you know, performantly and you know with their appropriate SLAs, SLIs, SLOs, whatever those might be. But ultimately, these are people, right, and we give them more tools, right? We try to find time to implement the right tools within the tool set, but again, so many of these applications do rely on a small number of subject matter experts who end up kind of being leashed to the escalation process, right? If applications don’t work well, then obviously they’re getting interrupted on the weekends at nights and things like that, and so the knee jerk reaction, and JP, I’d love to hear if this is pretty consistent in your experience as well on the vendor side. Your knee jerk reaction is to over provision everything, right? We over provision everything so that we know that it’s not going to go down, right.

And then that wrestling match between us and finance about cutting costs, you know what’s at stake is our ability, if I’m in the escalation hierarchy, then the what’s at stake is my ability to go home and enjoy the other parts of my life that are not work.

Jonathan Phillips: Exactly, and we’ll talk a little bit about some success cases and case studies that we will share towards the end with current customers and vendors today that were dealing with the same challenges, when it came to manually tuning and getting these applications to production and hitting a roadblock when it came to actually identifying where over provisioning was was occurring, and then ultimately how long it took for them to get the application stabilized and consistent with what their expectations were from a cost standpoint. So we’ll definitely talk more about that in future slides.

So what can be done about it? Erwin, I’m looking at you. What can be done?

Erwin Daria: Well, I think globally, or at a super high level, we’re seeing a couple of I think huge shifts in the way that we do business.

Partly in some of the technologies that we implement within that tool set, as well as how we structure organizations or how we structure cross-functional teams to kind of draw a finer point around how we can save whatever it is operational costs or cloud costs, whatever that might be. So the first one up, I’m using this as a straw man for machine learning, but you know I think we talked about it, as this kind of global category AIOps. This is the notion that if you instrument all of your applications and infrastructure appropriately, all that log data can be collected in a singular place.

I’m skipping a whole lot of steps here, but basically you’ve got an AI or some level of machine learning that can infer specific events or conditions from that pool of data.

Then that inference can be you know, maybe shipped to a human in the loop validation where a human being can say yes, this AI has found a significant event and then after that there’s a remediation plan, right? So either you engage a service desk where you can escalate to the appropriate resource and find that problem, and if you do that over and over again, the promise is that eventually the AI can find either the right scripts or resolve the issue, maybe find the exact right escalation point to solve the issue, or maybe even preemptively report on condition so that you don’t run into the issue again, right? Now the problem that I have here with this particular thing is that anybody that’s familiar with AI, you need a lot of data to build those inference models, which can have like an impact on how long it takes for you to see value.

The other thing is that this is happening, generally it’s well understood, at least from my perspective, that AIOps is really meant to catch all these things in production, right? Because production is where you have all the instrumentation etc. and so the issue that I have is that you’re not implementing kind of the most intelligent part of your solution if you can even reach this until it’s already customer impacting. I think that there’s an opportunity, I think this what we believe here in StormForge of using machine learning pre production, and we’ll talk about how that works architecturally in later slides, but you know, we believe that if you use machine learning in a pre production environment that you can preempt a lot of the conditions that might flag, a system like this. 

Then we also have to kind of give a nod to the fact that AIOps platforms are very domain specific today, so what I just described over the last five minutes is like, you know, cross domain AIOps. That’s still kind of a vision that I don’t think anybody’s really approached, so you’ll see things like storage specific AIOps from companies like HP, or Tintri, or Dell EMC where you know basically you’ve got an API that’s custom tuned and look at just storage events and then autonomously react to them, but there’s not something that you can do kind of across the entire stack just yet.

The next thing that is kind of interesting is this notion of FinOps or financial ops, and I think FinOps is super important. I think we do need to have cross-functional stakeholders kind of integrated into some kind of system where we’re all communicating, we all understand what the stakes are, what the trade offs are between kind of either making changes on the infrastructure, making changes to application configurations, and we should have an entire kind of formal reporting chain around making sure everybody knows how we’re making these changes, and what that financial impact is as well as what the performance or customer facing impact might be. 

But again, just like we saw kind of in the 1990s, with ITIL and all of these various people process and tools methodologies, the results are going to vary quite a bit depending on how mature the organization is how committed, they are to making these changes. And then I think, yeah I don’t mean to sound sarcastic, but anytime you have complexity, I question whether having more people involved is actually a better solution. I don’t know what, JP, if you’ve got another kind of perspective on that, but people tend to be bottlenecks, is what I’m saying.

Jonathan Phillips: So I totally agree, and you know, remembering back to the early days of public cloud adoption, where the promise was this consumption based model would allow you not only the ability to be elastic, but also significantly reduce overhead, for purchasing hardware, right? It was meant to allow organizations to scale their application development in a way that was way more cost effective, but, more importantly, giving their developers, kind of free rein to utilize all of this power at their fingertips, right? What ended up happening was the developers just decided, okay, well I’m going to need a M34 extra large for for this virtual machine and this application, because you know that’s what it’s telling me, right? Over time, we saw the cost just start to absolutely explode.

For organizations that had done either a traditional lift and shift or had slowly started migrating and not realizing that there was sprawl being created, or wasted, or under not even under utilized, but unused infrastructure and that was on the clock continuously driving up costs right? So the need for FinOps is certainly there, it should be utilized in a way that organizations have visibility into trended cost analysis and understanding why you know their cost is increasing, or maybe you know, ultimately, you know what’s driving the cost, right? But the ability to proactively go in and start to deploy these types of applications, with the best types of configurations that are tuned specifically for cost allows the developers to get ahead of that, right? I know Erwin you’ll be talking a little bit more about that and StormForge as we go on.

Erwin Daria: So we got another poll coming up, right? So extensively everybody has migrated to the cloud, because one of the big kind of banner things was it’s going to save us money like JP has already described, right? We’re getting some good feedback in our action here. We’re going to keep going. It’s the same few folks it looks like there are voting, but would love… don’t be shy guys.

Jonathan Phillips: Yeah, this is an interesting one, right? I know for most organizations that I’m talking to lowering cloud spend is at least number one or at the top of the priority list.

And so it’s interesting to see some of the feedback that we’re getting here today, so I’m going to end the poll and share the results.

Erwin Daria: We got a pretty even split here, so I’m going to add a wild card because my experience has been that while cloud adoption can save from that traditional Capex model and move to that utility model, I don’t know any IT teams that are smaller than they were prior to cloud.

Generally, you need a different set of skills, the velocity that’s kind of achieved means that you have more bandwidth or developers require more bandwidth from maybe a Dev OPS engineering set or it operations folks and a lot of times.

If you don’t have the built in expertise and you can’t hire that in, you’ve got consulting and professional services engagements that are part of it as well, and I know that that’s been universal over time.

But my experience has been that as the infrastructure grows, including all of these various cloud service providers and maybe you’ve got some legacy data center stuff or colo stuff as well, that the operational cost goes up as well, even exponentially in some cases.

Jonathan Phillips: Yeah, and just to kind of piggyback on that, there’s not only the challenge of getting an application out to production, but a lot of the folks I’m talking to are relying on these out of the box configurations that are really just meant to be recommendations.

And it’s not going to be cookie cutter for every organization, especially depending on what’s included in the namespace and how complex the application is.

So that’s where we start to see this reliability on what these recommendations are providing and ultimately there’s just a lot of additional manual tuning that will have to come into play once you start standing up the application and moving into production. So we’ll talk more about that as well.

So it’s actually a great introduction to StormForge.

Erwin Daria: Like the slide deck was designed well.

Jonathan Phillips: I know. It’s almost like we prepared for this.

Erwin Daria: I know, I know. So awesome. So let’s talk about StormForge because I think that’s where we’ve gotten a number of questions in the Q&A. We will try to hit all of those questions as part of the next few slides.

Real quick caveat here, we’re not going to do a demo of StormForge. We’re going to show you a high level conceptually how all of this stuff works and we’re going to show you how that impacts our customers. What I would say is, at the end of this you’re going to see a QR code after the Q&A where you can go to a next steps page and in those next steps you can request a demo will be happy to set up a Zoom when we can actually do a 1:1 type demo where you can ask some relevant questions, then have something that’s a little bit more tailored to you, your organization, your particular use case etc. 

So moving on, I kind of want to set the stage with kind of, you know, we’ve been talking about this problem of over provisioning because we don’t know what the application is going to use we don’t want to get escalated to, and we want to just make sure that the application is going to run, right? So today we’re talking about Kubernetes again, the parallels between VMs and Kubernetes. 

So here’s a really super high level model of kind of generally how we’ve seen our customers work. So you’ve got an application deployed generally in prod. It’s fully instrumented it is being reported into an APM, there are several dashboards, those dashboards may have some alerting threshold set up for them.

And then those get escalated to a team of people, those people go through, and they take a look at the metrics that they’re being reported by the APM. They make a series of manual changes to either a manifest or some type of config map, or whatever that might be, and then at the end of the next sprint, they may push to prod. That’s a manual process, and when we talk about AIOps, I was talking about AIOps in the last section, it’s really this notion that we can replace the group of people in the middle with an AI, right?

There’s a lot to be said about you know AGI, what decisions get made, what thresholds are kind of set for an AI to make decisions, etc.

And this is where we’ve decided to approach things a little bit differently. We want to use real machine learning methods and techniques, but we want to apply them in a different way, so that we can avoid kind of this loop of people chasing down issues in prod.

So the way the StormForge platform works is you’ll notice that we’ve got a colored production Kubernetes icon on the right. We’ve got a grey one on the left.

And this is where you know we start talking about shift left optimization. So what we do here in the StormForge platform was we… in a downstream environment, we can scan an application, we can generate load against that application. As an application behaves and is instrumented by the APM, it will report a series of metrics to the APM, or StormForge platform will read those metrics. It will make a series of recommendations, actually not recommendations, it will create a series of different sets of parameters, and parameters can be anything from CPU and memory reservations and limits for the positive created as part of the manifest. It can be things like JVM parameters, so things like heap size or heap parameters or garbage collection parameters, and then as the machine learning modulates the settings on the manifest, StormForge will then re-kick off the deployment of the application, regenerate load, remember the metrics from the APM, and do that, over and over again. 

Oh we call that entire cycle a trial, and then a series of trials become an experiment, and those experiments are meant to find the optimal configuration for an application given the metrics that you care about. So you set the metrics and you set the threshold for the metrics. So in the next slide, essentially what you’ll be provided with is the results of that experiment, and here you see a scatter chart, we can show you more in a demo. 

The scatter chart represents each of the configurations. Each of these dots is meant to represent a configuration that was tested by the StormForge platform.

And the X and y axis in this particular case, it might be a little small for everyone to see, are performance as a matter of duration, the duration for a transaction that gets created in the application, and then the cost of that application.

And so the blue dot is the baseline. So we figured that the infrastructure team or the DevOps team has created this manifest how they’re going to deploy your application and Kubernetes and then these are the baseline settings that were either entered by somebody, or you just a default were set.

Now, as an end user of the StormForge platform, you can select a specific configuration and I think there’s an animation here.

And that configuration will be the ideal configuration. On this particular case, I’m choosing the exact elbow of what we call the Pareto front. That Pareto front, that line of peach colored dots, represents how the machine learning has found optimal configurations on both axes and the one that I’ve chosen is the one that is the most left and the most bottom of that curve. That’s the elbow, that is the most efficient that you can get on both cost and duration without sacrificing one or the other. Now the thing that we really push for our customers is that when we want to do optimization, it’s not a zero sum game. We’re not trying to find the bottom of the pit for some of these things. Many times our customers use our platform to figure out where the trade offs are, right?

Can I squeeze a little more performance and how much does that cost? Can I squeeze more cost out of it, and what is the impact on performance? Now once you’ve identified the ideal configuration, you can integrate the results of this and not in the actual output… [inaudible]

There’s one more transfer yeah…

Now I’ve gone over a lot of different things, right? We talked about the risk, again super high level. This notion that we can apply machine learning in a pre prod environment, so that we can work out all the kinks in the application configuration before it hits prod, right? We can design testing that is representative of what you’re doing and probably will work with you to design that stuff. We can use your performance testing if you’ve got something that you’re already running in house, and again the key benefit is that the machine learning is replacing people and it’s iterating without having to worry about impacting end user customers. 

Now the benefits are real benefits, and JP, would you mind, maybe covering some of our current customers and how they’re experiencing the benefits of our platform?

Jonathan Phillips: Yeah, definitely. So you know you talk about this shift left in optimization and ultimately, what we’re achieving with our customers today is building continuous optimization into a CI/CD pipeline. So organizations that already have CI/CD set up and are stifled when it comes to this release in deploy section, we have the ability to incorporate StormForge into both the build and the release section of the CI/CD in order to proactively identify what would be the optimal configurations prior to deploying in production. So for customers today that are using StormForge, in this example it’s a large online travel company that is extremely advanced in their cloud native journey. They’re probably about four to five years already into their deployment of Kubernetes in production. They had spent so much manual time and processes around manually tuning these applications prior to moving them to production.

The current load tests were complex and just ultimately required a lot of extensive setup. Products like JMeter or Load Runner, for example, really aren’t built for cloud native development.

So utilizing StormForge Performance, they were able to create these performance or load tests that mimic their production environment or their production traffic and incorporate that into building out these experiment files. The results were staggering. We identified over 50% reduction in cost based on what their current baseline was set at. 

So this was the current configuration, we were able to identify the optimal configs based on what we were tuning. So in this case, it was cost versus latency and through getting the optimal configuration for this one particular application, we were able to distribute that across multiple applications. What we were able to achieve was not only the 50% reduction in cost, but we’re also able to really lower the latency and make sure that we’re able to free up the internal compute resources to enable additional QA and pre prod deployments. 

So, this is a prime example of an organization that had spent years standing up this environment, getting to their CI/CD process that they thought was fully optimized, or at least as best optimized as they could get from a cost standpoint, we were able to identify a 50% reduction across the board.

Erwin Daria: JP real quick, before we move on to the next example, this company, I won’t share the name, but I did look them up, right, this application, this is their main customer facing application, and this for them generates over a billion dollars in revenue a year, right?

And so, if you think about the model of what that revenue looks like, even if it stays flat, but you’re taking advantage of a 50% cost savings on essentially the cost of revenue, the cogs of the Platform, that’s pretty substantial.

Jonathan Phillips: Exactly.

Erwin Daria: We’re not talking small number.

Jonathan Phillips: Yeah, it’s a very good point, right, and this is their main customer facing application.

So the ability for us to actually run these experiments pre production was incredibly important, right? They definitely didn’t want to be able to optimize in production, they wanted to do it prior right, so it was a great use case for us and ultimately turned into a premier customer for StormForge.

The next example is a SaaS platform provider that was looking to build continuous optimization and automation into their CI/CD and ultimately support thousands of customer sites utilizing StormForge. So the end results were us building out these experiment files that allowed us to really dial in what would be the best configurations for both cost and latency. We identified the limits on latency improvements via the config changes and in the overall proof of concept, we were able to identify a return on investment of $3 million annually, of ROI based on the applications that we scoped for the proof of concept. 

Now today we’re working with them in production and in the sense that they’re using our product, the enterprise version, in order to distribute this across the, I believe it’s upwards of 5000 customer facing sites might be more than that today. Already in the first three months we’ve seen that 50% reduction in costs across that entire environment, so obviously a significant amount of ROI that we’re providing. But more importantly, getting their team to eliminate the time wasted going back in and manually tuning these applications one off.

Erwin Daria: Real quick, sorry JP. I mean you’ve given us two great examples. I think we’ve got one more left, but you know we talked about the amount of time that gets spent by people in this really manual process. Give us an idea of like how big are these teams, how much time have you spent, and then what is the contrast, I think, on how quickly our machine learning can find those same answers, or even better, improved outcomes versus a manual process.

Jonathan Phillips: Yeah in this particular example, it was a team of about 13 developers and there were also SREs that were involved and the time that it took them to get to the baseline that you see here was over a year and a half, right. What took them over a year and a half to get to we were able to improve on within a matter of four days, our trial runs for 14 days, within the first four days, we were able to identify a cost savings of 60% for one of their applications. 

As we expanded it across and we started to narrow the scope on some of the more I guess troublesome applications, the ones that they knew were over provisioned, we were able to get that achieved within the first week. So you know, the time for us to build out these experiment files, get the trial up and running, is actually a matter of days and, within that first week we’ll be able to show some really good output. 

Then the second week dedicated to refining the experiments adding more parameters, or maybe just switching around metrics beyond just latency and cost, we can look at throughput versus overall performance.

So I know with with time coming up, I wanted to make sure we addressed the questions and provide some answers as well, so Erwin, I’ll direct these towards you, these are coming directly from our Q&A chat within Zoom. 

So how can StormForge help my organization during downtime and failures, what are the products offered in order to help with that?

Erwin Daria: Yeah, so I think that the core value proposition of StormForge is to use machine learning and pre prod so that when you push it the product you don’t have downtime and failures, right? There’s certain failures, with regard to like no failures, or any other infrastructure itself where Kubernetes, I think, does a relatively decent job and redeploying the various pods and other components, but you know the notion of finding a failure in prod I think is antithetical to the value proposition of StormForge.

We want to do all these things in that kind of shift left methodology that JP has shown, so that we can tease out all those troublesome or problematic configurations, so we don’t have failures in prod.

Now, with the platform itself is made up of two major components, one is the load testing as a service, right, JP had alluded to it earlier, I kind of alluded to it.

In my description of the Platform basically we’ve got a SaaS platform where you can design performance load tests and then you can instantiate those tests in any Amazon region. So if you’ve got a need for kind of hyper specific locality, you can direct that load test from any region of AWS. And as a service obviously you don’t have to worry about provisioning your own infrastructure and then getting that load test up and running. You don’t have to worry about the load test itself being a noisy neighbor on the same cluster where you’re testing.

So that kind of stuff. And then the other component is the machine learning and controller part, right? So we have a controller you deploy directly in the Kubernetes environment that controller gets authenticated against our machine learning service and that’s what creates that link for not only executing those trials and experiments, but also shipping that information back to our machine learning platform for processing and for the next set of of trials that can be run during an experiment.

Jonathan Phillips: Awesome. Next question, how can StormForge integrate with Django plus AWS, and Helm.

Erwin Daria: So that’s an interesting question. I’m not familiar with Django in terms of AWS. We integrate directly into any Kubernetes cluster. It could be EKS, it could be any of the other cloud flavors. Can be your own. We don’t have any kind of pre-requirement or pre-requisite for that. In terms of how we certainly support helm. If you’ve got helm charts that dictate your manifest, or your configurations, or parameters, you can certainly use those to kind of input the parameters into the experiment. I don’t know if there’s anything else?

Jonathan Phillips: Nope, that’s it. Then for larger enterprise customers, what is the average percentage of savings?

Erwin Daria: I can actually be a great one for you.

Jonathan Phillips: Yeah, I can take that on.

So, today, on average, we’re seeing between 50% and 65% annual savings across a current environment right and as Erwin had mentioned, we support any flavor of Kubernetes, whether it’s on prem or if you’re running it on EKS, GKE, or just Kubernetes on EC2, for example. So that typically really depends on the size of the environment for our larger enterprise customers who they’re running upwards of 200 to 300 applications, and that’s where the higher the ROI based on their annual cloud spend, and certainly the complexity of the number of applications within a namespace, as well as the parameters, they’re looking to expose. We find that the more metrics in parameters that we build into these experiment files, the higher the percentage of savings and overall boost in performance we typically see.

And the last question is I’m looking for my product site optimization, what are the features that are offered by StormForge? So maybe I’m not reading that correctly, but…

Erwin Daria: I don’t know if it’s product side or production side…

Jonathan Phillips: I think we mentioned it before though, but it really doesn’t matter, what type of application. We’ve got examples in our Git repository for JVM, Postgres, web applications, Spark applications so… I’ll make sure that we follow up and provide anybody that has specific questions around our Git repository and the examples that are in there. We’ll make sure to share that out. For anyone that is interested in scheduling a demo, signing up for a free trial, we have our QR code here, or you can always email info@stormforge.io and the links will be provided in the shared email that includes not only the recording of today’s session, but also the ability to follow up through either one of these channels. 

So I think that’s it! Thank you so much, Erwin. It’s been a pleasure. Everybody that attended and stayed on a little longer, I know we went about five minutes over. We appreciate your time, and we’ll also make sure to send the winner of the $100 Amazon gift card their information needed for us in order to get that out to them. 

So thank you all, and look forward to the next session Thank you again.

Erwin Daria: Thanks everyone.

Latest Resources

Seeing is Believing

Start getting resizing recommendations minutes from now.

Watch An Install

Free trial includes full version on 1 cluster for 30 days!

We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.