This blog originally appeared on The New Stack.

In industries characterized by fierce competition and escalating customer demands, velocity has become a key differentiator. With the ability to support the rapid development and deployment of applications, cloud has emerged as the holy grail to achieve this velocity – easy, on-demand capacity that can scale with a business, all in an OpEx model. Perfect! Public, private and hybrid cloud use soared, and containers and orchestration platforms, specifically Kubernetes, found their place in the development process. The global pandemic only served to accelerate cloud, container, and Kubernetes adoption as companies turned to off-premises solutions to evolve operations, support a new way of working, and enhance business resiliency.

And then reality set in.

The run to the cloud and, specifically, Kubernetes, resulted in both system and organizational complexity that was not well understood on multiple fronts. Kubernetes introduced unexpected and unwelcome challenges, with one study finding that 94% of organizations adopting Kubernetes say it’s a source of pain for their organizations.

The big learning as organizations struggle to operationalize Kubernetes? Velocity can create friction in the form of high cloud costs, and those added costs can actually slow momentum.

But, expanding cloud costs are just one area of impact because the adoption of Kubernetes, and the complexity that results, also creates new burdens for the people that have to run it. It begs the question: Are you willing to trade agility for long-term profitability and the risk of operational burnout?

Attempts to simplify complexity didn’t fix the real problem

In an effort to address systems complexity, development teams adopted purpose-built observability platforms to make sense of the relationship between the components that make up an application (both software and hardware) along with how they serve the end user (how well the application works or doesn’t work). Unfortunately, reactive, performance-only focused observability platforms don’t solve the problem, they merely identify it. Then what?

So, organizations pursued other ways to try to solve the issues, which gave rise to CloudOps and Build/Run teams charged with making sense of the complexity resulting from the application moves and builds that need to be migrated to cloud. CloudOps organizations brought together people, processes and tools to focus specifically on how the cloud model impacts all areas of IT and the business. The goal of Build/Run is to give development teams responsibility for the day-to-day performance of applications and services, and empower developers to focus on products over projects – basically, you build it, you run it.

At the same time, organizations are beginning to implement FinOps frameworks to bring together cross functional stakeholders. This pseudo app-level steering committee is designed to add financial accountability to the variable spend model of cloud.

That’s a lot of time, people, and processes put against the problem. And yet it still exists today.

We moved fast and broke things…now what?

While a once popular mantra urged software developers to move fast and break things, today the reality is that things are broken. And they need to be fixed in a way that allows developers to push the velocity envelope while also ensuring that we don’t get caught in the trap of breaking the same things over and over again.

What was once a panacea for organizations that saw it as the holy grail of agility and speed – cloud – has become a source of uncontrolled costs and management complexity. This creates a problematic situation in two ways:

First, escalating cloud costs begin to erode margins by adding to the total cost of revenue (COR) or cost of goods sold (COGS), and
Second, as development teams are told to reduce cloud costs, they don’t know how to balance those cuts with the impact on SLAs promised to the business.

When the cost of cloud takes over the business value it was designed to create, it results in what Sarah Wang and Martin Casado of Andreesen Horowitz call the cloud paradox: You’re crazy if you don’t start in the cloud; you’re crazy if you stay on it.

AI and ML: How a new class of tooling does help

Fortunately, technologies like machine learning (ML) are making their way into the process and enabling improvements in the ability to optimize the trade-offs between performance and cost. Leveraging this technology and a new class of tooling, development teams can do what no individual human could – exhaustively understand and tune all of the variables available to ensure that performance and cost are optimized for each application.

Artificial intelligence (AI) and machine learning have become integral to supporting the velocity mandate. Naturally, these new tools are making their way into the deployment process. As a result, organizations are starting to develop practices to manage the adoption and integration of AI and ML tools. AIOps tools now empower Ops teams to automate and improve operations by leveraging analytics and machine learning.

At the same time, DevSecOps works to automate the process of integrating security into all phases of software development. Finally, continuous optimization has found its place in the CI/CD pipeline between continuous integration and continuous development, and is using machine learning to optimize Kubernetes configurations prior to launching into production. This continuous optimization is key to addressing the bottlenecks that can slow down application delivery by identifying issues that need to be addressed and finding an ideal solution – this is where ML is far superior to human cognition.

By implementing these capabilities into the CI/CD process, developer and operations teams now have tools to combat those surprise cloud bills that slow down so many organizations and accelerate their transition to cloud.

StormForge optimizes for today’s cloud-first – and app-centric – development

Purpose built for Kubernetes, StormForge is the only optimization solution that helps you proactively ensure efficiency and intelligent business trade-offs between cost and performance without time-consuming, ineffective trial-and-error.

Using ML, StormForge automates the discovery of optimal application configurations before deployment. Performance testing generates the load, tunes the application to meet the load, and then creates the ideal configuration for Kubernetes to deploy those apps into production. This helps engineers save time without increasing the cost of running applications or impacting app performance and reliability.

With the ability to recommend configurations, StormForge minimizes wasted resources, giving developers the power to make decisions based upon business goals.

Get Started with StormForge

Try StormForge Optimize Live for FREE to start optimizing your Kubernetes environment now. Sign up for a free trial or play around in our sandbox environment.

Life is complex enough: How to keep Kubernetes complexity from stalling your cloud momentum

Attempts to simplify complexity didn’t fix the real problem

We moved fast and broke things…now what?

AI and ML: How a new class of tooling does help

StormForge optimizes for today’s cloud-first – and app-centric – development

Get Started with StormForge

Latest Posts