Blog
By Patrick Tavares | Jan 04, 2022
This blog originally appeared on The New Stack.
Earlier in my career as an IT consultant, I was involved in performance testing for a large retailer before the integration of inventory, distribution and fulfillment systems for online channels. As you can imagine, the peak holiday season was critical, both in terms of maximizing revenue and meeting customer expectations for service.
The retailer had implemented a fairly new (at the time) omnichannel order management system and needed to ensure that it could withstand the stress that a high volume of orders would create. Our team needed to test systems, identify areas to optimize and minimize the time it took to do that with speed and urgency as if the business depended on it, because it did.
Fast forward to now, and the challenge we faced then still exists today: How to ensure optimal performance and resource utilization without introducing duplication of efforts, inefficiencies, costs — and business risk.
Or, to put it simply, how can we learn about our critical business systems better so that we can ensure confident system operation and risk forecasting?
We have two primary ways of acquiring knowledge: through observation or through experimentation. Either we observe how our systems are behaving, look at the data for clues and hopefully come to accurate conclusions, or we experiment in a controlled environment with a full ability to manipulate input values, make note of output values and draw more confident conclusions to get meaningful outcomes.
With each method, the end game is the same: We want to be sure of our conclusions so we can make informed decisions based on what we feel is accurately acquired knowledge. However, unlike experimentation, an observational approach to learning increases your chances of not only being wrong, but of also worsening the situation you set out to improve in the first place.
The process is at the mercy of the individual, their skill set and their ability to draw the correct conclusions. Observational learning is not only not scalable, it’s also inherently risky and can waste valuable time trying to get it right. On the other hand, controlled experimentation enables you to learn quickly and cost-effectively without introducing personal bias.
As IT professionals, we participate in both types of learning during our day-to-day activities. For example:
Each of these activities creates an opportunity to add to an organization’s collective knowledge base, but not all of them lead to undisputed outcomes, or at least not as quickly as you’d like or with low stress. The key is determining the most effective approach to gaining knowledge and then using that learning to inform actions to improve the accuracy in the conclusions that result from the activity.
Although the processes and tools differ, both observational and experimental research are established and accepted ways to gain knowledge about the world around us.
For IT teams, an experimental research approach greatly increases the likelihood of learning under preferred conditions. Choosing this approach gives you the advantages of a scientific approach to IT challenges for proactive, formal experiments with full control of variables. While reactive observational research situations are unavoidable, they can and should be used to form a hypothesis for future experimental research.
Part of the challenge for IT in the past was that the tools didn’t exist to help them easily implement and adopt an experimental research approach for IT infrastructure. The sheer number of variables and the manual effort required to effectively manipulate configurations didn’t allow teams to scale this approach. The addition of cloud platforms exponentially increases the number of variables in play, making it nearly impossible and too costly to apply experimental research to modern infrastructure. Instead, IT teams overprovision resources to ensure performance and capacity goals are met even if that results in skyrocketing costs.
Automation and new tools (such as our own here at StormForge) have changed the game. These solutions help IT proactively acquire knowledge to drive outcomes such as ensuring efficiency and intelligent business trade-offs between cost and performance without time-consuming, ineffective trial and error. With the addition of automation, rather than manual operation, multiple variables can be manipulated at once rather than one at a time. Machine learning can also be used to sidestep logical fallacies and establish strong correlation/causation links.
Observation is still imperative for IT, and observability tools play a vital role in the collection of data. But tools like ours go beyond troubleshooting to enabling experiments that empower engineers to proactively make smart resource decisions that minimize the cost of running applications, and the time and effort spent making those decisions, while ensuring business goals are met. Now IT can dramatically reduce the time it takes to get answers and do it before problems arise. It doesn’t take a scientist to see the value that brings to the IT equation, and to the business as a whole.
We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.