There is one source of performance problems we’ve been encountering even before we started with StormForge Performance Testing and still see every time: Missing HTTP Keep-Alive. This article is about why this is still a problem and relevant, and why performance testing is an important tool to uncover such issues.
HTTP is with its 24 years a well aged fellow among the web protocols.1 Today we are mostly using HTTP/1.12 or HTTP/2 and if you have fully embraced the new HTTP/2 world in your entire system this article is mostly an anecdote of past issues. But HTTP/1.1 is still alive and kicking for many systems. And even given its age, people are still forgetting about a very important feature that previous versions did not provide: Keep-Alive.3
To clarify, I’m not talking about TCP keep-alive (which is disabled by default). Also I’m not talking about other kinds of keep-alive mechanisms for other protocols, which are equally important to keep an eye on. Today, we will focus on HTTP keep-alive.
HTTP (at least prior to HTTP/2) is a very simple protocol. For a given request to fetch data from a server, the following steps happen (simplified):
The last point is the topic of this article: Don’t close the connection!
HTTP 1.1 learned to re-use an existing connection: If the response was read entirely, a new request could be sent using the existing connection. This happens automatically if both parties understand it. Unless the client sets the Connection: close
request header or the server actively closes the connection, it will be reused for subsequent requests. Sounds like a no-brainer, right?
Try StormForge for FREE, and start optimizing your Kubernetes environment now.
We seem to forget about the fact that there might be an issue with keep-alive. Almost everyone seems to be aware that this concept exists, but few are actively checking that everything is working as expected. You might be surprised how often keep-alive is not configured properly!
The other issue is: Developers and operations people heavily underestimate the impact of doing a DNS lookup, establishing a TCP connection, and making a TLS handshake. Over and over again. For every single HTTP request. Every. Single. Time.
From our experience, we can tell that the overhead will add up very quickly. And it does not make a big difference what kind of system you are building. Even for internal or even local systems, there is usually not really anything to gain from closing the connection. You don’t have to take our word for it – there are many resources out there supporting this.
What we and our customers are observing when running tests with missing keep-alive is slower response times, even for moderate load. If more and more requests take longer to process, more connections stay active so more resources are consumed and blocked. In many cases, systems under tests do not recover until traffic stops.
Here is a quick example I used a while back for a talk at the AWS User Group in Cologne. I used a simple StormForge test case to give you an idea how the TCP reconnects impacts latency (find the test definition at the end of this article). The following image is a latency histogram over all requests made by this test (available in all StormForge reports):
You might have already guessed it: Left is with keep-alive, right is without. Same target, same request, same response.
Yes, this is a simple and a bit artificial example, but not so far from many setups we see our customers are testing. We see a clear bimodal distribution: One maximum where new connections need to be established and the other when an existing connection is being used. The difference is rather significant.
The difference comes from multiple factors:
TIME_WAIT
state.If you want to learn more about TCP, sockets and TIME_WAIT
and how to optimize your servers, check out this great article by Vincent Bernat.
The issue with keep-alive being overlooked is that the impact gets bigger considering some currently trending architectural approaches.
For example, take Server-less or Function-as-a-Service (FaaS)4. With FaaS, you need to be stateless, but an application is usually not really fully stateless. Most of the time you solve this by externalizing state to other components and services. And how do you access the state again? Quite often it is done via HTTP. You should also check out Yan Cui’s article on HTTP keep-alive as an optimization for AWS Lambda.
This especially affects Microservices: HTTP is often selected as the communication protocol of choice.
Again and again we are witnesses when our customers uncover these problems using performance tests and have rather quick wins in terms of latency, stability and general efficiency.
Use HTTP keep-alive. Always.
More importantly don’t just assume it is used, check it. It can easily be tested with curl via curl -v http://example.com
and looking for * Connection #0 to host example.com left intact
at the end of the output. Testing it on a larger scale and especially the impact is also done easily with a performance test using StormForge. Catching a misconfiguration or an unintended configuration change using automated performance testing is even better because you minimize the risk of the potential havoc.
I’ve been using a simple test case to showcase the impact of HTTP keep-alive. We have two scenarios, each weighted 50%. One session does 25 HTTP requests with keep-alive (which is the default with StormForge) and the other one does 25 HTTP requests without keep-alive.
Note that our testapp does HTTP keep-alive by default:
definition.session("keep-alive", function(session) {
// Every clients gets a new environment, so the first
// request cannot reuse an existing connection.
context.get("http://testapp.loadtest.party/", { tag: "no-keep-alive", });
// HTTP Keep-Alive is the default, so for all the following
// requests in this loop, we can reuse the connection.
session.times(26, function(context) {
context.get("http://testapp.loadtest.party/", { tag: "keep-alive" });
context.waitExp(0.5);
});
});
definition.session("no-keep-alive", function(session) {
// Setting the "Connection: close" header, we signal our
// client to close the connection when the transfer has
// finished, regardless if the server offers to keep the
// connection intact.
session.times(25, function(context) {
context.get("http://testapp.loadtest.party/", {
tag: "no-keep-alive",
headers: { Connection: "close", },
});
context.waitExp(0.5);
});
});
Connection: keep-alive
and check if the server responds with the same header. Only then (depending on the implementation) the connection was kept intact after a request.We use cookies to provide you with a better website experience and to analyze the site traffic. Please read our "privacy policy" for more information.