I hit “publish” accidentally , when I meant to press “save draft” for publication in several weeks.. There is still a bit of the supporting evidence hasn’t been blogged or fully researched yet. Until I remove this warning, this is a draft. Sigh. But the conclusions won’t change…
For the last decade at least, we have been frogs in slowly heating water, and have not jumped out, despite at least a few pokes from different directions that all was not well in our Internet pond. Lots of people have noticed individual problems caused by bufferbloat and how to mitigate them. To some extent, we’ve been engineering around this problem without full understanding, by throwing bandwidth at the problem and as gamer routers show. But RAM costs have dropped even faster than network speeds have risen, and rising network speeds encourage yet larger (currently unmanaged) buffers; throwing bandwidth at the problem has been a losing race.
I’ve been losing sleep the last few months. Partially, I’m at an age where for the first time I’ve lost a number of older friends in a short period. And partially, it is very serious worry about the future of the Internet, and that previous attempts to warn of the bufferbloat problem have failed. And partially, I just don’t sleep very well. I’m not quite to the point Bob Metcalfe was in 1995 when he predicted web growth would cause the Internet’s collapse (we came close). But I’m close to such a prediction. That’s seriously bad news.
And no, I’m not going to offer to eat my words, the way Bob did. He looked much too green after consuming his column when I saw him backstage afterwards. I’ve had enough stomach problems recently and weight problems that is unwise. If I do ever make that prediction, I might eat a small piece of cake if I’m wrong. But so far, it’s still just worry and not prediction. But my worries have grown as I discover more.
I base the worry on the following observations:
- the data and discussion in a previous post shows as analyzed and by experts in TCP’s behavior confirm that bufferbloat can/does destroy TCP’s congestion avoidance abilities. And it is the touchstone for all congestion avoidance algorithms in the Internet. A common reaction to new protocol proposals in the IETF that they should be at least as network friendly as TCP.
- Windows XP is finally being mercifully being retired (if still more slowly than anyone including Microsoft would like), and everything else implements TCP window scaling. As it does so, dominant TCP traffic will shift from partially able to saturate most Internet links to always capable of link saturation. This is a fundamental shift in traffic character, affecting not only HTTP, but all TCP based applications.
- There are many, many more link saturating applications already deployed, and many many more Internet services that do so.
- Browsers, by ignoring RFC 2616 and 2068 strictures against use of many TCP connections in the last few years, have been diluting congestion signaling.
- Some major players appear to be reducing/defeating slow start. This is also really bad.
- By papering over problems, we are repeatedly closing off solving problems. A good example is ECN, whose deployment has been delayed by broken home kit, possibly a decade.
- There is a misguided and dangerous belief among almost everyone in the industry that dropping *any* packets is bad. In fact, it is essential (or at least that we mark packets with ECN); the trick here is enough, but not too much.
- Much of the consumer kit (home routers, cable modems, DSL CPE) is never properly maintained. Often, broken firmware is never touched after hardware is shipped, and/or usually requires manual upgrade by customers incapable of it; only recently have consumer devices started to automatically upgrade themselves (sometimes to the detriment of consumers, when features are removed). This is a serious security risk already (I know of a way to wardrive through a major city and take down wireless networks right and left, to give a simple example). But this also means that quickly mitigating bufferbloat will be much harder. Often, even trivial changes of one constant here or there might reduce the magnitude of the problem by a factor of ten; but that option is not available. So scrapping out a lot of gear needs to happen, but to do so costs money. Who pays? Will it happen soon enough? Can/should it be rolled into IPv6 deployment, if that happens?
- Self synchronized systems are common place and time based congestion problems have been observed in the Internet before. Some of the common network technologies have the property of bunching packets together into periodic bursts. My traces show stable oscillations, that may or may not be stable once random loss is put into the system, and I do not know if they would synchronize. This bursty behavior caused by intermediate nodes collecting packets together is well documented (though I don’t have the references handy).
First, some personal history: with Bob Scheifler, I started the X Window System in 1984. It was one of the very first large scale distributed open source projects. Our team was split between MIT in Cambridge, Massachusetts, and Digital’s west coast facilities. At the height of X11’s development, the congestion collapse of NSF net occurred. The path from Digital to MIT became so unusable that we were reduced to setting alarm clocks to 3am to rdist our source back and forth, and at times, when that would not succeed, FedX’ing magnetic tapes to get enough bandwidth (our bandwidth was fine; our goodput zero). Additionally, Nagle’s algorithm had caused us problems with X (which does its own explicit buffering), and TCP_NODELAY was added specifically to help us. I was also the editor of the HTTP specification: one concern we had there was that many TCP connections could self-congest a customer’s line that had minimal buffering (dialup gear often had only one or two packet buffers per dialup line in that era). So I’ve both been directly scarred by, and concerned with application generated Internet congestion and as a network application developer had reasons to become much more familiar than most with its details.
The browser situation is also worrying; but I’ve not seen recent web traffic statistics and so this worry may be a red-herring. What it is doing to latency is not a red-herring: it is doing bad things to the jitter in your home network, as I’ll explain in detail in a future post. While the first decade of browser warfare was mostly features, we now have a healthier situation of browser warfare on both features *and* performance. By using many, many TCP connections (6-15 is now commonplace, whereas the standard asked for no more than two connections), we’ve minimized the amount of congestion signalling going on and maximized the amount of traffic that is in slow start. And I recently caught wind of some major web sites messing with the initial congestion window. I haven’t had time to dig into this yet, so I won’t say more. While the original motivations for rules against HTTP using many connections have clearly lapsed, we may now have others due to bufferbloat. I had hoped that pipelining would enable both highest performance and optimal TCP behavior when I was editor of the HTTP spec and while doing that research: but it is now clear that due to the ugly complexity of the HTTP protocol and the lack of a sequence number in HTTP that those hopes are in vain. Something like spdy is in order. I’d sure like to see the HTTP protocol replaced entirely for the web; personally, I’m most excited by the CCNx project as a long term path forward there, as it enables fundamentally better performance (and would save massive amounts of energy!), but events may force shorter term band-aids. More when I blog again about browsers.
The Internet we had just learned to depend on became utterly unusable to many of us (at least on particular paths); we had just learned to depend on the Internet, and even then, it was scarring. In that era, IIRC, there were only about 100,000 hosts on the Internet, most of which were running Berkeley Unix. It was an era when all the systems were being managed by computer nerds. When Van Jacobson and Mike Karels published patches to the Berkeley TCP/IP stack for slow start and other algorithms after maybe 6 months of serious pain, they were, within weeks, applied to most machines on the Internet, and the situation recovered quickly. When discussing with my friend at Comcast mitigating the cable modem bufferbloat problem, he thought mitigations were probably possible for DOCSIS 3 (which only started shipping last year), possibly possible for DOCSIS 2, and no prayer of mitigating bufferbloat in DOCSIS 1 (but hoped the buffers there may be small enough not to be defeating congestion avoidance). I surmise that this opinion is based on the realities of what firmware still has maintenance teams for those devices. I expect a similar situation in other types of broadband gear.
The home router situation is probably much grimmer, from what I’ve experienced. We have a very large amount of deployed home network kit (hundreds of millions of boxes) much of which is no longer maintained, even for security updates (which is why the home router problem is so painful, and dangerous in my opinion). It seems that within 6 months to a year, the engineers working on that firmware have moved on to new products (and/or new companies), and that kit with serious problems (like that which has inhibited deployment of ECN) never, ever gets fixed.
There may be a way forward to replacing all this antique, unmaintained home kit as IPv6 deploys (if it really does); to deploy IPv6, almost all home routers (and much/most broadband CPE equipment) will be upgraded. These boxes aren’t all that expensive to replace; amortized over time, the ISP’s can easily afford to do so if the customers do not. But I don’t think we want to be in a situation where we have to try to replace them overnight, particularly since it will take a year or two at the minimum at least to engineer, test and qualify bufferbloat solutions. Replacing the old gear might be a concrete step in the war against global warming as well: new gear often (but not always) consumes less power; saving five watts would pay for a new home router in maybe 5 years, at my electric rates (I’m not sure the gear is always consuming less power though…)
Courtesy of malware on operating systems, many, but far from all of user’s operating systems get security updates, so we can have some hope for updating end user operating systems (if we can distribute the updates, that is; it may be hard to auto-update systems on non-functional networks; in the NSFnet collapse, all that had to get through were short source code patches that were applied at the recipients, and email went through in the middle of the night even at the worst of the NSFnet collapse). I worry much less about 3g; those devices are still pretty new, centrally managed, with maintenance teams dedicated to the software and firmware, there is even traffic classification around network control messages in that sort of gear. The phones are new enough they also are getting rapid update, and get replaced quickly.
As I will blog about more completely shortly (I had intended to blog about a number of other topics first, rather than this entry, but what is once published on the Internet cannot be unpublished), Dave Reed was correct when he attempted to draw attention to bad bufferbloat in 3G wireless networks over a year ago. That is a different aspect of large buffers in the aggregate; you can have no packet loss, but very high delays with bufferbloat scattered through a network, when that network becomes congested. The very lack of packet loss means that queue management algorithms such as RED are not enabled. By not signalling congestion, the end-nodes (3g smart-phones) do not slow down transmission, the buffers bloat, and the whole network operates at high latency. These networks stay congested until (possibly) late at night, when their buffers finally drain; you see a daily pattern to their latency behavior, low latency of, say 60ms, when the network is quiet and unloaded, increasing during the day, and then dropping again when load diminishes. I’ve seen up to 6 seconds myself, and Dave has observed up to 30 seconds. In 3g, telephony is also separately provisioned, and so has the same fairness issue as I documented before; so long as there is no QOS services provided to general data service applications and bufferbloat is in that infrastructure, we’ll never make low latency non-carrier VOIP, teleconferencing possible. From one point of view, however, we’re already (from the user’s aspect anyway) seeing congestion collapse on those networks, if not the packet loss form of congestion collapse warned about by Nagle and observed in the NSFnet collapse, which motivated development of TCP’s congestion avoidance algorithms.
We’re clearly suffering from steady-state congestion already; bufferbloat in broadband and 3g have injected pain. But it illustrates another facet to the issue. Mere aggregation of a problem can cause other problems to occur (e.g. diurnal 3g network congestion). Just because we’ve seen and understood one aspect to a problem does not mean we understand all of the consequences.
When I talked to Van Jacobson about why active queue management is not universally enabled (a coming post will discuss that), he pointed out that we must also be concerned about dynamic congestion effects in the network as well; not just the TCP oscillation you see in my traces, but that much of the network gear being built, often in the name of bandwidth optimization, is processing bunches of packets at once and bunching them together, to be re-emitted in bursts. Van wants there to be a way to schedule the transmission of outgoing packets, so that devices could defeat this bunching rather than the bursts traveling through the network aimed at some bottleneck someplace that might not be able to deal with them. It is part of what is good about the “random” in RED. Time based behavior is more subtle to understand, but might be as troublesome as what we already see.
So we have a number of different resonant frequencies in the network; some are timers used in network protocols; some are timers in various network gear, for their own internal implementation. And self-synchronizing behavior in large systems is more than a theoretical possibility in large systems; it has been observed. before Are we a bunch of soldiers marching in cadence on a bridge? Will oscillations form slowly enough we can react? Or will the bridge rapidly fall?
What’s going to happen over the coming five years? I don’t know. I do know by messing with slow start and destroying congestion avoidance in TCP, we’re playing with fire.
So I worry. And these worries make it hard to get back to sleep. Or am I just being an old Internet soldier suffering from post-traumatic stress disorders?
I do want to see a vigorous discussion of these fears; if you can dispel them, I can sleep better.