The Next Nightmare is Coming

BitTorrent was NEVER the Performance Nightmare

BitTorrent is a lightning rod on two fronts: it is used to download large files, which the MPAA sees as a nightmare to their business model, and BitTorrent has been a performance nightmare to ISP’s and some users. Bram Cohen has taken infinite grief for BitTorrent over the years, when the end user performance problems are not his fault.

Nor is TCP the performance problem, as Bram Cohen recently flamed about TCP on his blog.

I blogged about this before but several key points seem to have been missed by most: BitTorrent was never the root cause of most of the network speed problems BitTorrent triggered when BitTorrent deployed. The broadband edge of the Internet was already broken when BitTorrent deployed, with vastly too much uncontrolled buffering, which we now call bufferbloat. As my demonstration video shows, even a single simple TCP file copy can cause horrifying speed loss in an overbuffered network. Speed != bandwidth, despite what the ISP’s marketing departments tell you.

But almost anything can induce bufferbloat suffering (filling bloated buffers) too: I can just as easily fill the buffers with UDP or other protocols as with TCP. So long as uncontrolled, single queue devices pervade the broadband edge, we will continue to have problems.

But new nightmares will come….

The Bufferbloat Nightmare

We can set BitTorrent and Ledbat aside here for a bit: they are not the actual problem and solution to our performance problems, and never were. Bufferbloat, and the amazingly stupid edge of the broadband Internet are.

Bufferbloat is fundamentally a different phenomena than Internet congestion we experienced in the 1990’s, something most people, including myself, have not understood well.

The AQM algorithm called RED was invented in the 1990s by Sally Floyd and Van Jacobson to control internet router congestion in router queues, but history has show RED is fundamentally flawed and usually left unused. RED cannot handle the variable bandwidth problem we have in the edge of the Internet. These results were never properly published, for humorous and sad reasons. At best, RED is a tool that may be helpful in large Internet routers until a better algorithm is available.

The article published last week entitled “Controlling Queue Delay” by Kathie Nichols and Van Jacobson in the section “Understanding Queues,” helps greatly in understanding queuing, as well as introducing a novel AQM algorithm called CoDel (“coddle”) to manage buffers which works radically better than RED ever did. A clocked window protocol such as TCP (or others) can end up with a standing queue, adding delay that will never go away, killing speed. That standing queue cannot dissipate, and in fact (since TCP sees no timely loss), it slowly grows over time, as you can see in my original traces, and the delay grows and grows until any size buffer you care to name fills.

Worse yet, TCP’s responsiveness to competing flows is quadratic: 10 times to much buffering means TCP gets out of the way 100 times more slowly. Buffer queues must be managed; that is what an AQM does by signalling the endpoints the buffers are filling. But it is hard to figure out when to signal. No fixed buffer size can ever be correct, particularly in the edge of the Internet.

The fundamental problem is bufferbloat, and the amazingly stupid devices we have in the edge of the internet. These devices and computers typically having but one horribly bloated queue, with no queue management in them. The CoDel algorithm attacks the fundamental problem of standing queues, that RED did not, which required manual tuning. CoDel keeps buffers working the way they should: removing the standing queue, running usually nearly empty, so they can absorb bursts of packets that are inevitable in packet switched networks. The amount of buffering then becomes (probably almost) irrelevant, other than possibly costing money and power. For reasons I’ll cover shortly in another blog post, we think additional measures are also necessary; but the missing piece to solving bufferbloat has been a fully adaptive AQM algorithm that works well. The rest is engineering.

Without some mechanism to signal the endpoints to adjust their speed in the face of buffers filling (either by packet drop or ECN), we’ll continue to have problems with everything (including HTTP) and everything built on top of TCP. CoDel, we believe, is the tool here. Knowing a queue is filling excessively and managing it is a fundamental improvement over trying to inferring queue filling from delay.

Running CoDel code for Linux (dual licensed BSD/GPL2) is already staged in net-next for the next Linux merge window. Testing continues, but initial results match CoDel simulations. CoDel works, and works well.

You can build a delay based TCP, as was done in TCP Vegas, it can lose out to conventional TCP’s and has some other unsolved problems. No vendor is going to want to ship something that can make their systems work worse relative to competitors. Getting everyone to convert at once all over the Internet is a non-starter. I do not see a path forward in that direction.

The complete bufferbloat solution includes deploying CoDel in our operating systems, our home routers (which come from the factory with firmware that is based on at least five year old antique code), our broadband gear (which comes with a single queue, no classification or “fair” queuing) need upgrade or replacement. Bufferbloat is also hiding elsewhere in the Internet, including our cellular wireless systems.

Back to BitTorrent and its history, and what we can learn from the incorrectly diagnosed nightmare it triggered.

Ledbat is engineering around bufferbloat, not solving it. It is a really clever idea: if you detect excess delay, you try to get out of the way of other traffic.

But it doesn’t attack the fundamental problem, which is managing the buffers properly in the first place in your OS, your home routers, your broadband gear, to avoid delay in the first place. Then your network will always work at full speed, and share resources among people without falling off a performance “cliff”, as it does today.

Netalyzr uplink data

My cable modem is more than 10 times overbuffered according to the grossly flawed 100ms rule of thumb, even with a 2Mbps up-link. The netalyzr data shows my modem is typical. My brother’s DSL connection, rather than the 1.2 seconds up-link buffering, has 6 seconds of buffering, at least 60 times the worse than useless traditional 100ms “rule of thumb”. Tail drop is the worst of all possible worlds; you delay signalling the endpoints until the last possible instant.

At the time BitTorrent deployed, most cable customers had only a 256K to 768k uplink, with the same DOCSIS 2 modem; so rather than the 1.2 seconds I did on a 2Mbps uplink, it was correspondingly worse, and was comparable to my brother’s current DSL service.

BitTorrent filled these buffers. It was one of the first applications to be left running that would routinely fill the uplinks. BitTorrent was damned by association since it was often found running at the time the network engineers looked to see why the customer was complaining.

ISP’s reduced their nightmare overnight with a configuration change: Comcast, for example, upped their minimum upstream bandwidth to 784Kb, the most bloated buffers became 1/3 the size in time overnight. Many customers had long since bought the 2Mbps upstream service. The video demonstrates just how bad typical bufferbloat is on 2Mbps. 784k (with the same cable modem) will be three times worse!

BitTorrent may have problems that make it disliked by ISP’s (having to do with large scale traffic shifting), but ISP’s really hate customers calling up unhappy: this comes directly out of their bottom line profit, and is a competitive problem as well (to the extent there is competition between ISP’s; at my house, there is none).

There is one fly in the ointment for uTP/Ledbat, however. Since CoDel will allow us to finally attack the real problem and keep delays low all the time, Ledbat will no longer sense delay and cease to be effective at keeping out of the way of TCP. Ledbat behaves like TCP Reno in that case. This is what diffserv and “fair” queuing techniques were invented for. HTTP TCP traffic should have interactive priority, while downloads of all sorts, including HTTP downloads, BitTorrent, scp, and other bulk transport has lower priority. So if uTP/Ledbat also marks their traffic, we can deal with it in the edge of the network where we deploy CoDel and keep it from interfering with other traffic. We have to make our home routers and broadband less stupid; AQM is necessary, but not sufficient to get us a “real time” Internet. More about this soon.

Our edge network devices and our computers are stupid and broken. Most ISP senior executives likely still think it was BitTorrent causing their nightmares. It wasn’t BitTorrent’s fault at all, in the end. It was bufferbloat. Anything else you do with TCP or other protocols can and does cause serious problems. So long as we are stupid enough to think memory is free and how buffering is handled doesn’t matter, we are doomed. So get off of Bram’s case.

Network Neutrality and a Call for Transparency

Those who think they understand the network neutrality debate triggered by BitTorrent and do not understand bufferbloat and its history are wrong. Both sides of this debate need to step back and rethink what happened in its light. A new application, BitTorrent, deployed which caused ISP’s real severe operational nightmares. Their phones were ringing off the hook with unhappy customers due to horrifying performance. I made the same service calls about terrible performance, but I wasn’t using BitTorrent.

The ISP’s bufferbloat nightmare was hidden: no ISP wanted to admit they had a serious performance problem in public, and they misdiagnosed the real cause. BitTorrent is often left running for long periods; so it often happened to be present when ISP’s would troubleshoot. In secret, measures were taken to try to control the nightmare. This lack of transparency was the root cause of the blow-up. Opacity contributed for a half a decade delay diagnosing and understanding bufferbloat. It will take at least a half a decade to deploy fixes everywhere they are needed.

We can expect future problems like this unless there is much greater transparency into operational issues occurring in networks in the Internet. The Internet engineering community as a whole, did not have enough eyes on the problem to diagnose bufferbloat properly when bufferbloat first became severe. A very senior ISP engineer played the key role in bufferbloat’s final diagnosis, handing me the largest number of the pieces needed to assemble my dismal puzzle, and closing the loop to both ICSI and Dave Clark’s warning to him about the “big buffer problem”, but diagnosis could and should have happened five years earlier. Problems take much longer to solve when few people (even the very capable ones at that ISP) have access to the information needed for diagnosis.

When similar events happen in the future, what should we do then? How do we quickly diagnose and fix problems, rather than blaming the mostly innocent, and causing complete confusion on the root cause? What do we do while figuring out how to fix problems and deploying the fix? Sometimes it will be a simple and quick fix. Sometimes the fix will be hard and lengthy, as in bufferbloat. Sometimes the fix may be the application itself, that is badly designed (and we should think if the network needs ways to protect itself). Sometimes it will make make sense to manage traffic, in some (neutral) way, temporarily.

When will the next operational nightmare occur? And how long will it take for us to figure out what is going on when it happens? Will the right people be contacted quickly? How and where do operational problems get raised, and to whom, with what expertise when they occur? How, when, where, and with whom is information shared for diagnosis? Should there be consequences for hiding operational problems? How do we best observe the operating Internet? The need for transparency is the fundamental issue.

We are flying on an Internet airplane in which we are constantly swapping the wings, the engines, and the fuselage, with most of the cockpit instruments removed but only a few new instruments reinstalled. It crashed before; will it crash again?

For the next growing nightmare is certainly already hidden somewhere.

It is when, not if, the next nightmare arrives to haunt us.

This entry was posted on May 14, 2012 at 7:00 am and is filed under Applications, Networking, Puzzle, Bufferbloat. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses to “The Next Nightmare is Coming”

Bram Cohen Says:
May 24, 2012 at 9:59 am | Reply
I’m struck by how similar ledbat and codel are. They’re both based on one-way delays. There are two different sets of issues comparing them, one is how realistic deployment is, and the other is what’s the better solution.

In terms of deployment we already have widespread deployment of utp, and end users would be happy to deploy ledbat for all their TCP connections if only they were given the option, because they’re the ones suffering from the buffer bloat. Codel would have to be deployed by ISPs, who generally aren’t interested in fixing things.

As a solution, ledbat maintains the simple and flexible behavior of internet routers, while codel would make their behavior much more specific and particular, which could cause problems for future innovation. Codel has the benefit that it knows the one way delay, rather than having to subtract out a minimum seen, but ledbat knows what the RTT is.

In terms of future development, it should be possible to switch ledbat to be rate based rather than congestion window based, which would allow buffers to be much smaller than the RTT. (We’re working on that one). It should also be possible to rely almost exclusively on delays and ignore packet loss, which would be very good on mobile networks and wireless, because those have high non-congestive packet loss rates. That would currently break down when the bottleneck is on a very high bitrate connection, although explicit congestion notification should be able to fix that problem.
- gettys Says:
  May 24, 2012 at 10:50 am | Reply
  I have nothing against uTP/ledbat at all: I think it is goodness we have a “scavenging” protocol for transferring data in a non-interfering way. Whether this is better than diffserv marking isn’t something I’ve even thought about (though most people aren’t aware that diffserv marking actually is useful, as many/most home routers have implemented it for years, as it is implemented in the pfifi_fast queue discipline in Linux). Only game manufacturers and maybe VOIP vendors seem to have noticed.
  
  But I don’t see ledbat as a panacea: all it takes is one TCP session to ruin your whole day (sharing the same network path through a bottleneck), and by design, it means that any conventional TCP will take precedence over it. So Ledbat has the same problem that delay based TCP algorithms such as TCP Vegas has: flag days in the internet can’t exist, and you can’t get everyone to convert at once.
  
  Deploying home routers and broadband gear with codel will avoid bufferbloat for all versions of TCP; so you don’t face the upgrade of (often impossible to upgade) equipment.
  
  So I don’t think this is an “either/or” situation at all: both help, and both have their place.
  
  Just make sure that ledbat can deal with active AQM, and mark its packets with diffserv, and we’ll all win.
Randell Jesup Says:
May 29, 2012 at 6:10 am | Reply
Bram: I do have a problem with LEDBAT: it provokes standing queues of 100ms, and on top of that competes unfairly with any implementation trying to achieve a <100ms queuing delay (as seen by simulations of LEDBATs tuned for different delays).

No one seems to have seriously looked at the impact of LEDBAT on VoIP or other need-to-see-minimal-delay services. Yes, persistent TCP flows will do worse damage to VoIP, but scavenger protocols are far more likely to be running constantly in the background. 100ms of extra delay is a serious impediment to high-quality VoIP (or interactive video) service.

The initial 25ms proposal would have been mildly annoying, but manageable if it was achieved. 100ms (and the switch has no source I was able to find on the LEDBAT WG archives) is a real problem. I also worry about it being used in non-user-visible/controllable manners, like OS or application background updates, or for background cloud backups, or synchronizing pictures, etc.

And, as mentioned, it doesn't respond well to AQM techniques – it devolves to roughly fair sharing with TCP/etc it appears. If it reacted more strongly to loss (and ECN) than TCP, it might remain a scavenger in an AQM environment, for example.

diffserv is all well and good, but it is not an alternative for 'correct' protocol design (and I realize 'correct' is subjective). We can adjust endpoint protocols; we can't wave a wand and get rid of 100's of millions of WiFi/NAT/routers that no one will ever update the firmware of. Eventually (10 years?) most will have failed and be replaced, but I'd rather not wait until then.

I hope you'll be coming the the IAB/IRTF Congestion Control Workshop right before the next IETF!

jg's Ramblings