“A committee is a life form with six or more legs and no brain.” – Lazarus Long

When I started on the bufferbloat quest, I was bugged by why RED (or other AQM’s) were not universally used, and talked to Van Jacobson for some enlightenment; Van explained the issues with RED, and that the unpublished “RED in a Different Light” paper explained RED’s problems along with a potential solution.

Today, while being a friendly nag to Van to try to get the finished version of the paper, he told me the following story, that is (part of) why that paper was never published.

I just about fell out of my chair laughing…

One of the reviewer’s comments was that the issues with RED and solution could not possibly be true, and that the authors should go become familiar with the fundamental literature on RED and automatic queue management.

I wonder if the name of the reviewer will ever be known? But that would be cruel.

I hadn’t thought about this kind of danger of blinded reviews at refereed conferences. If ever you are on a program committee, please actually engage your brain.

P.S. Van’s finishing up converting the paper to TeX and fixing up the text as a bug discovered in the nRED algorithm had not been fully reflected in the text. It should be available soon. He’s also sending me a pointer to some other recent work that’s been published that may be useful for 802.11 wireless bufferbloat.

This entry was posted on January 6, 2011 at 2:40 pm and is filed under Bufferbloat, Networking. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

20 Responses to ““A committee is a life form with six or more legs and no brain.” – Lazarus Long”

Cyr Says:
January 7, 2011 at 10:49 am | Reply
Hi,

I’ve seen in the other replies that I wasn’t alone to not fully understand your point. So, I’ll dare asking: Could you recap’ the explanation of this “bufferbloat” problem in one (simple) post?

Here are some addtionals questions that might help you understand where I’m lost:

– Does bufferbloat only occurs when there is 2 (or more) flows? Or can bufferbloat also occur with only 1 single flow?
– What exactly happens when bufferbloat occurs (ie: when an excessively big buffer fills up, right?)?
– Is bufferbloat reffering to (1) the fact that an excessively big buffer fills up, or to (2) the fact that a flow can experience excessive delay when an excessively large buffers fills up?
– Is bufferbloat referring to excessively large buffers preventing TCP flow control?
– Does nefarious effect of bufferbloat only occurs under network congestion?

Thanks,

Cyrille

PS: I think one of the reason why I don’t clearly understand is that I’m not a native english speaker.
- gettys Says:
  January 7, 2011 at 11:28 am | Reply
  No, you’ve done remarkably well for anyone, not just for a non-native english speaker. Bufferbloat can be confusing. I can’t tell you how much hair I’ve lost scratching my head over what I was seeing. I didn’t have much to begin with…
  
  – Does bufferbloat only occurs when there is 2 (or more) flows? Or can bufferbloat also occur with only 1 single flow?
  
  Demonstrating bufferbloat is easy to do with a single TCP connection, on all operating systems other that Windows XP (which does not implement window scaling by default). You can use any other protocol to show bufferbloat; it just may be more involved. The Netalyzr test for bufferbloat is UDP based, for example.
  
  – What exactly happens when bufferbloat occurs (ie: when an excessively big buffer fills up, right?)?
  
  Buffers can easily fill from any traffic, from any protocol. Bufferbloat is when those buffers are not being managed, and therefore are oversize for extended periods, imposing excessive latency to traffic transiting those buffers.
  
  When there is bufferbloat, your queues are excessively long. In a packet switched network, queues should, on average, be very short, not running continually full. That is what AQM can do for you.
  
  – Is bufferbloat referring to (1) the fact that an excessively big buffer fills up, or to (2) the fact that a flow can experience excessive delay when an excessively large buffers fills up?
  
  Both.
  
  – Is bufferbloat referring to excessively large buffers preventing TCP flow control?
  
  There is still flow control, but again, as there are long delays due to the buffering, the remaining flow control is poor. As you can see in my tcp traces, when the buffers are much larger than the normal RTT should be, you’ve destroyed the fast response of the servo system. So TCP flow control is also suffering from the imposed latency. This actually induces a certain amount of packet loss (significantly more than you would have had in a good network) overall bandwidth usage is not destroyed on modern TCP implementations as SACK and fast retransmit can keep the pipe reasonably full.
  
  What bufferbloat causes is there to be no timely packet loss or ECN marking. This is destroying congestion avoidance in TCP and other protocols, guaranteeing the buffers fill.
  
  – Does nefarious effect of bufferbloat only occurs under network congestion?
  
  Yes, exactly. You only see bufferbloat just before a saturated bottleneck link. This is, however, common; it is therefore very common in our home networks, both due to the commonness of problems in the broadband gear as ICSI’s Netalyzr work showed, and also in our home routers and computers (typically on 802.11). Similarly we see aggregate forms of bufferbloat both on busy 802.11 networks and 3G networks. And some network operators (both corporate and ISP’s fail to run with AQM, and therefore inflict pain on their customers.
  
  Thanks for the “FAQ list”; I’ll move this to a separate page.
  - John Burdick Says:
    January 7, 2011 at 5:44 pm | Reply
    For background on the problem check out the paper Sizing Router Buffers by Appenzeller et al 2004.
    
    Click to access sigcomm-extended.pdf
    
    The paper discusses formulas for estimating buffer size and states as rationale “Large buffers conflict with the low-latency needs of real time applications (e.g. video games, and device control). In some cases large delays can make congestion control algorithms unstable …”
    
    From the paper it is easy to see how a well intentioned decision would make a buffer 200 times as large as it should be. There are number of other good papers linked to this one by citation.
    - gettys Says:
      January 7, 2011 at 8:32 pm
      Yes, the buffers are now not just a little bit large, but grossly bloated.
      
      The term “bufferbloat” as I was struggling for a good name; consensus not having been reached on the end-to-end list.
      
      And thank you very, very much for the reference; I need to pull this all together for more formal publication, and having blundered into this area by accident, I don’t know the literature well enough to have those at my fingertips.
      
      Relatively few puzzle pieces in this puzzle are new; at most, I’ve been mostly assembling a more coherent picture of the puzzle from other people’s pieces. I often don’t know who discovered and published different puzzle pieces first.
      
      Any one care to guess what the picture in the puzzle piece logo is? Don’t post spoilers here; just mail me, and I’ll credit whomever is first in a posting explaining the puzzle piece log I use.
Jonathon Duerig Says:
January 7, 2011 at 12:07 pm | Reply
After reading your discussions of bufferbloat, it seems very much related that I and some colleagues did in a paper at NSDI in 2009:

Click to access sanaga.pdf

We were looking at asymmetric links and buffer sizes in the context of path emulation, and came across these latency problems. We tracked a lot of the issues down and came up with some equations which relate to how to set queue sizes properly. Hopefully, reading this can provide some further insight into what you are seeing.

If you want to run controlled experiments with various levels of asymmetry and different queue sizes, you can also consider using the Emulab facility which allows automatic creation of topologies using actual PCs with real network traffic.
- gettys Says:
  January 7, 2011 at 9:24 pm | Reply
  Thanks for the reference.
  
  Note that bufferbloat is not a property of asymmetric links at all; my second exposure to bufferbloat was a symmetric FIOS link, and I can reproduce it on an ethernet trivially.
  
  It’s only relationship to asymmetric links is that the latencies (for the same size bloated buffers) hurts more in one direction than the other. This is why Comcast’s and others BitTorrent pain reduced so much when they upped their lowest upstream bandwidth then; IIRC, their typical upstream bandwidth in that era in their lowest tier was 384K; and with the size buffers observed in the DOCSIS 2 hardware shipping in that era, the latencies must have been horrific (take the 1.4 seconds I observed and multiply by two or three). So while I know BitTorrent has other problems as a protocol, the pain customers suffered due to bufferbloat then must have been extreme.
Matt C Says:
January 7, 2011 at 2:10 pm | Reply
Excellent series of posts on this issue. I’ve run into this a few times (having to explain to managers why bigger buffers are not better), but I was not aware that it was such a widespread, real world problem.

I wonder if the problem can be solved or at least helped by modifying TCP endpoints to treat RTT increases as an indication of congestion in addition to packet loss? As you pointed out, routers and network equipment can be difficult to update, but end-user’s computers as well as server nodes are easier. Have you heard of anyone pursuing this avenue?
- Matt C Says:
  January 7, 2011 at 2:53 pm | Reply
  After I posted my comment, I did a little looking and found out I’m only 15 years late with this idea – TCP Vegas did this in 1995. Also, Microsoft’s most recent ‘Compound TCP’ also takes into account delay to reduce the congestion window. I’m not sure why these hasn’t been used much, but they seem to me like they might be part of the solution…
  - gettys Says:
    January 7, 2011 at 8:10 pm | Reply
    Deploying new protocols takes a long time; most people are still using Windows XP these days. And game theory says that anything you do that puts you at a disadvantage is unlikely to be deployed. So I see all the suggestions of “change the protocols” as being likely doomed from the start. That doesn’t mean I think work like the IETF LEDBAT group is doing is useless at all; we’d clearly like a transport protocol that really tries hard to stay out of the way. But without queue management, (and semi-sanely sized buffers in the first place), we’re just going to continue to suffer, I think.
    - Sam Stickland Says:
      January 11, 2011 at 11:01 am
      It’s also possible to make the modifications at the server end though? i.e. Reducing the Sender Window in response to RTT.
      
      I’m not sure that your Game Theory comment applies here.
      
      This blog post http://www.cringely.com/2011/01/2011-predictions-one-word-bufferbloat-or-is-that-two-words/ describes how he could cause the issue using a single Xbox360 streaming from Netflix (but his older Roku box was fine).
      
      I think it’s fair to assume the buffers he was filling were on his router(s) or headend buffers dedicated to him. Otherwise the problem couldn’t be caused by the Roku -> Xbox360 upgrade?
      
      This is clearly a problem for Netflix. However if Netflix’s TCP implementation reduced its sender window in response to RTT the problem is avoided.
    - Sam Stickland Says:
      January 11, 2011 at 11:16 am
      In reply my my own comment (which isn’t up at the time I type this), I’ve checked and Microsoft’s Compound TCP algorithm does indeed modify the /senders/ window in response to RTT.
- gettys Says:
  January 7, 2011 at 3:49 pm | Reply
  Yes, but…
  
  The other guy’s traffic will kill you. (your kid’s, your wife’s, your co-worker). So you’ve just fixed your contribution to the problem, so you get to suffer more… Game theory says this isn’t a stable solution.
  
  The technique is being explored in the IETF LEDBAT working group to allow for friendlier behavior for bulk transfer protocols like BitTorrent. But as I noted in one of my posts, much of the pain of BitTorrent only occurred because of bufferbloat in the first place.
  
  Fundamentally, the queues need to be managed. So we need to fix bufferbloat overall, not just try to work around it.
  - Matt C Says:
    January 7, 2011 at 4:07 pm | Reply
    I was naively thinking that someone running something like TCP Vegas would have an advantage over the other guy’s traffic as the congestion control should be more stable by reacting to latency increases before actual packet drops. Then everyone would have an incentive to switch to this, and the more people that did the better it would get, mitigating the need to update routers…
    
    However, in practice it looks like TCP Vegas does worse when competing against non-Vegas implementations because it slows down sooner. So much for that silver bullet!
    
    Thanks for raising awareness on this issue.
    - gettys Says:
      January 7, 2011 at 4:38 pm
      Not to mention the difficulty of getting a new TCP deployed, and game theory that tells you that vendors (and people) don’t want to be at a disadvantage to others.
PlanBForOpenOffice Says:
January 7, 2011 at 5:58 pm | Reply
Jim,
here is a simple queue management algorithm to ponder.

If the buffer is full, drop all packets.

I know it is radical, and against the idea of never loose a packet.

However it will indicate congestion to all connections held in that buffer, which allows TCP congestion control to do it’s work.

Also, lets say your fat connection (rsync) has more than one packet in that buffer, it should notice that there is a larger problem (n packets lost) and back off more aggressively.

An alternative would be to drop only half of all packets.

I’m not at all a network specialist, so take this suggestion with a grain of salt
- gettys Says:
  January 7, 2011 at 8:41 pm | Reply
  Won’t help, or at least help much.
  
  Much of what we have today is doing tail-drop, which has been known to be a really bad idea for a very, very long time.
  
  Head drop is much better.
  
  And RED (the R stands for Random) is better yet.
  
  But with the buffers as amazingly bloated as they are now, it just doesn’t matter if you drop head, or tail, or all of the packets: the notification of congestion will just be so delayed when the first congestion notification occurs that the hostrs will just will refill the buffers again. That’s exactly what the tcptraces showed in my initial investigations, which have the amazing spikes.
  
  These buffers aren’t merely big: they are bloated, to the point that congestion avoidance is dead. So you see the oscillatory behavior, rather than a smooth flow. And why I lose sleep at night…
  - Arms Says:
    January 12, 2011 at 6:19 pm | Reply
    One thing to note is, that RED was also designed with a TCP like AIMD control loop in mind. Due to this, typical RED settings are rather sluggish and conservative with the actual extent of marking/dropping.
    
    Fact is, that no matter how many drops are expirienced in one cwnd worth of packets, there is only one single reduction ( by a fixed amount, ie 1/2) per rtt.
    
    otoh, simple algoritms which also react to the *extend* of congestion are at a disadvantage, when sharing flows with different rtts at the bottleneck – the flow with the slower control loop (higher rtt) gets more bandwidth for some transient time.
    
    nevertheless, i think getting rid of arbitrary sliggish control loops within control loops might be worthwhile – reducing RED like schemes to their fundamential (instantaneous) state… and meddling with that feedback should reside completely in the end hosts. that includes reacting to the extent of the congestion, the observed rtt, and perhaps even taking into account the first derived values ( change in congestion extent at the bottleneck).
    
    one way delay variance measurement (doesn’t need synchronized clocks, only comparable clock rates), could provide one more – fine grained feedback signal.
    
    Regards,
Dave Täht Says:
January 10, 2011 at 3:15 pm | Reply
One of the things that really gets to me about all the press, website (slashdot/arstechnica), and blog coverage you’ve got so far (and congratulations for doing such a great job on that) is that nearly all of them still miss your core points, and few actually perform the experiments you did, to truly grasp it.

Three misconceptions going around:

1) “It’s a TCP specific issue”

The focus on TCP here, and in the academic papers and discussions elsewhere, ignore the fact that there are other kinds of mission critical packets that are being intolerably delayed by bufferbloat.

These include NTP, DNS, DHCP, ARP, routing, VOIP, and encapsulated in UDP packets such as IPv6 6in4 tunneling and VPNs, all of which are getting drowned out by bufferbloat.

While many of these packet types are at the level of statistical noise, they are *mission critical* noise, and often need to jump the TCP queues in order for a network to function properly.

2) “Buffer size is a root cause of the problem”

No, TWO *TX* buffer sizes are a root cause. While it is hard to dig into (I have personally dug into 5 different ethernet/wireless drivers) the first (and hardest) thing that needs to get accomplished is reducing dma TX queues to sane levels.

I haven’t (sadly) seen any discussion on lkml or linux-wireless about TX queues yet.

Reducing txqueuelen helps a lot, but when dma buffer sizes are higher than, say, 16, (2 being optimal) the wheels lift from the road far sooner.

Of the drivers I’ve poked into personally:

Openrd/sheevaplug/kirkwood: Default DMA TX QUEUE of 256, which can be reduced to 20, via ethtool -G eth0 tx 20

ath5k: I don’t understand how this one works, my reading is that there are 16 queues of 200 entries each.

ath9k: dma queuelen of 64

Of course, all these share a txqueuelen of 1000 by default, and that needs to be reduced to sanity as well. I’m not sure what a good value is for this, either, I’ve been using values as low as 4 (AFTER reducing the tx dma buffers below 21). I imagine that wireless devices that can pack multiple packets into a single frame will want a larger txqueuelen than those that don’t.

I have the same intel ethernet card and iwl wireless card you do, but I don’t have the figures handy for those right now.

After taking these numbers down across my entire network, it has been amazing how the level of user complaints have dropped.

The amazing thing about these tweaks thus far is that they have had no measurable impact on cpu usage or performance at the high end of the bandwidth spectrum on the devices I’m using.

3) “Traffic shaping fixes it”

No, first you have to get the wheels back on the road.

Even then commonly used shapers such as the wondershaper have a set of assumptions built into them that apply to the 2002-2005 DSL era. Now that download/upload speeds are over an order of magnitude higher, and the ratio between download/upload speeds is now closer to 7to1 rather than 2 or 3/1, one of the wondershaper’s knobs for interactivity (putting tcp acks in the interactive class) now overdrives the servo mechanism and drowns out other critical (udp) traffic.

I’ve spent a lot of time poking into this this past month. I’m getting close to having pleasing results. Although RED in a different light is a good paper, I don’t think at this point it’s optimal for asymmetric networks such as those in a home gateways and wireless. It’s focus on flows, rather than devices, makes sense at the core but not in these cases.

Modifying wondershaper to put a new “ACK” class below in priority to outgoing interactive traffic appears to help a lot, but there are other issues too complex to go into here.
- gettys Says:
  January 10, 2011 at 9:05 pm | Reply
  Yes, the experiments are very useful to perform.
  
  1) You are exactly correct about the fact that this really doesn’t have much to do with TCP; the only new thing here is that it’s destroying TCP (and other protocol’s) congestion avoidance and that exhibits periodic behavior, which is scary by itself. This is inherent in inserting large amounts of delay into a servo loop of any sort.
  
  If you control the queue lengths properly then you have much less need for classification, even for critical stuff. Unfortunately, when I get to blogging again about what’s happened in the web site of things, I also think we do want classification to get various stuff working properly again, do to the issues I’ll discuss then.
  
  But if you get the bloat out first, your in a fundamentally easier place to figure out what amount/kind of classification you will need.
  
  2) exactly: we have a bunch of problems here.
  
  3) also exactly true: and wondershaper won’t do you much good if you have driver bufferbloat: the transmit queue won’t have packets in it to classify.
  
  And yes, I should blog about the dangers of taking most of the Howto’s I’ve found on the net to “optimize” your system.
Dave Täht Says:
January 11, 2011 at 10:57 pm | Reply
re: 1) If you can do an article talking about the IP/UDP underpinnings of the network and what happens when it gets congested without once using the word “TCP”… it would be an appreciated act of authorial legerdemain! I don’t think the new kids on the block really grok that stuff anymore.

re: 3) I just sent you some email on that – I have a long analysis of 4 existing shapers and where they go right and wrong, but it’s taking me too long to adopt a calm and civil tone regarding them to publish any time soon.