RED in a Different Light

Update May 8, 2012:Controlling Queue Delay” describes a new AQM algorithm by Kathie Nichols and Van Jacobson.

Congestion and queue management in Internet routers has been a topic since the early years of the Internet and its first congestion collapse. “Congestion collapse” does not have the same visceral feel to most using the Internet today, as it does for a few of us older people. Large parts of the early Internet actually stopped functioning almost entirely, and a set of algorithms were added to TCP/IP to ensure collapse not happen in the future.  These include slow start,  congestion avoidance, fast recovery, and at a later date, ECN (Explicit Congestion Notification), which has not so far seen wide use,  and is a subject of ongoing research to determine if it can be deployed.

Bufferbloat much larger than the RTT’s of the paths destroys the fundamental congestion avoidance of the TCP protocol’s servo system as I documented. We have destroyed congestion avoidance, and as I’ll discuss soon when I return to the topic of web browsers and servers, we are playing with fire. Even if nothing as bad as I fear ever happens, the bufferbloat situation today is bad, with multiple second latencies being common.

Bufferbloat was first understood encountered in very early experiments with satellite networking, and active queue management is a very active area of research since the 1980’s and continuing to this day.  With the invention and wide deployment of algorithms such as RED and others, I had thought that the problem was solved. To my surprise (I am not in the field, but due to history have been a somewhat interested bystander), I was wrong, and that queue management is often not enabled even on significant routers in both enterprise networks and the Internet. The reasons why and the limitations of existing AQM algorithms shed light on this aspect of today’s problems.

Conclusions: Active Queue Management is often not enabled or tuned in today’s Internet and corporate networks. Broadband (and some network’s) performance is therefore often significantly worse than necessary since your ISP may never have enabled AQM. If you are operating a network, check that you have correctly enabled and tuned your AQM on all types of your gear. You can have happier customers and fewer service calls. Finally, we need better queue management algorithms than “classic RED” or closely related algorithms  for today’s wireless routers and operating systems.

Read on for more detail…

Internet routers have to not only handle bursts of traffic of a flow, but also interference from other connections on other paths, changes in routing (and therefore available bandwidth).  So there has never been a “single right answer” routers’ buffering.  If congestion builds and is not signaled to the end point generating the congestion, the queues will grow effectively unbounded. Remember, such queue growth is orthogonal to any QOS that you may want to impose; a network link will remain congested until/unless the  endpoints back off and slow their transmission. QOS may certainly be a tool in providing good quality of service for interactive applications, but it by itself cannot solve the overall congestion problem; queue management (and signalling congestion) is absolutely essential.

My lunch with Comcast had alerted me that RED or other AQM was far from universally enabled, though RED and some other algorithms have been deployed in production routers for well over a decade. In retrospect, I’ve occasionally seen links in the public internet that have shown symptoms of bufferbloat: e.g. 20 second RTT’s via a satellite link to the Ministry of Education in Peru, and sometimes weird experiences at hotels (only partially explained by broadband bufferbloat). We covered the reasons why AQM might not be enabled.

  1. Availability of fiber in the late 1990’s, along with the popping of the .com bubble put a glut on the market and a temporary slackening of growth rates; supply of bandwidth caught up with demand. Congestion eased.
  2. RED has a number of parameters that need tuning;  it needs to be set up by network operators and tuned to the circumstances.
  3. Another generation of network operators grew up who had not been scarred by that growth to fear congestion, who did not see it as essential, and feared what might happen if RED was enabled.
  4. Most of the Internet core is not running congested at any given moment; but the congestion points are constantly shifting. The “right answer” is more fiber, everyone agrees: but it takes time to send trucks and ships to lay cable; there are backhoe events; there is latency measured in months or years to change (some) provisioning, so some parts of even the Internet core will always be congested.

After that lunch, I wondered how to go from anecdotal experience and hearsay to another  possible piece of the bufferbloat puzzle. Certainly, Comcast looked clean whenever I looked at it (as I was told it should be), except for that horrid “last mile”.

I’d seen  and experience enough trouble over the years to take this topic very seriously and investigated further.

AQM and Broadband

But even where we have AQM available to control bufferbloat, network operators are today often failing to enable and configure AQM.

Lack of AQM at your broadband “head end” can easily hurt you in the downstream (ISP to your home) direction; there may be additional problems in the broadband gear itself.  It is less likely you will see any buffering in your cable router or CPE directly, as usually they are connected by an ethernet to some other device (like you home router, where the you may have bufferbloat problems as well). You may have to educate your ISP.

The paper Characterizing Residential Broadband Networks by Dischinger, Haeberlen, Gummadi and Saroiu is illuminating.  In section 4.3.2 of that paper, they discuss strong evidence that a substantial fraction of ISP’s apparently are not running with any active queue management, at least on the broadband head ends.  I see this paper as confirming strong evidence of the initially anecdotal data I had been given (along with my ad-hoc experiments on hotel networks). And as I traveled this summer, I probed a number of networks at hotels; some were clearly disastrously badly suffering from broadband bufferbloat, but several of hotels were connected over T1 and fractional T3: on them I also observed bad congestion and excessive buffering but with the bottleneck links in the middle of their ISP’s networks. Your mileage will vary, depending on the network.

AQM and Enterprise Networks

I also believe many/most enterprise networks have bufferbloat problems.

I installed Smokeping at work in June to monitor our corporate network, having noticed occasional periods of higher latency and bad jitter.  It showed significant “spikey” latency, that was of very short duration (minutes), sometimes like clockwork at particular times of day, and large enough to be of concern to our teleconferencing project.  The magnitude of the spikes indicated that queue management was possibly not enabled on our corporate network; short running scheduled applications were unlikely to inject that much latency over the bandwidth of the links involved.

I eventually managed a discussion with a very clue-full person in our IT department: we run with quite sophisticated QOS classification (e.g. VOIP classification, and several other priorities), but no AQM.  I remind you: QOS is no substitute for AQM, which is always needed whenever there is any possibility of congestion at all (and congestion is constantly shifting). The lack of AQM is/will become an increasing problem in your enterprise network as Windows XP is retired since individual machines are more likely to drive any path to saturation. You should ask your IT department if they have enabled AQM.

Why Some Network Operators Fear AQM

RED  (the most widely deployed  and available AQM) was developed by Sally Floyd and Van Jacobson. I saw Van at LCA 2006 for the first time in over a decade when he gave a wonderful presentation on network channels. I called Van in August to understand the reasons why network operators distrust RED.

RED in a Different Light

Van told me then that:

  1. There are not one, but two bugs in classic RED, discovered and proven to him by Kathy Nichols, and that the result has been a cottage industry  publishing over 100 papers on RED tuning in that time, and justified mistrust by network operators.
  2. The finished “RED in a Different Light” paper was never published, in part, because someone on the program committee was offended by the diagram explaining its behavior. You can inspect the draft yourself, and conclude that some people lack any sense of humor; the mechanism in the diagram is familiar to us all and a direct analogy to what is occurring, that provides insight into its behavior.
  3. The improved RED algorithm was never implemented by Cisco, since the paper was never published and since therefore there was no demand from customers. Alternate algorithms have also languished, and generally not been fully automatic.
  4. Van then changed jobs, and the finished paper has been lost on a backup tape for a decade.
  5. At the time Van was working on that paper, he did not foresee we’d have essentially the same problem in wireless home routers, and there was little other outlet for that work except in internet routers.

Van thinks the observed behavior of spotty deployment of AQM by network operators (both corporate and ISP’s) is completely understandable (even if often very unwise).  RED does require tuning, and can conceivably cause trouble if not tuned properly; many operators who have not been scarred by congestion are therefore gun-shy.

Another old Digital colleague of mine, Dave Oran, asserted when he heard the above history that operators should “Just turn it (RED) on” and “it’s almost always better than doing nothing and very unlikely to hurt, and very likely to help greatly”. Dave is right; but understanding how we got here is illuminating, and people’s behavior is now understood. We should not wait for a better algorithm to fix the routers that we already can. There is more than a decade of operational experience with RED and other AQM’s.

The Way Forward

We can mitigate some of today’s bufferbloat problems without any AQM (since the size of the buffers are so huge), but to fully solve today’s broadband bufferbloat we minimally need to use RED or some AQM. This seems pretty immediatly doable, since broadband devices operating points may be predictable enough for stable operation, at least if the devices are aware of the bandwidth provisioning.

But I’ve shown in fun with wireless and the glasse house and home mitigation postings, we have bufferbloat in today’s operating systems and in home routers. Van is sure that classic RED algorithms cannot solve the wireless router  or OS bufferbloat problem, which spans much higher dynamic range and much more dynamic behavior than seen in the much more aggregated “classic” internet router environment. Van thinks his nRED algorithm might.  Any algorithm that misbehaves will not be trusted and not enabled by default; history shows it is often then forgotten. Manual tuning has its own pitfalls; it is good enough to demonstrate the problem as I have done, and while we can mitigate the magnitude of current bufferbloat with tuning, I believe that achieving something we’d call a full solution for OS and wireless router bufferbloat is not possible without something better than classic RED.

An early draft of RED in a Different Light by Jacobson, Nichols, and Poduri escaped onto the internet (caution: Van says that that draft’s revised algorithm still has at least one issue but the early draft is enlightening as to a glimpse of the way forward, and you can get immediate insight into classic RED’s problems). Van told me this afternoon he has found his old files with the final draft of the  paper and supporting material from backup tapes of ten years ago. Here is where you might be able to help with bufferbloat yourself:

  1. if you happen to have been on the program committee for SIGCOMM (?) in that era (1999?, 2000?), and have a copy of the finished paper Postscript, please send it to Van ASAP. It would save the hassle of 2.
  2. if you have access to a copy of Framemaker, we could use your help to use it to convert the paper to a modern format.
  3. one aspect of the simulation of the nRED algorithm was never completed, since the final draft was not accepted.  If you have research bent and are interested in completing those simulations, let Van know.
  4. once the paper has been resurrected (or at least the final versions of the nRED algorithm, from his talk materials), kernel and home router implementers can have a ball making a wireless home router and operating system that actually might work well across the huge dynamic range and workloads we now face.
  5. If you have other ideas of fully automatic AQM algorithms that might handle this challenging queue management problem, now is the time to try them out.

Update: Van has managed to retrieve the later draft of  “RED in a Different Light”  and resurrect FrameMaker via VM technology, and is converting the paper to LaTex. He says that the draft only partially reflects a bug fix found in the revised algorithm, and is editing it to reflect the final version of the algorithm. It should be available soon. Van also pointed out Buffer Sizing 802.11 Based Networks, for which there is running code, of particular interest to people working on 802.11. He believes there is an easier and more robust solution, but “they have running code, and I don’t”.

Home router and operating system bufferbloat is becoming much more common and problematic as the uplink bandwidth on ISP’s has increased: for example, on FIOS I observed the bufferbloat problem more in the wireless home router part of the ISP provided device than in the FIOS plant itself.

Conclusions: Active Queue Management is often not enabled or tuned in today’s Internet and corporate networks. Broadband (and some network’s) performance is therefore often significantly worse than necessary since your ISP may never have enabled AQM. If you are operating a network, check that you have correctly enabled and tuned your AQM on all types of your gear. You can have happier customers and fewer service calls. Finally, we need better queue management algorithms than “classic RED” or closely related algorithms  for today’s wireless routers and operating systems.


About these ads

39 Responses to “RED in a Different Light”

  1. Adam Williamson Says:

    On the Framemaker thing – maybe I’m missing something, but googling ‘Framemaker’ takes you to Adobe’s page for the current version of the software, complete with a page – http://www.runaware.com/clients/adobe/techsuite/ – offering various methods to get a trial version, for free, two of which offer ‘Access to full version software for 30 days.’ Wouldn’t that be enough to get the conversion done?

    • gettys Says:

      You are exactly correct: but Van has not messed with FrameMaker much (Kathy disliked LaTex, according to Van) and I never have, nor do either of us necessarily even have a Windows machine to install current FrameMaker on conveniently at hand.

      Free demoware is often/usually crippled in some way (often exactly on either export or reading old versions). Then there is the overhead of getting such a software purchase approved in big companies: spending $999 for a piece of software to be used once often causes company bean-counters to have indigestion, if we find we need the regular version. So finding someone who can just run it through an existing installation may be a lot less effort. Plan B has always been that route.

      Plan C may be the resurrect the whole system image. Van was muttering about the amount of work to resurrect an entire system image from 10 years ago image he has, if push came to shove. That seems like a lot of work indeed, so we’ll try Plans A and B first.

  2. Steven Blake Says:

    I’m confused: why can’t he generate postscript from the pdf file on citeseer?

    At any rate, I was an early reviewer of RED Light and was always disappointed that it was never published. Subsequently, there have been mountains of papers published on post-READ approaches to AQM (I have two 2″ binders full of papers on my shelf at work). The most promising one IMHO is BLUE. I would really love to see it get some experimental deployment.

    • gettys Says:

      The version on citeseer is a very early draft, and incorrect. The problem is resurrecting the version that was essentially completed (all but one set of simulations to determine the best value of a constant).

      There are two reasons why the RED in a different light paper is interesting: 1) for the discussion of the flaws in RED, and then 2) as a indication that the problem is soluble; that a autotuning AQM is possible, if we just set our mind to it.

      If you think Blue would handle the wireless problem, then there is a straightforward way to get experimental deployment; implement it as yet another queue management algorithm, and then the various open source router projects can take it for a spin. Some of those distributions are the basis for some commercial products as well, and you could get immediate feedback and testing from substantial communities of savvy people to help shake it down.

  3. Steven Blake Says:

    One comment about why some operators might be hesitant to deploy RED. Van’s and Sally’s papers always referenced simulations running at T1 – 10 Mbps speeds. This was in an era when > 1 Gbps links were being deployed, with much greater numbers of flows. Granted, traffic on these type of links was much harder to simulate. But it is not clear at all that queue behavior is scale invariant. Comments in the RED Light paper like “When sampling is performed at fixed intervals, we recommend samples on the order of an MTU time to avoid aliasing problems.” don’t make sense on gigabit links, where the sampling interval would be on the order of a microsecond.

    • gettys Says:

      And the obvious counter example is the observed fact that much/most of the net is running queue management on any link that might conceivably congest, for another decade.

      In any case, we can add your comment to the reasons: no one need feel embarrassed about not having done so; but at least in the conventional router area, we should be using it, at least until we have something good enough to “just work” and be able to be ignored entirely because we trust it so much it’s just always on. The consequences of doing nothing, now that memory is cheap, are really bad, including very high latency, bad jitter, and up to and including destruction of congestion avoidance.

    • Nathaniel Smith Says:

      Obviously it’d be good to understand how these sorts of algorithms work on high-speed connections, but the specific issue you raise is one that they explicitly address — for high speed links, they say, one should use random sampling instead of fixed sampling. Makes sense to me.

      • Steven Blake Says:

        I was never convinced with their argument about random sampling intervals. It is hard to reason about the frequency response of an averaging filter when the sample intervals are random. In any case, picking a small fixed interval to match the minimum random interval would not perform worse (aliasing wise).

        On fast links, fixed sample intervals on the order of 2-10 msec should be fine. That is an eternity relative to packet times. For slower links, fixed intervals on the order of an MTU-on-the-wire time should be fine.

  4. John Gilmore Says:

    Have you seen BEP 29, the BitTorrent UDP-based replacement for TCP that was specifically designed to detect buffer delays and avoid creating them? See http://www.bittorrent.org/beps/bep_0029.html

    Perhaps if these endpoint techniques actually work, TCP can be improved to do similarly (while we simultaneously work on getting routers to not bloat).

    • gettys Says:

      Well, it’s the other guys sharing the line that typically get you. So I’m not so sure how helpful this is. (Actually, I’m very pessimistic about that approach: game theory says it’s pretty much likely to fail, and getting everyone updated is very, very problematic).

      Having those algorithms for services like bittorrent, to keep them out of the way of other services is, of course, goodness.

      On the other hand, those techniques may be useful for working around broadband bufferbloat. Right now, the mitigation strategy defeats any temporary bandwidth boosts your ISP may give you (e.g. what Comcast does, which is very nice). So one could build routers that adapted to the available bandwidth possibly using those techniques.

  5. Luca Dionisi Says:

    Hi Jim
    I am following with much interest your posts about the probem of the bufferbloat.
    I have a question. Imagine that A wants to send a big file to B. There are several paths from A to B and a multipath mechanism is used by the routers in between those nodes. Now a congestion happens in one of the paths, but the other paths still have available bandwidth. What do you think that a good congestion avoiding algorithm should do?

    • gettys Says:

      No clue; I’ve never thought about multipath transport. Remember, I wandered into this area by accident recently, not by intent. Others are much more likely to have informed opinions than I. And even so, my attack on bufferbloat is from that of a systems person, not a transport protocol expert.

    • Arms Says:

      @Luca:

      Nothing will happen, unless the congestion is so severe, that the routing protocol will be impacted.

      OSPF (the only open standards protocol to ever feature a load-based metric; I won’t go into discussing proprietary EIGRP) was found to have “interesting” effects in the time-domain (ie. synchronized load waves across your domains), and thus load-based routing was never more than an academic excersize.

      Further note that routers typically work with a distinct seperation between control and data plane. Unless something very serious happens (port down, route unavailable), the control plane is little concerned with actuall data flowing over an existing, established connection. Furthermore, reacting (very) dynamically to changing congestion would likely cause heavy reordering at the point of the receiver, which in turn will severly reduce goodput (the sender may be fooled into fast retransmissions and spurious retransmission timeouts, when the path RTT is highly variable).

      If you have multiple paths available, which can actually be exploited (ie. load balancing routing protocol, or distinct ingress/egress points into the network between your hosts), you may want to implement MPTCP (where the relative load between sub-flows can be adjusted by the end hosts, and overall bandwidth across the network is no longer necessarily limited to a single physical link…)

      Regards,

    • Chris E Says:

      In addition to the time synced effects noted above, one other effect that affects multipath networks is Braess’ paradox, where relative cost metrics become useless. And buffering will make this situation even harder to recover from.

      • gettys Says:

        Interesting. Certainly by removing timely loss notification, we’re removing all incentives for good behavior.

    • Alex Yuriev Says:

      if you have a congestion inside your own network, you need to solve that as the problem well before addressing the others.

      The points of congestion are interconnects between the different networks where as a network operator I can control only one side and I’m speaking BGP to my peer.

      • Luca Dionisi Says:

        I am talking about a mesh network where a different protocol than BGP is used to distribute paths, and where multipath routing is enabled all over the network, not just inside a autonomous system. So congestion can happen in any point.

        • gettys Says:

          Exactly.

          Even inside of a much more static network, you have a mesh of connections, and given the variability of traffic (and transient failures) and ISP cannot always provision successfully in advance (nor is it generally economic to do so). While the goal of a well run network may be to run uncongested as much as possible, without queue management, you will have poor latency any time your path crosses a congested router, not just at the connections between autonomous systems.

          Also, Alex, please understand that any operating system more recent than Windows XP can quite easily saturate and congest links of > 1Gbps (as they will do TCP window scaling). Windows, however, has the additional default of shaping its traffic to remain below 100Mbps by default (on a *single* TCP connection).

          We are moving into a world where congestion is “normal” for many operations at some point in the network path (primarily, but not necessarily at the edge of the network). In fact, we’ve been living in this congested world from the beginning; it is just with unbloated buffers, transport protocols can react as they were intended and the latency disaster we are now experiencing was primarily confined to congested core routers failing to run AQM. Now we see this latency problem everywhere, including on the hosts (as they too have bloated buffers, which they did not a decade ago).

        • Alex Yuriev Says:

          No, congestion SHOULD NOT happen inside your own network unless you are experiencing a catastrophic failure of your equipment.

          This means that the congestion that you cannot resolve by adding capacity will only happened at interconnects – which is something we are seeing right now and were seeing since the internet got more than one distinct network. Of course that made no sense to the academics so their solution was … Internet2 a network with no traffic and therefore no congestion.

          If you are not accepting this as a premise you should stay away from the network design.

          “Also, Alex, please understand that any operating system more recent than Windows XP can quite easily saturate and congest links of > 1Gbps (as they will do TCP window scaling). ”

          I do not need to “understand it” – some of my customers saturate 10gig ports daily while pushing traffic to 1Gbit/sec or 100M/sec ports. Somehow none of them are having issues with the other connections dropping.

          There should be no “congested core routers”. That is the solution to your problem. Even Cogent knows that. Yes. Cogent.

        • gettys Says:

          There shouldn’t be congested core routers, but there are (at least by how I view it: multiple hops into a network run by a significant size ISP). And it is certainly occurring at the interconnects between networks. And given RED’s problems, this is hardly surprising.

          This is at least partially the result of the reality of backhoes, trucks, and ships, and the different timescales on which these operate; and competence of the network operators is also an issue. Crystal balls are very hard to find, and accidents and acts of men and gods do happen.

          So “SHOULD NOT happen” I can agree with; but it “DOES happen” is the reality of the Internet.

  6. Allan Duncan Says:

    Sometimes one wanders into an area outside one’s expertise, and sees something that strikes a chord from a different song sheet.

    By chance I had a book on QoS I had been reading, and looked up RED to refresh its meaning, and noted that the reason given for RED was a phenomena called “global synchronisation”, which was illustrated by a graph of Link Bandwidth Utilisation vs. Time, which has a nice scalloped shape to it. The reason given for this is cycles of congestion / backing off.
    What it looks like to me is a closed loop control system with way too much gain at high frequencies – greater than unity gain at the frequency of the second pole. Or looked at another way, there is too much delay from the perturbation to the corrective action. Jim’s bufferbloat derived latency. Remove those delays and the loop will stabilise, and no need for RED.
    To borrow another control system term, RED is just adding Feed Forward to the mix.

    I wish you all success in reducing buffers, I tried ethalyzer on my ADSL1 (1500/256 kb/s) and got 3.0 seconds up, 3.6 down which corresponds to 64 1500 byte packets up and 450 down, and boy, can my son clog it. Now if I can just persuade Netgear to tweak it. I note that their gigabit switch that is feeding it does tail drop, without the option to change it, so I think we have some educating to do.

    • gettys Says:

      Yes, the Random is “RED” is for the synchronization issue. Which, btw, has me bugged; bufferbloat has destroyed TCP congestion avoidance, and time based phenomena are all too possible.

    • Ivan Barrera Says:

      Allan,

      Indeed while RED works mainly on global synchronization, which may have not been an issue before (some claimed because the heterogeneity of networks), it also tried to break the phase of the flows and minimize the tradeoff between throughput and latency. Now, with many servers deployed near home to have better customer experience, I’m afraid the flows become a bit more homogeneous again.

      I’ve been working on this problem and trying to address several problems of RED, REM, PI (which is the actual control theory approach to the problem), among others. And found that the advantages of supporting AQM altogether with ECN (to mark packets instead of dropping them) are worth, particularly when routers now have increased capacity for packet inspection (which means the processing capabilities for detecting and marking packets due to congestion are there).

      Moreover, one main issue is that packet losses occur in wireless networks not due to congestion but due to the medium. The utilization of congestion control algorithms, altogether with ECN, can contribute to differentiate these to causes of packet losses and improve the performance of wireless networks as well.

      I’d be glad to discuss with you, as my PhD thesis is actually an AQM algorithm to keep lower queue utilization while maximizing the throughput. One of my last papers talk about addressing this issue with a couple of techniques, and I’m currently working on my own to improve the performance to make it very simple and faster.

      Jim Gettys, it’s nice to see someone of your stature could bring this issue to light. I’d be glad to hear more from you on the subject.

  7. Rich Rostrom Says:

    As a veteran semi-nerd, I think I get about 3/4 of this discussion. The technical elements are beyond me (I recognize what is being discussed, but have no actual knowledge of how it works).

    Having established my mala fides, I do think I see one possibly hopeful point.

    The bufferbloat problem occurs in two classes of place: at ISP and backbone nodes, and in CPE routers, PCs, and handheld devices.

    If the ISP and backbone nodes are fixed to eliminate bufferbloat, then the problem will be “compartmentalized”. End users will suffer if their equipment is vulnerable, and if not, not.

    This will disaggregate the problem. Each end user’s problems can then be solved by updating his gear. That will also localize symptom and solution, which will create a powerful incentive for CPE makers to come out with updated gear, and advertise the update and the reasons for it, and for end users to buy it.

    Or am I far too sanguine?

  8. Andrew McMillan Says:

    Hi Jim,

    Van Jacobson actually gave that talk at LCA 2006, in Dunedin, where it justifiably made it into a best of conference set and deservedly received a standing ovation, rather 2007 in Sydney.

    Cheers,
    Andrew McMillan.

    • gettys Says:

      You are exactly right Andrew, both as to location and how good the talk was. As far as I’m concerned, it was the best talk I heard in the decade of the 2000’s.

  9. Mohit P. Tahiliani Says:

    I have been working on RED based AQMs and trying to address parameter sensitivity of RED. Adaptive RED (ARED) designed by S. Floyd et al requires only one parameter to be tuned by network admins i.e. target queuing delay (which I guess might not be so difficult to tune). I wonder why this variant of RED has not been considered for experimental deployment. What do you think about the effectiveness of ARED?

    • gettys Says:

      I’ve never worked on AQM. Asking me is of limited benefit…

      For ARED to be “interesting” it needs to be implemented and tried out. I suggest you implement a Linux queue discipline. As the driver debloating work already underway for Linux (see the bloat-devel mailing listshow, at some point over the next few months it will be time to move on to how to deal with routing queues. Simulation is one thing: trial by fire quite an another (and I don’t know if there has even been recent simulation of ARED; note Sally Floyd’s reasonably famous statement to the effect that a simulator can be used to show anything you like).

      Note I don’t believe “one size fits all” is a likely outcome (pleasant as that might be if it turns out to be possible). We have a number of quite dramatically different environments where bloat is occurring, and it doesn’t follow in my mind that we’ll necessarily end up with a single solution. Beyond history having shown (and one of the the authors of RED 93 having said) that classic RED 93 isn’t really adequate, I take no position.

      I do note that algorithms that primarily depend on output goodput, in my intuition, are most likely to succeed for wireless. The big challenge of wireless is the very large dynamic range and variability of goodput.

      Please do make it a contender! We’re in a world of hurt.

    • Dave Täht Says:

      I will gladly slam more AQMs into my tree if you have working – or even semi-working – code.

    • Ivan Barrera Says:

      Mohit,

      When I did research in AQM, I noticed there were many algorithms out there. Mainly because anything you do will have an impact. The question is how big of an impact. RED was conceived as an initial almost empirical approach, where the piece wise linear function had absolutely no theoretical support except the one of being simple.

      Based on this idea the RED worked, many researchers started to make modifications to improve its performance. You see however that something like Kelly/Gibbens VQ provides a first insight to marking instead of dropping, and it “works” with a simpler approach. Again the issue its not that it works, but how good is it and what computer processing requires. RED is simple, but it has nonlinearities that are hard to account for. ARED is just a modification to actively modify some of RED’s parameters.

      On the other hand, you see for example PI or REM, with a little more theoretical background, but making the algorithms a bit more complex. You see that linear solutions such as REM are more stable, yet a bit slow. In fact, if you carefully read REM and AVQ you see both faced the same problems and while REM added the backlog into their algorithm to account for instabilities, AVQ didn’t reason why S-AVQ was later proposed.

      I initially worked developing estimators of network parameters and modeling TCP. But noticed that accurate modeling requires too many parameters and estimators of such parameters are processor hungry.

      Thus, my issue turned into using nonlinear systems that would act faster under extreme changing conditions, but reacted smoothly when the change in conditions weren’t significant. Therefore avoiding the speed/stability trade-offs of linear systems.

      It certainly depends on what your goal is and what are you trading to achieve those goals. I broke the problem of AQM into three main parts: detection, mapping and marking. After RED proposed a Bernoulli like approach for marking, everyone gave marking for granted. And most people concentrate on the mapping scheme. For RED, the detection is performed using the EWMA filter, the mapping uses the piece-wise linear function and the marking uses the geometric/bernoulli like mechanism.

      Using statistical theory, I found that a non-standard likelihood detector could dramatically improve the detection mechanism, and a Sigma Delta modulator could also improve the marking scheme in such way that the mapping mechanism wasn’t required to be processing intensive.

      You can check a journal paper recently published “statistical approaches to congestion control in computer networks”, where I describe such blocks. The detector itself was proposed in another paper. But the main trade I gave was processing power. The algorithm is still complex compared to that one of the original RED.

      Note that Cisco proposed an alternative WRED based on base-2 values and coefficients to make it easier to calculate and faster on switches. If you think about it, having a 48 1Gbps ports, and assuming worst case scenarios of 125 bytes per packet to make math simple, this means a rough 48 million of AQM operations per second. Having processor intensive AQMs translate those 48Mops into heavy work for the switches/routers. This, without taking into account using ECN and requiring to reassemble packets and re-calculating checksums (depending if the ECN is in the TCP or IP header of course).

      ARED doesn’t seem to provide the performance improvement worth implementing it, which causes: 1. Impact on packet processing (and risk of bugs) 2. Scalability problems for large x-connects 3. Companies rather sell expensive cards to telcos, instead of providing them with improved software techniques.

      That’s just a brief explanation of my perspective, but of course I’m willing to elaborate a bit further on it.

  10. CeroWrt – Debloating « Les enfants terribles Says:

    [...] such as RED) including home routers: unfortunately, existing algorithms such as RED are unlikely to work correctly in today’s home network [...]

  11. TCP Sucks | Bram Cohen Says:

    [...] the intelligentsia have a plan, called RED to how to fix the internet, because uTP (using the LEDBAT congestion control [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 673 other followers

%d bloggers like this: