Jim Gettys' ramblings on random topics, and occasional rants.
Active Queue Management (AQM) FAQ
What is AQM? An Active Queue Management system is used to control the length of a queue so that it does not run full, adding its maximum (usually bloated) delay under load. Such management also enables TCP to do its job of sharing links properly, without which it cannot function as intended.
Why are AQM algorithms essential? Buffers cannot be effective for their intended purpose of handling bursts of packets if they run full, and TCP cannot function correctly in the face of congestion unless it is signaled to slow down by packet drop or ECN in a timely fashion. TCP’s responsiveness to sharing a link is quadratic: 10 times the delay means it will respond 100 times more slowly to competing traffic, and “elephant” bulk data flows are truly elephantine.
What is the “correct” length of a queue? In the face of steady traffic, the correct length is nearly empty, just enough to keep utilization of a link at an “acceptable” level. More than that, and all you are adding is unneeded delay.
Are AQM’s by themselves sufficient to handle the current situation? No, while an AQM is necessary, they are not sufficient, given today’s web. Between browsers using many TCP connections simultaneously, “sharded” web sites, and hardware in data centers sending huge bursts of packets at line rate, a single class of service queue cannot provide good latency for real time (e.g. VOIP and teleconferencing) traffic. Other techniques such as “fair” queuing and QOS classification will also be necessary to provide really low latency services.
But I thought QOS classification can prevent these problems? No, at most they can determine who suffers, not prevent suffering. So you might protect your VOIP traffic, but all of your interactive and bulk TCP traffic might still suffer unless AQM manages TCP traffic. So classification is a useful adjunct, but not a solution to bufferbloat.
What about “Fair” queueing? “Fair” queuing can often be much more effective than standard QOS facilities in determining transient flows that should be preferred to elephant bulk data transfers. For example, we are experimenting with Linux’s SFQ line discipline in CeroWrt to good effect. But “fair” queueing by itself cannot signal TCP properly for good TCP behavior; TCP’s servo system works best by keeping delay short. And fairness is in the eye of the beholder: SFQ won’t be the ultimate answer for fair queuing, as fairness among all flows will still enable BitTorrent an unregulated amount of bandwidth.
What happens to algorithms such as “Ledbat” in the face of AQM’s? An effective AQM will keep delay low: so the very delay that enables Ledbat to “scavenge” bandwidth will go away. Ledbat itself will then compete on a roughly equal footing with TCP, and we’ll have to take other measures (e.g. diffserv marking) for an effective scavenging protocol to be able to stay out of the way of foreground traffic.
I thought AQM’s were only needed in Internet routers? No, they are needed anywhere a queue can grow, including in our home routers, laptops and smart phone, where AQM’s are not currently present. And since existing deployed algorithms are typically RED variants, they require tuning, and are often not configured and tuned even on routers where they would be of great benefit. It may be the only tool you have available to you in a internet router, until better dynamic AQM’s are available.
Do I need AQM algorithms everywhere?
To first order, yes. More precisely, you don’t need an AQM algorithm active on a buffer if it is impossible for a buffer to become a bottleneck under any circumstances, in which case excessive buffering just consumes power and may add cost, but will not add delay. The “RED Manifesto” had the principle right, even though RED itself is inadequate.
Are AQM’s available everywhere they are needed? Unfortunately, no: they are not available in current broadband gear, home routers, or in our operating systems, where they are needed, and even if they were available existing algorithms probably would not work adequately. Worse yet, our broadband gear typically provide a single, bloated queue, without any fair queuing or classification. This can be partially mitigated by careful knob twisting in home routers.
Does the buffer size matter if a dynamic AQM is effective? There comes a buffer size where it may be faster to have TCP recover via a full slow start than waiting for an adaptive AQM to signal TCP to adapt and the buffers to drain to a small size, when available bandwidth drops suddenly and dramatically. Whether this matters when fair queuing and QOS is also implemented awaits experimentation.
How do existing AQM algorithms fall short? They may require careful tuning to be effective, and do not handle the highly variable bandwidth case we now have due to wireless and features such as Comcast’s Powerboost commonly found in broadband. Any algorithm that requires human tuning for effectiveness is unlikely to be configured and enabled where needed.
Ok, I now understand we need an adaptive AQM. Why is this hard?
The bandwidth/delay product variation a laptop or smart phone using WiFi or cellular wireless may face can easily top four orders of magnitude, and on wireless can change on the time scale of 10’s of milliseconds. This dynamic range and time variability makes adaptive AQM’s suitable for those environments quite the challenge. Even existing adaptive algorithms such as Blue react slowly; on a timescale slow relative to the size of the buffer, where the size of that buffer in time may be be similarly varying hugely. Previously known adaptive AQM algorithms do not adapt in a timely fashion.
Is there hope for an adaptive AQM that may solve our current problems? Yes. See the article “Controlling Queue Delay” by Kathie Nichols and Van Jacobson in ACM Queue. Additionally, there is another paper making its way through the academic publishing process.