The Blind Men and the Elephant

February 11, 2018


Bufferbloat is responsible for much of the poor performance seen in the Internet today and causes latency (called “lag” by gamers), triggered even by your own routine web browsing and video playing.

But bufferbloat’s causes and solutions remind me of the old parable of the Blind Men and the Elephant, updated for the modern Internet:

It was six men of Interstan,
 To learning much inclined,
Who sought to fix the Internet,
 (With market share in mind),
That evil of congestion,
 Might soon be left behind.

Definitely, the pipe-men say,
A wider link we laud,
For congestion occurs only
When bandwidth exceeds baud.
If only the reverse were true,
We would all live like lords.

But count the cost, if bandwidth had
No end, money-men say,
But count the cost, if bandwidth had
No end, money-men say,
Perpetual upgrade cycle;
True madness lies that way.

From DiffServ-land, their tried and true
Solution now departs,
Some packets are more equal than
Their lesser counterparts.
But choosing which is which presents
New problems in all arts.

“Every packet is sacred!” cries
The data-centre clan,
Demanding bigger buffers for
Their ever-faster LAN.
To them, a millisecond is
Eternity to plan.

The end-to-end principle guides
A certain band of men,
Detecting when congestion strikes;
Inferring bandwidth then.
Alas, old methods outcompete their
Algorithmic maven.

The Isolationists prefer
Explicit signalling,
Ensuring each and every flow
Is equally a king.
If only all the other men
Would listen to them sing!

And so these men of industry,
Disputed loud and long,
Each in his own opinion,
Exceeding stiff and strong,
Though each was partly in the right,
Yet mostly in the wrong!

So, oft in technologic wars,
The disputants, I ween,
Rail on in utter ignorance,
Of what each other mean,
And prate about an Internet,
Not one of them has seen!

Jonathan Morton, with apologies to John Godfrey Saxe

Most technologists are not truly wise: we are usually like the blind men of Interstan. The TCP experts, network operators, telecom operators, router makers, Internet service operators, router vendors and users have all had a grip *only* on their piece of the elephant.

The TCP experts look at TCP and think “if only TCP were changed” in their favorite way, all latency problems would be solved, forgetting that there are many other causes of saturating an Internet link, and that changing every TCP implementation on the planet will take time measured in decades. With the speed of today’s processors, almost everything can potentially transmit at gigabits per second. Saturated (“congested”) links are *normal* operation in the Internet, not abnormal. And indeed, improved congestion avoidance algorithms such as BBR and mark/drop algorithms such as CoDel and PIE are part of the solution to bufferbloat.

And since TCP is innately “unfair” to flows with different RTT’s, and we have both nearby (~10ms) or (inter)continental distant (~75-200ms) services, no TCP only solution can possibly provide good service. This is a particularly acute problem for people living in remote locations in the world who have great inherent latency differences between local and non-local services. But TCP only solutions cannot solve other problems causing unnecessary latency, and can never achieve really good latency as the queue they build depend on the round trip time (RTT) of the flows.

Network operators often think: “if only I can increase the network bandwidth,” they can banish bufferbloat: but at best, they can only move the bottleneck’s location and that bottleneck’s buffering may even be worse! When my ISP increased the bandwidth to my house several years ago (at no cost, without warning me), they made my service much worse, as the usual bottleneck moved from the broadband link (where I had bufferbloat controlled using SQM) to WiFi in my home router. Suddenly, my typical lag became worse by more than a factor of ten, without having touched anything I owned and having double the bandwidth!

The logical conclusion of solving bufferbloat this way would be to build an ultimately uneconomic and impossible-to-build Internet where each hop is at least as fast as the previous link under all circumstances, and which ultimately collides with the limits of wireless bandwidth available at the edge of the Internet. Here lies madness. Today, we often pay for much more bandwidth than needed for our broadband connections just to reduce bufferbloat’s effects; most applications are more sensitive to latency than bandwidth, and we often see serious bufferbloat issues at peering points as well in the last mile, at home and using cellular systems. Unnecessary latency just hurts everyone.

Internet service operators optimize the experience of their applications to users, but seldom stop to see if the their service damages that of other Internet services and applications. Having a great TV experience, at the cost of good phone or video conversations with others is not a good trade-off.

Telecom operators, have often tried to limit bufferbloat damage by hacking the congestion window of TCP, which does not provides low latency, nor does it prevents severe bufferbloat in their systems when they are loaded or the station remote.

Some packets are much more important to deliver quickly, so as to enable timely behavior of applications. These include ACKS, TCP opens, TLS handshakes, and many other packet types such as VOIP, DHCP and DNS lookups. Applications cannot make progress until those responses return. Web browsing and other interactive applications suffer greatly if you ignore this reality. Some network operation experts have developed complex packet classification algorithms to try to send these packets on their way quickly.

Router manufacturers often have extremely complex packet classification rules and user interfaces, that no sane person can possibly understand. How many of you have successfully configured your home router QOS page, without losing hair from your head?

Lastly, packet networks are inherently bursty, and these packet bursts cause “jitter.” With only First In First Out (FIFO) queuing, bursts of tens or hundreds of packets happen frequently. You don’t want your VOIP packet or other time sensitive to be arbitrarily delayed by “head of line” blocking of bursts of packets in other flows. Attacking the right problem is essential.

We must look to solve all of these problems at once, not just individual aspects of the bufferbloat elephant. Flow Queuing CoDel (FQ_CoDel) is the first, but will not be the last such algorithm. And we must attack bufferbloat however it appears.

Flow Queuing Codel Algorithm: FQ_CoDel

Kathleen Nichols and Van Jacobson invented the CoDel Algorithm (described now formally in RFC 8289), which represents a fundamental advance in Active Queue Management that preceded it, which required careful tuning and could hurt you if not tuned properly.  Their recent blog entry helps explain its importance further. CoDel is based on the notion of “sojourn time”, the time that a packet is in the queue, and drops packets at the head of the queue (rather than tail or random drop). Since the CoDel algorithm is independent of queue length, is self tuning, and is solely dependent on sojourn time, additional combined algorithms become possible not possible with predecessors such as RED.

Dave Täht had been experimenting with Paul McKenney’s Stochastic Fair Queuing algorithm (SFQ) as part of his work on bufferbloat, confirming that many of the issues are caused by head of line blocking (and unfairness of differing RTT’s of competing flows) were well handled by that algorithm.

CoDel and Dave’s and Paul’s work inspired Eric Dumazet to invent the FQ_Codel algorithm almost immediately after the CoDel algorithm became available. Note we refer to it as “Flow Queuing” rather than “Fair Queuing” as it is definitely “unfair” in certain respects rather than a true “Fair Queuing” algorithm.

Synopsis of the FQ_Codel Algorithm Itself

FQ_CoDel, documented in RFC 8290, uses a hash (typically but not limited of the usual 5-tuple identifying a flow), and establishes a queue for each, as does the Stochastic Fair Queuing (SFQ) algorithm. The hash bounds memory usage of the algorithm enabling use on even small embedded router or in hardware. The number of hash buckets (flows) and what constitutes a flow can be adjusted as needed; in practice the flows are independent TCP flows.

There are two sets of flow queues: those flows which have built a queue, and queues for new flows.

The packet scheduler preferentially schedules packets from flows that have not built a queue from those which have built a queue (with provisions to ensure that flows that have built queues cannot be starved, and continue to make progress).

If a flow’s queue empties, and packets again appear later, that flow is considered a new flow again.

Since CoDel only depends on the time in queue, it is easily applied in FQ_CoDel to ensure that TCP flows are properly policed to keep their behavior from building large queues.

CoDel does not require tuning and works over a very wide range of bandwidth and traffic. Combining flow queuing with a older mark/drop algorithm is impossible with older AQM algorithms such as Random Early Detection (RED) which depend on queue length rather than time in queue, and require tuning. Without some mark/drop algorithm such as CoDel, individual flows might fill their FQ_CoDel queues, and while they would not affect other flows, those flows would still suffer bufferbloat.

What Effects Does the FQ_CoDel Algorithm Have?

FQ_Codel has a number of wonderful properties for such a simple algorithm:

  • Simple “request/response” or time based protocols are preferentially scheduled relative to bulk data transport. This means that your VOIP packets, your TCP handshakes, cryptographic associations, your button press in your game, your DHCP or other basic network protocols all get preferential service without the complexity of extensive packet classification, even under very heavy load of other ongoing flows. Your phone call can work well despite large downloads or video use.
  • Dramatically improved service for time sensitive protocols, since ACKS, TCP opens, security associations, DNS lookups, etc., fly through the bottleneck link.
  • Idle applications get immediate service as they return from their idle to active state. This is often due to interactive user’s actions and enabling rapid restart of such flows is preferable to ongoing, time insensitive bulk data transfer.
  • Many/most applications put their most essential information in the first data packets after a connection opens: FQ_CoDel sees that these first packets are quickly forwarded. For example, web browsers need the size information included in an image before being able to complete page layout; this information is typically contained in the first packet of the image. Most later packets in a large flow have less critical time requirements than the first packet(s).
  • Your web browsing is significantly faster, particularly when you share your home connection with others, since head of line blocking is no longer such an issue.
  • Flows that are “well behaved” and do not use more than their share of the available bandwidth at the bottleneck are rewarded for their good citizenship. This provides a positive incentive for good behavior (which the packet pacing work at Google has been exploiting with BBR, for example, which depends on pacing packets). Unpaced packet bursts are poison to “jitter”, that is the variance of latency, which sets how close to real time other algorithms (such as VOIP) need to operate.
  • If there are bulk data flows of differing RTT’s, the algorithm ensures (in the limit) fairness between the flows, solving an issue that cannot itself be solved by TCP.
  • User simplicity: without complex packet classification schemes, FQ_CoDel schedules packets far better than previous rigid classification systems.
  • FQ_CoDel is extremely small, simple and fast and scalable both up and down in bandwidth and # of flows, and down onto very constrained systems. It is safe for FQ_CoDel to be “always on”.

Having an algorithm that “just works” for 99% of cases and attacks the real problem is a huge advantage and simplification. Packet classification need only be used only to provide guarantees when essential, rather than be used to work around bufferbloat. No (sane) human can possibly successfully configure the QOS page found on today’s home router.  Once bufferbloat is removed and flow queuing present, any classification rule set can be tremendously simplified and understood.

Limitations and Warnings

If flow queuing AQM is not present at your bottleneck in your path, nothing you do can save you under load. You can at most move the bottleneck by inserting an artificial bottleneck.

The FQ_CoDel algorithm [RFC 8290], particularly as a queue discipline, is not a panacea for bufferbloat as buffers hide all over today’s systems. There has been about a dozen significant improvements to reduce bufferbloat just in Linux starting with Tom Herbert’s Linux BQL, Eric Dumazet’s TCP Small queues, TSO Sizing and FQ scheduler, just naming a few. Our thanks to everyone in the Linux community for their efforts. FQ_Codel, implemented in its generic form as the Linux queue discipline fq_codel, only solves part of the bufferbloat problem, that found in the qdisc layer. The FQ_CoDel algorithm itself is applicable to a wide variety of circumstances, and not just in networking gear, but even sometimes in applications.

WiFi and Wireless

Good examples of the buffering problems occurs in wireless and other technologies such as DOCSIS and DSL. WiFi is particularly hard: rapid bandwidth variation and packet aggregation means that WiFi drivers must have significant internal buffering. Cellular systems have similar (and different) challenges.

Toke Høiland-Jørgensen, Michał Kazior, Dave Täht, Per Hertig and Anna Brunstrom reported amazing improvements for WiFi at the Usenix Technical Conference. Here, FQ_CoDel algorithm is but part of a more complex algorithm that handles aggregation and transmission air-time fairness (which Linux has heretofore lacked). The results are stunning, and we hope to see similar improvements in other technologies by applying FQ_Codel.

WiFiThis chart shows the latency under load in milliseconds of the new WiFi algorithm, beginning adoption in Linux. The log graph is used to enable plotting results on the same graph. The green curve is for the new driver latency under load (cumulative probability of the latency observed by packets) while the orange graph is for the previous Linux driver implementation. You seldom see 1-2 orders of magnitude improvements in anything. Note that latency can exceed one second under load with existing drivers. And implementation of air-time fairness in this algorithm also significantly increases the total bandwidth available in a busy WiFi network, by using the spectrum more efficiently, while delivering this result! See the above paper for more details and data.

This new driver structure and algorithm is in the process of adoption in Linux, though most drivers do not yet take advantage of what this offers. You should demand your device vendors using Linux update their drivers to the new framework.

While the details may vary for different technologies, we hope the WiFi work will help illuminate techniques appropriate for applying FQ_CoDel (or other flow queuing algorithms, as appropriate) to other technologies such as cellular networks.

Performance and Comparison to Other Work

In the implementations available in Linux, FQ_CoDel has seriously outperformed all comers, as shown in: “The Good, the Bad and the WiFi: Modern AQMs in a residential setting,” by Toke Høiland-Jørgensen, Per Hurtig and Anna Brunstrom. This paper is unique(?) in that it compares the algorithms consistently using the identical tests on all algorithms on the same up to date test hardware and software version, and is an apple-to-apple comparison of running code.

Our favorite benchmark for “latency under load” is the “rrul” test from Toke’s flent test tool. It demonstrates when flow(s) may be damaging the latency of other flow(s) as well as goodput and latency.  Low bandwidth performance is more difficult than high bandwidth. At 1Mbps, a single 1500 byte packet represents 12 milliseconds! This shows fq_codel outperforming all comers in latency, while preserving very high goodput for the available bandwidth. The combination of an effective self adjusting mark/drop algorithm with flow queuing is impossible to beat. The following graph measures goodput versus latency for a number of queue disciplines, at two different bandwidths, of 1 and 10Mbps (not atypical of the low end of WiFi or DSL performance). See the paper for more details and results.


FQ_CoDel radically outperforms CoDel, while solving problems that no conventional TCP mark/drop algorithm can do by itself. The combination of flow queuing with CoDel is a killer improvement, and improves the behavior of CoDel (or PIE), since ACKS are not delayed.

Pfifo_fast previously has been the default queue discipline in Linux (and similar queuing is used on other operating systems): it is for the purposes of this test, a simple FIFO queue. Note that the default length of the FIFO queue is usually ten times longer than that displayed here in most of today’s operating systems. The FIFO queue length here was chosen so that the differences between algorithms would remain easily visible on the same linear plot!

Other self tuning TCP congestion avoidance mark/drop algorithms such as PIE [RFC 8033] are becoming available, and some of those are beginning to be combined with flow queuing to good effect. As you see above, PIE without flow queuing is not competitive (nor is CoDel). An FQ-PIE algorithm, recently implemented for FreeBSD by Rasool Al-Saadi and Grenville Armitage, is more interesting and promising. Again, flow queuing combined with an auto-adjusting mark/drop algorithm is the key to FQ-PIE’s behavior.

Availability of Modern AQM Algorithms

Implementations of both CoDel and FQ_Codel were released in Linux 3.5 in July 2012 and in FreeBSD 11, in October, 2016 (and backported to FreeBSD 10.3); FQ-PIE is only available in FreeBSD (though preliminary patches for Linux exist).

FQ_CoDel has seen extensive testing, deployment and use in Linux since its initial release. There is never a reason to use CoDel, and is present in Linux solely to enable convenient performance comparisons, as in the above results. FQ_CoDel first saw wide use and deployment as part of OpenWrt when it became the default queue discipline there, and used heavily as part of the SQM (Smart Queue Management) system to mitigate “last mile” bufferbloat. FQ_CoDel is now the default queue discipline on most Linux distributions in many circumstances.  FQ_CoDel is being used in an increasing number of commercial products.

The FQ_CoDel based WiFi improvements shown here are available in the latest LEDE/OpenWrt release for a few chips. WiFi chip vendors and router vendors take note: your competitors may be about to pummel you, as this radical improvement is now becoming available. The honor of the commercial product to incorporate this new WiFi code is the Evenroute IQrouter.

PIE, but not FQ-PIE was mandated in DOCSIS 3.1, and is much less effective than FQ_CoDel (or FQ-PIE).

Continuing Work: CAKE

Research in other AQM’s continue apace. FQ_CoDel and FQ-PIE are far from the last words on the topic.

Rate limiting is necessary to mitigate bufferbloat in equipment that has no effective queue management (such as most existing DSL or most Cable modems). While we have used fq_codel in concert with Hierarchical Token Bucket (HTB) rate limiting for years as part of the Smart Queue Management (SQM) scripts (in OpenWrt and elsewhere), one can go further in an integrated queue discipline to address problems difficult to solve piecemeal, and make configuration radically simpler. The CAKE queue discipline, available today in OpenWrt, additionally adds integrated framing compensation, diffserv support and CoDel improvements. We hope to see CAKE upstream in Linux soon. Please come help in its testing and development.

As you see above, the Make WiFi Fast project is making headway and makes WiFi much to be preferred to cellular service when available, much, much more is left to be done. Investigating fixing bufferbloat in WiFi made us aware of just how much additional improvement in WiFi was also possible. Please come help!


Ye men of Interstan, to learning much inclined, see the elephant, by opening your eyes: 

ISP’s, insist your equipment manufacturers support modern flow queuing automatic queue management. Chip manufacturers: ensure your chips implement best practices in the area, or see your competitors win business.

Gamers, stock traders, musicians, and anyone wanting decent telephone or video calls take note. Insist your ISP fix bufferbloat in its network. Linux and RFC 8290 prove you can have your cake and eat it too. Stop suffering.

Insist on up to date flow queuing queue management in devices you buy and use, and insist that bufferbloat be fixed everywhere.

Home products that fix/mitigate bufferbloat…

February 2, 2017

jigsawfish2Updated February 12, 2018 to incorporate the Evenroute WiFi support.

Bufferbloat is the most common underlying cause of most variable bad performance due to latency on the Internet;  latency is called “lag” by gamers.

Trying to steer anything the size of the Internet into a better direction is very slow and difficult at best. From the time changes in the upstream operating systems are complete to when consumers can buy new product is typically four years caused by the broken and insecure ecosystem in the embedded device market. Chip vendors, box vendors, I’m looking at you… So much of what is now finally appearing in the market is based on work that is often four years old. Market pull may do what push has not.

See What to do About Bufferbloat for general information. And the DSLReports Speedtest makes it easy to test for bufferbloat. But new commercial products are becoming increasingly available.  Here’s some of them.


The fq_codel & cake work going on in the bufferbloat project is called SQM – “smart queue management.” This SQM work is specifically targeted at mitigating the bufferbloat in the “last mile,” your cable/DSL/fiber connection, by careful queue management and an artificial bandwidth bottleneck added in your home router (since most modems do no perform flow control to the home router, unfortunately).

Modems require built in AQM algorithms, such as those just beginning to reach the market in DOCSIS 3.1. I just ordered one of these for my house to see if it functions better than the SQM mitigation (almost certainly not), but at least these should not require the manual tuning that SQM does.

To fix bufferbloat in WiFi requires serious changes in the WiFi driver in your home router (which typically runs Linux), and in your device (laptop/phone/tablet).  The device driver work was first released as part of the LEDE project, in January 2017 for initially just a couple of WiFi chip types.

Evenroute IQrouter

First up, I’d like call out the Evenroute IQrouter, which has a variant of SQM that deals with “sag”.

DSL users have often suffered more than other broadband users, due to bad bloat in the modems compounded by minimal bandwidth, so the DSL version of the IQrouter is particularly welcome.   ISP’s (particularly DSL ISP’s) often under provision their back haul, causing “sag” at different times of day/week.  This makes the static configuration techniques we’ve used in LEDE/OpenWrt SQM ineffective, as you have to give away too much bandwidth if a fixed bandwidth is used.  I love the weasel words “up to” some speed used by many ISPs. It is one thing for your service to degrade for a short period of days or weeks while an ISP takes action to provision more bandwidth to an area; it is another for your bandwidth to routinely vary by large amounts for weeks/months and years.

I sent a DSL Evenroute IQrouter to my brother in Pennsylvania and arranged for one for a co-worker, and they are working well, and Rich Brown has had similarly good experiences. Evenroute has been working hard to make the installation experience easy. Best yet, is that the IQrouter is autoconfiguring and figures out for you what to do in the face of “sag” in your Internet service, something that may be a “killer feature” if you suffer lots of “sag” from your ISP. The IQrouter is therefore the first “out of the box” device I can recommend to almost anyone, rather than just my geek friends.

The IQRouter is the first commercial product the very recent wonderful WiFi results of Toke and Dave (more about coming this in a separate post), and has the capability for over the air updates. The new WiFi stack is going upstream into Linux as I write this post. DSL users seldom have enough bandwidth for the WiFi hop to be the bottleneck; so the WiFi work is much more important for Cable and fiber users at higher bandwidth than for DSL users stuck at low bandwidth.

The Evenroute is therefore effective on all technologies, not just DSL. It is particularly important for DSL users, which suffer from sag more often than most, and the IQRouter is getting regular security updates and enhancements, which most home routers lack.

Ubiquiti Edgerouter

I’ve bought an Ubiquiti Edgerouter X on recommendation of Dave Taht but not yet put it into service. Router performance can be an issue on high end cable or fiber service. It is strictly an Ethernet router, lacking WiFi interfaces; but in my house, where the wiring is down in the basement, that’s what I need.  The Edgerouter starts at around $50; the POE version I bought around $75.

The Edgerouter story is pretty neat – Dave Taht did the backport 2? years back. Ubiquti’s user community jumped all over it and polished it up, adding support to their conf tools and GUI, and Ubiquiti recognized what they had and shipped it as part of their next release.

SQM is available in recent releases of Ubituiti’s Edgerouter firmware.  SQM itself is easy to configure. But the Edgerouter overall requires considerable configuration before it is useful in the home environment, however, and its firmware web interface is aimed at IT people rather than most home users. I intend this to replace my primary router TP-Link Archer C7v2 someday soon, as it is faster than the TP-Link since Comcast keeps increasing my bandwidth without asking me.  I wish the Ubiquiti had a “make me into a home router” wizard that would make it immediately usable for most people, as its price is low enough for some home users to be interested in it.   I believe one can install LEDE/OpenWrt on the Edgerouter, which I may do if I find its IT staff oriented web interface too unusable.

LEDE/OpenWrt and BSD for the Geeks

If you are adventurous enough to reflash firmware, anything runnable on OpenWrt/LEDE of the last few years has SQM available. You take the new LEDE release for a spin. If your router has an Ath9k WiFi chip (or a later version of the Ath10k WiFi chip), or you buy a new router with the right chips in them, you can play with the new WiFi goodness now in LEDE (noted above). There is a very wide variety of home routers that can benefit from reflashing. Its web UI is tolerably decent, better than many commercial vendors I have seen.

WiFi chip vendors should take careful note of the stupendous improvements available in the Linux mac802.11 framework for bufferbloat elimination and air time fairness. If you don’t update to the new interfaces and get your code into LEDE, you’re going to be at a great disadvantage to Atheros in the market.

dd-wrt, asuswrt, ipfire, all long ago added support for SQM. It will be interesting to see how long it takes them to pick up the stunning WiFi work.

The pcengines APU2 is a good “DIY” router for higher speeds. Dave has not yet tried LEDE on it yet, but will. He uses it presently on Ubuntu….

BSD users recently got fq_codel in opnsense, so the BSD crowd are making progress.

Other Out of the Box Devices

The Turris Omnia is particularly interesting for very fast broadband service and can run LEDE as well; but unfortunately,  it seems only available in Europe at this time.  We think the Netduma router has SQM support, though it is not entirely clear what they’ve done; it is a bit pricey for my taste, and I don’t happen to know anyone who has one.

Cable Modems

Cable users may find that upgrading to a new DOCSIS 3.1 modem is helpful (though that does not solve WiFi bufferbloat).  The new DOCSIS 3.1 standard requires AQM.  While I don’t believe PIE anywhere as good as fq_codel (lacking flow queuing), the DOCSIS 3.1 standard at least requires an AQM, and PIE should help and does not require manual upstream bandwidth tuning.  Maybe someday we’ll find some fq_codel (or fq_pie) based cable modems.  Here’s hoping…

Under the Covers, Hidden

Many home routers vendors make bold claims they have proprietary cool features, but these are usually smoke and mirrors. Wireless mesh devices without bufferbloat reduction are particularly suspect and most likely to require manual RF engineering beyond most users. They require very high signal strength and transfer rates to avoid the worst of bufferbloat. Adding lots more routers without debloating and not simultaneously attacking transmit power control is a route to WiFi hell for everyone. The LEDE release is the first to have the new WiFi bits needed to make wireless mesh more practical. No one we know of has been working on minimizing transmit power to reduce interference between mesh nodes. So we are very skeptical of these products.

There are now a rapidly increasing number of products out there with SQM goodness under the covers, sometimes implemented well, and sometimes not so well, and more as the months go by.

One major vendor put support for fq_codel/SQM under the covers of one product using a tradename, promptly won an award, but then started using that tradename on inferior products in their product line that did not have real queue management. I can’t therefore vouch for any product line tradename that does not acknowledge publicly how it works and that the tradename means that it really has SQM under the covers. Once burned, three times shy. That product therefore does not deserve a mention due to the behavior of the vendor. “Bait and switch” is not what anyone needs.

Coming Soon…

We have wind of a number of vendors’ plans who have not quite reached the market, but it is up to them to announce their products.

If you find new products or ISP’s that do really well, let us know, particularly if they actually say what they are doing. We need to start some web pages to keep track of commercial products.

Does the Internet need “Governance”?

November 12, 2014

Dave Reed just published a vital piece concerning network neutrality. Everyone interested in the topic should  carefully read and understand  Does the Internet need “Governance”?

One additional example of “light touch” help for the Internet where government may play a role is transparency: the recent MLAB’s report and the fact that Cogent’s actions caused retail ISP’s to look very badly is a case in point. You can follow up on that topic on the MLabs’s mailing list, if you are so inclined. If a carrier can arbitrarily delay/deprioritize traffic in secret, then the market (as there are usually alternatives in transit providers) cannot function well. And if that provider is an effective monopoly for many paths, that becomes a huge problem.

Bufferbloat and Other Challenges

October 6, 2014

Vint Cerf wrote a wonderful piece on the problems I’ve been wrestling with the last number of years, called “Bufferbloat and Other Internet Challenges“. It is funny how one thing leads to another; I started just wanting my home network to work as I knew it should, and started turning over rocks. The swamp we’re in is very deep and dangerous, the security problem the worst of all (and given how widespread bufferbloat is, that’s saying something). The “Other Challenges” dwarf bufferbloat, as large a problem as it is.

I gave a lunch talk at the Berkman Center at Harvard in June on the situation and recommend people read the articles by Bruce Schneier and Dan Geer you will find linked there, which is their takes on the situation I laid out to them (both articles were triggered by the information in that talk).

Dan Geer’s piece is particularly important from a policy perspective.

I also recommend reading “Familiarity Breeds Contempt: The Honeymoon Effect and the Role of Legacy Code in Zero-Day Vulnerabilities“, by Clark, Fry, Blaze and Smith, which makes clear to me that our engineering processes need fundamental reform in the face of very long lived devices. Vulnerability discovery looks very different than normal bug discovery; good examples include heartbleed and shellshock (which thankfully does not affect most such embedded devices, since the ash shell is used in busybox).

In my analysis of the ecosystem, it’s clear that binary blobs are a real long term hazard, and do even short term damage by freezing the ecosystem for devices on old, obsolete software, magnifying the scale of vulnerabilities even on new equipment. But in the long term maintenance and security of devices (examples include your modems and home routers) is nigh impossible. And all devices need ongoing software updates for the life of the devices; the routing devices most of all (since if the network ceases to work, updates become impossible).

“Friends don’t let friends run factory firmware”.

Be safe.

Traditional AQM is not enough!

July 10, 2013

Note: Updated October 24, 2013, to fix some editorial nits, and to clarify the intended point that it is the combination of a working mark/drop algorithm with flow scheduling that is the “killer” innovation, rather than the specifics of today’s fq_codel algorithm.

Latency (called “lag” by gamers), once incurred, cannot be undone, as best first explained by Stuart Cheshire in his rant: “It’s the latency, Stupid.” and more formally in “Latency and the Quest for Interactivity,” and noted recently by Stuart’s 12 year old daughter, who sent Stuart a link to one of the myriad “Lag Kills” tee shirts, coffee bugs, and other items popular among gamers.lag_kills_skeleton_dark_tshirt

Out of the mouth of babes…

Any unnecessary latency is too much latency.

Many networking engineers and researchers express the opinion that 100 milliseconds latency is “good enough”. If the Internet’s worst latency (under load) was 100ms, indeed, we’d be much better off than we are today (and would have space warp technology as well!). But the speed of light and human factors research easily demonstrate this opinion is badly flawed.

Many have understood bufferbloat to be a problem that primarily occurs when a saturating “elephant flowis present on a link; testing for bufferbloat using elephants is very easy, and even a single elephant TCP flow from any modern operating system may fill any size uncontrolled buffer given time, but this is not the only problem we face. The dominant application, the World Wide Web, is anti-social to any other application on the Internet, and its collateral damage is severe.

Solving the latency problem requires a two prong attack.

Read the rest of this entry »

Best Practices for Benchmarking CoDel and FQ CoDel (and almost anything else!)

July 2, 2013

The bufferbloat project has had trouble getting consistent repeatable results from Some puzzle pieces of a picture puzzle.other experimenters, due to a variety of factors. This Wiki page at attempts to identify the most common omissions and mistakes. There be land mines here. Your data will be garbage if you don’t avoid them!

Note that most of these are traps for people doing network research in general, not just bufferbloat research.

Bufferbloat in switches/bridges

June 20, 2013

I received the following question today from Ralph Droms.  I include an edited version of my response to Ralph.

On Thu, Jun 20, 2013 at 9:45 AM, Ralph Droms (rdroms) <rdroms@yyy.zzz> wrote:
Someone suggested to me that bufferbloat might even be worse 
in switches/bridges than in routers.  True fact?  If so, can 
you point me at any published supporting data?

It is hard to quantify as to whether switches or routers are “worse”, and I’ve never tried, nor seen any published systematic data.  I
Some puzzle pieces of a picture puzzle.
wouldn’t believe such data if I saw it, anyway. What matters is whether you have unmanaged buffers before a bottleneck link.

I don’t have first hand information (to just point you at particular product specs; I tend not to try to find out whom is particularly guilty as it can only get me in hot water if I compare particular vendors). I’ve generally dug into the technology to understand how/why buffering is present to understand what I’ve seen.

You can go look at specs of switches yourself and figure out switches have problems from first principles.

Feel free to write a paper!

Here’s what I do know.

Ethernet Switches:

  • The simplest switch case is where you have a 10G or 1G switch being operated at 1G or 100M; you end up 10x or 100x over buffered. I’ve never seen a switch that cuts its internal buffering depending on line rate.   God forbid you happen to have 10Mbit gear still in that network, and Ethernet flow control can cause cascades between switches to to reduce you to the lowest bandwidth….
  • Thankfully, enterprise switch gear does not emit Ethernet pause frames (though honors them if received): but all the commodity switch chips used in cheap unmanaged consumer switches does generate pause frames, that I looked at.  Sigh…
  • As I remember, when I described this kind of buffering problem to a high end router expert at Prague, he started muttering “line cards” at me; it wouldn’t surprise me if the same situation isn’t present in big routers supporting different line rate outputs.  But I’ve not dug into them.
  • We even got caught by this in CeroWrt, where the ethernet bridge chip was misconfigured, and due to jumbo-grams, was initially accidentally 8x overbuffered (resulting in 80-100ms of latency through the local switch in a cheap router, IIRC; Dave Taht will remember the exact details.)
  • I then went and looked at the data sheets of a bunch of integrated cheap switch chips (around 10 of them, as I remember): while some (maybe half) were “correctly” buffered (not that I regard any static configuration as correct!), some had 2-4x more sram in the switch chips than were required for their bandwidth.  So even without the bandwidth switching trap, sometimes the commodity switch chips have too much buffering.  Without statistics of what chips are used in what products, it’s impossible to know how much equipment is affected (though all switches *should* run fq_codel or equivalent, IMHO, knowing what I know now)….
  • I hadn’t even thought about how VLAN’s interacted with buffering until recently. Think about VLAN’s (particularly in combination with Ethernet flow control), and get a further Excedrin headache…About 6 months ago I talked to an engineer who had had terrible problems getting decent, reliable, latency in a customer’s VOIP system. He tracked it down (miraculously) to the fact that the small business (less than 50 employees) was sharing an enterprise switch using VLAN’s for isolation from other tenants in a building.  The other tenants in the building sometimes saturated the switch, and the customer’s VLAN performance for their VOIP TRAFFIC would go to hell in a handbasket (see above about naive sysops not configuring different classes of service correctly).  As the customer was a call center, you can imagine, they were upset.

Ethernet is actually very highly variable bandwidth: we can’t safely treat it as fixed bandwidth! Yet switch designers make this completely unwarranted presumption routinely.

This is part of why I see conventional QOS as a dead-end; most of the need for classic QOS goes away if we properly manage buffers in the first place. Our job as Internet engineers is to build systems that “just work” that system operators can’t mis-configure, or even worse, come from the factory mis-configured to fail under load (which is never properly tested in most customer’ sites).

Enterprise Ethernet Switches

Some enterprise switches sell additional buffer memory as a “feature”! And some of those switches require configuration of their buffer memory across various QOS classes; if you foolishly do nothing, some of them leave all memory configured to a single class and disaster ensues.

What do you think a naive sysop does????  Particularly one who listens to the salesman or literature of the switch vendor about the “feature” of more buffering to avoid dropping packets, and buy such additional RAM?

So the big disasters I’ve heard of are those switches, where deluded naive people have bought yet more buffer memory, and particularly if they fail to configure the switches for QOS classes. That report came off the NANOG list, as I remember, but it was a couple of years ago and I didn’t save the message.

After reading that report I looked at the specs for two or three such enterprise switches and confirmed that this scenario was real, resulting in potentially *very* large buffering (multiple hundreds of milliseconds reaching even to seconds).  IIRC, one switch had decent defaults, but another defaulted to insane behavior.

So the NANOG report of such problems was not only plausible, but certain to happen, and I stopped digging further.  Case closed. But I don’t know how common it is, nor if it is more common than associated routers in the network.

Router Bufferbloat problems

I *think* the worst router problems are in home routers, where we have uncontrolled buffering (often 1280 packets worth) and highly variable bandwidth before the WiFI links and classic AQM algorithms such as WRED are both not present, and if were present, would not be of any use due to highly variable bandwidth. Home routers are certainly located where one of the common bottlenecks in the path are located and therefore are extremely common offenders.  Whether better or worse than broadband hop next to them is also impossible to quantify.

I’ve personally measured up to 8 second latency in my own home without deliberate experiments.  In deliberate experiments I can make latency as large as you like. That’s why we like CoDel (fq_codel in particular) so much: it responds very rapidly to changes in bandwidth, which are perpetual in wireless. Fixing Linux and Linux’s WiFi stack is therefore where we’ve focused (not to mention the code is available, so we can actually do work rather than try to persuade clueless people of their mistakes, which is a difficult road to hoe.  This one is the one we seem to see the most often, along with the hosts and either side of the broadband hop.

The depth and breadth of this swamp is immense. In short, there is bufferbloat everywhere: you have to be systematically paranoid…. 

 But which bufferbloat problem is “worst” is I think, unanswerable. Once we fix one problem, it’s whack-a-mole on the next problem, until the moral sinks home: Any unmanaged buffer is one waiting to get you if it can ever be at a bottleneck link. Somehow we have to educate everyone that static buffers are landmines waiting for the next victim and never acceptable.

TCP Small Queues

October 1, 2012

Some puzzle pieces of a picture puzzle.Linux 3.6 just shipped.  As I’ve noted before, bloat occurs in multiple places in an OS stack (and applications!). If your OS TCP implementation fills transmit queues more than needed, full queues will cause the RTT to increase, etc. , causing TCP to misbehave. Net result: additional latency, with no increase in bandwidth performance. TCP small queues reduces the buffering without sacrificing performance, reducing latency.

To quote the Kernel Newbies page:

TCP small queues is another mechanism designed to fight bufferbloat. TCP Small Queues goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) and < 8ms on 100Mbit (instead of 132 ms).

Eric Dumazet (now at Google) is the author of TSQ. It is covered in more detail at LWN.  Thanks to Eric for his great work!

The combination of TSQ, fq_codel and BQL (Byte Queue Limits) gets us much of the way to solving bufferbloat on Ethernet in Linux. Unfortunately, wireless remains a challenge (the drivers need to have a bunch of packets for 802.11n aggregation, and this occurs below the level that fq_codel can work on), as do other device types.  For example, a particular DSL device we looked at last week has a minimum ring buffer size of 16, again, occurring beneath Linux’s queue discipline layer.  “Smart” hardware has become a major headache. So there is much to be done yet in Linux, much less other operating systems.

I’m attending the International Summit for Community Wireless Networks

September 24, 2012

I will be giving a updated version of my bufferbloat talk there on Saturday, October 6.  The meeting is about community wireless networks (many of which are mesh wireless networks) on which bufferbloat is a particular issue.  It is in Barcelona, Spain, October 4-7.

We tried (and failed) to make ad-hoc mesh networking work when I was at OLPC, and I now know that one of the reasons we were failed was bufferbloat.

I’ll also be giving a talk at the UKNOF (UK Network Operator’s Forum) in London on October 9, but that is now full and there is no space for new registrants.

The First Bufferbloat Battle Won

August 6, 2012

Some puzzle pieces of a picture puzzle.Bufferbloat was covered in a number of sessions at the Vancouver IETF last week.

The most important of these sessions is a great explanation of Kathie Nichols and Van Jacobson’s CoDel (“coddle”) algorithm given during Tuesday’s transport area meeting by Van.  It is not to be missed by serious network engineers. It also touches on why we like fq_codel so much, though I plan to write much more extensively on this topic very soon. CoDel by itself is great, but in combination with SFQ (like) algorithms that segregate flows, the results are stunning; CoDel is the first AQM algorithm which can work across arbitrary number of queues/flows.

The Saturday before the IETF the IAB / IRTF Workshop on Congestion Control for Interactive Real-Time Communication took place. My position paper was my blog entry of several weeks back. In short,  there is no single bullet, though with CoDel we finally have the final missing bullet for its complete solution. The other, equally important but non-technical bullets will be market pressure fix broken software/firmware/hardware all over the Internet: so exposing the bloat problem is vital. You cannot successfully engineer around bufferbloat, but you can detect it, and let users know when they are suffering to enable them to vote with their pocket books. In one of the later working groups, someone coined the term “net-sux” index, though I hope we can find something more marketable.

In the ICCRG (Internet Congestion Control Research Group) meeting, I covered research related topics including global topics, algorithmic questions, data acquisition and analysis needs, and needed tools for diagnosis.

Thursday included the RMCAT BOF. With the on-going deployment of large scale real time teleconferencing systems, congestion avoidance algorithms are becoming of pressing concern. TCP has integrated congestion avoidance algorithms, but RTP does not currently have equivalent mechanism. So long as RTP’s useage is low in the Internet, this is not a major issue; but classic 1980’s congestion collapse could occur should those rise to dominate Internet traffic. I was asked to cover AQM and Bufferbloat to help set context for the ensuing discussion. I covered the current status in brief and then added a bit of heresy. With a slight amount of forethought, we could arrange that someday real time media and AQM algorithms interact in novel ways. Detection (and preferably correct assignment of blame) is key to getting bufferbloat cleaned up.

In short, we’ve won the firstbattle for the hearts and minds of engineers who build the Internet and the tools are present to build the weapons to solve bufferbloat; but the campaign to fix the Internet will long and difficult.