Does the Internet need “Governance”?

November 12, 2014

Dave Reed just published a vital piece concerning network neutrality. Everyone interested in the topic should  carefully read and understand  Does the Internet need “Governance”?

One additional example of “light touch” help for the Internet where government may play a role is transparency: the recent MLAB’s report and the fact that Cogent’s actions caused retail ISP’s to look very badly is a case in point. You can follow up on that topic on the MLabs’s mailing list, if you are so inclined. If a carrier can arbitrarily delay/deprioritize traffic in secret, then the market (as there are usually alternatives in transit providers) cannot function well. And if that provider is an effective monopoly for many paths, that becomes a huge problem.

Bufferbloat and Other Challenges

October 6, 2014

Vint Cerf wrote a wonderful piece on the problems I’ve been wrestling with the last number of years, called “Bufferbloat and Other Internet Challenges“. It is funny how one thing leads to another; I started just wanting my home network to work as I knew it should, and started turning over rocks. The swamp we’re in is very deep and dangerous, the security problem the worst of all (and given how widespread bufferbloat is, that’s saying something). The “Other Challenges” dwarf bufferbloat, as large a problem as it is.

I gave a lunch talk at the Berkman Center at Harvard in June on the situation and recommend people read the articles by Bruce Schneier and Dan Geer you will find linked there, which is their takes on the situation I laid out to them (both articles were triggered by the information in that talk).

Dan Geer’s piece is particularly important from a policy perspective.

I also recommend reading “Familiarity Breeds Contempt: The Honeymoon Effect and the Role of Legacy Code in Zero-Day Vulnerabilities“, by Clark, Fry, Blaze and Smith, which makes clear to me that our engineering processes need fundamental reform in the face of very long lived devices. Vulnerability discovery looks very different than normal bug discovery; good examples include heartbleed and shellshock (which thankfully does not affect most such embedded devices, since the ash shell is used in busybox).

In my analysis of the ecosystem, it’s clear that binary blobs are a real long term hazard, and do even short term damage by freezing the ecosystem for devices on old, obsolete software, magnifying the scale of vulnerabilities even on new equipment. But in the long term maintenance and security of devices (examples include your modems and home routers) is nigh impossible. And all devices need ongoing software updates for the life of the devices; the routing devices most of all (since if the network ceases to work, updates become impossible).

“Friends don’t let friends run factory firmware”.

Be safe.

Traditional AQM is not enough!

July 10, 2013

Note: Updated October 24, 2013, to fix some editorial nits, and to clarify the intended point that it is the combination of a working mark/drop algorithm with flow scheduling that is the “killer” innovation, rather than the specifics of today’s fq_codel algorithm.

Latency (called “lag” by gamers), once incurred, cannot be undone, as best first explained by Stuart Cheshire in his rant: “It’s the latency, Stupid.” and more formally in “Latency and the Quest for Interactivity,” and noted recently by Stuart’s 12 year old daughter, who sent Stuart a link to one of the myriad “Lag Kills” tee shirts, coffee bugs, and other items popular among gamers.lag_kills_skeleton_dark_tshirt

Out of the mouth of babes…

Any unnecessary latency is too much latency.

Many networking engineers and researchers express the opinion that 100 milliseconds latency is “good enough”. If the Internet’s worst latency (under load) was 100ms, indeed, we’d be much better off than we are today (and would have space warp technology as well!). But the speed of light and human factors research easily demonstrate this opinion is badly flawed.

Many have understood bufferbloat to be a problem that primarily occurs when a saturating “elephant flowis present on a link; testing for bufferbloat using elephants is very easy, and even a single elephant TCP flow from any modern operating system may fill any size uncontrolled buffer given time, but this is not the only problem we face. The dominant application, the World Wide Web, is anti-social to any other application on the Internet, and its collateral damage is severe.

Solving the latency problem requires a two prong attack.

Read the rest of this entry »

Best Practices for Benchmarking CoDel and FQ CoDel (and almost anything else!)

July 2, 2013

The bufferbloat project has had trouble getting consistent repeatable results from Some puzzle pieces of a picture puzzle.other experimenters, due to a variety of factors. This Wiki page at attempts to identify the most common omissions and mistakes. There be land mines here. Your data will be garbage if you don’t avoid them!

Note that most of these are traps for people doing network research in general, not just bufferbloat research.

Bufferbloat in switches/bridges

June 20, 2013

I received the following question today from Ralph Droms.  I include an edited version of my response to Ralph.

On Thu, Jun 20, 2013 at 9:45 AM, Ralph Droms (rdroms) <rdroms@yyy.zzz> wrote:
Someone suggested to me that bufferbloat might even be worse 
in switches/bridges than in routers.  True fact?  If so, can 
you point me at any published supporting data?

It is hard to quantify as to whether switches or routers are “worse”, and I’ve never tried, nor seen any published systematic data.  I
Some puzzle pieces of a picture puzzle.
wouldn’t believe such data if I saw it, anyway. What matters is whether you have unmanaged buffers before a bottleneck link.

I don’t have first hand information (to just point you at particular product specs; I tend not to try to find out whom is particularly guilty as it can only get me in hot water if I compare particular vendors). I’ve generally dug into the technology to understand how/why buffering is present to understand what I’ve seen.

You can go look at specs of switches yourself and figure out switches have problems from first principles.

Feel free to write a paper!

Here’s what I do know.

Ethernet Switches:

  • The simplest switch case is where you have a 10G or 1G switch being operated at 1G or 100M; you end up 10x or 100x over buffered. I’ve never seen a switch that cuts its internal buffering depending on line rate.   God forbid you happen to have 10Mbit gear still in that network, and Ethernet flow control can cause cascades between switches to to reduce you to the lowest bandwidth….
  • Thankfully, enterprise switch gear does not emit Ethernet pause frames (though honors them if received): but all the commodity switch chips used in cheap unmanaged consumer switches does generate pause frames, that I looked at.  Sigh…
  • As I remember, when I described this kind of buffering problem to a high end router expert at Prague, he started muttering “line cards” at me; it wouldn’t surprise me if the same situation isn’t present in big routers supporting different line rate outputs.  But I’ve not dug into them.
  • We even got caught by this in CeroWrt, where the ethernet bridge chip was misconfigured, and due to jumbo-grams, was initially accidentally 8x overbuffered (resulting in 80-100ms of latency through the local switch in a cheap router, IIRC; Dave Taht will remember the exact details.)
  • I then went and looked at the data sheets of a bunch of integrated cheap switch chips (around 10 of them, as I remember): while some (maybe half) were “correctly” buffered (not that I regard any static configuration as correct!), some had 2-4x more sram in the switch chips than were required for their bandwidth.  So even without the bandwidth switching trap, sometimes the commodity switch chips have too much buffering.  Without statistics of what chips are used in what products, it’s impossible to know how much equipment is affected (though all switches *should* run fq_codel or equivalent, IMHO, knowing what I know now)….
  • I hadn’t even thought about how VLAN’s interacted with buffering until recently. Think about VLAN’s (particularly in combination with Ethernet flow control), and get a further Excedrin headache…About 6 months ago I talked to an engineer who had had terrible problems getting decent, reliable, latency in a customer’s VOIP system. He tracked it down (miraculously) to the fact that the small business (less than 50 employees) was sharing an enterprise switch using VLAN’s for isolation from other tenants in a building.  The other tenants in the building sometimes saturated the switch, and the customer’s VLAN performance for their VOIP TRAFFIC would go to hell in a handbasket (see above about naive sysops not configuring different classes of service correctly).  As the customer was a call center, you can imagine, they were upset.

Ethernet is actually very highly variable bandwidth: we can’t safely treat it as fixed bandwidth! Yet switch designers make this completely unwarranted presumption routinely.

This is part of why I see conventional QOS as a dead-end; most of the need for classic QOS goes away if we properly manage buffers in the first place. Our job as Internet engineers is to build systems that “just work” that system operators can’t mis-configure, or even worse, come from the factory mis-configured to fail under load (which is never properly tested in most customer’ sites).

Enterprise Ethernet Switches

Some enterprise switches sell additional buffer memory as a “feature”! And some of those switches require configuration of their buffer memory across various QOS classes; if you foolishly do nothing, some of them leave all memory configured to a single class and disaster ensues.

What do you think a naive sysop does????  Particularly one who listens to the salesman or literature of the switch vendor about the “feature” of more buffering to avoid dropping packets, and buy such additional RAM?

So the big disasters I’ve heard of are those switches, where deluded naive people have bought yet more buffer memory, and particularly if they fail to configure the switches for QOS classes. That report came off the NANOG list, as I remember, but it was a couple of years ago and I didn’t save the message.

After reading that report I looked at the specs for two or three such enterprise switches and confirmed that this scenario was real, resulting in potentially *very* large buffering (multiple hundreds of milliseconds reaching even to seconds).  IIRC, one switch had decent defaults, but another defaulted to insane behavior.

So the NANOG report of such problems was not only plausible, but certain to happen, and I stopped digging further.  Case closed. But I don’t know how common it is, nor if it is more common than associated routers in the network.

Router Bufferbloat problems

I *think* the worst router problems are in home routers, where we have uncontrolled buffering (often 1280 packets worth) and highly variable bandwidth before the WiFI links and classic AQM algorithms such as WRED are both not present, and if were present, would not be of any use due to highly variable bandwidth. Home routers are certainly located where one of the common bottlenecks in the path are located and therefore are extremely common offenders.  Whether better or worse than broadband hop next to them is also impossible to quantify.

I’ve personally measured up to 8 second latency in my own home without deliberate experiments.  In deliberate experiments I can make latency as large as you like. That’s why we like CoDel (fq_codel in particular) so much: it responds very rapidly to changes in bandwidth, which are perpetual in wireless. Fixing Linux and Linux’s WiFi stack is therefore where we’ve focused (not to mention the code is available, so we can actually do work rather than try to persuade clueless people of their mistakes, which is a difficult road to hoe.  This one is the one we seem to see the most often, along with the hosts and either side of the broadband hop.

The depth and breadth of this swamp is immense. In short, there is bufferbloat everywhere: you have to be systematically paranoid…. 

 But which bufferbloat problem is “worst” is I think, unanswerable. Once we fix one problem, it’s whack-a-mole on the next problem, until the moral sinks home: Any unmanaged buffer is one waiting to get you if it can ever be at a bottleneck link. Somehow we have to educate everyone that static buffers are landmines waiting for the next victim and never acceptable.

TCP Small Queues

October 1, 2012

Some puzzle pieces of a picture puzzle.Linux 3.6 just shipped.  As I’ve noted before, bloat occurs in multiple places in an OS stack (and applications!). If your OS TCP implementation fills transmit queues more than needed, full queues will cause the RTT to increase, etc. , causing TCP to misbehave. Net result: additional latency, with no increase in bandwidth performance. TCP small queues reduces the buffering without sacrificing performance, reducing latency.

To quote the Kernel Newbies page:

TCP small queues is another mechanism designed to fight bufferbloat. TCP Small Queues goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) and < 8ms on 100Mbit (instead of 132 ms).

Eric Dumazet (now at Google) is the author of TSQ. It is covered in more detail at LWN.  Thanks to Eric for his great work!

The combination of TSQ, fq_codel and BQL (Byte Queue Limits) gets us much of the way to solving bufferbloat on Ethernet in Linux. Unfortunately, wireless remains a challenge (the drivers need to have a bunch of packets for 802.11n aggregation, and this occurs below the level that fq_codel can work on), as do other device types.  For example, a particular DSL device we looked at last week has a minimum ring buffer size of 16, again, occurring beneath Linux’s queue discipline layer.  “Smart” hardware has become a major headache. So there is much to be done yet in Linux, much less other operating systems.

I’m attending the International Summit for Community Wireless Networks

September 24, 2012

I will be giving a updated version of my bufferbloat talk there on Saturday, October 6.  The meeting is about community wireless networks (many of which are mesh wireless networks) on which bufferbloat is a particular issue.  It is in Barcelona, Spain, October 4-7.

We tried (and failed) to make ad-hoc mesh networking work when I was at OLPC, and I now know that one of the reasons we were failed was bufferbloat.

I’ll also be giving a talk at the UKNOF (UK Network Operator’s Forum) in London on October 9, but that is now full and there is no space for new registrants.

The First Bufferbloat Battle Won

August 6, 2012

Some puzzle pieces of a picture puzzle.Bufferbloat was covered in a number of sessions at the Vancouver IETF last week.

The most important of these sessions is a great explanation of Kathie Nichols and Van Jacobson’s CoDel (“coddle”) algorithm given during Tuesday’s transport area meeting by Van.  It is not to be missed by serious network engineers. It also touches on why we like fq_codel so much, though I plan to write much more extensively on this topic very soon. CoDel by itself is great, but in combination with SFQ (like) algorithms that segregate flows, the results are stunning; CoDel is the first AQM algorithm which can work across arbitrary number of queues/flows.

The Saturday before the IETF the IAB / IRTF Workshop on Congestion Control for Interactive Real-Time Communication took place. My position paper was my blog entry of several weeks back. In short,  there is no single bullet, though with CoDel we finally have the final missing bullet for its complete solution. The other, equally important but non-technical bullets will be market pressure fix broken software/firmware/hardware all over the Internet: so exposing the bloat problem is vital. You cannot successfully engineer around bufferbloat, but you can detect it, and let users know when they are suffering to enable them to vote with their pocket books. In one of the later working groups, someone coined the term “net-sux” index, though I hope we can find something more marketable.

In the ICCRG (Internet Congestion Control Research Group) meeting, I covered research related topics including global topics, algorithmic questions, data acquisition and analysis needs, and needed tools for diagnosis.

Thursday included the RMCAT BOF. With the on-going deployment of large scale real time teleconferencing systems, congestion avoidance algorithms are becoming of pressing concern. TCP has integrated congestion avoidance algorithms, but RTP does not currently have equivalent mechanism. So long as RTP’s useage is low in the Internet, this is not a major issue; but classic 1980’s congestion collapse could occur should those rise to dominate Internet traffic. I was asked to cover AQM and Bufferbloat to help set context for the ensuing discussion. I covered the current status in brief and then added a bit of heresy. With a slight amount of forethought, we could arrange that someday real time media and AQM algorithms interact in novel ways. Detection (and preferably correct assignment of blame) is key to getting bufferbloat cleaned up.

In short, we’ve won the firstbattle for the hearts and minds of engineers who build the Internet and the tools are present to build the weapons to solve bufferbloat; but the campaign to fix the Internet will long and difficult.

The Internet is Broken, and How to Fix It

June 26, 2012

Some puzzle pieces of a picture puzzle.

Many real time applications such as VOIP, gaming,  teleconferencing, and performing music together, require low latency. These are increasingly unusable in today’s internet, and not because there is insufficient bandwidth, but that we’ve failed to look at the Internet as a end to end system. The edge of the Internet now often runs congested. When it does, bufferbloat causes performance to fall off a cliff.

Where once a home user’s Internet connection consisted of a single computer, it now consists of a dozen or more devices – smart phones, TV’s, Apple TV’s/Roku devices, tablet devices, home security equipment, and one or more computer per household member. More Internet connected devices are arriving every year, which often perform background activities without user’s intervention, inducing transients on the network. These devices need to effectively share the edge connection, in order to make each user happy. All can induce congestion and bufferbloat that baffle most Internet users.

The CoDel (“coddle”) AQM algorithm provides the “missing link” necessary for good TCP behavior and solving bufferbloat. But CoDel by itself is insufficient to solve provide reliable, predictable low latency performance in today’s Internet.

Bottlenecks are most common at the “edge” of the Internet and there you must be very careful to avoid queuing delays of all sorts. Your share of a busy 802.11 conference network (or a marginal WiFi connection, or one in a congested location) might be 1Mb/second, at which speed a single packet represents 13 milliseconds. Your share of a DSL connection in the developing world may similarly limited. Small business often supports many people on limited bandwidth. Budget motels commonly use single broadband connections among all guests.

Only a few packets can ruin your whole day!  A single IW10 TCP open has immediately blown any telephony jitter budget at 1Mbps (which is about 16x the bandwidth of conventional POTS telephony).

Ongoing technology changes makes the problem more challenging. These include:

  • Changes to TCP, including the IW10 initial window changes and window scaling.
  • NIC Offload engines generate bursts of line rate packet streams at multi-gigabit rates. These features are now “on” by default even in cheap consumer hardware including home routers, and certainly in data centers. Whether this is advisable (it is not…) is orthogonal to the reality of deployed hardware and current device drivers and default settings.
  • Deployment of “abusive” applications (e.g. HTTP/1.1 using many > 2 TCP connections, sharded web sites, BitTorrent). As systems designers, we need to remove the incentives for such abusive application behavior, while protecting the user’s experience. Network engineers must presume software engineers will optimize their application performance, even to the detriment of other uses of the Internet, as the abuse of HTTP by web browsers and servers demonstrates.
  • The rapidly increasing number of devices sharing home and small office links.
All of these factors contribute to large line rate bursts of packets crossing the Internet to arrive at a user’s edge network, whether in his broadband connection, or more commonly, in their home router.
Read the rest of this entry »

The Bufferbloat Bandwidth Death March

May 23, 2012

Some puzzle pieces of a picture puzzle.Latency much more than bandwidth governs actual internet “speed”, as best expressed in written form by Stuart Chesire’s It’s the Latency, Stupid rant and more formally in Latency and the Quest for Interactivity.

Speed != bandwidth despite all of what an ISP’s marketing department will tell you. This misconception is reflected up to and including FCC Commissioner Julius Genachowski, and is common even among technologists who should know better, and believed by the general public. You pick an airplane to fly across the ocean, rather than a ship, even though the capacity of the ship may be far higher.

Read the rest of this entry »