Archive for the ‘Uncategorized’ Category

Bufferbloat in Action due to Covid-19

April 22, 2020

Four people now live and work in my home 24×7; my wife Andi, her mother, my daughter and myself. Many of you now live in similar situations.jigsawfish2

Very occasionally, everyone will have network trouble, such as occurred to us this morning. Sometimes it is our “last mile” connection: it is easy to see these failures in our cable modem log. (Often available by looking at the address 192.168.100.1, which seems to be the default address for cable modems.). Occasionally it can be the ISP (in our case, Comcast), either due to some routing failure or DNS failure. These can be harder to diagnose.

Bufferbloat, however, is insidious. It comes and goes, and most users have been “trained” to ignore temporary bad behavior over many years. When you go to diagnose your network, you usually stop the operation that is causing bufferbloat. This blog has recorded our efforts to fix bufferbloat. Now that there are many more people at home at the same time trying to use more demanding applications, this problem is much more common. Other people in your home can inflict the bufferbloat problem on you without anyone understanding what is happening.

Yesterday afternoon I was helping my wife learn how to edit videos that she now makes and uploads for her students, and one of her co-workers, who we’ll call  “Carleen”, called on a video class planning. Those of you who wonder what teachers do when not teaching should understand that not only do teachers grade materials and give other feedback to students, but also spend much of their time preparing future classes. The amount of “prep” time diminishes the more often a good teacher has taught a given subject, but never drops to anything close to zero. At the moment, learning to teach remotely is an extreme extra burden of preparation.

In the first phase of learning from home, assignments to students consisted of watching videos and performing on line work sheets. None of these activities are latency sensitive. As of this week, the next phase of instruction includes teachers attempting more “conventional” instruction via on-line teleconferencing (Google Meet, in their case). Teleconferencing is by its nature very latency sensitive.

Carleen shares her home with her husband and two sons of age seven and ten. She reported that yesterday her class had been aborted entirely due to intermittent network problems caused by her kids playing games. She has now banished her kids from their video games during her class times, but this cure may be worse than the disease with two bored children in the house; only time will tell. Transient problems caused by bufferbloat now really matter to her, and to her classes, whether they know it or not.

In my early bufferbloat talks, I called bufferbloat the “The Internet is Slow Today, Daddy” problem. But anyone can inflict pain on others sharing a connection if bufferbloat is present; it is just as much a “Kids, quit what you are doing so I can teach” problem.

Thankfully, I can now make concrete recommendations on how you can solve bufferbloat in your environment. My “go to” recommendation for non-geeks is currently the EvenRoute IQRouter. This device will both mitigate bufferbloat in your “last mile”, and fix it in the WiFi link to your device, without any manual tuning. At this moment, I am not aware of any other home router that deals with bufferbloat both in the “last mile” and in WiFi; either or both can be a problem at any given instant.

Needless to say, I recommended that Carleen buy one of those routers.

We do not pay teachers anything like what they are worth. I wish that all ISPs would increase the upstream bandwidth of all connections to teachers for the duration of the Covid-19 crisis, which would help diminish the bufferbloat problem (bufferbloat is generally most severe in the upstream link, and many teachers cannot afford expensive tiers of Internet service). Those hurt most by bufferbloat are those with the most minimal service:  impecunious teachers, and also the students whose parents are least able to afford higher tier internet service. The bufferbloat problem therefore affects children of both rich and poor, directly or indirectly.

A few network speed test sites have tests for bufferbloat, such as DSLReports Speedtest but there are many ways to test explicitly for bufferbloat, as outlined here.

Does the Internet need “Governance”?

November 12, 2014

Dave Reed just published a vital piece concerning network neutrality. Everyone interested in the topic should  carefully read and understand  Does the Internet need “Governance”?

One additional example of “light touch” help for the Internet where government may play a role is transparency: the recent MLAB’s report and the fact that Cogent’s actions caused retail ISP’s to look very badly is a case in point. You can follow up on that topic on the MLabs’s mailing list, if you are so inclined. If a carrier can arbitrarily delay/deprioritize traffic in secret, then the market (as there are usually alternatives in transit providers) cannot function well. And if that provider is an effective monopoly for many paths, that becomes a huge problem.

Bufferbloat and Other Challenges

October 6, 2014

Vint Cerf wrote a wonderful piece on the problems I’ve been wrestling with the last number of years, called “Bufferbloat and Other Internet Challenges“. It is funny how one thing leads to another; I started just wanting my home network to work as I knew it should, and started turning over rocks. The swamp we’re in is very deep and dangerous, the security problem the worst of all (and given how widespread bufferbloat is, that’s saying something). The “Other Challenges” dwarf bufferbloat, as large a problem as it is.

I gave a lunch talk at the Berkman Center at Harvard in June on the situation and recommend people read the articles by Bruce Schneier and Dan Geer you will find linked there, which is their takes on the situation I laid out to them (both articles were triggered by the information in that talk).

Dan Geer’s piece is particularly important from a policy perspective.

I also recommend reading “Familiarity Breeds Contempt: The Honeymoon Effect and the Role of Legacy Code in Zero-Day Vulnerabilities“, by Clark, Fry, Blaze and Smith, which makes clear to me that our engineering processes need fundamental reform in the face of very long lived devices. Vulnerability discovery looks very different than normal bug discovery; good examples include heartbleed and shellshock (which thankfully does not affect most such embedded devices, since the ash shell is used in busybox).

In my analysis of the ecosystem, it’s clear that binary blobs are a real long term hazard, and do even short term damage by freezing the ecosystem for devices on old, obsolete software, magnifying the scale of vulnerabilities even on new equipment. But in the long term maintenance and security of devices (examples include your modems and home routers) is nigh impossible. And all devices need ongoing software updates for the life of the devices; the routing devices most of all (since if the network ceases to work, updates become impossible).

“Friends don’t let friends run factory firmware”.

Be safe.

TCP Small Queues

October 1, 2012

Some puzzle pieces of a picture puzzle.Linux 3.6 just shipped.  As I’ve noted before, bloat occurs in multiple places in an OS stack (and applications!). If your OS TCP implementation fills transmit queues more than needed, full queues will cause the RTT to increase, etc. , causing TCP to misbehave. Net result: additional latency, with no increase in bandwidth performance. TCP small queues reduces the buffering without sacrificing performance, reducing latency.

To quote the Kernel Newbies page:

TCP small queues is another mechanism designed to fight bufferbloat. TCP Small Queues goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) and < 8ms on 100Mbit (instead of 132 ms).

Eric Dumazet (now at Google) is the author of TSQ. It is covered in more detail at LWN.  Thanks to Eric for his great work!

The combination of TSQ, fq_codel and BQL (Byte Queue Limits) gets us much of the way to solving bufferbloat on Ethernet in Linux. Unfortunately, wireless remains a challenge (the drivers need to have a bunch of packets for 802.11n aggregation, and this occurs below the level that fq_codel can work on), as do other device types.  For example, a particular DSL device we looked at last week has a minimum ring buffer size of 16, again, occurring beneath Linux’s queue discipline layer.  “Smart” hardware has become a major headache. So there is much to be done yet in Linux, much less other operating systems.

The First Bufferbloat Battle Won

August 6, 2012

Some puzzle pieces of a picture puzzle.Bufferbloat was covered in a number of sessions at the Vancouver IETF last week.

The most important of these sessions is a great explanation of Kathie Nichols and Van Jacobson’s CoDel (“coddle”) algorithm given during Tuesday’s transport area meeting by Van.  It is not to be missed by serious network engineers. It also touches on why we like fq_codel so much, though I plan to write much more extensively on this topic very soon. CoDel by itself is great, but in combination with SFQ (like) algorithms that segregate flows, the results are stunning; CoDel is the first AQM algorithm which can work across arbitrary number of queues/flows.

The Saturday before the IETF the IAB / IRTF Workshop on Congestion Control for Interactive Real-Time Communication took place. My position paper was my blog entry of several weeks back. In short,  there is no single bullet, though with CoDel we finally have the final missing bullet for its complete solution. The other, equally important but non-technical bullets will be market pressure fix broken software/firmware/hardware all over the Internet: so exposing the bloat problem is vital. You cannot successfully engineer around bufferbloat, but you can detect it, and let users know when they are suffering to enable them to vote with their pocket books. In one of the later working groups, someone coined the term “net-sux” index, though I hope we can find something more marketable.

In the ICCRG (Internet Congestion Control Research Group) meeting, I covered research related topics including global topics, algorithmic questions, data acquisition and analysis needs, and needed tools for diagnosis.

Thursday included the RMCAT BOF. With the on-going deployment of large scale real time teleconferencing systems, congestion avoidance algorithms are becoming of pressing concern. TCP has integrated congestion avoidance algorithms, but RTP does not currently have equivalent mechanism. So long as RTP’s useage is low in the Internet, this is not a major issue; but classic 1980’s congestion collapse could occur should those rise to dominate Internet traffic. I was asked to cover AQM and Bufferbloat to help set context for the ensuing discussion. I covered the current status in brief and then added a bit of heresy. With a slight amount of forethought, we could arrange that someday real time media and AQM algorithms interact in novel ways. Detection (and preferably correct assignment of blame) is key to getting bufferbloat cleaned up.

In short, we’ve won the firstbattle for the hearts and minds of engineers who build the Internet and the tools are present to build the weapons to solve bufferbloat; but the campaign to fix the Internet will long and difficult.

The Bufferbloat Bandwidth Death March

May 23, 2012

Some puzzle pieces of a picture puzzle.Latency much more than bandwidth governs actual internet “speed”, as best expressed in written form by Stuart Chesire’s It’s the Latency, Stupid rant and more formally in Latency and the Quest for Interactivity.

Speed != bandwidth despite all of what an ISP’s marketing department will tell you. This misconception is reflected up to and including FCC Commissioner Julius Genachowski, and is common even among technologists who should know better, and believed by the general public. You pick an airplane to fly across the ocean, rather than a ship, even though the capacity of the ship may be far higher.

(more…)

Apple Patents Portrait-Landscape Flipping: the patent system is broken…

July 18, 2011

I noticed with interest Slashdot’s article last week on Apple Patenting Portrait-Landscape flipping based on control of one or more accelerometers in Slashdot last week.  As I work at Bell Labs these days, I don’t read patents, so I’ll just go on the summary I read there.

Here’s some prior art from June 2001.  In that period, at Compaq/HP’s Cambridge Research Laboratory, we had ported Linux to the iPAQ handheld (with touch screen & expansion capability). Colleagues of mine, including Jamey Hicks, Andy Christian, Frank Bomba, Ben Curis had built an expansion pack for the iPAQ, called the BackPAQ (just like Apple has an I fetish, Compaq had a paq fetish and liked “i”s as well), with accelerometer, camera, and additional expansion capability including additional battery, for our (and other’s) research as part of “Project Mercury”; it was obvious that such devices would become standard in short order, but no device at the time had them integral. Quite a few BackPAQ’s were built and distributed to researchers around the world (small number of hundreds, if I remember correctly). We wrote some papers, distributed a bunch of BackPAQ’s to like minded researchers around the world, and demonstrated the code at the Usenix conference and elsewhere, and published all the code on handhelds.org (which seems down at the moment). The probability of Apple employees having seen this device and it rotating the screen is an absolute certainty; not only did we show the BackPAQ off at numerous conferences, but we built significant numbers used at universities.

It was blindingly obvious to us that hooking up the accelerometer to be able to rotate the screen would be “a good idea”.  Keith Packard and I wrote the xrandr X Window System extension specifically to support screen rotation, for the iPAQ handheld using his TinyX driver (the X extension then became a standard part of the X Window System releases in X.org).  I wrote (in an hour or two) the first version of the xaccel daemon that took the accelerometer data and controlled the screen rotation.  I first packaged it (in ipgk format, for the iPAQ Familiar Linux distribution) on June 11, 2001 to enable the code’s distribution. Ironically, I like what I remember of xaccel’s behaviour better than what I now see on the iPhone and the iPad I own.

SProject Mercury BackPAQince I can’t go reading Apple’s patent itself, I’ll just note:

  • This is a handheld device, with 802.11 wireless (later versions of the iPAQ became phones).
  • It has a touch screen
  • It has an accelerometer in the BackPaq
  • It used the data from the accelerometer with simple heuristics to control the orientation (portrait or landscape) of the screen (in this case, running the X Window System
Now, maybe you’d like to quibble and claim the idea of putting an accelerometer in a hand-held device is non-obvious.  I think it was pretty obvious, myself, and doing that goes to the group working on Project Mercury. I don’t remember any patent being filed there. And having done so, it seemed obvious to hook it up to the screen. I know we did not file any patents. Are either of these ideas worth a patent? Personally, I think both ideas are pretty obvious, the first idea more original than the second.
But I’m sure the first handheld device with touch screen, with accelerometer, rotating the screen under control of that accelerometer was in my hand running my code below, sometime in the year 2000 or 2001 (I haven’t tried to excavate the exact date),  and that it was widely published on the Internet and used by hundreds of people.
Since handhelds.org seems down at the moment, I spent 5 minutes digging around for the code itself elsewhere.  It’s short enough I include it below (looks like the copyright notice got cut and pasted from the xrandr code); it was called xaccel.c, strangely.

Update 1:

Comments make it clear I fired before aiming carefully: the patent at question apparently is on multitouch gestural overrides to accelerometer screen flipping, apparently. If so, my apologies to Apple.

We have three problems here:

  1. prior art, which may not apply to my example certainly we did not have a multi-touch screen to play with and did not explore that area.
  2. Obviousness may be in the eye of the beholder, but certainly I’ve seen ideas which were non-obvious. The current broken patent system is encouraging filing of patents just for protection of every trivial idea, and to use as weapons against competitors, whether there is merit in them or not.
  3. the treble damages problem, which is why I did not go read the patent in the first place, and stifles actual innovation (independent of whether you thing software patents are a good or bad idea, being unable to know what is going on elsewhere defeats part of the original bargain of why patents were granted in the first place.
And I still like my algorithm better than what I experience on the iPad, which often flips the screen when I don’t want it to flip and begs out for overriding.

Update 2

Jaharks of CMU in a comment below notes that the Itsy folks did gesture based screen rotation on the Itsy.  Quite a few Itsy‘s (the spiritual predecessor to the iPAQ, and to my knowledge the first handheld device to run Linux, and the inspiration/cause of our handhelds.org work) were built and distributed to universities, along with the source code.

(more…)

Rant warning: there is no single right answer for buffering, ever… (part 2)

July 9, 2011

It’s clear my previous post was ill formed.  Let me clarify a bit.

  1. I’m really, really, really, happy to see the work in Tomato USB w/reduced buffer bloat, as I am in the work going on to control buffering in DOCSIS (cable modems). Let me make this clear up front as somewhat of an apology to hechacker1.The enemy of the good is the perfect, and we can and should do what we can quickly to suffer less.
  2. Work to fix bufferbloat is going to be both a “do what we can right away” activity, as well as a long term fundamental redesign problem.  I am deeply unhappy as to quite the depth of that redesign problem.
I’d been thinking of the buffer management problem as a two part problem; the OS level queuing and buffering has been divorced from the device drivers, and how to better integrate the buffer management.  But hechacker1’s post made it clear it was yet more complex. Somehow, we need to get to a more intelligent unified view of queuing across all of these to handle the dynamic range found in today’s networks. Linux queue disciplines themselves may have independent buffering as well as in this example. The integration problem  is therefore more deep. I expect we’ll find similar issues elsewhere in other systems too.
Van Jacobson had warned me last fall of just how challenging the buffering problem was, and I had understood (part) of what he had told me; but it’s clear it has yet more dimensions than I had appreciated then.  That’s my deep unhappiness.

Rant warning: there is no single right answer for buffering, ever…

July 6, 2011

I was just doing my usual google for bufferbloat, to see what’s going on out there….

I came across Tomato USB w/reduced buffer bloat; it left me deeply unhappy.

I’m unhappy as it’s missing a fundamental point to the bufferbloat mess: there is never any single right answer for buffering in a general purpose system.  To give a simple example: on a loaded 802.11 network, even a single packet of buffering can be an issue and add significant latency (1 packet = 1500 bytes = 12000 bits = 12ms/packet @ 1MBPS) ; but when operating with that identical hardware at 802.11n speeds, we’ll need potentially many packets of buffering (and AQM to boot, to control background elephant flows that would otherwise fill those buffers). And bandwidth on wireless can change by an order of magnitude (or more) when you move a machine a centimeters.  So there isn’t a simple “tuning” point either; it’s dynamic.

The buffer management problem here is really hard; and will be a real challenge to get right.  While twisting knobs to mitigate bufferbloat may be helpful at times (particularly now, while buffering is often so grossly wrong), it’s not going to really this problem.

As a final aside to the above link, so long as the device drivers are radically over buffered, all our fancy QOS queueing and traffic shaping facilities is almost impossible to work right (they can’t even be effective if the OS is dropping the packets into the device driver before noticing that the packets may need to be resorted); OS buffer management needs to be rethought end-to end, and interact much better with today’s “smart” underlying hardware.  It’s a really, interesting, difficult problem that has been under the radar for many years, while the situation stewed.

All that being said, I’m happy to see people beginning to take action in ways that may help the home mess (which seems to be where bufferbloat is most serious).

Beware, there are multiple buffers!

April 19, 2011

Some people note that in my bufferbloat testing I set the transmit queue length (txqueuelen) to zero on Linux.

Note that this is *at best* a short term hack to reduce pain, and the wrong answer in general, and on some hardware will cause your system to go completely catatonic.  Please, please, don’t just blindly go twisting knobs without understanding what you are doing…

(more…)