Any network system with buffering shared among many users is much like a congested highway. We’ll call them
“big fat networks”. Two such network technologies which show this problem are 802.11 (abgn), and 3g wireless. In one, the buffers are distributed among the clients (and may also be in the access points and routers); in the other, both possibly in the clients, and the radio controllers they talk to, but also possibly in the backhaul networks.
You have suffered unusable networks at conferences. Wonder why no more. You can make your life less painful by mitigating your operating system’s and access point’s buffering.
Moral of the Story
Whether you call what we see on 802.11 and 3g networks “congestion collapse” as the 1980′s NSFnet event was called (with high packet loss rates), or something different such as bufferbloat (exhibiting much lower, but still significant packet losses), the effect is the same: horrifyingly bad latency and the resulting application failures. Personally, I’m just as happy with “congestion collapse” as with bufferbloat.
The moral of the story is clear: when the network is running slowly, we really need to absolutely minimize the amount of buffering to achieve anything like decent latencies on shared media. Yet when the network is unloaded, we want to fill this network pipe that may be hundred megabits or more in size. On such a shared, variable performance network: there is no single right answer for buffering. You cannot just “set it, and forget it”. Read on…
If you are familiar with network congestion 101 topics, skip this section.
What happens when the traffic exceeds the available bandwidth at a shared bottleneck in the network with excessive buffering?
Let’s examine the highway system. Some highways in congested areas have meters for both capacity and pacing reasons; but we all commonly suffer.
Once the highway’s capacity is filled the traffic jams get longer (the queues grow, and grow), unless you make provisions to avoid congestion. Once a bottleneck has reached capacity, adding more traffic makes arrival take longer, but also may make other intersecting highways (network links) back up. It takes longer and longer for you to get to work, or to home; but more capacity is the only real solution. Sometimes traffic jams can last hours, or all day, and only clear out at night. Or in the most extreme case, traffic jams go on for weeks. Traffic jam clearance time depends on the length of the queues and the output capacity. Building more highways takes time (lots of it).
Preventing a car from entering such a highway is often better than to try to deal with the ensuing mess. Sometimes a ramp meter’s purpose is to avoid “clumps” of traffic, which smooths the flow (and avoids bursts of traffic arriving at intermediate intersections where they may wreck havoc with other traffic flows).
Timely arrival is important: if you miss a deadline, all the effort to drive to a destination is for naught. Not only has your trip been wasted, but you prevented someone else’s trip, effectively doubling the loss. Better that traffic not start out at all, than take forever in transit, and the ensuing waste.
The Internet today has a single way to signal congestion: dropping a packet (like throwing away a car); you notice the network may be congested simply because the packet (the car) never arrives and you notice the next car in sequence that set out does arrive. Rather than send another car into the mess, you wait a while until the traffic clears before sending the next car out (or that is the way it is supposed to work).
Another signalling mechanism exists: ECN: Explicit Congestion Notification. You can think of ECN as asking cars (packets) when they go through a congested intersection to carry a note to its destination saying “I passed through a Malfunction Junction“; you should wait until you set out. This may avoid having to drop (as many) packets.
By our extreme attempt to avoid ever dropping a packet, not only have we now built highways without congestion avoidance or signalling, we’ve built huge parking lots on the highways to hold yet more traffic, delaying the traffic’s arrival yet longer. In the extreme, the packets “time out”, and are discarded (abandoned cars, or food that spoils in transit). They arrive too late for their goods to be useful (the food trucks with perishable cargo rot). We’d really like all the cargo the highway can carry to get there quickly, not have lots of bits rot and have to be discarded. The large buffers (parking lots) we have built leads everyone thinks the highway is clear, and everyone piles onto the highway anyway, making the problem much worse. We are guaranteed to “fill the parking lots” we’ve been building. We’ve gone well beyond the 1990′s by inserting such large buffers: we’ve destroyed the Internet’s congestion avoidance algorithms with bufferbloat; the very attempt to avoid losing packets has caused more packet loss, and some of the data arrives too late to be useful, either due to human impatience, or due to failures these long delays induce on higher level protocols.
There are at least three places where buffering may occur:
- any transmit queue in the OS (of the host or the nearby router) used for classification or other buffering: e.g. the transmit queue in Linux.
- the device driver and its transmit rings. (often hundreds of entries in size on many of today’s device drivers)
- sometimes the device itself may have additional buffering; examples include smart NIC’s like the Marvell wireless device we used on OLPC (which buffers 4 packets)
A fourth (so far unconfirmed) possibility may be in the buses and bus class drivers that connect the network interfaces to the CPU.
A simple concrete optimal example of such a busy network might be 25 802.11 nodes, each with a single packet buffer; no transmit ring, no device hardware buffer, trying to transmit to an access point. Some nodes are far away, and the AP adapts down to, say 2Mbps. This is common. You therefore have 25 * 1500 bytes of buffering; this is > .15 seconds excluding any overhead, if everything goes well; the buffers on the different machines have “aggregated” behavior. This is the optimal case for such a busy network. Even a 802.11g network with everyone running full speed will only be about 10 times better than this.
A simple less optimal example: OLPC’s have 4 packets of buffering in their wireless device; and the device driver has a fifth packet buffer as well to ease locking design in the driver. Even if OLPC eliminates the Linux transmit queue entirely and solely use driver/NIC buffering, it will have 5 packets of buffering, so any new packet will suffer a minimum of .75 seconds on such a busy network of OLPC’s. Even if all machines are operating at 10Mbps, each machine will still suffer almost two hundred milliseconds latency.
What happens if:
- You buffer 20 packets on each node? or 20? or 200? (this laptop’s driver buffers ~250 packets, and provides no way to reduce the buffering).
- You keep trying to retransmit packets in the name of “reliability”? (some wireless network interface devices are known to try to transmit up to 255 times; 8 times in common)
- And, in the name of “reliability”, any inherently unreliable multicast/broadcast traffic drops the radio bandwidth to minimum, as many access points do?
- You then try to run WDS or 802.11s, which both forward packets and/or respond to any multicast (e.g. ARP) with routing messages?
- Your OS and/or wireless router buffers up to 1000 packets?
If you do the math for many of these cases, you often quickly exceed both human patience (and when were 10 year old’s patient?) and that of timeouts in higher level protocols. Just like the highway, the traffic will move just fine until it begins to back up. But your transmission can block other’s transmissions, so the other guy’s queues grow, not just your queues (which may also grow). Buffering beyond the minimum required can be a recipe for congestion collapse and complete failure of protocols built on such a shared media network, whether based on TCP or other transports. It isn’t pretty. I’ll blog about the mayhem I believe occurred that ensued separately, though in OLPC’s case, I believe we were clever enough to compound the bufferbloat problem with additional mistakes.
You have suffered unusable networks at conferences. Wonder why no more. You can make your life less painful by mitigating your operating system’s and possibly access point’s buffering. Note that for optimal results, both the end nodes and the routers need mitigation.
3G Network Bufferbloat
Please forgive me for any inaccuracies in the following explanation, relying on year-old memories of conversations with implementors of these systems.
In the 3G radio systems, the error rate of the radio channel can be high enough that were the IP packets to be transported as a single packet, a significant fraction of 1500 byte packets would be lost and the efficiency of the system could be low. These systems were, by and large, designed before data traffic were important. The systems were therefore engineered to fragment the IP packets, and perform error detection, retransmission and reassembly of damaged packet segments into complete packets in the radio systems. How is this done? Well, by “buffering” the packet fragments of course!
In September of 2009, Dave Reed reported very long RTT’s with low packet loss on 3g networks on the end-to-end interest mailing list. I’ve observed on several different operator’s 3g networks RTT times of order 6 seconds: Dave reported seeing up to 30 second RTT’s. These RTT’s are so long that many of the operations “time out”, by boredom (and extreme frustration) of the user. You see terrible latency during the day in some geographic areas (presumably those which have insufficient capacity). At some time late at night the congestion clears and RTT’s drop to something sane (a hundred milliseconds or even less), just to repeat the next day. Dave was exactly correct: I have been able to confirm that many/most/all 3g systems have bufferbloat. As in the DSL and Cable case, telephony is independently provisioned from data, so you don’t have problems with carrier provided telephony; but you can give up trying to use VOIP over these data services currently anytime the systems are congested, unless you enoy talking to people further away than the moon.
When the area served by an RNC is busy, you may have to wait a long time for your turn to try to retransmit the damaged packet fragment (or the RNC to retransmit it to you); so many fractions of packets to/from you and many other users may be buffered awaiting one or more sub packets for completion. Again, by never signalling congestion, the end-points never back off, and all available buffers will fill. The buffers will stay full until sometime that night when the load finally allows the buffers to empty. Similarly, the 3g devices themselves are performing a similar dance to the RNC’s, and have similar problems with buffering.
- we can drop some packets in a timely fashion. Arguably, we’ll end up dropping fewer packets (I know there are times I just give up with my smartphone; whatever TCP transfers I had in progress become orphaned). If TCP’s behavior over these links is even vaguely similar to what I see in cable, the buffering is already actually inducing much higher actual packet loss rates (measured by packets actually useful to the user) than would normally be required for proper congestion avoidance. Maybe someone would like to take some data an confirm this hypothesis?
- Having worked very hard to transport the bits, the radio guys are very reluctant to ever throw a packet away. ECN may allow us to usually have our cake and eat it too by signalling congestion when the RNC’s are busy. Steve Bauer, who works with Dave Clark at MIT, is currently researching on whether ECN maybe usable. Early results from Steve sound encouraging.
- Other mechanisms to dynamically manage the queue sizes are also possible.
But classic RED and friends won’t work in this case; the bandwidth is too variable, and the traffic too dynamic for RED tuning to be stable.
One final warning: when people say “3g Networks”, you must consider them as large, complex systems: bufferbloat may also be hiding in places other than the RNC’s and smartphones; the backhaul networks may be failing to operate with any AQM enabled, for example. Look, and measure; take nothing on faith.