Home Router Puzzle Piece One – Fun with your switch

As enough pieces have fallen into place to make actual predictions, (and the quarry’s spoor noticed), I decided to perform more deliberate experiments to see if I could capture more henchmen of the mastermind. I did.

I’ll start to provide puzzle pieces I’ve discovered here regularly, so you can help with the conviction of the criminal and repair of the damage they have caused.  Today’s simple experiments only involve your router’s switch. We’ll do some experiments on the wireless side of the router next.

Conclusion: something stinks in operating system’s network stacks.  Linux is often worst, with two different but related problems, followed by Mac OSX; Microsoft Windows manages to obfuscate much of it’s problems, but also demonstrably suffers. After mitigation, Linux may be able to perform much better than either.

Experiment Setup

If your home router has a gigabit switch (a few do, these days), you’ll want to find a 100 meg switch to perform this experiment with. You may be able to achieve the same effect using “ethtool” and setting your ethernet link speed to 100Mbps. I presume your machines all have gigabit network interfaces; most have for a while.

Hook up your laptop directly to the switch’s ethernet port.  Hook a second machine up to a second port to act as your server. In case one or the other of your computers is wimpy, let’s use nttcp for our testing.  The point here is to transfer data over the link as fast as you can.

Install “nttcp“.  Run “nttcp -i” on the machine you designate as your server.

Experiment 1a:

Run “nttcp -t -D -n2048000 server & ping-n  server” on your laptop.

What do you observe after, say, 20 seconds?  Is this what you would expect, given that a packet of 1500 bytes takes only .13 milliseconds  to pass through a 100Mbps switch?

Experiment 1b:

Issue the command “ifconfig eth0“; look at the txqueuelen value. On my laptop, it is set by Linux to 1000.

Is the latency constant, or variable, as you manipulate the txqueuelen parameter?

Set the txqueuelen parameter to half of its initial size (e.g. “ifconfig eth0 txqueuelen 500“.  What happens to the observed latency?

What do you observe?  How does it differ from Experiment 1a?

Try playing with different values of txqueuelen while continuing to observe the ping latency. On most current hardware, you can set the txqueuelen to zero; on some older hardware, you may have problems if you do so.

Experiment 1c:

Install the command “ethtool” if you don’t have it installed.

Set the txqueuelen to the minimum operating value (0 on my laptop) for this experiment.

Execute the command “ethtool -g” and note the current hardware settings for your ethernet interface. Note that not all device drivers support this interface. On my laptop, the ring size is 256 by default.

Run “nttcp -t -D -n2048000 server & ping -n server” on your laptop. What do you observe? Why?

Try playing with different values for the ring parameters (e.g. “ethtool -G eth0 tx 64” , and observe the ping latency.  Your hardware will probably have some limit minimum ring size that you cannot go below. On my laptop, this is 64 entries.

Is the latency constant, or variable?  Why?

Experiment 1d:

Note that you can perform similar experiments on Mac OSX and Windows, both of which behave much better than Linux “out of the box” (though Linux is better than OSX once the transmit queue is truncated).  Note that the details of the hardware matter here: you should use the same hardware, or hardware using the same ethernet chip if possible.

For extra credit, explain why Windows default behavior is so much better than either Linux or OSX on 100Mbps Ethernet. (Hint: try setting the transmit speed on the Windows machine to 10Mbps; and search Windows technical notes about multimedia playing).  Do you now believe Microsoft’s explanations?  Or is there a different explanation given these experiments that makes more sense?

7 Responses to “Home Router Puzzle Piece One – Fun with your switch”

  1. Jer Says:

    For those of us that don’t have the equipment or an hour to blow, perhaps you could just tell us the point of the exercise?

  2. gettys Says:

    I don’t think you’d like the results I get from Linux by default.

    In tabular form:

    txqueuelen tx ring RTT jitter

    1000 256 220ms ~1-2ms (default)

    0 64 2ms ~1-2ms
    2 64 4ms ~2ms
    5 64 12ms ~4ms
    10 64 50ms ~15ms
    25 64 132ms ~40ms
    1000 64 220ms ~1-2ms
    10000 64 221ms ~1-2ms

    0 256 8ms ~3ms

    0 4096 up to 150ms ~50ms
    2 4096 up to 155ms ~40ms
    5 4096 up to 170ms ~50ms
    10 4096 up to 190ms ~20ms
    25 4096 220ms ~1-2ms
    50 4096 219ms ~1-2ms
    100 4096 219ms ~1-2ms
    1000 4096 218ms ~1-2ms

    For Mac OSX, the observed behavior is 11ms, slightly worse than the best performance I can get by tuning Linux.

    For Windows, you see times that are 1ms or less. As to why, I don’t want to spoil all the fun here, quite yet: more over the next few days.

  3. Home Router Puzzle Piece Two – Fun with wireless « jg's Ramblings Says:

    […] my wireless NIC does not support the “-g” and “-G” options we explored in Experiment 1. So I cannot try reducing the transmit ring. If yours does, I encourage you to to try twisting that […]

  4. Wicher Says:

    Thanks. Now my (networked) pulseaudio server doesn’t skip a beat anymore when I copy files over NFS. In Gentoo, add something like:

    ifup_eth0="ip link set \$int txqueuelen 0"
    

    to /etc/conf.d/net .

  5. Riccardo Giuntoli Says:

    Dear Jim Gettys, nice to meet you. I’m Riccardo Giuntoli writing from Spain, as simple networking addicted 🙂

    I’m doing all the test in my work network lab, after reading all yours post [i was previously very interested on network throughput and latency cause i’m working in a wireless 802.11 testing enviroment].

    About the tests that you post here, my result are the same of yours (i’m now preparing an spreadsheet with all the tests and the results to share with you and the readers of your blog), but i’ve got a question about txqueuelen, net.core.netdev_max_backlog and the queue ring of the controller.

    I was searching and digging a bit about the meaning of those values, and i found that many researcher simply recommend to INCREASE those values to obtain best performace speaking about gigabit networks.

    Here you are a link from cern.ch :

    http://datatag.web.cern.ch/datatag/howto/tcp.html

    and other related to it:

    http://www.hep.ucl.ac.uk/~ytl/tcpip/linux/txqueuelen/datatag-tcp/

    What do you think about this?

    Best Regards.

    RG.

    • gettys Says:

      I plan to blog about this. Most of the HOWTO’s I came across on the web were 1) obsolete, and probably don’t apply very much to current systems, 2) were going to be wrong for most people in the world, 3) were written for super computer people, who have fat pipes over long delays.

      There is no right answer, I believe. We have to manage queues, and make this all work so that good performance doesn’t require everyone to become tuning experts (if indeed they can tune even a fish); and in the meanwhile, it would be nice if the only people who had to tune were the people with the fat pipe long delay problem that so few people actually have.

  6. Buffer Bloat: Experiment 1 « He Who Conquers the Left Side Says:

    […] since I first saw Jim Getty’s post on testing bufferbloat, I wanted in. But at the time I was living with room mates, and I didn’t feel like taking the […]

Leave a comment