For the last number of months (at least three months), I’ve been suffering with poor Internet service (intermittent high packet loss rates). When my service has worked, it has worked pretty well, but minutes to an hour or so would go by when I’d see high loss rates (30-60%) and suffer accordingly.
Grrrrrr…
I need to give this story a bit of context; I won’t bore you with the shaggy dog story (though somewhat interesting) of getting my cable Internet service to actually work after moving into this house. To skip this history, it suffices to say that Comcast installed a new cable to the house, but the contractor they hired goofed and pulled the wrong cable for the length of the run, so I was warned that I might have trouble sometime in the future as a result of the cable not meeting their engineering standards. Some sort of amplifier was installed to try to paper over the problem. This was done with my permission, as I didn’t want my lawn torn up again, nor to waste Comcast’s money if it wasn’t really necessary.
This phase of the story started last summer, when we were struck by lightning. A bunch of my gear in my house was damaged, including the Comcast provided cable modem, my router box, one of my ethernet switches, a couple ethernet cards in several machines, the irrigation system, and so on. The cable modem was the most fun: its power supply literally blew up.
So off to the store I went, and over a period of few weeks, I got everything working again as time permitted and additional damage was discovered.
So far so good. Almost all of my networking gear (with the exception of a single switch) was new at the end of this process.
But as I had had to go grab a little commercial firewall/router box for my point of presence (I want a system I can just unplug and replug in the case of problems when I’m traveling, so have avoided putting a little Linux box to do this, and to save the planet some joules…), and had no working Internet service, I had little time to see what was out there and just grabbed the first router box (a Linksys BEFSR41) I could find at the store I went to.
It turned out to lack a feature I want: to disable the network for particular mac addresses at particular times of day (we got very tired of arguing with our sleep deprived kids to go to bed, etc…), so I went on line and ordered the follow-on to the dlink box I had originally had that had that feature, to replace the one I bought in my emergency. I installed it, and all seemed well (except dlink had removed the other feature I liked: the ability to configure alternate name servers: at times I’ve been unimpressed by theirs….).
Somewhat later I noticed that our network had become intermittent; most of the time, it would work great, but other times, it would become unusable, possibly related to the amount of traffic we were generating, but this was never clear.
So I started the service call cycle.
After some number of pokes over a couple month period, Comcast noticed they had a seriously unhappy customer (and one who generally knows what he’s talking about) and escalated the issue to the supervisor in charge of network problems for my town and several adjacent towns. After several more conversations and visits by him and technicians, we were coming to the conclusion either the cable from our house or inside the house might be at fault, and he had verified what I had been told after the cable had been installed that it was the incorrect grade of cable given the run. Replacing the cable to the would be both expensive for Comcast, do another number on my lawn, and have to wait until the ground thaws, so both he and I were not happy campers.
But he had asked one of his technicians to monitor my cable modem, and after at least some tweaking, Comcast was seeing no further problems from their end….
Strange. I thought about a scenario where the cable modem box might do monitoring traffic at higher priority than customer’s traffic, and still smelled a rat. But on the request of the nice Comcast supervisor, I repeated earlier experiments again, this time. I also noted that running a traceroute while pinging the cable modem could induce a high packet loss rate just to the cable modem. Ahah!, I could finally induce the problem more quickly, and do so before the packets going over the suspect cable. An afternoon of heavy debugging, and it was clear the problem was in the ethernet between the cablemodem and the router; either could be at fault).
So I switched back to the Linksys BEFSR41 router I bought after the lightning strike.
End of high packet loss problem, and the culprit appeared to be the DLink EBR 2310, except…
An hour or so after resetting all the network gear (mine, and the cable modem), the ping to the cable modem would mysteriously stop without warning, which meant my traceroute test could not be run. But over a nearly two week period, we saw no other network trouble.
At the suggestion of my helpful, Comcast service supervisor, I tried resetting my router, and not resetting the rest of my network gear; the pings returned immediately, and the cycle repeated. One last call to him and he suggested that since the cablemodem was so inherently dumb, I should go looking for trouble in the router as he said the cable modem has no filtering features.
Turns out he was right; there is a “Security” feature in the BEFSR41: Block Anonymous Internet Requests, that was “on” for some reason; quite bizarrely, if on, it seems to have accounted for my returning ICMP ping responses evaporating after a while. I would have thought it would have been all or nothing.
So at this point, I’m going to declare success, and my thanks to Comcast.
However, there were a couple of other things that I learned that seems unfortunate:
- I was told inside of Comcast’s network, they deliberately throw away ping/traceroute sorts of information, and that there was no place I could ping to help test the cable down my hill. This made it MUCH harder to debug: ideally, there should be some way I could ping some place inside of Comcast’s network (ideally, very close to my local cable end); then, by noticing no difference in the packet loss problem going down the cable down the hill versus just pinging the cable modem, I would not have remained stuck on the possibility/probability of it being the cable itself (which I certainly had reason to suspect, given the history and knowledge that the cable itself is not up to Comcast’s engineering standards).
- The Comcast technician told me they are not allowed to ping beyond their cable modem to to regulatory restrictions (even though I wanted them to); this would have made it much easier for them to help me determine it was on my side of the cable modem (by Comcast being able to ping my router box). But given the “feature” of the Linksys, it isn’t clear this would have helped.
Conclusions:
- It seems both network engineering and regulatory regulations are making debugging yet more difficult, and I have no clue how regular people would ever get to the bottom of a situation like this… A bit of sanity on both fronts (regulatory, and ISP network design) be an improvement.
- it isn’t always the ISP’s fault…. My thanks to Dave Dumais of Comcast for having the patience to get to the bottom of this with me.
- my experience with dlink gear is now pretty bad (I had trouble with a wireless device of theirs dying as well a couple years ago). They also removed two features I wanted from the previous generation of their product (I naively bought the successor product expecting the same functionality, and didn’t even fix their help files on the new model. “Value engineering” at its worst. I think I will avoid them in the future.