Comcast and debugging….

For the last number of months (at least three months), I’ve been suffering with poor Internet service (intermittent high packet loss rates).  When my service has worked, it has worked pretty well, but minutes to an hour or so would go by when I’d see high loss rates (30-60%) and suffer accordingly.

Grrrrrr…

I need to give this story a bit of context; I won’t bore you with the shaggy dog story (though somewhat  interesting) of getting my cable Internet service to actually work after moving into this house.  To skip this history, it suffices to say that Comcast installed a new cable to the house, but the contractor they hired goofed and pulled the wrong cable for the length of the run, so I was warned that I might have trouble sometime in the future as a result of the cable not meeting their engineering standards.  Some sort of amplifier was installed to try to paper over the problem.  This was done with my permission, as I didn’t want my lawn torn up again, nor to waste Comcast’s money if it wasn’t really necessary.

This phase of the story started last summer, when we were struck by lightning.  A bunch of my gear in my house was damaged, including the Comcast provided cable modem, my router box, one of my ethernet switches, a couple ethernet cards in several machines, the irrigation system, and so on.  The cable modem was the most fun: its power supply literally blew up.

So off to the store I went, and over a period of few weeks, I got everything working again as time permitted and additional damage was discovered.

So far so good.  Almost all of my networking gear (with the exception of a single switch) was new at the end of this process.

But as I had had to go grab a little commercial firewall/router box for my point of presence (I want a system I can just unplug and replug in the case of problems when I’m traveling, so have avoided putting a little Linux box to do this, and to save the planet some joules…), and had no working Internet service, I had little time to see what was out there and just grabbed the first router box (a Linksys BEFSR41)  I could find at the store I went to.

It turned out to lack a feature I want: to disable the network for particular mac addresses at particular times of day (we got very tired of arguing with our sleep deprived kids to go to bed, etc…), so I went on line and ordered the follow-on to the dlink box I had originally had that had that feature, to replace the one I bought in my emergency.  I installed it, and all seemed well (except dlink had removed the other feature I liked: the ability to configure alternate name servers: at times I’ve been unimpressed by theirs….).

Somewhat later I noticed that our network had become intermittent; most of the time, it would work great, but other times, it would become unusable, possibly related to the amount of traffic we were generating, but this was never clear.

So I started the service call cycle.

After some number of pokes over a couple month period, Comcast noticed they had a seriously unhappy customer (and one who generally knows what he’s talking about) and escalated the issue to the supervisor in charge of network problems for my town and several adjacent towns.  After several more conversations and visits by him and technicians, we were coming to the conclusion either the cable from our house or inside the house might be at fault, and he had verified what I had been told after the cable had been installed that it was the incorrect grade of cable given the run.  Replacing the cable to the would be both expensive for Comcast, do another number on my lawn, and have to wait until the ground thaws, so both he and I were not happy campers.

But he had asked one of his technicians to monitor my cable modem, and after at least some tweaking, Comcast was seeing no further problems from their end….

Strange.  I thought about a scenario where the cable modem box might do monitoring traffic at higher priority than customer’s traffic, and still smelled a rat. But on the request of the nice Comcast supervisor, I repeated earlier experiments again, this time.  I also noted that running a traceroute while pinging the cable modem could induce a high packet loss rate just to the cable modem.  Ahah!, I could finally induce the problem more quickly, and do so before the packets going over the suspect cable.  An afternoon of heavy debugging, and it was clear the problem was in the ethernet between the cablemodem and the router; either could be at fault).

So I switched back to the Linksys BEFSR41 router I bought after the lightning strike.

End of high packet loss problem, and the culprit appeared to be the DLink EBR 2310, except…

An hour or so after resetting all the network gear (mine, and the cable modem), the ping to the cable modem would mysteriously stop without warning, which meant my traceroute test could not be run.  But over a nearly two week period, we saw no other network trouble.

At the suggestion of my helpful, Comcast service  supervisor, I tried resetting my router, and not resetting the rest of my network gear; the pings returned immediately, and the cycle repeated.  One last call to him and he suggested that since the cablemodem was so inherently dumb, I should go looking for trouble in the router as he said the cable modem has no filtering features.

Turns out he was right; there is a “Security” feature in the BEFSR41:  Block Anonymous Internet Requests, that was “on” for some reason; quite bizarrely, if on, it seems to have accounted for my returning ICMP ping responses evaporating after a while.  I would have thought it would have been all or nothing.

So at this point, I’m going to declare success, and my thanks to Comcast.

However, there were a couple of other things that I learned that seems unfortunate:

  • I was told inside of Comcast’s network, they deliberately throw away ping/traceroute sorts of information, and that there was no place I could ping to help test the cable down my hill.  This made it MUCH harder to debug: ideally, there should be some way I could ping some place inside of Comcast’s network (ideally, very close to my local cable end); then, by noticing no difference in the packet loss problem going down the cable down the hill versus just pinging the cable modem, I would not have remained stuck on the possibility/probability of it being the cable itself  (which I certainly had reason to suspect, given the history and knowledge that the cable itself is not up to Comcast’s engineering standards).
  • The Comcast technician told me they are not allowed to ping beyond their cable modem to to regulatory restrictions (even though I wanted them to); this would have made it much easier for them to help me determine it was on my side of the cable modem (by Comcast being able to ping my router box).  But given the “feature” of the Linksys, it isn’t clear this would have helped.

Conclusions:

  1. It seems both network engineering and regulatory regulations are making debugging yet more difficult, and I have no clue how regular people would ever get to the bottom of a situation like this…  A bit of sanity on both fronts (regulatory, and ISP network design) be an improvement.
  2. it isn’t always the ISP’s fault….  My thanks to Dave Dumais of Comcast for having the patience to get to the bottom of this with me.
  3. my experience with dlink gear is now pretty bad (I had trouble with a wireless device of theirs dying as well a couple years ago).  They also removed two features I wanted from the previous generation of their product (I naively bought the successor product expecting the same functionality, and didn’t even fix their help files on the new model. “Value engineering” at its worst. I think I will avoid them in the future.

18 Responses to “Comcast and debugging….”

  1. Russ Says:

    I know you want to save some joules, but its just not worth the hassle. Get a low end embedded fanless system, such as a VIA EPIA based system, and put shorewall or some other network framework on it. Its so much more functional. I get ssh, I get openvpn, I get a Bluetooth Piconet, I get both a WPA wireless network that is bridged with my ethernet network, and a non-encrypted public WLAN with limited bandwidth and access (eg, no smtp), etc, etc, etc.

  2. gettys Says:

    I just want something that works; I don’t want to have to configure it….. I really don’t want yet another system to take care of..

  3. Jon Smirl Says:

    Aren’t you in a FIOS area?

    Load dd-wrt onto your router. http://www.dd-wrt.com
    It has the Mac filter by time of day. Most common router hardware is supported unless it is really low end. I think I paid $12 for my current router. I lose one to lighting every couple of years.

    I have been using dd-wrt 6-7 years without a problem. The current version tracks your bandwidth usage to tell if you are getting near the 250GB traffic/month cap.

    • gettys Says:

      No FIOS here in Carlisle. In fact, before about 5 years ago, when Comcast redid the cable system, there was nothing…. And Verizon will not admit to any schedule for FIOS.

    • Martouf Says:

      I own a pair of Buffalo Tech WZR-HP-G300N which arrived with DD-WRT v24SP2-EU-US (08/19/10) std (SVN revision 14998) installed (linux kernel 2.6.24).

      I found the default txqueuelen of 1000 on its interfaces and suffered from all sorts of network “badness” among the household of game systems (sony and nintendo), NAS device, WDS link between WZRs and personal systems.

      I have had good success by storing the following commands in nvram rc_startup:
      ifconfig ath0.1 txqueuelen 8
      ifconfig ath0 txqueuelen 8
      ifconfig wifi0 txqueuelen 8
      ifconfig eth0 txqueuelen 20
      ifconfig eth1 txqueuelen 20

  4. sdf Says:

    or just get a wrt54gl and put one of the many open firmwares on it (dd-wrt etc)

  5. Mark Tearle Says:

    I amazed that Comcast didn’t bother using conduit for your lead-in. Almost all lead-ins in Australia run through conduit.

    • gettys Says:

      As I said, Comcast subcontracted replacing my cable when I moved in: the contractor screwed up and put in the wrong cable for the length run. The supervisor of the cable plant for the area was very unhappy when he saw what was done. So, I could have probably gotten them to replace it again, and torn up the lawn again to do so, but so far, the cable has not been the problem….

  6. Maxo Says:

    Whenever I have Internet problems, the first thing I do is plug my laptop directly into the modem and test. If I still have problems i swap out the Ethernet cable, and if I still have issues I call my ISP and let them know I’ve tested multiple computers (I have more than one laptop in the house), multiple ethernet cables, all directly into the modem.

    • gettys Says:

      I did do such testing (direct to the cable modem); but remember, I had an intermittent problem: the net would work well for hours or days before returning to plague us. So it wasn’t an open and shut case, that would be easy to test for: it was only very late in the process that I figured out I could provoke the problem reasonably quickly by a traceroute and have a chance of debugging it in any sane fashion, and then I ran into the router throwing away the ping packets.

      And early on, there was a signal strength problem on the cable itself that was resolved by Comcast. Whether this was contributing to the problem is hard to say at this point.

      .

  7. Maxo Says:

    Yeah, intermittent problems are always a PITA for everyone involved.

  8. sxpert Says:

    We tend to use lots of PCengines Alix and soekris 5501 at customers’ locations for a whole lot of things, including routing, voip, and even email server (the soekris’ case has space for a hard drive).
    The design is so that you install the OS (say, debian, or whatever else) in a CF card. They need about 5Wh or so, so in the same ballpark as those routers you’ve been using (I have some running on solar in isolated locations).

    • gettys Says:

      Yeah, I’ve considered such options. But part of my job is to sort of keep track what the commercial state of the art is, so I am a bit loathe to just throw a Linux box is.

      And I travel about 1/4 time: I’d like my wife to be able to swap out network giblets if I’m out of town.

  9. sxpert Says:

    as for the cable, did they direct bury it ??
    over here, code says a conduit must be buried and the cabled pulled in

  10. KD Says:

    I have no specific knowledge of the regulations involved, but on general principles, I would not accept without proof the Comcast claim that they are prevented from pinging outside their network by regulation. That is the sort of excuse that large organizations sometimes use to justify not doing something that they do not want to do for their own reasons, but do not have a justification that the customer will easily accept. Blaming it on the government usually allows them to trick the customer into dropping the matter. So I am always suspicious of claims such as that.

    In this case, the claim might be true, but before you accept Comcast’s word about it and blame the regulators for contributing to making the sort of problem you had hard to debug, I think it would be prudent to find out exactly what regulation prevents Comcast from pinging customer equipment.

    I bring this up not particularly to bash Comcast — as I said, this is a practice many large organizations sometimes employ as part of their customer disservice. I bring it up only in the interest of properly identifying the source of the problems that make debugging harder.

    • gettys Says:

      Indeed, having, at a later date met with a very senior Comcast technical management person, as described in the blog postings on bufferbloat, he believes the field support chain is confused on this point, particularly when it is a customer with a support problem asking for diagnosis beyond the cable box. He thought they should have been able to monitor my router at my request. He said that he’d try to straighten out the field. I’ll try to remember to ask him when I see him next if that has happened.

      I do think that in general, pinging into someone else’s network is not good netiquette, and is often taken to be hostile attack.

      Of course, I now think much of my problems were due to bufferbloat: and that saga is unfolding in this blog. I certainly know that many of the problems I’ve had are not Comcast’s fault, since bufferbloat is not only in the broadband gear, but in my home router and computers too.

Leave a comment