Wednesday, November 4, 2009

Don't blame the customer

Don't blame the customer when there's a better chance it's the company's problem, especially when confronted with information that demonstrates it's not the customer. Like this is new? But in our age of technology, communications and the Internet, it's very easy to always just blame the computer and its owner.

That's what my Internet Service Provider (ISP) has done with some recent problems. Since last Wednesday, and the start of the World Series, continuining into last night (election night), the ISP's servers and routers have not been working properly, often producing drops or long lags in responses, sometimes simply losing the load of a Web page, including my own Website on their servers.

So, as a customer I did what I could do, reboot and test everything on my end, the computer and modem. And then I monitored it through the day and especially through the evening, and true to form, the problems happened, intermittent and insignificant during the day and then after 5-6 pm, far more often a problem than not, working for short periods in between longer periods of some to major problems.

And so I started monitoring the connections to Website, comparing loads from before to now and using traceroute to see where the packets were lagging, hanging up, or simply disappering, and again true to form, it's their equipment, On a good day the traceroute looks like this:

Traceroute has started ...

traceroute to (, 64 hops max, 40 byte packets
1 ( 2.674 ms 0.393 ms 0.335 ms
2 ( 11.873 ms 11.100 ms 11.955 ms
3 ( 13.953 ms 11.398 ms 11.920 ms
4 ( 13.963 ms 13.292 ms 13.971 ms
5 ( 15.867 ms 15.698 ms 16.011 ms
6 ( 15.968 ms 15.733 ms 15.957 ms

This means it's leaving my machine to their servers and to my Website, all told about 2-3 seconds.

On a not so good day, even this morning at 6 am, the tracesoute looks like this:

traceroute to (, 64 hops max, 40 byte packets
1 ( 0.819 ms 0.486 ms 0.351 ms
2 ( 14.153 ms 11.605 ms 11.956 ms
- see note on time gaps to step 3
3 ( 11.989 ms 11.813 ms 15.802 ms
4 * ( 14.129 ms 16.637 ms
5 ( 15.200 ms 15.681 ms 15.956 ms
6 ( 15.990 ms 17.795 ms 15.987 ms

All told 22 seconds and a 8 second gap between hops 2 and 3.

Yes, I timed it. When it's busy in the evening, this multiples by several times and sometime not finishing the loading.

And on a really bad day

Traceroute has started ...

traceroute to (, 64 hops max, 40 byte packets
1 ( 0.819 ms 0.486 ms 0.351 ms
2 ( 14.153 ms 11.605 ms 11.956 ms
3 ( 11.989 ms 11.813 ms 15.802 ms
4 * * *
5 * * *
6 * * *
and so on for several lines. Or

traceroute to (, 64 hops max, 40 byte packets
1 ( 0.829 ms 0.473 ms 1.281 ms
2 ( 12.705 ms 11.672 ms 11.991 ms
3 ( 11.931 ms 13.440 ms 11.979 ms
4 ( 13.851 ms 12.922 ms 13.850 ms
5 ( 15.877 ms 15.626 ms 15.868 ms
6 * * *
7 * * *
8 * * *
and so on for serveral lines.

Sometimes, both of this will eventually move through and on to the Website, but the load will be slow, stall or incomplete.

So, I dutiful sent this to the e-mail help desk and told to call the helpdesk. Ok, I did last night and was on the phone for 20-30 minutes (didn't note when I started the call). And true to form, somewhat as expected the technician did what they all do, blame the customer.

But before they do that, they have the customer reboot the modem (done), their computer (done), do ping and telnet test (done) and then test loading with different browsers. When I tried to explain the traceroute results, the technician dismissed that and me as irrelevant.

Except that ping and telnet only test if the Website is there, alive and responding. It's simply a knock on the door, nothing more. A traceroute provides the details where the packets (or some since it's also not perfect) are going, recording all the hops along. And in almost every case the lags and hangs are in steps 3-6, and all the drops are in steps 3 and 4.

It's inconsistent when there isn't a lot of traffic (daytime) but very consistent when there is a lot of traffic (evenings). But even now it happens when it shouldn't, like at 6-7 am PST. But would they think to reboot their equipment to test if the problem is there and not me?

You can bet they don't and won't. I've been through this before with them over access to my Website and to upload (ftp) new files for my Website. In this case they discovered problems with their computer, and checking thing and rebooting solved it. But this time, they blamed my browser, specifically Safari and told me to call Apple (they even offered a number). The reason this isn't the problem is that both traceroute and e-mail experienced the same problems, slowdowns, lags and stalling. That's not a browser problem, but a network communications problem.

Gee, it's kinda' like, let's shove this problem elsewhere so we don't have to do the real work to see if it's really us and not them. Ok, cruel and inappropriate, but in the middle of it, that's what people think. And the funny part is that after we finished, within a minute they sent me an e-mail asking to fill out a survey. And indeed I did, but what number on the scale of 1 to 5 is sucks?

So what's going to happen? Well for one I'm continuing testing things. I'm more than convinced it's not Safari, as I've duplicated it with Firefox, and nothing on my computer has changed before and after this period. And during the World Series game I will continue to test it (hopefully even a game seven - not a Yankee fan). And when I get the results, I'll post them and send them to my ISP for some answers, of which I'm not to blame.

Anyway, that's the story. If anyone has suggestions to sort this out, I'm listening. And if I'm to partially to blame somewhere or somehow, I'll apologize, but until then until my ISP notes they can be partially or wholly to blame, I'll keep my view of things.

No comments:

Post a Comment