Sunday, March 16, 2008

Now where did I put those packets...

Several months ago I started noticing some trouble with the outgoing connections. As my wife put it, "the internet is slow again..." Indeed. Initially, power cycling cable modem or router would sometimes do the trick, but about 3 weeks ago that no longer fixed things.

The first thing I checked was my outbound hop. I was surprised to see pretty massive loss to that gateway. I ended up writing a script to check my connection to this gateway every 5 minutes around the clock so I could have some picture of when the outages were occurring.

#!/bin/bash -x
INTERVAL=${1:-300}
LOGDIR="~/work/suddenlink_logging"
while true; do
IP=`wget -q checkip.dyndns.org -O - | awk -F\:\ '{print $2}' | awk -F\< '{print $1}'`;
GATEWAY="${IP%.*}.1"
LF="${LOGDIR}/`date +%Y%m%d:%H:%M:%S`.log"
ping -c 50 ${GATEWAY} | tee ${LF}
gzip ${LF}
sleep ${INTERVAL};
done
I just spent some time with gnuplot and have some images of the issues over the last month.



Once I got the openWRT working the way I wanted, I swapped out my
router hoping that would help me isolate whether or not it was the router causing issues. I had seen a couple times that power-cycling the router would help, but not always. After about a week, I ended up seeing the same issue on the new router.

One of the most frustrating bits of this issue has been that the standard practices of the Suddenlink techs ends up masking the problem. The first thing they tell you to do is power cycle your router and modem. This ends up "fixing" the issue without ever diagnosing why it happened in the first place. They also ask you to connect directly to the cable modem, taking the router "out of the picture." This, too is problematic because the cable modem has to be power-cycled to recognize the MAC address of the new ethernet device before it will allow you to DHCP upstream. As I mentioned, this power-cycle tends to temporarily fix the issue.

Thanks to openWRT though I was able to clone the MAC address of my laptop and when I did this, I observed that typically when the upstream DHCP server sees a new MAC address, you get placed in a different subnet than before. After another round of calls with Suddenlink, I eventually told them that I was having issues even when directly connected. That was the required step before they would send out a tech. Oh, I guess I also forgot to mention that it wasn't clear that sending out a tech would help since Suddenlink can probe the modem remotely to see the various voltage signals going to the device; they all looked fine.

In any case, the tech came out on March 10th and replaced the connector of every piece of cable from the modem to my box outside the house and even the filter in the combo box up the street. In addition to that, I'm also using a loaner cable modem. The current plan is to test for a week to see if I get any packet loss and if I see none, then I'll swap back to my old cable modem and test for another week. That should determine if the issue is with my cable modem or if the wiring fixes made the difference.

I'm hoping to get this issue resolved soon, but as you can see from the charts, one never knows when it's going to cause a problem and there isn't a clear fix other than continuous power-cycling of the devices until their system rights itself.

Saturday, March 8, 2008

A Quest Completed

If you've been looking for a computer upgrade recently, you've probably seen the stellar reviews of Intel's new 45nm Penryn-based cpu, the e8400. This dual-core, 3Ghz, 6MB shared L2 cache processor is great price to performance value as well as delivering the best price per watt as well as it runs cooler than the E6600 Conroe both at idle and at load.

I certainly wasn't the only one to notice this. The processor went from being available everywhere online at roughly $190 to nowhere to be found except at shifty places at sky-high prices of $300.

Once I had decided to go ahead and pick one of these guys up it was too late to find it at my favorite online computer store, newegg.com; out-of-stock there. I started searching far and wide. Eventually I ran across this Official Intel e8400 price thread

One of the reasons I love the internet is that almost always you will find that someone else has more time than you do and spends it on finding out information that is useful to everyone and posts it for all. Such is the case with the e8400 shortage. An enterprising netizen discovered that the Intel Xeon e3110 ends up being the same processor as the e8400. It's 100% compatibile, and in most cases when users run a utility like CPUz, it says they have an e8400 processor. I've not seen any reports of the e3110 not working in boards that are known to be working with the e8400. Alas, I missed the e3110 boat as well and that processor too is not available.

But that thread keeps on giving. Friday I read a post that mentioned that a local Fry's had plenty in stock at a reasonable price, $225 + tax. That set the idea for me to visit my local Fry's on Saturday. Bingo! The salesman said they had 60 in stock. Make that 59.

Quest Reward: +5000 xp, +1 skill points, +3 attribute points

Friday, March 7, 2008

240 TB of pudding? I wanna dip my storage array in it![1]

Ha! I'll have to settle for something less "enterprise." How about 96TB? I'm game. There aren't that many low-end storage solutions that can scale up near 100TB ranges without some serious cash and lots of racks. I'm sitting on about 2TB of data: 700G cobbled together by lvm2, and a more robust 1.4TB in a RAID5 setup backed by a 3ware 4-port 9650 RAID card. I can't bare to let drives go unused so swapping out my 4 500G drives in the RAID array for 1TB drives isn't something I'm considering even though that would double my storage to 3TB in the RAID array. Just what would I do with the other 4 500G drives? I had initially hoped that 3ware's multi-card support would allow the construction of logical arrays spread across multiple cards. That is, keep my 4 drives on the current card, purchase a 12-port 3ware, and then build a single logical array of all 16 drives. The 3ware support team said that it is possible but only using software RAID in the host. Well, duh! But what's the point of the hardware RAID?

So where does that leave us? Looking at SAS cards, that's where. SAS being the "enterprise" version of SATA; advanced features like dual ports, multi-pathing, support for large numbers of drives per adapter, etc. Two key features of SAS are of key interest. First, a single SAS card can address something in the range of 128 to 256 devices. Second, SASSATA drives are interchangable. This means you can get "enterprise" scaling (SAS), with cheaper storage (SATA). In my above example, I'd be looking at multiple sets of 16-port SATA RAID cards to get large numbers of drives going in an array. Each one of these RAID cards go for $1200-$1500 a pop. 3ware's SAS controller is $650. Yea, that's right, $650 card can support up to 128 devices!

Hold your horses because there is one other dirty secret about this gold-mine of scalability. You can't directly attach that many drives to a SAS controller. Instead, SAS relies on your storage enclosure to include an expander. This expander helps the SAS controller attach to all of these additional drives. The trouble here is that most storage enclosures with expanders included are rather pricey. Typically $3000 to $6000 - just for the enclosure; you still have to go buy your drives.

There is some light at the end of the tunnel though. First, Adaptec makes a scalable enclosure, the SANBloc S50. With 12 3.5" hot-swap bays, *and* support for daisy-chaining up to 7 S50s to a single SAS controller; we've got a big bowl of pudding here. Looking around, I see empty S50s going for around $1600. An Adaptec SAS adapter and to start with, 12 1G drives, we're looking at about $5600. All told, about $465 per TB with 1 tray and a $270 drive, $419 if you scale out to 7 trays. As drive prices go down, say $200 by end of 2008, then $395 per TB for one tray, or $349 for all 7 trays. That's downright respectable scaling and a rock-bottom price.

Now, before you choke on that initial $5600 layout, let me introduce you to the other feature of the unified SAS card from Adaptec. Rather than just having a port for external enclosures, the Adaptec RAID 51245 has 12 internal ports, and 4 external. This provides an even lower entry point by supporting up to 12 drives internally AND then any of those 7 external S50s. At the cheapest, the card will retail for around $900 and then you can add in any set of drives you like, up to 12 of them, slowly adding a drive at a time up to 12 and then scale out to external storage, all with the same card.

I don't know about you, but I'm going to go get some pudding.

1. Crude references to MTV's The State skits (Barry and Levon, Louie).

Saturday, March 1, 2008

Say Hello to Failsafe

Life can be very exciting on the bleeding edge, but as I read the other day in a forum post, sometimes it means that you get cut. Figuratively speaking, I got "cut" the other morning as I attempted to switch my main router over to the glorious wrt54gl using my hand-tuned openWRT install. This did not go as planned.

I had spent the last week or so tuning the openWRT installation to do exactly what I wanted. I used one of my servers second NIC to simulate a connection to the internet. Something like this:


Such a setup allowed client1 to connect to the internet just like Server could. In my previous post I also spent time splitting up the WIFI and LAN networks to keep them independent of each other. Now, what I didn't do was to test the settings across a reboot of the device. Having not done this crucial step, I went ahead and swapped out my dlink for the wrt54gl. At first, things went well. My WIFI connection to the wrt54gl worked just fine and I could connect to the internet. However, nothing on the LAN was getting a connection to the wrt54gl. I logged onto the wrt54gl (via WIFI link) and looked at the config. After examining the routing table (rount -n), I noticed that the LAN network wasn't listed. I had configured WIFI to use 10.23.24.X and LAN to use 10.23.23.X. At first, I figured this was just a hiccup and attempted to bring the LAN interface up (ifup eth0.0). No such device. Huh. ifup et0. No such device. Booo. After about 10 minutes of nothing working, I figured I would revert the separation configuration and bridge the WIFI and LAN together. This was simple, just add the bridge option back to the config (/etc/config/network). Reboot.

At this point the bleeding had started. The WIFI associated with the wrt54gl, but I couldn't get an IP. I couldn't get an IP via the LAN either. I manually configured both WIFI and LAN and neither could connect to the wrt. Thinking for a but I suddenly realized my mistake. In the previous post, I mentioned some firewall rules. Specifically, I had entries to prevent the WIFI and LAN from passing traffic to each other. And now they were part of the same bridge. This meant that no traffic from either LAN or WIFI was getting into the device. The wrt54gl was bricked and I needed a tourniquet.

As hopeless it seemed, I had to assume I wasn't the first guy to hose up a firewall configuration on the wrt , locking one out of the device. And I was right. The smart folks at openWRT built in a failsafe mode.

I had already attempted to use the reset button on my wrt54gl, hoping that holding that during boot up might reset to defaults. This is true, but as the link described, it has to be done after the DMZ light turns on. Thankfully on my model, holding down before the light came one didn't trash the device. On the second try, I got it to enter Failsafe mode; DMZ light flashing three times a second. A quick reconfiguring of my ethernet interface and I was telnetting to 192.168.1.1 and greeted with the openWRT logo and a shell.

I chose to use the firstboot and sync method and before rebooting, I ran 'passwd' to switch over to ssh. I then rebooted the device and it came up with all of the defaults again. Whew! It was also good to know that even if the Failsafe method hadn't worked for me, there were still several other options: UDP broadcast message and TFTP booting. Great Job openWRT!