Are you redundant? You should back it up!

So, the reason I haven’t blogged in quite a while is that I had my external HD crash on me; said HD containing, amongst other things, my scribbles on things to blog about that might hopefully be of value or interest to people. While it turns out I didn’t lose any data, thanks to some luck and the friendly IT guys at work, it still made me reluctant to plug the HD back in until I had sorted out a reasonably reliable backup solution. You see, I had been bad and, ehrm, well, didn’t have any backups for my PC hard disks! And I bet you too probably don’t have a backup solution for your machine, dear reader! Tsk, not good. Don’t tell me I didn’t warn you when your HD crashes. And it will crash, it’s just a matter of time!

In typical Christer fashion I wanted to solve this new problem of backing up data in the best possible way, so I started researching all this stuff that I never really paid attention to before. I started looking into external HDs, NAS boxes (because it would be cool to stream data to my PS3 too in addition to providing storage for my PC), RAID, and everything in-between. But it probably took a week of research into these things before I realized the the simple yet so very important distinction between data redundancy and data backup.

Redundancy is something you get e.g. by having two or more HDs in a RAID1 or RAID5 configuration. If one HD fails, you can recover your data from the other HDs, either due to mirroring or due to having a parity disk that will allow your data to be recreated. However, this is not a backup! Repeat: RAID is not a backup! If two disks fail, or your whole PC is fried in a lightning strike, or whatever it might be, your data is irreparably lost. Backup is something that should protect you from data loss even in the case of a severe hardware failure or you saving a file after first having deleted all its important contents!

Once I had reached this (in hindsight) obvious distinctive realization, I figured a good place to start was looking at online backup solutions. If my backup files are in, oh, New York while I’m in Los Angeles, odds are I’ll be able to recover my files even if LA falls into the ocean from a 12.0 earthquake (assuming I survive the ordeal).

Online backup services

There are quite a few online backup solutions out there, including:

Which one do you pick?! When you dig around a little it turns out that many of the plans that seem good actually have hidden gotchas like e.g. upload caps that could make your initial backup of, say, 10GB take days (literally). To get rid of the sometimes absurd limitations you often have to go for the business plans which suddenly don’t look like much of a bargain anymore.

After looking long and hard I opted for Jungledisk for a few reasons. First, it utilizes Amazon’s S3 storage as a backend. This is a big advantage IMO, because Amazon isn’t likely to go away anytime soon (unlike some of these companies that are likely to fold in a year or two when their funding runs out, taking their servers with them to the grave). Also, Amazon has a good level of redundancy built into their S3 system, which they use themselves for their own data.

The other thing is that my single Jungledisk license allows me to install it on an unlimited number of computers (for the same Amazon S3 account). Some others (e.g. Mozy) would require me to buy an additional license for every other PC in the house.

I also like that the Jungledisk software maps my Amazon S3 storage to a network drive, so I can backup stuff with simple drag-and-drop. Some sites require you to go through webpage interfaces to upload your files and folders. Totally bizarre! Amazon’s S3 storage is also very reasonably priced, which is somewhat important too.

Finally, a tip: As you are paying each month for the files you’ve uploaded to your online storage, make sure you don’t have duplicate files under different names. I found the free program Easy Duplicate Finder a good tool to help me avoid storing and paying for multiple copies of the same files.

Redundant storage and NAS appliances

So the online storage is what you use to back up the invaluable things: family photos and videos, financial documents (encrypted, natch), your four-year-in-the-making book manuscript, that sort of thing. For your MP3s, DVDs, PDFs, or whatever it is you’re hoarding, you want a different solution and here’s where the redundant storage comes in. It would be really annoying if you lost a harddisk of your ripped DVDs, but you probably wouldn’t be heartbroken. But with redundant storage you can somewhat safeguard yourself from having to spend hours re-ripping the DVDs in case of failure, as long as it’s just the single HD that fails.

And while you’re at it, because you’re a geek, you might want that redundant storage solution to be a NAS (network-attached storage) so your PS3 or XBOX 360 can access your DVDs and MP3s. Here, boy oh boy, there are so many options that your head’ll reel! In fact, I’ve been thinking about this for well over a month now and I’m still not sure that the best option is, but I’ve found several that are all pretty cool (but also have their own quirks and drawbacks).

First though, before I list the cool stuff, note that almost every NAS box out there is RAID5. Well, having done the research, I don’t like RAID5 and I wouldn’t recommend anyone to go with (standard) RAID5 solution, over the alternatives. What’s wrong with RAID5, you say? Several things, in fact! Let me list a few:

  • Due to the use of striping, if disaster strikes and two HDs were to die at the same time, then all your data is lost, because you’re left with partial files on the unaffected drives.
  • The storage is limited by the size of the smallest drive (if it can handle differently-sized drives at all). E.g. in a RAID5 with 3*1TB + 1*250GB, you only get 4*250GB of storage.
  • When the RAID5 drives are full, we’re done. We cannot resize the system other than by copying all data off the full system onto a new, larger RAID5 setup.
  • A RAID5 system needs a lot of power and cooling because all drives will spin up (and down) at the same time due to the striping. This makes for wear and tear on the drives.

On the positive side, you do get additional speed from pulling data off multiple drives with RAID5, which is valuable in certain situations (e.g. video editing). However, the average home user is not going to stress the bandwidth of their NAS box, so this is of questionable value for most people.

BTW, for another (critical) look at RAID, but from a different perspective, see “Smart SOHOs Don’t Do RAID.”

So, what options does this leave us with then? Several actually, that you can either buy off-the-shelf, or build yourself. They include:

Infrant/Netgear ReadyNAS. The Infrant ReadyNAS, now known as NetGear ReadyNAS (RND4000 without disks, RND4450 with 4x500GB) is a pretty full-featured four-disk NAS box, which sports a custom RAID5 version dubbed X-RAID. X-RAID allows online capacity expansion by replacing one drive at a time with a larger drive (letting the RAID5-based algorithm rederive the data on the new disk using the parity calculation) until all disks have been replaced with larger disks and the new (larger) capacity becomes available. A major issue with the ReadyNAS is that it is pretty expensive. ReadyNAS is limited to 4 drives.

QNAP TS-409 PRO. QNAP has offered some best-of-breed RAID1 two-disk NAS boxes for a while, but only recently announced the TS-409 PRO which is a four-drive RAID5-type box, that just like the ReadyNAS supports “Online RAID Capacity Expansion & RAID Level Migration.” The TS-409 PRO is about $150 cheaper than the ReadyNAS with comparable (and possibly even better) features. Its kid brother, the QNAP TS-209 PRO, also comes with lots of NAS-y goodness, but as a RAID1 box it only supports mirroring.

Windows Home Server. If you want something that truly “just works” out of the box with your PC, Windows Home Server is possibly the best choice, regardless of what your anti-Microsoft bias is telling you. Or, rather, it would be the best choice if it wasn’t still (two months later) suffering from a critical data destroying bug that the MHS team seems totally unable to fix or at least outright refuse to post any status updates about on their blog, all while they’re bandying about how they like not to take themselves seriously. (I guess you don’t take your prospective customers too seriously either, MHS team. You certainly lost me as a customer by not giving any ETA on that bug getting fixed. Good job guys. Party hard!)

At this very moment you can either build your own server and install WHS yourself, or you can buy the prebuilt 4-bay HP MediaSmart Server (EX470 with 1x500GB, or EX475 with 2x500GB).

Drobo. Another “just works” solution seems to be the Drobo. Unlike the RAID5-based solutions like the ReadyNAS and the TS-409 PRO, Drobo is not a NAS; it must be attached to networked computer, a router with a USB 2.0 data port, or Data Robotics own DroboShare router (which can connect two Drobo’s). Similar to RAID5, Drobo features some very slick hot-swapping, with no need to shut machine down to replace bad drive or to add storage. Like ReadyNAS and TS-409 PRO the Drobo can have drives of any size installed, but unlike the other two, Drobo actually uses all (well, most) of the space. A big drawback is that the Drobo uses a proprietary format, so if the Drobo fails, then the data is useless until you buy another Drobo (and with Data Robotics being a small pre-IPO(?) company, the company could disappear if the product doesn’t take off or if their VC money runs out). Drobo is also limited to 4 drives (and always report the unit has having 2TB regardless of actual capacity, which might be a problem in some cases). A Drobo review can be found at Tom’s Hardware.

unRAID. The perhaps coolest solution is unRAID. It’s a software-only solution (although they will sell you a machine with the software installed on if you want) very similar to RAID5, but without the striping. Running without striping gives up the speed benefits of RAID5 but buys a lot of cool features:

  • Because it’s not striped, only two disks spin up at a time (the data disk currently accessed and the parity disk), so an unRAID machine runs cooler and more quiet than a RAID5 box.
  • With a two-drive failure, one or two drives’ worth of data are lost (two data disks or one data disk + parity), not all of the disks as in the corresponding RAID5 failure. UnRAID uses standard ReiserFS so (given the appropriate drivers) the disks can be individually mounted under Linux or Windows after a crash for data recovery.
  • It supports any size disks and uses all space available (with the possible exception of an over-large parity disk) and you can also mix and match IDE and SATA drives.
  • You can add drives as you go, up to the current limit of 16 drives.
  • It’s free for up to 3 drives!

There are some drawbacks too, of course. It’s pretty much a one-man operation, so while support is good it is also very spotty with a high reliance on the unRAID user community. Also, the system does not seem to do hot-swapping; you have to shut down and start the system back up each time you change or add disks. It also doesn’t have media server capabilities, like WHS, TS-409 PRO and ReadyNAS, so if you want that you have to install e.g. Twonky alongside with the unRAID software. unRAID is probably only for the power user, but if you are one, it’s one cool piece of software.

unRAID is also very easy to install (in principle). You put the software on a bootable USB drive, stick it in your old PC (that you’ve filled with lots of HDDs) and you have a server. However, unRAID isn’t very strong on hardware compatibility so it might not be quite that simple in practice.

NASLITE-2 (USB).Even though it’s not quite in the league of these other ones, I really need to mention NASLITE-2. Like unRAID it is a piece of software that you put on a bootable USB drive. Booting from the UBS drive turns your old PC into a NAS. Unlike unRAID, NASLITE-2 supports just about any hardware and doesn’t demand much power from the machine, so you can take some truly ancient PC’s and turn them into network storage (it only needs 64MB RAM). If you have a bunch of old PC’s gathering dust, this is a cool way of making them useful again. Here’s a review, but the same reviewer also notes that NASLITE-2 is not a very expandable solution. Also, to get RAID capabilities you would have to rely on RAID hardware cards.

Linux with SAMBA. OK, so this is a possible solution too, I guess, but it’s way too hardcore and not something for normal people to attempt.

It’s still vaporware, but worth keeping an eye out for is iSCUBE from Kapsean, which sounds a bit like unRAID or NASLITE.

What I decided on

Given there are a number of building options above (WHS, unRAID, NASLITE, Linux), I decided to go that way. I opted to buy a refurbished computer instead of building from scratch. For less than $400 (including shipping) I picked up a HP XW6200 from LapkoSoft (P4 XEON at 3.6GHz, 3GB RAM, 74GB HDD, 500W power supply, and an old NVIDIA workstation card). Taking out the DVD-drive and fitting a Kingwin KF-3000-BK, using the two internal 3.5″ trays, and installing a SunbeamTech Wherever PCI Rack I can fit 6 HDDs in this box. I’m gonna try the free unRAID version first and if I like it, I’ll upgrade to the commercial version. Come hell and high water, I might even do WHS if they can actually sort their shit out this decade. For those (like me) who are curious about unRAID there are several success stories on the web.

The bad apple?

Oh yeah, the HD that crashed? A LaCie “Design by F.A. Porsche” 500GB external drive. I thought it was a nice product (good capacity, nice design for a good price, including a smallish fan which most external HDs in this category don’t have), but all it took to fry its backplane and some HD sectors was a fuse blowing in the house (from someone unnamed running a heater and a hair dryer in the same outlet). And there’s the second lesson: all your computer appliances should go through a surge-protected UPS! Don’t cheat just because you ran out of protected outlets!

3 thoughts on “Are you redundant? You should back it up!”

  1. Hey Christer,

    Your post serves as a timely reminder that I should sort my backups out too. I burn some of my important data on DVD from time to time (email, programming projects, photos etc – last backup about 3 months ago!).. but have no backup of things like MP3/AAC files, videos and other similarly trivial things. I do know I wouldn’t like to have to rip 40GB of music in iTunes again though! I’m wary of DVD backups of the important stuff too, as even with good quality DVDs I’ve experienced failure due to random events (wind direction, size of chicken being sacrificed etc).. so the online storage stuff you’ve covered sounds well worth checking out.

    Thanks for the pointers.

    Dean

  2. Hi Dean. The online backup solutions are, I think, really just for relatively modest size data sets. When you get up in the tens of GB of needed storage it starts becoming expensive (over time, as it is a monthly recurring fee). That’s why I recommend to backup only the irreplaceable stuff and use redundant storage for music, videos, and similar. With something like unRAID you do get a pretty good redundant storage, plus it has the advantage of growing with you (you can just keep adding disks as needed, which is super cool). I still have to plug all the pieces together for my unRAID machine, but I’ll get there eventually. I mean, if you build (or buy) something like Lime Tech’s MD-1500 you can gradually add HDDs and swap up to 1TB HDDs (or 2TB HDDs when they become available) to eventually have 15TB (or 30TB) of redundant storage in one box. How nuts is that?!

  3. Hey Christer-

    Thanks a lot for journaling your backup adventures. I’m looking around for a similar solution and was glad to read about your findings (though sorry about your motivation!) I’d be very interesting in reading your opinions once you’ve put your new local backup system through its paces.

    Best,
    Charles

Leave a Reply