RSS2.0

Home Server: Potential Problems

Tuesday, April 22, 2008

I've come up with a list of problems I will have, or could run into, as I embarking on building my Solaris-based open server in the coming weeks:

  • Solaris/Linux Hardware Compatibility: I don't know whether my hardware is fully compatible (ex. NIC, GPU) with Solaris, and the Solaris Hardware Compatibility lists are incomplete. Let's hope everything goes ok!
  • Bootloading: I will have to configure a bootloader, probably GRUB, with all of my different OS's. I have previously experimented with GRUB, and I didn't like it. It seemed too complicated, and I was never able to understand how to add OS's. Maybe there is a better alternative than GRUB?
  • General *NIX Ineptitude: I have very limited CLI experience, mainly from using OS X's Terminal app. lightly. If it comes down to having to enter shell commands in order to enable or fix things, I won't know what to do unless I have a tutorial to follow.
  • Networking Protocols: Which networking protocol should I use to share my files? SMB/CIFS? NFS? Should I try to install AFP on Solaris? What about iSCSCI? or mDNS (Bonjour) for automatic discovery with my Macs?
  • Server Access: How will I go about remotely accessing my server to perform admin tasks and browse files and snapshots. This is easy. More than likely, I'll just use whatever networking protocol I decide on. I may also enable some of the following: SSH; FTP; VNC; media serving?
  • rsync Script: I will need both a shell script & a Launchd/CRON job for automatic, hourly backups from my Mac to the file server (however, what happens if I sleep my Mac while a backup is in progress; can I get any notification while a backup is in progress or if one completes successfully?)
  • ZFS Snapshot Creation Script: I will need a script, running preferably on the server, to create a snapshot using the current date/time as a name upon successful rsync backup.
  • ZFS Snapshot Management Script: I will need a script, running preferably on the server, to manage ZFS snapshots (deleting old snapshots)
  • Remote Backup: This isn't so much a problem as it is a curiosity. Which online backup service should I use once I get my server up and running, and whether I should backup from my Mac or from the server (Some considerations: Mozy, Amazon S3, JungleDisk, etc. - I'm currently using Mozy on my Mac, which is not available for Solaris)
  • Server Sleep: Can the server enter either S1 or S3 sleep, and automatically be woken by attempted network access?
  • Windows+ZFS Compatibility: How to get around Window's incompatibility with the ZFS pool. I think I will use my proposed FAT or NTFS "Windows Storage" drive, and copy files back and forth when I know I will need them. Oh, can Solaris read NTFS? If not, I will have an additional problem. Another option would be to create a separate Solaris virtual machine inside Windows to use as a bridge.
I think the hardest aspects will be finding the necessary scripts (since I wouldn't know where to begin to write one myself) and the GRUB configuration. I will update this post when I come up with some answers.

What about TimeMachine?

Monday, April 21, 2008

In my last post on backup strategies, I completely neglected mentioning TimeMachine. So, I have decided to devote an entire post to it. For those unfamiliar, TimeMachine is a backup feature in Apple's Mac OS X 10.5 "Leopard" operating system. TimeMachine has some good and interesting features. To start off with, it's heavily integrated into the OS, automatically executing hourly or when a disk is connected to backup your entire disk to another drive. The application uses hard links to save space for incrementals, and keeps hourlies for a day, dailies for a week, and weeklies for a month. The system will automatically delete the oldest backup if it needs to recover space for the latest (after alerting the user). Furthermore, TimeMachine has the famous, and often criticized, "time warp" interface which shows your present disk in the foreground, with your previous snapshots disappearing into the endless black hole. Essentially, underneath the unnecessary eye candy, are instances of the regular Finder interface. You can flick back and forth between snashots and present, fully browse the, and even use Spotlight to search for a specific file. You can restore an entire snapshot or a specific file. Additionally, the Leopard start-up DVD allows restoration of a TimeMachine snapshot.

All of this seems fantastic, and indeed it generally is. However, TimeMachine is lacking a few features and exhibits numerous bugs that keep me from wanting to use it. TimeMachine only supports HFS+ formatted target disks or disk images. It can use an internal partition, an external USB disk, and a USB disk connected to an Airport Express (called an AirDisk) only. No backing up to another system or server, which means I can't use TimeMachine to backup to a ZFS-based storage pool (even if it has a HFS+ disk image residing on it). TimeMachine also doesn't support delta-transfer, which means if a large file changes, it copies the entire file as opposed to just the bits that changed (like rsync does). One potential plus that TimeMachine has going for it is its use of Leopard's FSevent API to dynamically see changes on the disk without having to scrub the entire filesystem at each backup looking at modification times like every other backup solution does (including rsync). This should allow relatively instantaneous backups, cutting out time a typical program takes to scrub. I use the would "potentially" because it doesn't really work like it should. For unknown reasons, TimeMachine is incredibly slow. It takes forever once started, seemingly stuck on "preparing backup" before files even begin to copy (reminiscence of a scrub). Once files start copying, they go slowly with frequent pauses in the backup stream. Literally, the initial backup of an 80GB disk connected via USB 2.0 could take upwards of 5-6 hours. I've had subsequent backups, even ones with less than 100MB to backup, take upwards of an hour (mostly on the "preparing" stage). Furthermore, I have had numerous problems where TimeMachine refuses to backup to its backup store, and the only fix is to all of your backups and start fresh. Unfortunately, deleting all of the files (including hardlinks) off of an HFS+ disk takes forever (hours and hours), so I recommend reformatting the entire partition/drive or deleting or recreating the disk image. In fact, I recommend just using disk images altogether!

Personally, I find it a pain to have to connect a disk to my laptop to do a backup. If so, I'm only likely to backup a few times a week (if that), which entirly defeats the purpose. To fix this, I've decided to connect my USB drive to my Airport and use it as an AirDisk. Recently, Apple added support to use an AirDisk like its TimeCapsule product. Again, this is tightly integrated. TimeMachine creates a sparse image on the AirDisk, and automatically mounts it whenever a backup begins and unmounts it when finished. This is fantastic! However, my PowerBook uses 802.11g (no n for me yet), so I'm stuck with extremely slow wireless transfers. This further exaggerates TimeMachine's general sluggishness I mentioned earlier. Even connecting via gigabit ethernet, backups are still slow. The worst problem lies in browsing snapshots via the TimeMachine interface, which is so slow over 802.11g its almost unusable. Finally, an AirDisk or USB drive are not redundant, meaning a crash would destroy all backup data (making TimeMachine not an enterprise backup solution).

I really like TimeMachine, but because of its slowness and buggy operation, I'm left searching for another backup solution that offers me some more flexibility (thus, my last post). I do, however, think TimeMachine has a very bright future! Apple will be tweaking it with Leopard point updates, and may very well eradicate the bugs I've experienced and speed up its general sluggishness. Hopefully, it will! One thing I'd really like to see, is snapshot support in the Finder without having to open the TimeMachine interface (such as previous version in the contextual menu). Furthermore, its very possible that in future OS X versions, TimeMachine could support delta-transfers and even ZFS!!! With these feature additions, FSEvent used to its full potential, combined with greater scheduling and target drive flexibility (including network drives), TimeMachine would truly be a world-class backup product I would relish coming back to.

Possible Homeserver & Backup Solutions

Sunday, April 20, 2008

When I built my latest computer last December, I had the intention of using it as a backup and home file server for other computers on my network, mainly my PowerBook. Once built, however, I installed Windows on it (because I don't have an Intel-based Mac) and began playing with different operating systems. The file server idea got delayed, and I'm just about ready to pick it up again. The problem is, I will periodically need to use Windows, which won't be able to access my ZFS pool from my server. I can't get around this until ZFS for Windows is available, so I'm thinking about the following solution (barring any compatibility conflicts).


Current hardware:
  • 1 500GB SATA HD
  • 1 150GB PATA HD
  • 1 100GB PATA HD
Partition Idea:
  • Use the 100GB HD as the OS drive, partitioned into the following:
  • 10GB Solaris Partition - (File-server OS - either Solaris Nevada or OpenSolaris Indiana)
  • 5-10GB *nix Partition  - (for Indiana or Ubuntu)
  • 10GB Windows Vista Partition
  • 20-25GB Test OS Partition
  • 45-50GB (aprox.) Windows XP Partition
  • Use the 150GB drive as Windows compatible storage (NTFS)
  • Use the 500GB SATA drive and additional to-be-purchased drives for ZFS storage pool
Backup Strategy:
I have a couple options when it comes to a potential backup strategy for my new fileserver, which I will explore below. Assume the backup sources are running either the HFS+ filesystem or NTFS filesystems, and the backup target is running ZFS.
  • rsync: Use rsync to make delta backups to the file-server (no incrementals or old data would be kept)
  • rsync + ZFS Snapshots: Use rsync to make delta backups to the file-server, and automatic ZFS snapshots to keep incrementals
  • rsyncsnapshot: Script uses rsync to make delta backups to the file-server, which keeps incrementals through rotated hard-links.
  • rdiff-backup: Makes delta backups to file-server, and keeps incrementals as diffs
  • ZFS send/receive: Use built in ZFS commands to send delta ZFS file-system snapshots from a ZFS source to either a ZFS target or an archive file (incrementals kept via snapshot on ZFS target; no incrementals kept in archive file?)
I would obviously like to take advantage of aspects of the ZFS filesystem, specifically snapshots, so this sways my choice a little bit, but here are some pros and cons for the differing solutions:
  • rsync: Pros (uses rsync engine, fast) Cons (no incrementals)
  • rsync + ZFS: Pros (uses rsync engine, fast, ZFS snapshots for incrementals) Cons (???)
  • rsyncsnapshot: Pros (uses rsync engine) Cons (slow?, hard-link incrementals?)
  • rdiff-backup: Pros (incrementals as diffs) Cons (slow, bad Mac meta-data support, must use rdiff-backup to browse/restore diffs)
  • ZFS Send/Receive: Pros (all ZFS) Cons (source filesystems not running ZFS)
So, I like the rsync engine. It performs very well, and has perfect Mac meta-data support. However, plain rsync won't work, since it doesn't keep incrementals. I still need to do some more testing with rsyncsnapshot, but my limited testing indicates it is dog slow. The most cons seem to be with rdiff-backup, and I can't use plain ZFS send/receive because my source to be backup is not yet running ZFS. 

This leaves me with one very promising option: rsync + ZFS. I get to use rsync's delta copies combined with ZFS's efficient snapshot feature. A seemingly win-win! Here's the only problem I can forsee: I have to find some sort of script to either automatically create a ZFS snapshot on the target upon completion of a rsync backup, or have the target automatically take snapshots at certain times (and name them after the time). Furthermore, I will need a script that can manage my snapshots automatically, keeping a given number of snapshots (such as 24 hourlies, 7 dailies, and 4 weeklies) and delete snapshots that exceed this. It is too complicated and time-consuming to manually manage the barrage of ZFS snapshots that would be created.

BackupBouncer Metadata tests

Thursday, April 03, 2008

I've recently been experimenting with different backup tools in order to find something more effective than Time Machine to use for my daily backup. Backup-Bouncer is an amazing test suit created by Nate Gray for the sole purpose of checking how well backup and file copy tools preserve all of the different types of metadata inherent in Mac OS X. I, as most people do, consider a backup tool to be ineffective if metadata is not preserved or is handled incorrectly. After scouring the tubes for some Backup-Bouncer results for common *nix backup tools, and not finding any, I decided to run my own and came up with some very interesting results:

Ditto (on 10.5):

Verifying:    basic-permissions ... ok

Verifying:           timestamps ... 

   Sub-test:    modification time ... ok

ok

Verifying:             symlinks ... ok

Verifying:    symlink-ownership ... FAIL

Verifying:            hardlinks ... ok

Verifying:       resource-forks ... ok

Verifying:         finder-flags ... ok

Verifying:         finder-locks ... ok

Verifying:        creation-date ... FAIL

Verifying:            bsd-flags ... FAIL

Verifying:       extended-attrs ... 

   Sub-test:             on files ... ok

   Sub-test:       on directories ... ok

   Sub-test:          on symlinks ... ok

ok

Verifying: access-control-lists ... 

   Sub-test:             on files ... FAIL

   Sub-test:              on dirs ... ok

FAIL

Verifying:                 fifo ... ok

Verifying:              devices ... ok

Verifying:          combo-tests ... 

   Sub-test:  xattrs + rsrc forks ... ok

   Sub-test:     lots of metadata ... FAIL

FAIL


rdiff-backup 1.1.15:

Verifying:    basic-permissions ... ok

Verifying:           timestamps ... 

   Sub-test:    modification time ... ok

ok

Verifying:             symlinks ... ok

Verifying:    symlink-ownership ... ok

Verifying:            hardlinks ... ok

Verifying:       resource-forks ... ok

Verifying:         finder-flags ... FAIL

Verifying:         finder-locks ... FAIL (edited: originally ok)

Verifying:        creation-date ... ok

Verifying:            bsd-flags ...  ok  (edited: originally failed)

Verifying:       extended-attrs ... 

   Sub-test:             on files ... ok

   Sub-test:       on directories ... ok

   Sub-test:          on symlinks ... FAIL (edited: originally ok)

ok

Verifying: access-control-lists ... 

   Sub-test:             on files ... FAIL (edited: originally ok)

   Sub-test:              on dirs ... FAIL (edited: originally ok)

ok

Verifying:                 fifo ... ok

Verifying:              devices ... ok

Verifying:          combo-tests ... 

   Sub-test:  xattrs + rsrc forks ... ok

   Sub-test:     lots of metadata ... FAIL

FAIL


Note: I have updated the above rdiff-backup rtest result to reflect some new results I've received. I think I must have errored on the previous test, because the above results have been constant through numerous trials. Additionally, I have discovered through some research that rdiff-backup doesn't support (yet) ACL's on Mac OS X with the pylibacl python extension, which would explain the ACL failures. rdiff-backup uses the python xattrs library, which seems to not fully support attributes in symlinks, as well as finder-flags and finder-locks (finder-locks may be on purpose to allow file deletion by the app). It's possible I'm using the xattrs library wrong, so I will conduct further tests and report back. I would really like to use rdiff-backup, but wish its Mac support was stronger!


Rsync-3.0.1 pre2 (Patched):

Verifying:    basic-permissions ... ok

Verifying:           timestamps ... 

   Sub-test:    modification time ... ok

ok

Verifying:             symlinks ... ok

Verifying:    symlink-ownership ... ok

Verifying:            hardlinks ... ok

Verifying:       resource-forks ... ok

Verifying:         finder-flags ... ok

Verifying:         finder-locks ... ok

Verifying:        creation-date ... ok

Verifying:            bsd-flags ... ok

Verifying:       extended-attrs ... 

   Sub-test:             on files ... ok

   Sub-test:       on directories ... ok

   Sub-test:          on symlinks ... ok

ok

Verifying: access-control-lists ... 

   Sub-test:             on files ... ok

   Sub-test:              on dirs ... ok

ok

Verifying:                 fifo ... ok

Verifying:              devices ... ok

Verifying:          combo-tests ... 

   Sub-test:  xattrs + rsrc forks ... ok

   Sub-test:     lots of metadata ... ok

ok


rsync (Apple supplied leopard default with -E flag):

Verifying:    basic-permissions ... ok

Verifying:           timestamps ... 

   Sub-test:    modification time ... ok

ok

Verifying:             symlinks ... ok

Verifying:    symlink-ownership ... ok

Verifying:            hardlinks ... FAIL

Verifying:       resource-forks ... ok

Verifying:         finder-flags ... ok

Verifying:         finder-locks ... FAIL

Verifying:        creation-date ... FAIL

Verifying:            bsd-flags ... ok

Verifying:       extended-attrs ... 

   Sub-test:             on files ... ok

   Sub-test:       on directories ... ok

   Sub-test:          on symlinks ... FAIL

FAIL

Verifying: access-control-lists ... 

   Sub-test:             on files ... ok

   Sub-test:              on dirs ... ok

ok

Verifying:                 fifo ... ok

Verifying:              devices ... ok

Verifying:          combo-tests ... 

   Sub-test:  xattrs + rsrc forks ... ok

   Sub-test:     lots of metadata ... ok

ok


As these tests show, the standard Apple-provided and modified backup tools, rsync and ditto, fail the test! Ditto performed the worst, not being able to copy 5 different types of metadata. rdiff-backup performs slightly better. However, the champion would have to be rsync 3.1, which passed every single test thrown at it with flying colors - hands down the best performance I've seen from any backup or copy tool!


Note: I didn't test duplicity or rsnapshot, but assume duplicity would perform like rdiff-backup (uses same core) and rsnapshot like rsync (since it uses rsync - just make sure its using rsync 3!) Box-backup is a little more advanced/complicated than I want to delve into at the moment, so I didn't test it.

New Backup Strategy?

Tuesday, April 01, 2008

I've been doing a lot of research recently to determine the best way to improve my current backup scheme, which consists of Mozy for online backups and Apple's Time Machine for local backups. I've documented my Mozy issues in a different post, so I will briefly comment on my problems with TimeMachine: It's slow, resource intensive, and unreliable; and its slow! I'm currently backing up to a USB drive connected as an AirDisk with my Apple Airport Express Basestation. Its very convenient (always available and not having to consistently run my home server), but slow - not just over 802.11g, but also when connected via gigabit ethernet. However, my bigger problem lies with Time Machine (not AirDisk), although both could be a lot faster. Did I mention Time Machine was slow? Anyway, here are the backup apps I'm looking at:

  • rsync 3.0.1: The gold standard - delta encoding, perfect Mac OS meta-data support, SSH-remote transfer; browsable files on a system; excellent Mac meta-data support (Cons: no versioning; no built-in encryption or compression???)
  • rsnapshot: Configuation front-end to rsync; makes versioned incremental backups using hardlinks; browsable. (Cons: no encryption)
  • rdiff-backup: Incremental; versioned; SSH; Uses libsync delta encoding; increments stored as diffs; (Cons: no encryption; increments stored in proprietary diffs - not browsable & requires rdiff-backup for restore; bad Mac meta-data support)
  • duplicity: Uses libsync, incremental, encrypted (GPG), and compressed (tar); increments saved as diffs (like rdiff-backup); Amazon S3 support (Cons: encrypted and tar'ed on server - not browsable and requires duplicity to restore; No versioning!)
  • Box Backup: Continuous data protection; requires complex daemon setup?; file-system server
One problem I have is with restoration: I'm not a CLI guru, and would like to have to use it as little as possible. Therefore, backup programs that save files as a regular filesystem directory that can be browsed/restored with GUI tools (instead of the app itself) are preferential (not sure how easy restoring from an rdiff-backup diff file would be without using the app to combine the files.)

Now, all I have to do is find a good online storage/hosting provider that doesn't charge too much, but that will be saved for another post.