My Backup Process

My Backup Process

A few people have asked me about the backup process I use, so this post covers how I manage my backups. Two things to note:

  • This post is rather long and boring, unless you’re into this sort of thing.
  • I’m generally happy with my approach, but I have no opinion at all about how you manage your backups. To each their own, and vive la différence.

A few updates to this old blog post:

  • In 2019, I installed a 12TB Synology NAS drive and added it to my backup process, but I'm still following the process outlined below. It's just another drive, but it's nice to have it instantly available from any of my laptops at home.
  • There are references below to 200,000 files in my backups and the use of 2-3 TB drives. As of 2021, I have nearly 500,000 files in my backups (about 95% photos), and I' using drives in the 5-12TB range.

My Requirements

To understand why I use the approach that I use for managing backups, consider the scenario where I’m in a location with limited or non-existent internet access (our favorite cabin at Lake of the Woods, say, or on a long flight with no wireless available), and I want to take a look at photos I took at some point in the past. This could be because I’m working on a project that requires those photos, because the person sitting next to me asked about a place I’ve traveled to, or just because I feel like it; each of those has happened many times.

In fact, I am typing these words on Saturday afternoon at a cabin at Kalaloch on the Washington coast. As often is the case, I’m finding time to finish up something (this blog post) while off the grid. There is no internet connectivity available here, and for those who’d say “just use your cell phone’s data plan,” you guys really need to get out more. I have 0 or 1 bars of signal strength here on AT&T, and in any event it’s an old-fashioned Edge cellular network that doesn’t include data at all.

In these sorts of situations, I’d like to be able put my hands on any of my photos pretty quickly. A delay of seconds is fine, but any delay of more than a minute is unacceptable to me. And … wait until I’m in a location with internet access? Or until I’m all the way back home? Are you kidding me? I haven’t had to do that for decades!

That scenario demonstrates what I’m trying to accomplish. And I’m willing to do a few things to make it happen:

  • I’m willing to be rigorously consistent in how I process and store photos (and other documents – photos make up most of my backup stream, but the basic principles covered here apply to everything I backup).
  • I’m willing to put in some time managing my backups (an hour a week, say).
  • I’m willing to take personal responsibility for rotating backup sets between physical locations. No, that’s too passive: I want to be the person responsible for that detail, because my backups matter far more to me than they matter to anyone else.

Some things never change

At a high level, I’ve been doing the exact same thing for 30 years in a row now:

  • I keep my content carefully organized in a folder hierarchy. (Back in CP/M and early DOS days, before folders, I would have stated that as “I keep my content carefully organized.”)
  • I use a temporary staging area for new content destined for my backup stream, and periodically (once a week or so) I move that into the backup stream.
  • I use the operating system’s native copy facility to copy my backup stream (or updates to it) to multiple copies of a complete backup set. I’ve had anywhere from 3 to 6 backup sets, and I’ve stored them at home, work, with friends or relatives, or in other locations.

This, in other words …

That’s it. That’s all I do. Note that there are a few things I don’t do, which some people see as being critical to backing up your stuff:

  • I never use “backup software” or “backup hardware” of any kind. I want the simplest approach possible, with the minimum number of components involved, since each component is a potential point of failure.
  • I never store diffs of anything: always complete backups. Diffs are another thing that can go wrong.
  • No compression, either, for the same reason.

The devil is in the details, of course. For the curious, here are a few details …

Current backup hardware

I currently have four USB drives in rotation. They’re all 2TB or 3TB size, and my current favorite is the Western Digital Passport 2TB model, around $150. I always have one copy in my laptop bag with me, one copy in the car’s glove compartment, and two copies at home. I guess that means a bomb landing squarely on the house while I’m at home, with sufficient power to destroy the vehicles on the street as well as the house, would totally wipe out my backups. I’m comfortable with that risk.

I’ve always used whatever storage technology is mainstream at the time, including Floppy disks (8”, then 5.25”, then 3.5”), Bernoulli cartridges, and ZIP disks. I’m currently using USB drives instead of FireWire because I want the flexibility of being able to use my backups from the widest possible range of devices.

Organizing and finding photos

The vast majority of what I backup is photographs, as mentioned earlier, and I have over 200,000 photos in my backup stream now. This includes not only photos from the digital era, but also scanned slides and negatives from film days, going all the way back to slides my Dad shot in the 1950s. A few years ago I did a big project to get to 100% digital in my photo backups, which meant having thousands of prints, slides, and negatives scanned by a scanning service.

So I have two groups of photos, each with different data available and therefore different approaches for organizing and finding them:

  • Photos take with digital cameras are stored in a big folder hierarchy by year/month/date; for example, today’s photos will be stored in \photos\2013\04\01.
  • Scanned photos from other sources are stored in a set of folders by event, person, place, or other criteria. For most of those, I don’t know a specific date so they don’t fit into my folder hierarchy.

So how do I ever find a photo of, say, the Eiffel Tower? Flickr is the key. I upload my best-of photos to my Flickr photostream, and tag them with whatever info I think will be handy for finding them later. When I want to find a photo of the Eiffel Tower, I search my photostream on Flickr for Eiffel Tower, and that gets me to my favorites from over the years. Then if I want other shots from those dates, or the original hi-res versions, I can use the date I’ve found on Flickr to drill down into my backup folders and get to those sorts of details.

With this approach, as with any approach, there are certain types of shots I can’t really search for. For example, suppose I want to look at “Seattle traffic” photos I’ve taken. I might have a few on Flickr, but they’re probably tagged inconsistently, and in any event they’re just a subset of those shots. That’s OK, I’m comfortable with my decisions about what to tag in the first place, and I can accept that some of my photos are difficult to find later.

The only way to avoid that entirely would be to tag every single photo with a variety of metadata, and I’m not willing to do that. Even if I could type the tags for each photo in an average of 10 seconds, that would add up to over three months of full-time work to tag all of my photos. Definitely not worth it to me; and keep in mind, I already have all of my favorites tagged, so that would just get me the non-favorites as well.

Outgrowing my laptops

One thing changed in 2012: the total size of my backup stream exceeded 1TB, the size of the hard drive in my personal laptop. So I no longer have a machine with all of my photos sitting on the hard drive, ready to go. It only takes a few seconds to plug in an external drive, so that’s not much of an issue when I want to retrieve something, but it complicates the backup process itself because I now have to copy my latest photos to four different drives. Not a big deal, but it does take a little longer than in the past.

I may set up a server at home to have a single “master copy” of my backups that I can clone to the backup drives, as I did up until late 2012. But I’m not in any hurry to do so, because what I’m doing works great for me, and I’ve become accustomed to the simplicity of living with only laptops and tablets. I spent 20 years building my PCs and tinkering with them, but since then I’ve gone over 10 years without a non-portable PC, and I’m not in any hurry to get another one. Vive la difference!

Footnote: two years after this blog post, I wrote a tool to do automated verification that all of my backup drives so that I can be sure that they’re 100% synchronized after each update, as covered in this blog post: