Reproducible Research: WorldMake.org

Open Peer Review: OpenReview.net

Metagenomics: RTAX, QIIME


RSS feed

GitHub: davidsoergel

Twitter: @loraxorg

s3napback: Cycling, Incremental, Compressed, Encrypted Backups to Amazon S3

2008 May 07

In searching for a way to back up one of my Linux boxes to Amazon S3, I was surprised to find that none of the many backup methods and scripts I found on the net did what I wanted, so I wrote yet another one.

The design requirements were:

  • Occasional full backups, and daily incremental backups
  • Stream data to S3, rather than making a local temp file first (i.e., if I want to archive all of /home at once, there's no point in making huge local tarball, doing lots of disk access in the process)
  • Break up large archives into manageable chunks
  • Encryption

As far as I could tell, no available backup script (including, e.g. s3sync, backup-manager, s3backup, etc. etc.) met all four requirements.

The closest thing is js3tream, which handles streaming and splitting, but not incrementalness or encryption. Those are both fairly easy to add, though, using tar and gpg, as suggested by the js3tream author. However, the s3backup.sh script he provides uses temp files (unnecessarily), and does not encrypt. So I modified it a bit to produce s3backup-gpg-streaming.sh.

That's not the end of the story, though, since it leaves open the problem of managing the backup rotation. I found the explicit cron jobs suggested on the js3tream site too messy, especially since I sometimes want to back up a lot of different directories. Some other available solutions will send incremental backups to S3, but never purge the old ones, and so use ever more storage.

Finally, I wanted to easily deal with MySQL and Subversion dumps.

The solution

I wrote s3napback, which wraps js3tream and solves all of the above issues by providing:

  • Dead-simple configuration
  • Automatic rotation of backup sets
  • Alternation of full and incremental backups (using "tar -g")
  • Integrated GPG encryption
  • No temporary files used anywhere, only pipes and TCP streams (optionally, uses smallish temp files to save memory)
  • Integrated handling of MySQL dumps
  • Integrated handling of Subversion repositories, and of directories containing multiple Subversion repositories.

It's not rocket science, just a wrapper that makes things a bit easier.

Check out the project page for more info and to download it!