Easy differential backups with rsync

I strongly believe that for any backup system to work, it needs to be automated and basically maintenance-free. People are just lazy, and having non-automated backups is a recipe for disaster. I believe Moore’s Law applies to backups: The day you need your backups is the day after you forgot to make them.

This particular article will only deal with one component of a backup system: keeping differential or historical copies of files. For example, /etc is a good directory to keep a few weekly or daily backup copies of. If you make a change that adversley affects the system, you can always go back a day or two.

Since keeping a bunch of full copies of /etc around would take up a lot of disk space, this will use rsync to make hardlinks to unchanged files (see also soft vs hard links). The benefit is that it only makes copies of files that have changed, but each backup directory is a full backup. The total size of the backups is the size of the source, plus the size of any files that have been modified over the span of time you keep backups for.

I came up with this script many months ago based on some work by Mike Rubel. It simply takes care of deleting old backups and invoking rsync. The first time it runs, rsync will create a full copy of the source directory. The next time it runs, it will only copy the files that have been modified, and the rest will be hardlinks to the files in the first backup. The benefit is that you can delete either directory, and the files will still exist.

Every file you see in a *nix file system is a hardlink. A hardlink is basically a pointer to the actual data on the disk, and so it is possible to have multiple hardlinks to the same piece of data (though they’re limited to being on the same physical volume). A file is ‘deleted’ when there are no more hardlinks pointing to it (which is why deleting is often called ‘unlinking’ — and also why it’s still possible to recover the data).

Now, the great thing about hardlinks for rsync is that each copy of the backup can just create hardlinks that point to the same data as the previous backups’ hardlinks (if the file is the same), or create a new file otherwise. When the old backup copy is deleted, there are still hardlinks (in newer copies) pointing to the data, so it’s not really deleted. This process continues until the file is changed, then a new copy is made.

Rsync has support for creating hardlinks in this manor, so my backup script is nothing more than some glue to piece things together.