a reasonable way to create backups on Linux


psst! click here to skip to the tutorial.

recently, i ran into an issue with the software i was previously using to run automated versioned backups of a couple of folders on my computer. the backups randomly stopped being able to be read or written to. thankfully i didn't need them, but i had to figure out a way to replace them.

first, I had to decide on what i was backing up and and how i was going to do that. in my case, i only need to back up a couple dozen gigabytes of various personal files from my home directory, and i need them to be backed up so that i can recover an old version in case i make a mistake, or in case my computer explodes.

there's a lot of places where i could have stashed the data, though the three that are most relevant are cloud storage, a local NAS, and an external USB drive. cloud storage offers nearly infinite scalability, but it comes at a monthly cost, and you don't always know if you can trust them to not snoop on you. a local NAS has the benefit of being under your full control, but is much more expensive to deploy and scale as needed. an external USB HDD or SSD is cheap and easy to get, but they could easily be destroyed along with your computer by malware or a power supply failure. i personally use a local NAS, but you can really go with whatever option you want.

software is also an important consideration. many backup solutions use a proprietary1 format to store the data, so it can be annoying to get the data back. encrypted backups sound good for security, but if you forget the password you might as well have never had a backup in the first place. because all i wanted was a simple backup that could be read easily, i chose rsync as the tool to use for it.

the next question is how to call upon rsync. rsync is one of those ubiquitous old tools with an excessive amount of options, so figuring out which ones you actually care about can be difficult. thankfully, they provide a flag which automatically sets some sane defaults — which is the "archive" flag — so we don't have to do that part ourselves.

the only other important flag to note is the "link-dest" flag; an obtuse, mysterious flag which magically solves all of the problems i had. when called with a folder, it checks that folder for any files which happen to match what you're trying to back up. if they're already there, it hardlinks them into the new folder. if they're different in any way, the new version is copied over. using this, we can construct a versioned backup with ease.

the last piece of our puzzle is the date command. it allows you to print today's date, or even yesterday's date! by saving our backup into a folder named with today's date, and checking the folder of yesterday's date with the link-dest flag, we can create a versioned backup using a single command ran daily by cronjob. no complex scripts, no fancy tools; and since it's all hardlinks, we don't have to worry about one link in a chain of delta backups breaking and ruining all of it.


alright, but how do we actually do this?

first, you need to create an initial backup. this can take a while, so be patient. make sure to replace the [SOURCE] and [DEST] with your source and destination folders respectively.

/bin/sh
rsync -azzv [SOURCE] [DEST]/"$(date -I)"
					

note the date part at the end. this will put your backup in a subfolder which has the name of the current date. pretty nice, eh? next, once your backup is done, edit your crontab to add a line like this.

crontab -e
0 0 * * * rsync -azz --link-dest=../"$(date -d yesterday -I)" [SOURCE] [DEST]/"$(date -I)"
					

if you're connecting to the server over SSH, you can replace $(date -d yesterday -I) with $(ssh me@nas.local 'ls backup | tail -1') to ensure that you're linking to the last backup made, instead of making a full backup any time you miss a day.

other than that, the only problem I can see with this method is that it could stop working in approximately 178.1 years (65,000 days) as ext4 will run out of hardlinks per file if a file remains unchanged over that entire duration.

i guess i'll have to find a better solution when that happens.


1. this is not in the meaning of software licensing, but in that you can only read the backup with the software itself, instead of being able to read the backup as plain old files.