Best backup practices

Andrius · August 8, 2017, 10:23pm

Hello,

I am getting tired of backing up my collection manually and looking for a good open source backup solution right now.
Probably the majority of the community members use rsync but I am not sure if it is going to work for me.
My problem is that I do not just keep adding files to the collection or changing them but I also delete pictures constantly. So the only way for me to be sure that the backup is up to date is to delete the backup from an external HDD and copy the whole collection there again.

Is rsync capable of doing this? What would be my command?
Any tips would be greatly appreciated.
OS: OpenSUSE Tumbleweed, file system: source - etx4; backup (external hdd) - NTFS

paperdigits · August 8, 2017, 10:33pm

When I need a snapshot using rsync I use rsync -arvP --delete source-dir dest-dir. --delete will remove files from dest-dir. Is that good practice though? What if you accidently delete a file?

I’ve had good luck in the past using a program call borg-backup, which can utilize deduplication and compression, which should help a bit.

DavidOliver · August 9, 2017, 1:06am

If you want to delete the backup copies of deleted pictures then the --delete option of rsync should work well, subject to the warning from @paperdigits.

If you want to retain the backup copies of deleted pictures but also maintain a mirror-like copy of the source directory, rdiff-backup could be a nice and simple option.

The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.

E.g.:

rdiff-backup --print-statistics /path/to/source-dir /path/to/backup-dir

floessie · August 9, 2017, 7:50am

I do basically the same. But I have two (LUKS encrypted) backup disks: One at home, one at work, which I swap after a rsync. So if I’ve recently deleted something I still need, I still have it on the second disk. (And if my house burns down, I also at least have my data).

Additionally I store my photos on 25GB M-Disc BluRays in my cellar.

There’s also rsnapshot which we use with two disks for our build server at work. It’s not our primary store for sources, but it has our self-compiled toolchains, which are valuable.

houz · August 9, 2017, 8:41am

I am doing my occasional backups with this script:

#!/bin/sh

set -e

source="/home /etc /opt/foo /opt/virtualbox"
target="backup"
today=$(date +%Y-%m-%d)

rsync -avR --delete ${source} --exclude-from exclude.txt  "${target}/${today}" --link-dest="$(pwd)/${target}/last"
cd "${target}"
ln -nsf "${today}" "last"

with exclude.txt looking like this (there are some more directories with big stuff I don’t care about):

/home/houz/.cache

The two files are on an external USB disk and result in a complete backup per run. However, files that didn’t change since the last run are hard links to the old version, so unchanged files don’t take extra disk space. And since they are hard links I can just delete old folders and not lose anything referenced from newer ones.

floessie · August 9, 2017, 8:50am

Didn’t know --link-dest. Thanks for the hint!

Morgan_Hardwood · August 9, 2017, 9:42am

Since this is for public eyes, and since the title of this thread includes the phrase “best practices”, I wanted to comment that the script in its current form is only good for paths without spaces and is broken/unsafe if the paths contain spaces or unusual characters:

$ source="/opt/foo $HOME/my broken --holiday"
$ echo rsync --foo ${source} --bar
rsync --foo /opt/foo /home/morgan/my broken --holiday --bar

A simple and readable solution is to put the paths into an array:

$ sources=("/opt/foo" "$HOME/my fixed --holiday")
$ echo rsync --foo $(printf '"%s" ' "${sources[@]}") --bar
rsync --foo "/opt/foo" "/home/morgan/my fixed --holiday" --bar

Jonas_Wagner · August 9, 2017, 11:15am

I would strongly recommend doing incremental backups. The day you wake up to find your files corrupted or gone, check you backup and realize that your backups mirror the disaster is a pretty bad day.

I documented my own backup setup on my website:

rasimo · August 9, 2017, 12:27pm

For a long time I used backups based on rsync and hardlinks for incremental backups with file level deduplication. Since some time I am running borg and only have the old setup in place for legacy purposes. Borg does block-level deduplication (e.g. meaning changing exif information in an image just uses 1 block (couple kB) of data) and supports encryption and compression. So every backup you make is a “full backup” but only takes the space of a traditional “delta backup”. There is a prune command to remove old backups - so if you only want one backup, just prune to only keep one backup, Backups can be done locally or remotely and restoring can be done via command line or by fuse mounting a backup.
The development isn’t the fastest, because it is very thorough - which is a good thing for a backup tool. The first release candidate for version 1.1.0 has just been released (after 6 beta releases).
borgbackup.org

Andrius · August 9, 2017, 7:29pm

Thank you for sharing the command.
I usually don’t accidently delete images and think 2-3 times before deleting one plus digikam has internal Trash plus the OS has Trash so I am feeling pretty safe here.

Andrius · August 9, 2017, 7:31pm

that sounds like a great option. I will take a look at rdiff-backup for sure! Thanks for sharing!

Jonas_Wagner · August 9, 2017, 7:51pm

Shit happens. From corrupt disks, bugs in software, ransomware to good old stupidity

dutch_wolf · August 9, 2017, 8:36pm

Currently using a multi tier setup firstly using snapper on top of a btrfs raid10 to create a timeline set of snapshots, the snapshots cover deletions, bugs and basic ransomware (the snapshots are read-only) while the raid + btrfs checksums should protect against bad disks. Besides this I uses borg to backup to an external drive (which runs on ext4).

In this setup the only thing missing is a backup to a remote location.

Andrius · August 9, 2017, 9:32pm

I know. I have already had a hard drive die. I was lucky to be able to recover 99.5% of my collection using recuva (a proprietary software for Windows) but if the things gone worse that would have been a very very bad day (pictures were covering around 5 years of my family’s life)

Andrius · August 9, 2017, 9:46pm

I think I am sold on your solution using diff-backup . I like the idea of having the mirror backup as well as the increments for X weeks .

paperdigits · August 9, 2017, 10:38pm

I would consider this to be a very solid strategy: http://www.dpbestflow.org/backup/backup-overview

rasimo · August 9, 2017, 10:53pm

And while were throwing backup best practices around: Whatever tool/process you use:
A backup that you have never restored successfully is no backup.

houz · August 10, 2017, 8:51am

I know and don’t care. But maybe others who want to make use of it will benefit.

houz · August 10, 2017, 8:55am

That is FUD.
It might be worthwhile to check if backups packing stuff into their own file formats actually works, but with all the rsync based backups you can easily check the files copied over, so actually restoring a system from it as a test is not needed.

rasimo · August 10, 2017, 9:20am

That’s harsh
That’s one obvious big advantage of rsync based backups: Restoring and backing up is the same thing → If you can back it up, you can restore. If you are really paranoid, you could still say that the destination medium might be broken, so it is still good to check that backups are actually readable.
For any tool that doesn’t “just” recreate the original data, this is most certainly not FUD. You can find stories abounds in the interweb and I can confirm personally, that it is a potential issue. In my case it was only mangled filenames and lost filesystem times on restore, but that was annoying enough.