Best backup practices

If you want to delete the backup copies of deleted pictures then the --delete option of rsync should work well, subject to the warning from @paperdigits.

If you want to retain the backup copies of deleted pictures but also maintain a mirror-like copy of the source directory, rdiff-backup could be a nice and simple option.

The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.

E.g.:

rdiff-backup --print-statistics /path/to/source-dir /path/to/backup-dir
2 Likes

I do basically the same. But I have two (LUKS encrypted) backup disks: One at home, one at work, which I swap after a rsync. So if I’ve recently deleted something I still need, I still have it on the second disk. (And if my house burns down, I also at least have my data).

Additionally I store my photos on 25GB M-Disc BluRays in my cellar.

There’s also rsnapshot which we use with two disks for our build server at work. It’s not our primary store for sources, but it has our self-compiled toolchains, which are valuable.

1 Like

I am doing my occasional backups with this script:

#!/bin/sh

set -e

source="/home /etc /opt/foo /opt/virtualbox"
target="backup"
today=$(date +%Y-%m-%d)

rsync -avR --delete ${source} --exclude-from exclude.txt  "${target}/${today}" --link-dest="$(pwd)/${target}/last"
cd "${target}"
ln -nsf "${today}" "last"

with exclude.txt looking like this (there are some more directories with big stuff I don’t care about):

/home/houz/.cache

The two files are on an external USB disk and result in a complete backup per run. However, files that didn’t change since the last run are hard links to the old version, so unchanged files don’t take extra disk space. And since they are hard links I can just delete old folders and not lose anything referenced from newer ones.

3 Likes

Didn’t know --link-dest. Thanks for the hint! :+1:

1 Like

Since this is for public eyes, and since the title of this thread includes the phrase “best practices”, I wanted to comment that the script in its current form is only good for paths without spaces and is broken/unsafe if the paths contain spaces or unusual characters:

$ source="/opt/foo $HOME/my broken --holiday"
$ echo rsync --foo ${source} --bar
rsync --foo /opt/foo /home/morgan/my broken --holiday --bar

A simple and readable solution is to put the paths into an array:

$ sources=("/opt/foo" "$HOME/my fixed --holiday")
$ echo rsync --foo $(printf '"%s" ' "${sources[@]}") --bar
rsync --foo "/opt/foo" "/home/morgan/my fixed --holiday" --bar
3 Likes

I would strongly recommend doing incremental backups. The day you wake up to find your files corrupted or gone, check you backup and realize that your backups mirror the disaster is a pretty bad day.

I documented my own backup setup on my website:

2 Likes

For a long time I used backups based on rsync and hardlinks for incremental backups with file level deduplication. Since some time I am running borg and only have the old setup in place for legacy purposes. Borg does block-level deduplication (e.g. meaning changing exif information in an image just uses 1 block (couple kB) of data) and supports encryption and compression. So every backup you make is a “full backup” but only takes the space of a traditional “delta backup”. There is a prune command to remove old backups - so if you only want one backup, just prune to only keep one backup, Backups can be done locally or remotely and restoring can be done via command line or by fuse mounting a backup.
The development isn’t the fastest, because it is very thorough - which is a good thing for a backup tool. The first release candidate for version 1.1.0 has just been released (after 6 beta releases).
borgbackup.org

1 Like

Thank you for sharing the command.
I usually don’t accidently delete images and think 2-3 times before deleting one plus digikam has internal Trash plus the OS has Trash so I am feeling pretty safe here.

that sounds like a great option. I will take a look at rdiff-backup for sure! Thanks for sharing!

Shit happens. From corrupt disks, bugs in software, ransomware to good old stupidity

Currently using a multi tier setup firstly using snapper on top of a btrfs raid10 to create a timeline set of snapshots, the snapshots cover deletions, bugs and basic ransomware (the snapshots are read-only) while the raid + btrfs checksums should protect against bad disks. Besides this I uses borg to backup to an external drive (which runs on ext4).

In this setup the only thing missing is a backup to a remote location.

I know. I have already had a hard drive die. I was lucky to be able to recover 99.5% of my collection using recuva (a proprietary software for Windows) but if the things gone worse that would have been a very very bad day (pictures were covering around 5 years of my family’s life)

I think I am sold on your solution using diff-backup . I like the idea of having the mirror backup as well as the increments for X weeks .

I would consider this to be a very solid strategy: http://www.dpbestflow.org/backup/backup-overview

And while were throwing backup best practices around: Whatever tool/process you use:
A backup that you have never restored successfully is no backup.

1 Like

I know and don’t care. But maybe others who want to make use of it will benefit. :grinning:

That is FUD.
It might be worthwhile to check if backups packing stuff into their own file formats actually works, but with all the rsync based backups you can easily check the files copied over, so actually restoring a system from it as a test is not needed.

That’s harsh :stuck_out_tongue:
That’s one obvious big advantage of rsync based backups: Restoring and backing up is the same thing → If you can back it up, you can restore. If you are really paranoid, you could still say that the destination medium might be broken, so it is still good to check that backups are actually readable.
For any tool that doesn’t “just” recreate the original data, this is most certainly not FUD. You can find stories abounds in the interweb and I can confirm personally, that it is a potential issue. In my case it was only mangled filenames and lost filesystem times on restore, but that was annoying enough.

Just have withdrawn my last (very hasty) post. I will try to provide later on a more complete post on backup strategies. The first point of data saving is data organisation. You have to think about how to organize your assets. The second point is versioning. As you will change or edit pictures, you have to think about versioning of files (there are various methods for doing this).
Third think about a Backup medium. A small NAS is well affordable can be mounted via fstab on boot time to provide always the same path.
Last is doing the backup: here technology is secondary, primary is to have a simple way to recover single lost files.
Specially: if you want to first backup but modify later on your source (say by deleting low quality images-) there has to be a save way to sync this change to your back directory. GitHub - bashforever/safeback: Bash script to backup directories and save differences in target structure in SAVE directories is one approach to do this. (this is beta - explanation follows later on)

So just wait a littlte bit and I will provide my thoughts more in detail.

Cheers

Immanuel.

I always use rsync for backing up data.

Rsync with the switch “–delete” will delete the files at the remote end. Without specifying “–delete”, files at the remote end will not be deleted. As such the crux of your problem.

For example, so long as the drives are formated EXT and are not FAT/NTFS:
$ rsync -ax --delete /source_folder/ /remote_folder/

(FAT/NTFS formated drives will required a mix-mash of extra flags for omitting permissions, etc…)

Best backup practice is likely to use an external Firewire/USB-2/USB-3 hard drive. Even better, put a NAS in your garage in the event your house catches fire, but then you’re likely using a slower gigabit wired network.

The current costs of commercial remote storage is probably not feasible with the data accumulated with RAW images. But that’s just my opinion.