Backup Software

I took it to mean “the filesystem notices a change, and makes a backup of the file” (so it would be at a filesystem level, and work in all programs). But maybe I’m out cycling :slight_smile:

1 Like

No backup program I know does this. The closet would be a filesync program with a versioned trash. If properly configured (i.e. not default config) Syncthing can do that, maybe with some caveats.

I agree with everybody saying filesync is not a backup in the absolute sense, but can be used as a temporary backup or an intermediate step.

1 Like

Basically that’s named “incremental backup” or “differential backup”.
See Incremental backup - Wikipedia and Differential backup - Wikipedia

Backup software I’ve used has generally had 3 different modes:

Backup: All new and changed files from source are copied over to destination. Deleted files on source do not get deleted on destination.

Synchronize: All new and changed files from source are copied over to destination. Deleted files on source get deleted on destination, but any changes on destination do not get changed on source.

Mirror: All new, changed and deleted files on either source or destination are changed on the other drive.

I almost always use Synchronize. So when I load new image files on source, synchronize copies them to destination. If I import them in darktable and make some edits, then run Synchronize again, only the xmp files are copied since those are the only files new or changed. If I edit in Capture One, only the catalogue and some other text files are copied when I Synchronize.

I do use one external portable drive and while on the road will edit photos - for this I use Mirror since I want the destination location (the portable drive) to copy changes over to the source drive (my hard drive).

Many people use full Backup mode…they like the fact that nothing gets deleted. But for me I’m ruthless on culling photos - if I delete, they are gone. Restoring from a full backup would be a nightmare if you don’t use starring or colour codes properly to easily see what you’ve deleted in the past.

I’m on Windows, and as someone else mentioned, I also used FreeFileSync at one point, but now only use SyncBackFree. Let’s me set up multiple different “profiles” that I choose to run when I want and depending on the situation.

Actually, a lot of the backup tools mentioned so far work this way.

I would point to Duplicati, Restic, Borg, and Kopia as good open source options to consider.

How does “sync” software determine when/which files have changed? They periodically scan the directory for changes to the file’s metadata (file size, last changed date, etc.) and then check the hash of the file against the last sync’d copy.

How does backup software like the ones I mentioned above determine which files have changed? Basically the exact same method.

The four I mentioned above are the ones I recommend because they fall into a category of backup software that offers the best of both worlds, “Full” and “Incremental” backups.

Each time you run the backup, they scan your files and only send the new or changed files to your backup destination. However, each time you run the backup they also create a full snapshot of the directory. As an added bonus, they can also check for corruption to either your backup archive or the filesystem you are backing up. The magic behind this is that they use a strategy called content-addressable storage.

I’m happy to talk at length about how the algorithm works, and what its benefits are, but the part that’s relevant to this discussion is that you don’t have to choose between Full snapshots and Incremental updates.

If you’re looking for affordable off-site backup. Backblaze Personal has clients for Mac and Windows that work the same way, but use Backblaze’s servers as the backup destination. With Backblaze personal, you pay a flat monthly rate regardless of the size of the data you’re backing up.

No, that’s not the only way they know something has changed. Both Syncthing (open source) and Resilio Sync (not) also rely by default on inotify to provide almost real time information on files that have changed, without crawling the whole tree (which could be huge). This allows the sync apps to transfer changed files as soon as they are modified, without waiting for the next scheduled time (what @aptille was looking for).

1 Like

What you’re saying about inotify is mostly correct. But I highlighted the last part, your parenthetical comment, because that highlights exactly why inotify is not typically used by backup software.

While inotify is great, it’s also limited. By default, it’s typically limited to 8,192 paths watched, which is often too few for a large backup dataset. You can increase it, but it does incur a cost to your memory utilization and filesystem performance, which is why it’s limited by default.

It’s also arguably inappropriate if you have large mutable files, because each change to the file would trigger the sync/backup program to re-hash/chunk/upload the file. This is mostly a problem for VHDs, which may be continuously changing while in use. It can also be a problem for slower video rendering jobs where a file is being continuously written and therefore constantly triggering the sync/backup. Sync software solves this by rate-limiting the notifications, but then you are basically just backed off to a (short) periodic scan.

That said, if you really want to use inotify rather than scheduling a (very cheap) periodic metadata scan, it can be done with any of the four open source backup solutions I mentioned. You would just write a short script (Bash, Python, or whatever you’re familiar with) to watch the filesystem with inotify and trigger a backup job that skips the full scan and just updates the triggered path.

For Bash, you would use the inotifywait command from inotify-tools. For Python, there are several inotify modules to pick from in PyPI.