Backup Software

It sounds like synchronization is the better bet. I just want to be sure that a file i just edited, is stored on an external drive as well as my computer.

it depends on what you want. When synchronizing, if you accidentally delete a file, then the sync happens, the file is deleted in both places. Sync is not a backup.

3 Likes

These two points are worth reading a few times. Syncing is one thing, backing up is another.

I’m personally a fan these days of restic but i know some others love borg.

I’ve got daily snapshots going back weeks that only take up as much space as files I’ve changed. You could set it to run on a schedule every hour (or less). It’s fast at finding differences and only needs space to store the diffs.

2 Likes

Good point. I really want copies of the files that I edit. Can’t afford to lose a synchd file if I delete a working copy by mistake.

I use the btrfs filesystem and its incremental send/receive capability.

2 Likes

my solution is

# The last / in source directory means don't create A6300 directory again under Destination A6300 directory
    /usr/local/bin/rsync -avE --exclude=.DS_Store "/Volumes/HackSSD/A6300/" "/Volumes/DevToshiba/A6300"
    /usr/local/bin/rsync -avE --exclude=.DS_Store "/Volumes/Data/A6300/" "/Volumes/DevToshiba/Data/A6300"

    # in this case below NO / after darktable means create if not present darktable directory under DarkTabe Backup directory & copy
    /usr/local/bin/rsync -avE --exclude=.DS_Store "/Users/raj/.config/darktable" "/Volumes/DevToshiba/DarkTable Backup"

This is part of a script which is autorun when I plug my USB backup drive and it will not delete any files in the backup drive only backup modified and new files. I am on OSX Big Sur

I’m also a great fan of borg. I make an automatic daily snapshot of my data and store it encrypted “in the cloud” (my own server). No problems for serveral year.
In addition, I sync the data from time to time to a second computer in my home with rsync. Just to be sure :wink: .

Put another thing up:

For years and years I use rsnapshot.
It is actually a script using rsync in the background.

Files can be read directly, only native Linux tools are used.

Is the original question related to a specific operating system?
I personally like multiple backup methods:

  • Synced copy on a local drive (internal or USB)
  • Synced copy on one NAS drive
  • Synced copy on another NAS drive (different location)
  • Versioned backup (reverse incremental backup) on local drive

On Windows I am using FreeFileSync, Hardlinkbackup, AutoVer and Syncovery.

1 Like

I use autover with my xmp files…that way if I screw up I can go back in time so easily to any version of my edit…

I should give this a try. I use rsync to external USB but I also do combine it with periodic duplicati for .xmp and a cloud version.

I use rsnapshot as well, but the original question was:

As far as I know, none of the tools mentioned recognise when a change to a file is made. That could be done (al least Linux) with some kind of filesystem watcher/trigger, can’t it? I’m not aware of any backup program doing it that way though.

I am not sure I am grasping the point here. Even if the FS would detect file change - it would need an action to make the snapshot (it does not make snapshot by itself so a user can somewhat step back every single transaction).

Since the user has to initiate the action (based on schedule or manually) what is the difference between FS approach and backup approach?

So yes - IMO backup does provide the functionality as described in the original question. It is rather - what tool works most elegantly for the user (and each has pros and cons).

I took it to mean “the filesystem notices a change, and makes a backup of the file” (so it would be at a filesystem level, and work in all programs). But maybe I’m out cycling :slight_smile:

1 Like

No backup program I know does this. The closet would be a filesync program with a versioned trash. If properly configured (i.e. not default config) Syncthing can do that, maybe with some caveats.

I agree with everybody saying filesync is not a backup in the absolute sense, but can be used as a temporary backup or an intermediate step.

1 Like

Basically that’s named “incremental backup” or “differential backup”.
See Incremental backup - Wikipedia and Differential backup - Wikipedia

Backup software I’ve used has generally had 3 different modes:

Backup: All new and changed files from source are copied over to destination. Deleted files on source do not get deleted on destination.

Synchronize: All new and changed files from source are copied over to destination. Deleted files on source get deleted on destination, but any changes on destination do not get changed on source.

Mirror: All new, changed and deleted files on either source or destination are changed on the other drive.

I almost always use Synchronize. So when I load new image files on source, synchronize copies them to destination. If I import them in darktable and make some edits, then run Synchronize again, only the xmp files are copied since those are the only files new or changed. If I edit in Capture One, only the catalogue and some other text files are copied when I Synchronize.

I do use one external portable drive and while on the road will edit photos - for this I use Mirror since I want the destination location (the portable drive) to copy changes over to the source drive (my hard drive).

Many people use full Backup mode…they like the fact that nothing gets deleted. But for me I’m ruthless on culling photos - if I delete, they are gone. Restoring from a full backup would be a nightmare if you don’t use starring or colour codes properly to easily see what you’ve deleted in the past.

I’m on Windows, and as someone else mentioned, I also used FreeFileSync at one point, but now only use SyncBackFree. Let’s me set up multiple different “profiles” that I choose to run when I want and depending on the situation.

Actually, a lot of the backup tools mentioned so far work this way.

I would point to Duplicati, Restic, Borg, and Kopia as good open source options to consider.

How does “sync” software determine when/which files have changed? They periodically scan the directory for changes to the file’s metadata (file size, last changed date, etc.) and then check the hash of the file against the last sync’d copy.

How does backup software like the ones I mentioned above determine which files have changed? Basically the exact same method.

The four I mentioned above are the ones I recommend because they fall into a category of backup software that offers the best of both worlds, “Full” and “Incremental” backups.

Each time you run the backup, they scan your files and only send the new or changed files to your backup destination. However, each time you run the backup they also create a full snapshot of the directory. As an added bonus, they can also check for corruption to either your backup archive or the filesystem you are backing up. The magic behind this is that they use a strategy called content-addressable storage.

I’m happy to talk at length about how the algorithm works, and what its benefits are, but the part that’s relevant to this discussion is that you don’t have to choose between Full snapshots and Incremental updates.

If you’re looking for affordable off-site backup. Backblaze Personal has clients for Mac and Windows that work the same way, but use Backblaze’s servers as the backup destination. With Backblaze personal, you pay a flat monthly rate regardless of the size of the data you’re backing up.

No, that’s not the only way they know something has changed. Both Syncthing (open source) and Resilio Sync (not) also rely by default on inotify to provide almost real time information on files that have changed, without crawling the whole tree (which could be huge). This allows the sync apps to transfer changed files as soon as they are modified, without waiting for the next scheduled time (what @aptille was looking for).

1 Like

What you’re saying about inotify is mostly correct. But I highlighted the last part, your parenthetical comment, because that highlights exactly why inotify is not typically used by backup software.

While inotify is great, it’s also limited. By default, it’s typically limited to 8,192 paths watched, which is often too few for a large backup dataset. You can increase it, but it does incur a cost to your memory utilization and filesystem performance, which is why it’s limited by default.

It’s also arguably inappropriate if you have large mutable files, because each change to the file would trigger the sync/backup program to re-hash/chunk/upload the file. This is mostly a problem for VHDs, which may be continuously changing while in use. It can also be a problem for slower video rendering jobs where a file is being continuously written and therefore constantly triggering the sync/backup. Sync software solves this by rate-limiting the notifications, but then you are basically just backed off to a (short) periodic scan.

That said, if you really want to use inotify rather than scheduling a (very cheap) periodic metadata scan, it can be done with any of the four open source backup solutions I mentioned. You would just write a short script (Bash, Python, or whatever you’re familiar with) to watch the filesystem with inotify and trigger a backup job that skips the full scan and just updates the triggered path.

For Bash, you would use the inotifywait command from inotify-tools. For Python, there are several inotify modules to pick from in PyPI.