Of course!
For many of the reasons listed below, but mostly I wanted to be able to take a drive full of my encrypted data off-site, not worry about adding data, then rotate the off-site data back to my apartment, updated it, then rotate it out again. I was trying to follow the 3-2-1 backup strategy at first, but once I got going with git-annex, I realized that I could just add drives indiscriminately to my backup/parity routine… so I did.
I still don’t have them on different medium and haven’t found a good way to do such a thing; my current thinking is just making multi-part tar.7z files, then burning them to DVD. Does SSD count as a different medium from HDD? I somehow doubt it ![]()
With git-annex, I’m only using it for my RAW files, so they only get committed once upon import and git-annex makes them read-only by default. Read-only RAW files has been working well for me, I can’t accidentally delete them or otherwise mangle the file and RAW files really only need to be read anyway (not write or execute). Since git-annex can be used in tandem with regular git, I’m currently evaluating committing my metadata sidecar files (.xmp and .pp3) into git (not git-annex). That looks like it’ll work well too, but I’m super conservative when changing my workflow. I’m not sure if’d I’d call a hook from my editor or just do it on the CLI. Darktable does have some nice lua scripting capability.
Git annex also supports metadata and can show and hide files based on that metadata.
There are several advantages of git-annex over an rsync solution.
-
Git annex repos are aware of one another,
git annex whereis file.extreturns a list of repos that contain the file. I can useannex.numcopiesto specify a minimum redundancy number for my files. Currently that is set at 3, so there are at least 3 copies of my files. I don’t have to manage that manually, it is done for me. On the flip side of that, when one of my drives start to get full, I cangit annex drop file.ext(which considersannex.numcopies) then it removes the file from the repo, freeing space on the disk. -
I can add data to the repo on any disk. They’re all configured to talk to one another. While I do have a “master” disk, that is just a naming convention, git annex is just as distributed as git itself.
-
File hashing comes free. Rsync means your file got there intact, but rsync doesn’t safeguard against bit rot. You can
git annex fsck -qand it’ll hash all the files and tell you which files don’t match their hash. If a file comes back bad, you cangit annex get file.extand pull a known good copy of that file (or replace the disk or whatever you need to do). -
PGP encryption is cheap and pretty easy to set up. You can then PGP encrypt files or the whole repo using git gcrypt. I use this for my offsite backups.
-
Multiple cloud storage systems are supported (Amazon S3 & Glacier, Rackspace, etc, etc). If cloud providers are not your thing, you can set up git annex over ssh on your own server and use it that way. If your personal host doesn’t support git annex, you can use git annex to push just the files.
-
Integrates with git hosting solutions; I’m using it with gitolite and like it, and the enterprise version of gitlab supports git annex.
-
Even if git-annex disappears from the face of the planet, all your files are still there on the filesystem in .git/annnex/objects named by their file hash.
-
Git annex is flexible and I’m confident I can get what I need out of it now and in the future