Advice needed: Pro photographer workflow on linux

Welcome to the free software world. Developers and your fellow community users appreciate bug reports or else the problem might not get fixed. FWIW coincidentally at this very moment I’m working on some unfinished elements of the backup code in the latest alpha.

1 Like

What exactly would be needed there? A button for “add selected images to git annex” which runs gti annex add foo.raw followed by a git push, a button “commit sidecar to git” which runs git commit -m "snapshot from <datetime>" foo.raw.xmp and then git push? Maybe even automatically add images in import to git annex if they are not there yet? Anything else?

PS: If this is becoming too off-topic we can move that elsewhere.

@damonlynch:

As I’m testing out different distros and different programs I hadn’t considered to file a bug report yet.

It was on a antergos system, with the uar install. I had a backup set over nfs and the import and backup went asynchronous, after about 100 files it stalled completely and had to be cancelled manually.

I don’t know what a uar install is, nor an antergos system. However if the version of Rapid Photo Downloader you were running is 0.4.x, don’ t bother reporting a bug because that code has been rewritten for the new 0.9.0 version, currently in alpha.

going to try again to use RPD this time under fedora, last time was antergos.

I definitely have to checkout git for my photography workflow also. I’m using it for my personal scripting and my website development, but not yet as a sort of a backup solution for my photography.

That would be a solution, except that we are talking of litterally hunderds of thousands of files. :frowning:

going to check out if that is feasible.

Antergos is arch linux install and aur is the package manager of arch linux.

It was indeed version 0.4.x so I’m going to try out 0.9 I don’t mind beta/alpha version as long as it is not on my workstation, as I’m testing a new workflow this should be ok. :wink:

Now looking at it written down it does not sound too complicated. Plus maybe automatic sidecar snapshots whenever darkroom is left and something changed, or whenever in lighttable a threshold of tags/ratings/styles have changed, or whenever dt is left. Plus a button “save film roll state to git” with text field beside which generates a commit with a meaningful comment of all changes and added files of the current film roll.

A more elaborate system could deal with the fact that in git-annex it may be necessary to transfer files to the local file system before their content becomes available (requires to run a git-annex command), and this may require even adding an external file system, e.g. an archive disc (see https://git-annex.branchable.com/location_tracking/). It would be good if dt would not get confused by this fact and provide tools to deal with it to the user. E.g., if a raw file is opened in lighttable which is not available on the local file system, git annex get file could be automatically run, and if the file system has to be made available, a dialog could appear with this information, but allowing to cancel the operation at all without messing up DTs database or the xmp.

Agreed.

Cool. I suggest turning off back ups in alpha 4. When alpha 5 is released the underlying backup functionality will be complete.

The code I added tonight to Rapid Photo Downloader’s backup feature was the generation of thumbnails for file managers like Nautilus of RAWs and TIFFs. For example in this screenshot the drive “stuff2” is an external drive I use for testing backups (and “raw samples” happens to be a large collection of many types of RAW that I used for testing):

1 Like

I do use git-annex and I think it is fantastic!

I use it from the CLI, I guess GUI integration would be nice, but I don’t see it as necessary.

Currently I check all raw files into git-annex, then sync out to all my backup disks. I’m evaluating checking xmp files into git (not git-annex) and it seems reasonable.

Would you mind describing how you use git-annex with your photos. I would really appreciate it.

My workflow is approximately this:

I have six external hard drives that hold my git-annex repos. Three of them are unencrypted and stay in my apartment, the other three are gpg encrypted and get rotated off site.

  1. Mount camera medium on computer
  2. Change directory into the camera medium directory
  3. Run exiftool script that renames files by their creation date and copies them into a date stamped folder in my git-annex raw files repo
  4. Change directory into the git-annex raw files repo
  5. Run git-annex add . to add the files into the annex
  6. Run git commit -m "some message" to commit the changes to my git annex repo.
  7. Plug in other git annex external drives and mount them.
  8. Run git-annex sync --content to sync up content to all the connected drives.

This is used when I am importing. I usually import then sync to at least one other drive for redundancy. I’ll then start darktable or rawtherapee, both of which are pointed at the git annex raw file repo.

3 Likes

That’s a great system.

I just have to script this that it does most of it automatically, otherwise this will be too time consuming in my case. But I do like the fact that the files are in a git system. The great thing is that I don’t have to specify where I’m backing up when I’m on the road. definitely going to check out how to set all this up.

Thx

@damonlynch: sadly enough the alpha is really not working for me. too much is still missing. like the tab for renaming the files.

I’ll file a bug report for the things I found that are working but behave not like expected.

Also I submitted a feature request to add an extra option next to job name. Client name/user inputted text 2, would be good as well. This is because of my file structure. more info in the feature request.

1 Like

@paperdigits Your setup sounds similar to mine except that I just rsync to the external drives (and remote servers). I’m familiar with git and git-annex (my photos are about the only thing I don’t keep in a git repo), but I’m curious what git-annex gets you in this case… can you give a little detail on the advantages of git-annex over rsync or why you ended up at this workflow? (Do you make commits with every photo edit? Call a hook from Darktable or something? Now that would be cool. Hmm).

1 Like

Of course!

For many of the reasons listed below, but mostly I wanted to be able to take a drive full of my encrypted data off-site, not worry about adding data, then rotate the off-site data back to my apartment, updated it, then rotate it out again. I was trying to follow the 3-2-1 backup strategy at first, but once I got going with git-annex, I realized that I could just add drives indiscriminately to my backup/parity routine… so I did.

I still don’t have them on different medium and haven’t found a good way to do such a thing; my current thinking is just making multi-part tar.7z files, then burning them to DVD. Does SSD count as a different medium from HDD? I somehow doubt it :stuck_out_tongue:

With git-annex, I’m only using it for my RAW files, so they only get committed once upon import and git-annex makes them read-only by default. Read-only RAW files has been working well for me, I can’t accidentally delete them or otherwise mangle the file and RAW files really only need to be read anyway (not write or execute). Since git-annex can be used in tandem with regular git, I’m currently evaluating committing my metadata sidecar files (.xmp and .pp3) into git (not git-annex). That looks like it’ll work well too, but I’m super conservative when changing my workflow. I’m not sure if’d I’d call a hook from my editor or just do it on the CLI. Darktable does have some nice lua scripting capability.

Git annex also supports metadata and can show and hide files based on that metadata.


There are several advantages of git-annex over an rsync solution.

  • Git annex repos are aware of one another, git annex whereis file.ext returns a list of repos that contain the file. I can use annex.numcopies to specify a minimum redundancy number for my files. Currently that is set at 3, so there are at least 3 copies of my files. I don’t have to manage that manually, it is done for me. On the flip side of that, when one of my drives start to get full, I can git annex drop file.ext (which considers annex.numcopies) then it removes the file from the repo, freeing space on the disk.

  • I can add data to the repo on any disk. They’re all configured to talk to one another. While I do have a “master” disk, that is just a naming convention, git annex is just as distributed as git itself.

  • File hashing comes free. Rsync means your file got there intact, but rsync doesn’t safeguard against bit rot. You can git annex fsck -q and it’ll hash all the files and tell you which files don’t match their hash. If a file comes back bad, you can git annex get file.ext and pull a known good copy of that file (or replace the disk or whatever you need to do).

  • PGP encryption is cheap and pretty easy to set up. You can then PGP encrypt files or the whole repo using git gcrypt. I use this for my offsite backups.

  • Multiple cloud storage systems are supported (Amazon S3 & Glacier, Rackspace, etc, etc). If cloud providers are not your thing, you can set up git annex over ssh on your own server and use it that way. If your personal host doesn’t support git annex, you can use git annex to push just the files.

  • Integrates with git hosting solutions; I’m using it with gitolite and like it, and the enterprise version of gitlab supports git annex.

  • Even if git-annex disappears from the face of the planet, all your files are still there on the filesystem in .git/annnex/objects named by their file hash.

  • Git annex is flexible and I’m confident I can get what I need out of it now and in the future

1 Like

On the flip side of that, when one of my drives start to get full, I can git annex drop file.ext (which considers annex.numcopies) then it removes the file from the repo, freeing space on the disk.

Oh nice. That alone would be worth it for me. Okay, sold. Thanks for the info.

At least I would consider having the same files on drives of different manufacturers, so you can avoid e.g. firmware problems that affect a whole series of drives from one company. That said, I was myself too lazy wrt to this point, but I will definitely do it when buying the next set of backup drives.

Git-annex supports encrypted storage at different cloud services, I guess this would really count.

I’m not very comfortable with cloud storage, so I don’t use it. My six drives are spread over three different manufacturers.

I’ve been eyeballing a blu ray burner for backup, but it isn’t quite cheap enough.

1 Like

And the media are not big enough. The maximum you can get is 128 GB, I would need 10 disks for a full photo backup. With 128 GB disks hardly available and expensive (>10 €/disc) this is not feasible at the moment, I agree. The price tag may be OK for a half year backup cycle, but handling 10 disks of that size at the write speeds provided …

With cloud storage, I am not comfortable as well (besides my own own/nextcloud, but this is on site as well and therefore doesn’t count).