Git annex use in workflow

Hey @chris, sorry, your initial post caught me at an awkward time, I had to travel this week for work and it was a working marathon.

I think defining some specific use cases would be really helpful here.

Say you have two repos: laptop and backup. laptop is your laptop with limited space; backup is the external disk that is large enough to hold your whole collection of photos for a while. I’ll also assume that the available space on laptop is less than the total size of your photo collection, so doing git annex sync --content isn’t feasible, since your disk will run out of space.

It seems like you have both repos set up correctly and they can “talk” to one another, e.g., when you issue a git annex command, the repos can sync the git state and transfer files (if this assumption is not correct, we’ll back track a little and get it configured correctly).

Workflows scenarios:

  • Sync the git state of both repos without transferring content
  • Get an arbitrary file on to laptop laptop that isn’t there
  • Remove an arbitrary file from laptop
  • Keep 5-start images available on laptop
  • Import a days worth of shooting to laptop
  • Import a days worth of shooting to backup

Please let me know if there are any more workflows you’d like to add.

2 Likes

Hey @paperdigits, thanks for your support on my weird plans and fantasies on how things should work :smile:.

I still struggle with some basic concepts, e.g. which rule takes precedence, “numcopies” or the rule in “wanted”, or if I have to make “wanted” aware of the numcopies requirement. I know that your approach on my issue would be to avoid sync --content, but I still think there is a way to configure the repository in a way such that things work as expected automatically and that I cannot accidentally break it (e.g. by running sync --content).

Regarding workflow scenarios:

That’s the easy one: git annex sync, and it works as expected.

git annex get <file>

git annex drop <file/pathspec> or, if configured correctly, git annex drop -a

That’s a tricky one, and after some discussion in the git-annex forum, I think the best approach would be to sync the metadata (only the star rating or maybe other workflow related metadata) from sidecars to the git-annex metadata (always in one direction, from sidecar to git-annex) and have preferred content rules to reflect the personal ideas about which files to keep. I am working on a little shell script to parse the sidecars and set metadata accordingly. But this will take a while, not too much time recently …

Besides adding files to git-annex, I would skip more complex settings such as having memory cards configured as git-annex with the “source” group setting, since the directory structure of memory cards in cameras is somehow restricted, and there are great tools available to do this without git-annex (e.g. rapid photo downloader).

Besides these, there would be a couple of workflows to add, but I am running out of time (family wants breakfast) … But an important one would be to drop jpeg duplicates of raw files when shooting raw+jpeg. I had some discussion over at the git-annex forum regarding this, see here, and this seems to be not (easily) possible automatically from git-annex side, so I am working on a shell script to set git-annex metadata accordingly, something such as

for k in `find . -name "*.CR2"`; do
  if [ -e `dirname $k`/`basename $k .CR2`.JPG ]; then
      <set metadata>
  fi
done

but a bit more robust and with the correct meta data setting.

Hm, and, of course, the other way round as well: drop rejected and less than or equal n stars, and I would start with n=1 as long as disk space permits …

Sounds like you’ve got a solid idea of what you are doing. I didn’t mean for you to explain everything, I was just trying to clarify what the end goal is.

You should check out wanted = present as it will keep around files on the laptop that are already there, allowing you to git annex sync --content, though it may keep you from using other preferred content settings, it wasn’t super clear to me, I only read the man page twice.

There was no complaint in my response, I am happy for each little hint that helps me better understanding how things work. The next days I’ll try to better document how my setup should look like eventually.

Hm, similar on my side. I read the documentation many times over the last two weeks, but I still struggle with some details. Unfortunately, only some standard use cases are documented really well, but if you want to dive deeper, there’s a lot to discover by try and error.

I think here’s the point, I did not understand the preferred content settings well enough to achieve my goal. And the git annex find command is still a mystery to me, I am still not able to mimic a --dry-run behaviour with it.

When I’ve started to look into git annex, packages disappeared from distributions I’m using. When I ask for the reason the answer was use git-lfs and it looks like this will obsolete git annex.

Hm, for some applications they are similar, but git-annex can do so much more. I hope it will find its way back into distros. E.g., what I am working on here is impossible with lfs.

LFS and annex are similar and try to solve a similar problem, but:

  • with LFS, the files aren’t actually stored in your repo
  • LFS is centralized and requires hosting to use
  • annex has way more features and possibilities.

I still haven’t dared to feed my photo collection to git annex. I installed it way back and had a more complex setup of devices. During the experiments I shot myself in the foot quite a few times…

In theory it should save you from foot shooting, in practice not so much. I want to use it because as is I’m locked into editing on my desktop. I also can’t easily choose which files should be on my laptop.

I do use it for my rather vast collection of ideas and references. Many of them images added using addurl. I find it helpful how the url gets saved with the metadata.

Maybe one day i dare init my img folder.

My major recommendation for using git annex is to have numcopies=1 or more and to sync often. That way it makes is pretty hard to loose data. I thinkni have like seven full copies of my repos :slight_smile:

@chris can you describe what you’re trying to do with find? Have you tried whereis?

With

git annex find --want-drop --in .

and

git annex find --want-get --not --in .

I try to find out which files will be transferred or dropped to find out if a certain rule for preferred or required content works as intended. Unfortunately, when I then run the sync command, most of the time the behaviour is different than what was expected.

Sorry for kicking an old topic, but this seems like a good place to ask:

Today I played a bit with git-annex in a separate temp folder and it looks like a promising way of managing my raws (and version the xmps while I’m at it). One problem I have is that I have my photo collection on an NTFS drive so I can also access it on Windows, but git-annex insists on having two copies of each file, in the annex and the working directory.

This effectively halves my disk space. What am I doing wrong? Or is NTFS just not supported properly?

NTFS isn’t well supported since it doesn’t support Unix symlinks. Usually when you issue the command git annex add <some file>, the hash of the file is computed, and the file is moved into the annex in.git/<hash path>/<git address> and a symlink is placed in the working directory that points to the file in the annex.

I’ve not really been following git annex but I thought there were changes that worked around the issue. V7 or whatnot? I might be mixing things up.

Dang it, NTFS doesn’t do symlinks and worse, doesn’t play nicely with read/write permissions. That’s annoying, because it is the file system that is most compatible with other systems and OSs. Or are there alternative FSs for usb-drives that I could be looking at?

git config annex.thin true

The above has safety implications check the documentation to see if its worth it for you.
https://git-annex.branchable.com/tips/unlocked_files/

2 Likes

That’s actually quite a good idea! I’m not modifying the raw files anyway. Thanks!

Here is a nice video from SCALE about this specific workflow:

Now, somebody asks a question at 20:32 about some similar thing to git-annex called Adica, Attica or something similar.
It seems to be a Rust app or lib so can someone point me to it? I’d like to take a look at that app since I don’t know Haskell.

EDIT: Oh wow, I’ve just realized that this is you @paperdigits :smiley: Very cool talk man! :smiley: :smiley:

Hey thanks, that is me.

I think the rust project you are asking about is GitHub - rdrsss/attic-redux: Attic...Again! and I actually went to that talk… Not production ready and a little bit more “cloudscale storage solution” than is necessary for my photo collection. Not a knock on that project, just that my photos aren’t web scale in size.