bulk import workflow advice

anarcat · November 25, 2022, 3:34am

Hi there, long time no see!

I have a problem: I have a huge backlog of photos (~18 months, ~3000 shots) I need to process. I’m not satisfied with my current workflow because it’s kind of bespoke and incomplete.

Current workflow

I take pictures on my phone and (much less than before) a Fuji X-T2 camera. I have about 1500 photos to process on each, so I guess an average of 100 a month. That doesn’t sound like much, but skip a few months and you’re kind of doomed… Anyways.

The current workflow goes a little like this:

mount the SD card (or sync the photos from the phone with Syncthing)
open rapid photo downloader (RPD) and browse to the right directory
wait for thumbnails to render
deselect all photos
then, for each “roll”, select photos that go together, and import them as a named roll
repeat step 5 a long time
add everything into git annex (git -C ~/Photos annex add . ; git commit -m'yolo import')
open darktable
import the most recent rolls, probably by reimporting the last month or year, depending on how much there is to import
try to rate each photo from reject to 5
normally no time left for this, but typically do some post-processing here (!!)
export photos to some gallery, either sigal or pixelfed, not sure anymore
print some photos (but the local store closed shop)

Problems with the workflow

Nowhere in the workflow do I remove the images from the original media (that feature was removed from RPD some time in the past. This causes multiple issues on its own:
1. the next import run takes an undue amount of time as it needs to rescan all those old photos
2. RPD can “forget” which photos it has already imported especially if i am on a different workstation
3. the storage on the media grows unbounded (AKA my phone is almost full now)
i typically solve this by manually deleting photos from the medium after they are into git-annex (as they are now safe)
it’s slow, with lots of manual steps:
- there’s no hooks in RPD to fire up git-annex automatically
- and it doesn’t automatically import into Darktable either
- RPD doesn’t allow me to easily preview images in full size, or compare them
- rating is separate from import, which means i process each image twice
- i can’t just delete an image on import
where are you supposed to print photos now anyways, ugh

Ideas for a new workflow?

I’m open to considering other software than Darktable. I was previously using Shotwell to manage my photo collection, and it would do pretty neat tricks in regrouping images automatically, for example. But that was before I got a “real” digital camera and started shooting RAW.

… I’m not sure I have time to shoot RAW anymore, to be honest. It’s so much work just getting back up the “original JPG” hill than more often than anything, I just use Fuji’s excellent JPGs… but I can’t help but feel there’s always this corner case where I can get more out of the RAW, and have successfully recovered some shots that way… So I guess RAW is there to stay?

I have tried using Darktable’s importer in the past, but it feels slow, and doesn’t really resolve the problems described here: it also copies images (instead of moving), doesn’t support hooks, and I’ve been kind of burned by it. Maybe it’s all better now and I should try again?

I heard Digikam is pretty good, could I use it to replace Darktable completely? I read the Synchronizing between digiKam and darktable thread which is quite promising in that aspect… I could just work with Digikam for most stuff and switch over to Darktable for more acute work? How is Digikam’s import story?

I’ve also worked with a manual import script a few years ago, before using RPD. I’m not sure I want to go back there: I like having a GUI to see my photos, and no, fim > import-list && exiftool -@ import-list is not sufficient.

Finally, how do people connect their desktop workflow with online and offline medium? Do you post stuff in a gallery of some sort? Pixelfed anyone? Do you print photos?

I know how broad this sounds, but maybe it could be a “hey, what’s the state of the art in 2022 for you” kind of thread…

Thanks!

elGordo · November 26, 2022, 12:37am

A lot of what you’ve described resembles my workflow. The biggest difference is that I’m not unhappy with mine!

I may not work with my photos after every time I’m out, but I do import them every time (*). I like to see if I got anything exciting or if there are some mistakes I can learn from. On the very odd occasion, I am pleasantly surprised.

Open rapid photo downloader (RPD), which remembers the top directory of my photos.
Insert SD card (I find things go better if RPD is up before the card goes in).
Wait for thumbnails to render. The wait is largely because I only format my SD cards when they are nearly full.
Deselect all photos. I only have to do this if not all the photos will go under the same “job code” (using RPD terminology). I try to make the job codes keyword-rich, so I can easily find photos via their directory name.
For each “job code”, select photos that go together, and import them as a named job code.
I do not need to repeat step 5 for a long time, because I only have to import the last outing.
rsync my photos to a second computer (I could use RPD’s auto-backup or git annex, but I already had rsync set up).
Open geeqie and review the photos. Detonate the absolute crap, and keep the rest. I’ve never really gotten into the whole ratings thing.
For any photo I want to edit, open the editor I want to use (dt, RT, ART) from geeqie. In effect, geeqie is acting as my dt light table or RT/ART file browser.

I don’t use dt’s database(**) because, for me, I don’t really find that it gives me anything I need. I keep sidecars from all three of the raw processors I may use.

For me, this workflow is relatively quick and lightweight without any bells and whistles that I don’t use.

(*) I don’t import every time from my phone, because most of my phone photos are “note to self” shots, not something worth editing. I would have hundreds of directories with one or two photos each. My workflow doesn’t really handle that case very well.

(**) I do use dt’s database (and no sidecars) if I am testing a different version of dt, so that the test version doesn’t write sidecars by the photos.

Edit: just before posting this reply, I looked at the thread name again and realized I may not be exactly answering your question, because I don’t normally import a year’s photos at a time. However, I did initially [mis]use RPD to organize 20 years of photos into a coherent directory structure, so I understand the strain that comes with that. To tell you the truth, I think the biggest obstacle in your workflow is not any of the tools you are using; it’s simply the large accumulation of photos from many outings.

K-1 · November 26, 2022, 1:45am

Thanks for your work on Sigal!

Hate post processing, so scripted most to minimize the effort. digiKam holds the whole collection, across cameras, film and digital, JPEG or RAW.

I do not manually sort by “collections” but strictly import by $camera/$date. What belongs together gets keyword tagged. From digiKam i either open a RAW developer, or extract the camera JPEG with a script.

What i want to share has a specific filename pattern and is uploaded via rsync to date based folders on a server, again, with a script that also builds the website gallery with Sigal.

Best case scenario:

take some photos
plugin card
call import script
select good photos in digiKam
call export JPEG from RAW script with the selection
call upload script

The scripts are not generalized, so just some key commands.

Mount and import

udisksctl mount -b /dev/disk/by-label/foobar
gphoto2 --skip-existing --keep --get-all-files --filename='/camera/%Y/%Y-%m-%d/%f.%C'
udisksctl unmount -b /dev/disk/by-label/foobar

export JPEG from RAW and copy EXIF over. Copy any titles from XMP for Sigal to display.

dcraw -e "$RAW"
exiftool -q -tagsFromFile "$RAW" "$DEST"
exiftool -q -tagsFromFile "$XMP" '-XMP:Title>IPTC:ObjectName' '-XMP:Description>IPTC:Caption-Abstract' "$DEST"

Probably can improve import speed with RPD, but wasn’t really an issue for me.

Terry · November 26, 2022, 3:30am

I would caution against getting any downloading software to delete the original images. If something goes wrong with the download your images could be lost forever. Also I have seen on more than one occasion where the card becomes unrecognized when returned to the camera. However, when this occurred I was able to reformat the card on a computer and then load it into the camera where I reformatted it again. I recommend formatting the card in the camera when you are confident your pictures are safely backed up elsewhere.

anarcat · November 26, 2022, 3:13pm

I feel that’s actually actually riskier than the idea workflow I’m thinking of. If I delete the file once and only when it’s actually added into git-annex, it’s pretty safe. The only thing safer would be to run git annex sync --content to actually copy it off-site as well.

If, instead, I delete all files when I think I have copied them all … somewhere, then I run the very real risk of forgetting to import a picture. I have tried that and the net result is usually that i never wipe the card because I’m never sure I have everything on there.

That workflow also doesn’t work on phones, where I really don’t wipe the entire storage ever.

alpinist · November 26, 2022, 3:27pm

Hi @anarcat, I’m in a similar situation, with a huge backlog because in the past I used the raw developer of the manufacturer of my camera and this stopped working well for me when I got a different body. Perhaps you will find some of the following points useful.

On Linux, I am not aware of a more efficient culling solution than geeqie. Personally, I shoot a lot in continuous drive, so I typically keep only 5% of the shots or so. Geeqie is very snappy, and allows to quickly switch back and forth between subsequent images, on any zoom level. I like to enable “show marks” mode and toggle mark 1 for the pictures that I’d like to keep. When all have been marked, I let geeqie select all the marked ones and move them to a directory on the computer. I delete the rest directly using geeqie as well, right on the card. This way, the pictures on my sdcards correspond to the images that haven’t been culled yet.
I began using digicam for viewing/searching my collection. I first wanted to use darktable for this purpose, but digicam seems better suited for this purpose. In particular, images do not need to be imported explicitly. New images simply show up, deleted ones disappear. In my opinion the workflow for tagging, rating and searching of photos is somewhat better in digikam. I configured digikam such that its database only acts as a cache. Modification are only written to XMP files both for RAWs and for JPEGs. I use the following shell alias to “transplant” XMP metadata into JPEGs whenever I want: alias xmp-transplant="parallel -X 'exiftool -tagsfromfile %d%f.%e.xmp -xmp:all -ignoreMinorErrors -overwrite_original {.} && rm {.}.xmp' ::: ". This way, the JPEGs and RAWs are only ever modified explicitly, and I can keep them in git-annex.
It took me a lot of time, but finally (with what will be darktable 4.2) I succeeded in crearing a simple default style that gives good results for most of my photography. I have many images that I’d like to keep for record, but that do not justify a lot of manual editing. The common advice in this case would be to simply keep the out-of-camera JPEGs, but I prefer if all of my photos go through the same pipeline and have a consistent look. In addition, shooting RAW only is faster/uses less storage.

In darktable, by default, I apply “hot pixels” and “lens correction” to all images, together with a custom “exposure” (for my camera, I set the exposure to +1 EV by default). This is a good starting point for further edits, but I now also have a “default” style that adds slightly custom “filmic” (black point to -8 EV, white point to + 3.5 EV), “local contrast” (set to defaults), “color balance” (set to “basic colorfulness: standard”), and “denoise” (set as suggested here: Possible to achieve good basic chroma noise reduction with a preset? - #20 by priort). I find that this gives good results in many cases, often already much better than the camera JPEG.

Geeqie would allow to address the last three of these points. As for the first, isn’t it sufficient to add JPEGs and RAWs to git-annex only from time to time? As for the second, darktable can be launched directly from digikam or from the desktop file manager. It’s quick to launch it on a whole directory of fresh RAWs (select all and then “open in darktable”).

priort · November 26, 2022, 4:21pm

Ulrich’s script is a nice start to preprocessing as well…Its a bit slow right now but it runs the auto picker for exposure, and the filmic autopickers and sets the tone eq mask so its ready to go…you can also tweak a few other things by default… If you couple this to a custom Quick Access panel and a few auto applied presets you could have a pretty good starting point… THe script works from lightable or darkroom view… Could be a great template to modify…

paperdigits · November 26, 2022, 4:27pm

I have several large SD cards that I rotate though, and only format when then go back into the camera. That gives me a bit of time before the card is formatted. I’ve also stopped formatting the 2nd card for cameras that have 2 slots, and I rotate those as well.

rvietor · November 26, 2022, 5:10pm

As I understood it, git-annex doesn’t copy/store the image files, but only keeps track of where it’s stored, etc… So it would not be a backup solution, nor protect against disk failure.

anarcat · November 26, 2022, 5:13pm

It definitely stores files and copies them. The main difference between files tracked by git and git-annex is that the file “annexed” by git-annex are not tracked in the git history, which allows you to more easily track large files. Older versions, for example, are marked “unused” and can be pruned.

But git-annex definitely stores files! And it keeps track of where files are, and how many copies there are, and can make sure you have at least N copies. It even keeps a checksum on all files for deduplication but also as a guardrail against bit rot.

So I definitely consider it a valid backup solution. But it goes a step further and provides you with an archival solution, a much harder problem to solve.

It’s a great match for photo management, I would never roll that back.

anarcat · November 26, 2022, 5:14pm

I definitely used the word “definitely” too often in that post, but you get the idea.

rvietor · November 26, 2022, 5:16pm

OK, thank you for clearing that up. Seems I misread the git-annex page (or got the wrong variant?).
Back to Google (or rather Duckduckgo).

anarcat · November 26, 2022, 5:17pm

git-annex is not for the faint of heart… it’s a bit complicated and kind of assumes you already know git. There’s a web UI for it that is supposed to make it easier to use, but as a seasoned git user, it generally made my life more complicated… so yeah, it’s normal if you got confused.

paperdigits · November 26, 2022, 5:26pm

I agree and have been using it since 2013. I should publish my workflow article that’s been sitting around for checks notes 4 years…

paperdigits · November 26, 2022, 5:27pm

@anarcat why not use git annex on your phone?

anarcat · November 26, 2022, 5:52pm

i tried that, with termux, or with a special remote, it was quite complicated. syncthing just works, and better and faster than git-annex. i use git-annex on the other end of that… different tools, different purposes…

anarcat · February 18, 2023, 4:38pm

Looking again at this thread now that I am completely unable to use RPD because of a weird new bug, possibly caused by Wayland (see 1031557).

I have actually tried to use this, somewhat naively, to import photos from my phone which are sync’d to my local disk with syncthing, but that doesn’t work for my case because gphoto doesn’t find those pictures at all… It assumes an SD card or some camera is connected to the computer…

I think my next step is just give up on the “roll name” workflow Darktable somehow imposed on my and just batch-import everything in a YYYY/MM/DD tree. The last photos I have properly archived at all are from April 2021 at this point, almost two years. I have 25GB of photos on my phone and an unknown quantity in the camera. I need a stopgap fix here…

paperdigits · February 18, 2023, 8:27pm

This just GUI parlance on darktable’s part, it doesn’t have much to do with the file organization.

This is what I do already with exiftool, though I don’t import video. In darktable, the film roll is just the date folder name.

anarcat · February 19, 2023, 1:51am

yeah but it is featured quite prominently in the GUI. it’s also a quick way to find similar things… i don’t have time to individually tag each picture with everything that’s on it, but i can generally assign a tag to a roll, which also allows me to regroup shoots that happen over multiple days.

i used to find this super confusing in darktable, and quite annoying as I didn’t use a roll subdir before, but now i find that workflow quite useful. assuming i have time to assign a darn roll…

so are you YYYY/MM/DD or YYYY-MM-DD?

paperdigits · February 19, 2023, 1:56am

This one. I have a flat folder structure.