Weird, incosistent tag problem (probably) with digikam

I’ve been working on a photo digitization project for several months now, and I think I’ve finally narrowed down an issue to digikam, but I can’t reproduce it consistently.

Here’s my workflow. I scan negatives or prints with Epson iScan into GIMP. There, I rotate, crop, clean up, and add “Description” metadata before exporting as a .png. The “Description” field in the GIMP metadata dialog saves to (at least) Exif.Image.ImageDescription (wrong, see below), and Iptc.Application2.Caption. I’m using thumbsup, and it will read the ImageDescription and display that with the image.

The problem is, after I finish scanning a batch, import the images into digikam, and add face tags so that thumbsup will create albums for people, about half of the exif descriptions stop showing up in thumbsup. Since thumbsup uses exiftool, I’ve been looking at the metadata with exiftool. For the images that no longer display a description in the thumbsup gallery, exiftool either doesn’t list Description any more, or it says it’s binary, and I need to use the -b option.

If I generate the html with thumbsup before adding face tags in digikam, all the Descriptions show up in the html gallery. After adding face tags, about half of them are gone, or binary according to exiftool and thumbsup. If I load the image into GIMP again, “Write Metadata” without making a change, and re-export the image. The Description works again in exiftool and thumbsup.

I started writing a rust program with rexiv2 to just copy the caption to the description, and it shows that the caption is still there, and sometimes the description is still there. Sometimes it’s changed to “Created with GIMP”

Note, I don’t use the metadata editor in digikam. I just show the face tags, and add face tags.

I think I’ve narrowed it down to something digikam is doing, but I can’t figure out if it’s a problem with my workflow, or a weird bug in digikam. I couldn’t find anything in digikam’s bug tracker.

So, is my workflow obviously stupid? Should it ever work? I still have lots more old photos and negatives to scan, so it would be nice to figure out what I’m doing wrong.

Edit: I lied. When I enter text in GIMP’s Description field in the Metadata dialog, it sets the Xmp.dc.description tag, and the Iptc.Application2.Caption tag, but NOT the Exif.Image.ImageDescription tag. Geeqie and exiv2 agree. exiftool just shows it once as “Description”

Have you tried with jpegs instead of PNG? PNG is weird about metadata still.

No, I really want a lossless format. And Webp is even weirder about metadata, but it’s smaller than png.

Not sure about JPEG-XL support in Gimp, but that’s what you want.

Then there’s always TIFF. Though that can be (much) larger than png (depending on bit depth, compression etc.)

As for digikam, you can tell it which XML tags to use for various metadata items. Crucially, you also define the order in which the different fields are used: the first field found is used. You set that up in the configuration dialog. So in your case, you will want the Xmp.dc.description tag or Iptc.Application2.Caption tag first, then the other two. For writing, I think you can specify all three.

Are you using utf-8 characters outside the ascii set, by any chance?

GIMP 2.99 does support jxl, but thumbsup still doesn’t, so I’m not quite as certain as you are that it’s what I want.

For previous projects like this, I saved the original scans as TIFFs, then batch generated jpgs for sharing. That was a pain for LOTS reasons, and I’m trying not to go back to that.

My (apparently) naive expectation was that digikam would read in the tags from the image, and then not delete or change any existing tags when it writes them back to the image, since the intersection of the set of tags written by GIMP and the set tags of tags I’m changing in digiKam is null. I still don’t understand why the order is important, and the docs just say:

This functionality is often used by advanced users to synchronize metadata between different software. Please leave the default settings if you are not sure what to do here.

I already have a script to read an .md file and write the contents to the correct tag. I might just do that at scan time, and after face tagging in digikam, run the script to set the caption.

No, I went down that rabbit hole for weeks, trying to figure out why some disappeared, and some didn’t. I had 17-character, captions disappear, and 200+ character ones survive. But they’re all 7-bit ASCII clean.