Hello, when I export photos, I export tags in both Exif and IPTC. Keyword in Exif show properly, however, french accents in IPTC keywords do not show properly, likely due to the character set not being specified (see image for output of IPTC keywords)
I have not been able to find the proper way to set it in order to correct this issue. Can anyone guide me?
I am using Darktable 4.2.1 on Windows 11. The tags are tags I typed directley in the darktable tagging module interface with Windows Keyboard set to French Canadian. In the export module of darktable, I have them mapped to the IPTC keywords as per the image. The initial output was from the geosetter exiftool pane. I do see the same result when importing the photos in PhotoPrism and looking at the image information.
I asked about the output, as the way the text is shown in your screenshots implies that darktable did store the IPTC keywords in utf-8 encoding. So that part seems to work as expected.
IPTC tags can be stored in two ways:
as IIM tags (afaik that way is not recommended anymore, but still possible for backwards compatibility), IMM uses an 8-bit encoding natively;
as XMP tags (recommended), basically XML, which should use utf-8 natively ;
That leaves two options:
darktable doesnât indicate the character set used for the IPTC tags stored in IIM format;
the other programs you use expect an 8-bit encoding and interpret the tags that way. If your IPTC tags are stored as XMP data, that looks like a bug in those other programs (as XMP should be interpreted as utf-8, unless otherwise indicated!).
So you may want to check how your IPTC tags are stored in the exported files (or post a file here, so others can check).
Since I am not sure how to verify how the IPTC tags are stored, I will go with your suggestion to post files.
I am attaching an example of an exported photo and also itâs corresponding XMP file
Interesting, on my (Linux Opensuse) system, both exiftool and exiv2 have no problem with the accented characters, everything shows up as expected (nice image, btw).
Darktable opens the image with no problems and shows the correct characters. exiftool -charset latin also gives correct output.
My binary editor (Okteta) shows this for the relevant (IIM IPTC start)
Perhaps some programs arenât completely unicode-aware? The switch to unicode was rather slow, iirc. (I still remember the headaches caused by different editors using different character setsâŠ)
Iirc, EXIF fields are supposed to be ascii only. That means utf-8 compatibility isnât guaranteed.
Better switch to XMP, imo: utf-8 is the native charset there, and there are no limits on field lengths (in theory). There is an ITPC namespace definition, so all ITPC fields are available.
A classic. This is from my notes on metadata handling and an exiftool command as reference:
IPTC by default uses Latin1 and chokes on UTF8. The option for the characterset writes a tag to the iptc section. This makes sure that other software can read the tags correctly. There is no conversion going on here, just identification. Make sure to put that option towards the end of the command.
That depends on what other programs you want to use. Why not start with just the default settings in darktable (i.e. without redifinitions) and see what is missing in the programs you use?
As per my post, you just need to tag your output files with the correct encoding.
And maybe file a feature request with darktable, that the tag should be written right away.
There are other possible encodings, too, so that would have to be customizable. Iâd still set the default to UTF-8 just because itâs what the sane world has agreed upon.