IPTC Keyword encoding question

Hello, when I export photos, I export tags in both Exif and IPTC. Keyword in Exif show properly, however, french accents in IPTC keywords do not show properly, likely due to the character set not being specified (see image for output of IPTC keywords)
I have not been able to find the proper way to set it in order to correct this issue. Can anyone guide me?

IPTC Character issue

Thank you
sm

What computer or device and what operating system?

I have windows PC and here are some letters with French accents:

çéĂȘĂ€

they are Unicode characters from my windows-7 ‘Character Map’ pop-up. Copy and Paste them into your IPTC and see if they work.

Is your IPTC keyword list in a pure text format?

2 Likes

And what was used to create the output you show?

The text looks like UTF-8 input displayed with an 8-bit encoding: the â€œĂ…Â©â€ is the kind of thing you see then.
UTF-8 “non-ascii” characters are encoded in more than one byte, using an 8-bit encoding shows each individual byte

2 Likes

I am using Darktable 4.2.1 on Windows 11. The tags are tags I typed directley in the darktable tagging module interface with Windows Keyboard set to French Canadian. In the export module of darktable, I have them mapped to the IPTC keywords as per the image. The initial output was from the geosetter exiftool pane. I do see the same result when importing the photos in PhotoPrism and looking at the image information.
Darktable export preference

@xpatUSA , I did the quick test you mentionned using darktable tagging but the result is not good. test accent

I asked about the output, as the way the text is shown in your screenshots implies that darktable did store the IPTC keywords in utf-8 encoding. So that part seems to work as expected.

IPTC tags can be stored in two ways:

  • as IIM tags (afaik that way is not recommended anymore, but still possible for backwards compatibility), IMM uses an 8-bit encoding natively;
  • as XMP tags (recommended), basically XML, which should use utf-8 natively ;

That leaves two options:

  • darktable doesn’t indicate the character set used for the IPTC tags stored in IIM format;
  • the other programs you use expect an 8-bit encoding and interpret the tags that way. If your IPTC tags are stored as XMP data, that looks like a bug in those other programs (as XMP should be interpreted as utf-8, unless otherwise indicated!).

So you may want to check how your IPTC tags are stored in the exported files (or post a file here, so others can check).

1 Like

Since I am not sure how to verify how the IPTC tags are stored, I will go with your suggestion to post files.
I am attaching an example of an exported photo and also it’s corresponding XMP file


20230429_094227_0009.CR2.xmp (12.8 KB)

You can also see how Flickr treat the metadata here: le faucon et sa proie | Stéphane Morin | Flickr

The easier option I see is to simply export Exif tags and forget about IPTC tags, but that doesn’t really fix the issue :wink:
Thank you all for your time!

I think there may be a general problem, not just ‘darktable’ 


I opened an image in XnView and pasted in the four French characters as an IPTC keyword. They showed correctly.

Then I opened the image in ExixtoolGUI. The characters showed correctly.

I do not have darktable, so I opened the image in RawTherapee 
 voila

test RT

As you can see. the same problem appears.

So; both our editors are failing to display UTF8 ‘special’ characters’ correctly. How to correct that, I have no idea.

Anyone else know?

1 Like

Can you try exiftool -charset latin?

Sorry, I don’t understand the suggestion.

Is that a command-line and what does it do?

I only use ExiftoolGUI to view, not to change something.

Yes it is cli. Maybe there is a preference in the GUI.

Does your suggestion change the image EXIF or only how text appears in the exif tool? In my ExiftoolGUI the French text appears correctly.

Interesting, on my (Linux Opensuse) system, both exiftool and exiv2 have no problem with the accented characters, everything shows up as expected (nice image, btw).
Darktable opens the image with no problems and shows the correct characters.
exiftool -charset latin also gives correct output.

My binary editor (Okteta) shows this for the relevant (IIM IPTC start)

ÿí.ĂźPhotoshop 3.0.8BIM
Ñ
P
Stephane Morin
Animaux
Canada
Oiseaux
Québec
Sépaq
faucon pélerin
parc national de Boucherville
pic mineur
province
province de Québec
le faucon et sa proie.

where the “ÿí” bit is the byte order mark. Note the â€œĂƒÂ©â€ groups, which are the utf-8 code for â€˜Ă©â€™ (0xC3 0xA9)

Perhaps some programs aren’t completely unicode-aware? The switch to unicode was rather slow, iirc. (I still remember the headaches caused by different editors using different character sets
)

Iirc, EXIF fields are supposed to be ascii only. That means utf-8 compatibility isn’t guaranteed.
Better switch to XMP, imo: utf-8 is the native charset there, and there are no limits on field lengths (in theory). There is an ITPC namespace definition, so all ITPC fields are available.

Perhaps the exiftool faq about charsets is useful?

1 Like

Better switch to XMP, imo: utf-8 is the native charset there, and there are no limits on field lengths

What would be the proper XMP field to use to export keywords? I think I will go that route

image

Thanks
sm

A classic. This is from my notes on metadata handling and an exiftool command as reference:

IPTC by default uses Latin1 and chokes on UTF8. The option for the characterset writes a tag to the iptc section. This makes sure that other software can read the tags correctly. There is no conversion going on here, just identification. Make sure to put that option towards the end of the command.

_$ exiftool \
    '-IPTC:Keywords < XMP-dc:Subject' \
    '-IPTC:Caption-Abstract < XMP-dc:Description' \
    '-IPTC:CodedCharacterSet=UTF8' \
    FILENAME
3 Likes

Here are some data points, in case it’s useful. Downloaded https://d2x313g9lpht1q.cloudfront.net/original/3X/0/2/02c02215e065b33238d3f012db8da0282ffbcb87.jpeg|xmp and viewed in some common software.

Geeqie image viewer:

Gwenview image viewr:
gwenview

Imported to darktable:

Exiftool:

Looking at XMP:

My locale:
konsole

This all on Ubuntu Linux 23.04.

2 Likes

That depends on what other programs you want to use. Why not start with just the default settings in darktable (i.e. without redifinitions) and see what is missing in the programs you use?

Unless you need to use IPTC tags, in which case the IPTC specification for XMP might be of help.

2 Likes

As per my post, you just need to tag your output files with the correct encoding.

And maybe file a feature request with darktable, that the tag should be written right away.
There are other possible encodings, too, so that would have to be customizable. I’d still set the default to UTF-8 just because it’s what the sane world has agreed upon.

1 Like

Thank you everyone for your help!
I’ve switched to XMP and it works fine now. I don’t have a specific use that warrant IPTC so I’m good now

1 Like

Unfortunately, I don’t see how to add that tag with Darktable as it is not available to choose: