About darktable metadata values and management

mmv · December 16, 2024, 1:39am

Hi!

I configured my camera to save both the default JPEG and the NEF version of each picture. For historical reasons, those files were initially ingested in Lightroom version 6 (!, part of a process I am trying to escape from), which was used to insert keywords (tags in DT), and then to update the metadata in the JPEG file and to export those keywords in the XMP sidecar file for the NEF version. As a test case, two such pictures and the Lightroom-generated XMP file were subsequently imported in DT 4.8.1. The following two screenshots exhibit the metadata exhibited by darktable for each file:

Here are some questions:

Why does the “lens” item report “24.0-70.0 mm f/2.8” for the JPEG image and “Nikon AF-S Zoom-Nikkor 24-70mm f/2.8G ED” for the NEF version?
What is the meaning of the “zo” at the start of the “import timestamp” value?
What is the meaning of the “za” at the start of the “datetime” value?
Is it to be expected that the “width” and “height” of those pictures are not the same in the JPEG and NEF versions (I did not crop the JPEG)?
More generally, where can I find the documentation concerning each and every item in that metadata list? For instance, what process sets the “flags”, how many flags exist or can be set, in what format (e.g., do the dots in those flag values mean anything), etc.?
I found the “metadata settings” menu under the “presets” icon, and saw that I can mask some of the items in the image information panel, but is it possible to add other items, such as the “GPS Date/Time” reported by exiftool, which is the universal date and time of the picture, in this case “2024:04:13 08:28:11Z”?
In the planned process of moving from the old Lightroom to the current darktable, I’d like to update the keywords (tags) into a hierarchical structure, such as

“Varese” => “World|Europe|Italy|Lombardia|Varese”

or

using the open source classification of all living organisms at https://www.catalogueoflife.org/. This approach may be necessary to distinguish, for instance, between Milan (Kundera), Milan (city), Milan (province) and Milan (bird).

Question: Where/when/how should such conversions be applied, preferably programmatically as I have to process 80,000+ pictures? In Lightroom before exporting, as a separate processing of the XMP files (for NEF files), with the risk of messing up those files, or after ingestion in darktable?

Lastly, to what extent is the performance of darktable affected by the number and length of the tags? For instance, is it appropriate to manage that many pictures, in a single common library, each with between three and thirty tags, where the tag dictionaries may contain 1000 names of people, 1000 different locations, and 1000 living organisms, formatted as suggested above? Alternatively, would it be best to setup a small number of independent databases, say by year or decade instead of merging all the information from all pictures of the last century in a single database? Or are synonyms better suited for efficiency in this case?

Thanks in advance for your suggestions.

martin.scharnke · December 16, 2024, 5:08am

Welcome to pixls.us, Michel!

Some possible help with some of your questions:

That will be down to Nikon as to how they embed metadata.

My guess these are weekday name abbreviations for your locality?

The raw sensor data is larger than the nominal JPG size - this is Nikon’s determination to crop a very few pixels from some or all of the edges.

The flags are a black magic item that I have also tried to translate without luck (I also shoot Nikon DSLRs)

The issue of which date/time has been debated a fair bit, I believe, but the bottom line is that darktable will not modify the embedded data in any event, rather just its own database and the xml file it creates.

The tagging conversion seems likely to be difficult, short of directly manipulating the SQLite database in libraryrc (only after exiting darktable and backing it up).

As to performance impact - I have many hundreds of tags across some 60,000 images - I have no anecdotal evidence of significant performance impact. As always, however: Your mileage may vary.

paperdigits · December 16, 2024, 6:48am

Hey, welcome to the forum

Likely your jpeg has the string and your raw file has a numeric code that is interpreted by the underlying dark table metadata library, exiv2.

Yes, your camera crops some pixels off of the jpeg. Raw processors like Light room do the same cropping. Darktable generally does not.

You’ll want to look at the exiv2 docs, likely.

Not currently.

Dark table has an internal database, I’d probably use a lua script to manipulate the tag, turn off xmp writing when you do it. Back up the database before. Then if you screw up, you can just replace the database. This will require some programming tho.

I’d think you should be OK, but you can always test it out!

mmv · December 16, 2024, 9:02am

Thanks a lot to @martin.scharnke and @paperdigits: your detailed comments are very useful indeed.

On the “zo” and “za” question: excellent idea. I currently live in Belgium, which has three official languages, including Dutch, which I don’t speak: probably “zondag” and “zaterdag”. The next question, then, is how do I request darktable (and/or macOS) to use French or English as my default language rather than Dutch?

Thanks again for your timely and exhaustive support.

rvietor · December 16, 2024, 9:24am

Those flags are set by darktable, and hold some information about the image. If you hover over the flags, you’ll get some more information (first appears to be the star rating, there’s one that indicates a raw image, and what was used to load the image). The dots mean there’s no relevant value, or the value is “unknown”.
It’s clear those “flags” aren’t binary flags with just an “on” and an “off” state

Indeed, “zondag” en “zaterdag”.

For the language: darktable’s “preferences” ==> “general” allows you to set the interface language. For macOS, I suppose there’s also a way to set such preferences, but that’s better asked from mac users…

Concerning the metadata: I understand you have not yet loaded all the images in darktable.

For simple keywords with no ambiguity as to where in the future tree they should be: you can manipulate the tags after the images are loaded in darktable: create the tree you want, select the images with the current flat tag, apply the “tree tag” and remove the flat tag. No need to mess with scripts and the database directly. I’m not sure a script helps a lot here, as it cannot decide where in the tree the flat tag should go…
If you have “ambigous” tags (like the “Milan” example), same basic idea, but you’ll have to manually select the images depending on where they have to go. Not something a script can do.
If you have good backups, you can try to manipulate the xmp files, but then you’ll have to know which xml tag to use…

If lightroom can handle hierarchical keywords, you may be better off creating the trees there, as you are more familiar with that program.

Oh, and there’s an extra wrinkle with the tree for living organisms: there are always parts that are in flux (I’ve seen that with orchids, and with mushrooms). That means names can changes, and organisms can even be moved between families. Not too bad in itself (it’s fairly rare that species are split or combined with others, though that does happen), but it can complicate searching for the organism (esp. in paper documentation…). And the tree can get rather deep. So perhaps you don’t need to use all levels for tagging (which, for me, is meant to find images, not to give a full taxonomy)

mmv · December 16, 2024, 4:04pm

Thanks a lot for your comments, @rvietor.

Yes, I did notice that the values of those flags can be visualized by hovering the mouse on the values (0…r…a…r in the NEF case). So where can I find the complete list of 13 flags currently implemented in DT? I did find a list of flags in the documentation for the Exiv2 utility manual at Exiv2 - Image metadata library and tools, but those appear unrelated to the DT flags.
Thanks for your suggestions about tags management: indeed, I have not made the switch over from LR to DT yet. I’ll make a few more tests first…
I am aware that the field of taxonomy is in a permanent state of flux, though for most of the common macroscopic plants and animals I have photographed, their classification should be relatively stable. I plan to specify only the main categorical levels anyway.

Thanks again for your help.

paperdigits · December 16, 2024, 4:20pm

I’d guess you probably need to read the source code.

mmv · December 18, 2024, 1:02am

Thanks, @paperdigits.

pehar · December 18, 2024, 8:43am

Maybe darktable/src/libs/metadata_view.c at master · darktable-org/darktable · GitHub lines 339ff

static void _metadata_get_flags(const dt_image_t *const img, char *const text, char *const tooltip, const size_t tooltip_size)
...

might answer your question regarding the flags.

mmv · December 19, 2024, 3:29am

Hi @pehar. Thanks for the pointer to the source code. My understanding of the C function “metadata_view.c” between lines 339 and 470 is as follows:

“flag_descriptions” is a pointer to a string array of 11 predefined labels to be displayed on screen when the mouse is hovering over the string value defined next.
“tooltip_parts” is a pointer to the string of 15 bytes which is exhibited by DT in the “image information” panel. This array is initially defined as a null string and built by concatenating 13 characters (indices 0 to 12), one at a time; the 14th character (index 13) is set to a null character to end the string. The 15th char is unused.

Each character in that string has the following meaning:

char at index 0: number of stars assigned to the picture (range: 0 to 5).
char at index 1: either “.” (meaning “empty” or “false”) or “!” (meaning “true”).
char at index 2: either “.” (meaning “empty” or “false”) or “!” (meaning the image thumbnail is deprecated).
char at index 3: either “.” (meaning “empty” or “false”) or “l” (meaning low dynamic range (LDR) picture).
char at index 4: either “.” (meaning “empty” or “false”) or “r” (meaning raw image).
char at index 5: either “.” (meaning “empty” or “false”) or “h” (meaning high dynamic range (HDR) picture).
char at index 6: either “.” (meaning “empty” or “false”) or “d” (meaning image flagged for removal).
char at index 7: either “.” (meaning “empty” or “false”) or “a” (meaning “auto-applying presets applied”).
char at index 8: either “.” (meaning “empty” or “false”) or “c” (meaning “image local copy”).
char at index 9: either “.” (meaning “empty” or “false”) or “t” (meaning “image contains text”).
char at index 10: either “.” (meaning “empty” or “false”) or “w” (meaning “image has WAV”, presumably audio data).
char at index 11: either “.” (meaning “empty” or “false”) or “m” (meaning monochrome image).
char at index 12: either “.” (meaning “empty” or “unknown loader”) or “j” (meaning "JPEG " in this case).

However, being new to DT, the following is still unclear to me:

what test generates a “true” or “false” response for the char at index 1? Is this option currently unused?
when is an image thumbnail considered deprecated? Is it garbled or simply out of sync with the most current processing of the raw image?
why are there two separate tests for LDR and HDR? Is it possible for a picture to be both simultaneously?
which presets are automatically applied to a picture (presumably on import), as reported by the char at index 7?
when is a picture declared “local copy”, as reported by the char at index 8?
what characters other than “j” are allowed at index 12?

Thanks again for pointing me to the right code fragment. I hope this may help anyone curious about those flags.

pehar · December 19, 2024, 7:28am

I must confess that I have never familiarized myself with these flags to the extent that I could answer all your questions. They are of no real importance for my daily work with the program. I don’t even display them in the “Image information” module (disabled by hamburger menu → preferences).
With this in mind, please consider my answers with caution.

out of sync

depends on the image (RAW / non RAW)

https://darktable-org.github.io/dtdocs/en/overview/sidecar-files/local-copies/

rvietor · December 19, 2024, 8:36am

And on criteria specificed for user-defined presets, perhaps? It is possible to have such presets automatically applied when opening the image for editing.
To see which presets are applied to your images, just have a look at the history stack after opening an image…

Note that presets are auto-applied when the image is opened for editing the first time (and when you reset the history stack, iirc), not on import per se. If you always create thumbnails from raw, this will happen on import.

mmv · December 19, 2024, 10:17am

Thanks a lot, @pehar and @rvietor, for these additional points.

pehar · December 20, 2024, 7:58am

You spent some time and effort to extract this information from source code. Have you thought about offering this information to the dtdocs (manual) ? Maybe first as a draft, other users or developers could then complete / correct it. GitHub - darktable-org/dtdocs: darktable user manual , A plea for help writing the docs

rvietor · December 20, 2024, 8:18am

I wonder what the relevance of the information in those flags is for the end user. Most of it is either available elsewhere, like flag[0], which seems to be the rating (0…6, where 6 = “rejected”), or frankly irrelevant for daily use, like the loader used.
Debugging is a different matter, there the information can be relevant. But that should be a rare case for end users.

Having the flags documented somewhere outside the source code could be nice, but I don’t feel it should be part of the manual.

pehar · December 20, 2024, 8:59am

In principle, I agree. As I said, I have personally deactivated the display of the flags. However, the flags are presented in a prominent position with the default setting of the module (line 12). Questions from users are therefore to be expected…
The question should be a different one: Should these flags be shown at all, or do they only have a historical character? If the flags are displayed, then there should also be a documentation in the manual.

rvietor · December 20, 2024, 1:49pm

Fair point. But then any field that’s not immediately clear should have an explanation in the manual (e.g. “lens” can just show a number in parentheses, and there are the different time stamps, export size, …). And atm, none of the fields are even mentioned.

mmv · December 22, 2024, 1:01am

Thanks again, @pehar and @rvietor, for your comments and suggestions. As mentioned earlier, I am totally new to darktable (DT): in fact, I’m in the process of evaluating whether this software tool will meet my needs, so I may not be the best person (yet) to start writing DT’s definitive documentation. However, I did have a look at the GitHub call for help (A plea for help writing the docs)…

Regarding the 13 DT flags, I note that
- flag 0 duplicates the information already provided by the stars at the bottom of each thumbnail in the “lighttable” view of DT.
- flag 1 appears to be unused.
- flag 2 (“thumbnail is deprecated”) could be implemented as a separate item in the “image information” panel, or perhaps as a warning, if there is a centralized mechanism for reporting potential and actual issues.
- flags 3 and 5 could be combined as a single item in the “image information” panel to indicate whether the picture is LDR or HDR (assuming these are exclusive of each other).
- flag 4 duplicates the information provided by the extension of the file containing the image, itself appearing in the “filename” and the “full path” items in the “image information” panel (e.g., r = raw = “NEF” extension in the case of Nikon hardware).
- flag 6 (image flagged for removal) could also be implemented as a filter option and/or a warning before deletion.
- flag 7 appears to duplicate the information in the “history stack”, if I understand correctly the comment of @rvietor.
- flag 8 is not entirely clear to me: does it indicate that the information provided concerns a “local copy” of an image whose master copy is located elsewhere than the address shown in “full path”? This could also be implemented as a warning, or a more explicit annotation in one of the “darkroom” panels.
- flags 9 (“image contains text”), 10 (“image has an associated WAV record”) and 11 (“image is monochrome”) could appear as separate items in the “image information” panel, instead of as a flags (would that also make it easier to search for images with those characteristics?).
- flag 12 probably duplicates the information contained in the extension of the file: it would be useful to know why this flag was setup in the first place.

So, I second the comment of @rvietor: from a user point of view, most of these flags do not appear to provide essential information, and those who do could/should be implemented individually rather than as cryptic flags, and be clearly documented (as recommended by @pehar). However, I do not know whether those flags are used elsewhere in the code or fulfill another purpose.

On the more general question of the number of items shown in the “image information” panel, I don’t quite understand why only a small selection of the metadata items generated by Phil Harvey’s excellent “exiftool” software are made available to DT users. I doubt that the developers can anticipate any and all requirements of any and all existing and prospective users, nor is it their mandate to limit such information to what they like. Since it is already possible to select or deselect displayed items from the list of available items, the obvious solution is to offer the entire set of metadata (EXIF, IPTC and XMP) as an option and to set the initial default selection to something close to what most users would expect. This is similar in approach to providing a wide range of processing modules and letting the user decide which ones to use in a particular context.
Lastly, and as an aside, why is there not a “Documentation” category as a primary locus for discussion of that important topic on the PIXLS.US web site (next to “Processing”, “Software”, “Hardware”, etc.)? Documentation is often attributed a low priority in the software development process, even though it really conditions the usability of the software. And better documentation in general would reduce the burden on the community to answer questions from users.

Thanks again for your support.

paperdigits · December 22, 2024, 1:15am

What would we discuss there that doesn’t fit into an existing category?

Specifically for darktable, I’d want to not give any indication that things should be put anywhere other than the official docs.

Ehhh, we are certainly in the “crowdsource the answers if it isn’t the first YouTube video in the search result” era.

But also the docs are pretty through already.

g-man · December 22, 2024, 2:12am

I’m pretty sure they are use in the lighttable view to show the icons in the overlay views. I’m not near my PC at the moment, but I recall reading the code about the sound one (flag 10). Due to dt maintaining backwards compatibility with previous xmp, it is likely some of these are not used but not easy to remove them.