exiv2 data corruption when importing a folder with XMPs

Hi,

I’m working with dtlapse and for that I need to rather often import large folders (600…900 pictures) with raw+xmp files, containing a pre-existing edit stack. Darktable corrupts a small fraction of the XMPs on import.

I’m using darktable --library :memory: SAM_*.SRW to load the files (but the issue also happens with the regular sqlite library), loading and saving to an NVMe on Linux with dt 5.2.1.

A few (between one and a dozen) XMP files get overwritten with empty edit stacks. This seems to be due to a race condition within exiv2 that’s not sufficiently thread-safe. Darktable reads the XMP, does bad thing, then writes a fresh “empty” XMP.

I get the following in the debug logs for the corrupted files:

36.1024 [exiv2] XMP Toolkit error 101: Unregistered schema namespace URI
36.1025 [exiv2] Failed to decode XMP metadata.

These messages come together, but sometimes there is only the first of them, and there are more of these messages than there are corrupt XMPs.

I’ve looked into the issue with strace, and it looks like there are multiple XMP files open at the same time from multiple darktable PIDs. I haven’t checked if there are also parallel read and write operations on different XMPs.

Hmm… Race conditions mean you have two “actors” (programs or threads) trying to write to the same resource (here: .xmp files). Darktable reading, modifying and writing an .xmp should not cause a race condition (the operations are strictly sequential), nor should operating on several sidecars in parallel cause issues.

You really need two different entities trying to write to the same .xmp. Candidates here are dtlapse and darktable. Do you by any chance have darktable running while dtlapse is working on a stack?

1 Like

This needs significant more detail. I saw your post in the Matrix channel.

  • What’s the original file naming scheme? filename.xmp or filename.raw.xmp(eg dt edits in them?)

  • are you using copy and import or add to library?

  • what OS?

  • external drive?

1 Like

I only have one instance of Darktable open, and I can reliably corrupt a few XMP files by just running an import into Darktable.

strace is showing two PIDs working on the XMP files, a parent and its child, created by the parent via clone3().

In the beginning, the parent starts opening and reading the XMP files (it is opening each XMP file six times, because why not, before switching to the next one). After this has gone on for a few files, the child starts reading the same XMPs twice, then overwriting them with the Darktable-processed metadata.

At the same time, the parent is continuing reading further files. Unfortunately, doing a full strace means that most of the calls are spread out with mutexes between the start and end, like this:

930663 11:41:37 openat(AT_FDCWD, ".../SAM_5322.SRW.xmp", O_RDONLY <unfinished ...>
930661 11:41:37 futex(0x5613024c5e58, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
930659 11:41:37 futex(0x5613024c5e58, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
930661 11:41:37 <... futex resumed>)    = -1 EAGAIN (Resource temporarily unavailable)
930663 11:41:37 <... openat resumed>)   = 20

That makes following what’s happening rather tiresome, and there is a bunch of unrelated PIDs doing more I/O, opening and closing their own FDs.

I’m open for suggestions on how to further debug this.

The filename scheme is SAM_1234.SRW.xmp as created by darktable, I’m passing the filenames as parameters when launching Darktable (but I’m very sure the same happens with “add to library”), and I’m using the internal NVMe running Debian experimental.

Can you try with add to library? If I recall correctly, the xmp is not used during copy and import. A new one is created, but I’ve not tested copy&import using the filename.raw.xmp scheme.

I can confirm that the problem exists when doing “add to library”

This is Fix race-condition corruption of XMP files during large import by ge0rg · Pull Request #19400 · darktable-org/darktable · GitHub now

1 Like