Slow image import to a large library

fatman · January 5, 2021, 10:00pm

As I try to re-import my photo for renaming all my photo file to unique name. I found that if my library growing large, the import time increase significantly.

When the library have 183,000 photo, 128 photo required 1m53s to import.
So, I remove the library.db to test an empty library, the same 128 photo just need 4s to import.
For 10K photo import, it will take about 2~3 hours to complete.

I am running Darktable 3.4 on Windows 10, AMD 2700X, 32G ram. The CPU / DISK usage keeping low during import.

AxelG · January 6, 2021, 4:03am

Welcome to this forum

I have similar experience on Linux. Just my collection is 85k pics.

Unfortunately no solution to be seen at the horizon tho.

I could imagine, @johnny-bit tells us: “well, this is the nature of the beast”

johnny-bit · January 6, 2021, 11:14am

This should/could be made faster. It’s very nasty and unforgiving combination of database transactions (so every select won’t get microlock, just whole import session could be done with one lock), thread locking (so if you want to do anything during import lock should be per imported image and released ASAP), cache locking in threading…

it can be helped but it requires loads of work in many parts of darktable handling, including suff like signal blocking/ignoring, thread synchronization etc.

HansBull · January 6, 2021, 3:05pm

How does adding filename as a second column to images_film_id_index affect import speed on large collections? Could one with such a collection give it a test (perhaps using sqlitebrowser)?

fatman · January 6, 2021, 10:17pm

It is faster, after adding the field to the index images_film_id_index, the time of import the same 128 photo is 1m19s.

paka · January 6, 2021, 10:41pm

fwiw: with 185 images from cl as:
darktable --library :memory: /data/photos/group/*
takes approx 10 seconds
all raw and all with accompanying xmp files
openSUSE Tumbleweed
10 year old i7 970 36gb nvidia GTS 450
all files are on local system rotating rust

same start with library containing >189k image
approx 161 seconds

paka · January 6, 2021, 10:52pm

note: not 189k images, >389k images

paka · January 6, 2021, 11:04pm

during the soccer season I normally work sets of 400-600 images, and it becomes much easier/quicker to work and export the images, upload them to my server, then import the worked images and accompanying xmp files into the library and later deleting the rejected shots.

also doing the original work on ssd before moving to rust.

and sorry for I guess spamming/multi-posts, am not really familiar with the user-interface, but I will learn

johnny-bit · January 7, 2021, 7:55am

Cool! darktable db probably needs more indices… Can you make a PR for this change? plz

HansBull · January 7, 2021, 8:07am

@fatman Thank you, 40s shaved, but still too much.

@johnny-bit No experience with PRs, alas. There are indeed a couple of more indices to alter or add. Here are mine, but since it was done one secondary machine, it was never tested against a larger collection.

HansBull · January 7, 2021, 8:29am

The query in line 1410/1468 of image.c is another candidate. LIKE is bad. Could be solved by adding another column groupname to images containing only the filename without the extension, and then adding another appropriate index over all three columns.
Plus, collection.cline 1006: do wo really need LIKE in WHERE folder LIKE ... and why.

@fatman Do you get benefit from adding another index CREATE INDEX 'images_film_id_id_fn' ON 'images' ('film_id','id','filename');

fatman · January 7, 2021, 11:28am

I also don’t have experience with PR.

fatman · January 7, 2021, 11:38am

No extra benefit. Seem it either use the images_film_id_index with filename added or images_film_id_id_fn for query

What I test:
1:53 (Original)
1:19 (images_film_id_id_fn)
1:19 (images_film_id_index with filename added)
1:19 (images_film_id_id_fn & images_film_id_index with filename added)

1:11(tagged_images_position_index & images_film_id_id_fn & images_film_id_index with filename added)

I take a look on the image.c, seem SELECT MAX(position) FROM tagged_images is required for a new tagged_images row, so I added this index.
Without index, the SELECT MAX(position) take 180ms, After index added, it take 4ms, my tagged_images have 417K rows.

HansBull · January 7, 2021, 11:50am

Thank you. Mayebe someone will have mercy with us: [FR] log sqlite execution time in debug mode · Issue #7728 · darktable-org/darktable · GitHub

HansBull · January 9, 2021, 9:37pm

Do you use many tags?

fatman · January 11, 2021, 3:59pm

count(*) / count(distinct imgid) in tagged_images is 2.56908285621692
As Darktable has some internal tag auto inserted, I don’t think it is many.

HansBull · January 14, 2021, 8:15am

Do you compile darktable yourself?

fatman · January 16, 2021, 1:58pm

No, I am using official release windows installer.

HansBull · February 27, 2021, 9:32am

If I could bother you with another test, I’d like to see if this yields some speedup:

CREATE INDEX images_filename_index_nc ON images (film_id,group_id,filename COLLATE NOCASE);

fatman · April 28, 2021, 5:43pm

Sorry for the late reply, I haven’t visited the forum in a while
There is no different after the that index images_filename_index_nc added.

BTW, I gave up Darktable to catalog my photo library. I tried to import 180K photo to Digikam, it took 2 hours and 10 minutes (Darktable need several days), and the tag management is much better.