Help me organize my collection(s)

LarsPoulsen · March 15, 2025, 9:06pm

I am fairly new to digiKam. I have about 50 000 images on my hard drive (NAS), about two thirds of which are my wife’s, and the rest are mine taken over my lifetime (mostly since year 2000, when film processors started giving out PhotoCD discs.
On my NAS, most images are in a folder hierarchy with years at the top level, under that months, then groups taken a a specific location during that month.
A subtree has thumbnails and “postcard-sized” “web images” of each image.

In my first import, I made the mistake of importing everything as one collection. This means that when scanning for faces and locations, most images come in as 3 separate versions: The thumbnail, the postcard and the full image.

I have been pleased to see how well the face recognition works. If I start again from scratch, do I have to lose my tags database and all the faces already known by now? I guess the answer is yes.

Finally, because my collections are on a NAS (actually they are on my Linux system, from where my Windows desktop gets them via a SAMBA share), I have a choice of where to run digikam:

natively on my Linux system, switching my monitor (via a KVM switch) diretly to Linux
or ditto, projecting via TigerVNC to my Windows display
natively on Windows (at least I think digiKam has a Windows build)
in a Fedora Linux VM under Windows 10

Because my normal desktop is Windows (while keeping a terminal window or two open on Linux) my initial work has been on Linux-under-Windows. digiKam is fairly happy running like that.
I am guessing that the VNC solution will not be a happy one (too much pixel-recoding).

Your thoughts are invited.

Michmill · March 16, 2025, 2:13am

Hi Lars,
There’s lots to unpack here, and I’ll do my best.

I’m a little confused by your folder structure and what you are trying to accomplish with it. Can you give me a little more information?

I recommend you turn on .xmp sidecars, and configure digiKam to write all metadata to the .xmp sidecar. Then you can use Tools->Maintenance->Sync Metadata and Database to write all metadata from the database to the sidecars. You’ve now saved all the metadata and you can import it into digiKam by reversing the sync process to read from the sidecars.

digiKam will always run faster when the program and the storage are on the same computer. I would recommend either on your Linux system or Windows (yes, there is a windows version).

Cheers,
Mike

clinart · March 16, 2025, 4:53pm

Hi,

I’ve digikam and photos on the same computer but on 2 disks.
System and programs are on the main disk: 500 Gb SSD nvme
Photography storage is an internal disk: 4 Tb HDD
I didn’t notice any significant slowdown at the digiKam launch.

Edit: a NAS storage will be ever slower, due to network connection, in any case slower than USB3 or direct connection to the motherboard.

cedric · March 16, 2025, 5:45pm

Probably no help at all, but:

Long ago, I stopped trying to manage my image assets by means of the folder structure on my single hard drive.

Instead, I have been embedding IPTC metadata , keywords being especially useful.

I keep that activity separate from editing with a free app: XnView MP

XnView MP has the best search engine on the planet and, if so selected, can search my entire HD from the root down by various EXIF, IPTC and presents the results in a thumbnail catalog format.

Michmill · March 16, 2025, 5:54pm

Hi @cedric,
I completely agree. Good digital asset management is all about the metadata. Using a good tagging strategy combined with a powerful search feature makes the folders where the images reside almost irrelevent.

As a digiKam dev, I’m curious what features do you like in XnView MP that don’t exist in digiKam? Maybe we can add them.

One of our big pushes for 8.7.0 is a new search engine, so please let me know what we can do better.

Cheers,
Mike

cedric · March 16, 2025, 6:14pm

Sorry Mike, I am not actually a digiKam user, so can’t really help in that regard.

Here’s the search pages:

One can keep adding ‘conditions’ like this:

Hopefully you can see how deep and how specific one can be …

Michmill · March 16, 2025, 6:26pm

Hi @cedric,
Thanks for the feedback. This looks like most of the same things as the digiKam Advanced Search page. The one thing that’s missing are the negative search options like “is not” and “does not”.

This is great feedback for us to make the digiKam search experience better. Many thanks, sir!

Cheers,
Mike

LarsPoulsen · March 16, 2025, 7:18pm

Is IPTC basically the same as EXIF metadata, is it an extension, or is it an (incompatible) alternative?
I don’t see XnView as incompatible with digiKam - if they use the same embedded metadata in the image files.

LarsPoulsen · March 16, 2025, 7:27pm

Yes, I know they will run faster if everything is in one place. The files have to be on the Linux system, because that is my central file storage, and backups etc are organized around that. Also, I have my various tools for processing large batches of imports from the cameras (i.e. iPhones) and automatically rotating images taken in portrait/landscape mode there, as well as a simple set of webscripts that can display folders in blocks of 50-60 images locally and remotely. These tools are all written in PERL and perl-cgi.
But my everyday desktop computer is a Windows system from where I branch out to my Linux system as well as to the systems at my workplace. It has a 42" monitor (4K, but reduced to 2560x1440). So in the normal course of events, everything I need is on one big screen, including email, browser(s), VNC projections and ssh windows into other systems. So leaving that for long periods will be a little disorienting for a while.

LarsPoulsen · March 16, 2025, 7:36pm

I have not heard of sidecars before. I know about metadata in image files and metadata in SQL database files, and I am trying to understand how those two optimally play together.
If I do a fresh import from image folder trees, I will obviously need to abandon the SQL databases at that point, and re-establish them again after the fresh import, but the embedded metadata will survive.

Are the sidecars “global” or within each folder subtree?
If they are global and they were exported from a large collection, but the new import is all done in subtrees, will digiKam be okay with that (and ignore/delete data relating to image files that are no longer in the collections)? If they are local to each folder that contains images, it seems I will have the best transition.

LarsPoulsen · March 16, 2025, 9:04pm

> I’m a little confused by your folder structure and what you are trying to accomplish with it. Can you give me a little more information?

The folder structure is a way to break a collection of 50,000 images into more manageable chunks.

2001/
....
2023/
    2023-01/
    ...
    2023-12/
        2023-12-KatesHouse/
        2023-12-LiveOakUUC/
        2023-12-PoulsenHome/
2024/
Imports/
     Bird/
     Lars/
     SlideScans/
small/
    HEIC/
    Thumb/
    Medium/
....

If I started over, knowing what I know today and with access to the tools I have today, I would do things differently in many areas, but at this point, a restructuring of the file tree would be painful.

Imports/ are files recently uploaded and not yet slotted into the structure.

small/ contains
(1) HEIC files from the iPhones pre-jpg-conversion
(2) Thumbnails (192 pixels on the longest side) for web display
(3) Web “postcard size” (768 pixels on longest side) for web “slide shows” etc

In my original import I made the mistake of taking in the whole filesystem.

LarsPoulsen · March 16, 2025, 9:33pm

The iPhone inserts EXIF metadata in its image files.
Windows file manager displays and edits EXIF in File Explorer. Digikam reads and edits EXIF style tags in the image files. As do my web scripts. They may not use all the same names for the tags, but the Perl libraries does a decent job of translating equivalent metadata item names, and they use the same mechanism for embedding named data items in the file header. That’s what I mean by compatible. So my question is: Is IPTC using the same mechanism for reading and writing tags in the image file header?

rvietor · March 17, 2025, 11:28am

I assume your scripts use a library to read and write metadata?
If that library can handle IPTC (exiftool and exiv2 do), you shouldn’t see much of a difference between using EXIF or IPTC (or XMP tags). But there are differences in allowed lenghts and perhaps character sets.

Keep in mind that not all programs use all possible tags. That means a program may read all IPTC tags from an image, but it will not understand all of them. Those “not understood” tags will be ignored, and should not be changed by such a program or script.

LarsPoulsen · March 17, 2025, 5:47pm

Thank you!
My Perl scripts use the Image::ExifTool library to access tags. So it appears to be much the same thing for my limited point of view.
That is a great relief.

They may be defined by different groups, but they seem to have a very large overlap and those not in the overlap area seem not to get in each other’s way too badly.

Donatzsky · March 17, 2025, 10:18pm

Two different metadata standards: https://www.photometadata.org

A sidecar is any “extra” file that gets placed next to the main file. So filename.ext might have a filename.xmp or filename.ext.xmp with metadata that was added not by the camera (digiKam considers raw files read-only, and will not write to them by default) and/or darktable edit history. RawTherapee likewise uses a .pp3 sidecar for storing editing information.

rvietor · March 18, 2025, 7:51am

If you seriously start working with metadata and sidecars, you’ll have to get familiar with the notion of XML namespaces. Those allow grouping of metadata tags to avoid collisions (tags with the same name, but different semantics). Digikam sidecars are XMP files, which contain XML. Both EXIF and IPTC tags have their proper namespace in XML, but there are several others. Try looking at an XMP file, they are plain text.

The downside of namespaces is, that we ended up with multiple tags for the same concept (but in different namespaces, with possibly subtle differences in syntax). So you may want to be a bit “generous” in adding “overlapping” tags (unless or until you know what the programs you use read and write).

Digikam allows rather fine control over what’s read and written, esp. for sidecars (see the manual). If you want to add IPTC to raw files, you’ll have to use (XMP) sidecars…

LarsPoulsen · March 18, 2025, 5:05pm

Thank you.
I don’t use raw files. Apple’s .HEIC files seem to be composed of an original image (or a short video) with high dynamic range plus a set of edit instructions. Hence, the conversion to .jpg files (using the heif_convert program) may yield multiple .jpg files. I do save the HEIC files, but I think of them mostly as a second level of backup.

In the context of this topic, I do not think of myself as a photographer, but as the archivist of our family photos.