What is the right tool for indexing/cataloging specific file types?

I have an external HD with a lot of different files going back 15 years or so. The problem is, I’ve never organized it very well. I’d like to be able to find and keep track of all of the RAW files from my old Canon 5DmkII (.CR2 files) and be able to locate them by date, if nothing else. digiKam could probably do this, but it’s going to index everything, not just the RAW files. Also, I’ve already got dK pointed to a couple of different folders on that HD, and it doesn’t let me index the entire drive anyway.

How can I index only the .CR2 files on this drive, and keep them available for future reference without an agonizingly slow folder search each time? I’m running Win11. I’m looking for a FOSS solution, not necessarily one that’s featured on pixls.us. Thanks in advance for whatever help you can offer.

My methods may not work for you. (Or anyone!)

I have two indexes. One is text: one line per picture library directory, describing what is in that directory. So I can grep that file for keywords, which might be locations, events, cameras, and so on. I’m not diligent about updating this index, so I have a script that looks for directories that are not in the index, and adds those to the index with blank text. This takes a couple of seconds. When my conscience pricks me, I manually add the appropriate text.

The other index is a directory of thumbnails, about 10,000 images. Each is a JPEG, maximum dimension of 600 pixels. Windows explorer shows thumbnails of those thumbnails, 100 per screenful. My memory is mostly visual, and I can quickly scan those screens to find the image I want. A script populates this directory from my picture library, running overnight. Some day, I’ll improve the script so it records what picture library directories it has indexed, and only refreshes from directories it hasn’t already indexed.

1 Like

digikam will let you search through your collection by file extension. Would this work for you?

It doesn’t. I also have a few folders indexed on that same HD, and dK doesn’t let me also index the entire drive alongside those folders.

Over time I’ve ended up with all kinds of image files scattered all over my hard drive. At 84, I’m disinclined to learn about “indexing” and even less inclined to start re-arranging them into some semblance of order.

So I downloaded ‘XnView MP’ which has the best search engine on the planet and additionally lets me gradually add IPTC/XMP keywords when I get a round tuit.

The search engine can set to look in all sub-folders of the one selected and yes it can search only for *.CR2 files for example. In other words, starting at C:\, it could find and show thumbnails for every CR2 file on my HD.

1 Like

Thanks so much, I’ll check it out.

“Index” can mean wildly different things.
You mention “filename” and “date”.
Are those all the requirements you have for an index?
How would you want to work with such an index?
Textfile? Spreadsheet? Paper printout?

I would like to browse my .CR2 files by date, without waiting hours for File Explorer to find them all each time.

May I remind you about XnView MP?

I just searched a 26 GB folder for “.X3F” files. It took about 2 minutes to find 500 of such files. When I clicked “Browse” it took another couple minutes or so to show a catalog of thumbnails. In the catalog View menu there was “Sort by” with several kinds of date and clicking on any one gave an instant sort of the thumbnails.

Also, in the Search conditions, one can enter “between” dates, again for several kinds of date.

How does that compare to “waiting hours for File Explorer”?

I think if it had a real index by type, that search would have taken a fraction of a second (generating and loading the thumbnails excluded).

No doubt.

My comparison was a General Purpose Search Engine with File Explorer.

P.S. I know little to nothing about indexing, so I wonder if a General Purpose Search Engine can even index and keep indices for every possible query or combination thereof.

You can also have multiple digikam databases with entirely separate configs. I do that regularly for different projects. I’m on my work computer now, so I don’t have the details since that’s all on my personal desktop. But sometimes, just knowing it can be done is half the battle.

IIRC, there’s a digikam command line option to specify the config file, and it has all the paths in it. But I could be wrong.

Might find this interesting if looking for alternative approaches… I haven’t had a chance to fully go through it but I stumbled on to it on an interesting site…need to poke around it …

Looked like some other interesting material there as well…

https://pyimagesearch.com/category/tutorial/

Eggstreemly interesting, Todd!

Thanks for the links.