What's the best type of database nowadays?

djotaku · June 25, 2019, 12:22am

Hey Guys,

Things are very slow for me on the database front. Latest digikam update from Fedora (6.1.0) updated the database schema and has been going for hours just to get to 86% for finding new items. (Earlier today it would keep getting stuck at 83%). I’m using internal mysql and right now I have 2 mysqld processes currently running (any way to figure out which is related to digikam?) taking 8-ish% CPU. Earlier today I ran digikam and it kept getting stuck at 83%, it was complaining about ancient files (like at least a decade old) that it should have scanned in the past. I added about 214 Canon RAW files, but that’s never made things this slow in the past.

For a while now the program has been unresponsive whenever writing metadata to files - like for minutes - I usually have a book open while I’m adding metadata.

I switched to mysql because I have hundreds of thousands of photos and videos and I thought that was supposed to be the faster format. Currently database is on spinning HD, not SSD, but I don’t want to try to move it at the very least until after it’s done finding new items. Should i still be using MySQL or should it be SQlite?

CarVac · June 25, 2019, 1:03am

If it’s not a client-server architecture, SQLite is generally the best.

But anything that writes to disk frequently will have a very low operations-per-second limit on a hard drive.

pittendrigh · June 25, 2019, 1:35am

My reaction is an outlier. But I rely on the file system for storing finding and labeling image groupings. I’m a retired programmer who specialized in database stuff: Oracle, Mysql, DB2 and Postgres and even XML databases like SleepyCat and the open source Exist. Actually I did a lot with XML databases. So my database reluctance has noting to do with skills.

I don’t want my image processing life cycle to involve a database any more than I want to drive an 18 wheeler to the corner store for milk and beer.

I use the darkroom half of Darktable religiously. With the lighttable half of Darktable I only use the export button. There are sophisticated file system directory structure ways of storing images, that are a lot more accessible and easier to navigate than embedded database software.

garagecoder · June 25, 2019, 3:43am

I don’t use digikam, but I’m not certain that’s a database issue (even millions of rows is nothing for a database). If it’s scanning for images (as in a comparison) it makes me think something is slow at the file system side.

djotaku · June 25, 2019, 3:44am

Wanted to add that it’s now at 86% 3 hours later. I went into the mysql directory and mysql.err had lots of entries like:

2019-06-24 18:57:29 0 [Warning] InnoDB: Table mysql/innodb_index_stats has length mismatch in the column name table_name. Please run mysql_upgrade

as well as:
2019-06-24 20:01:12 10 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

Although that last entry is 3 hours ago.

The only two files in .mysql.digikam/db_data with a current timestamp are: ibdata1 and ib_logfile0. Everything else has a date from today, but an older time.

In
.mysql.digikam/db_data/digikam - the latest entries:

-rw-rw----. 1 ermesa 37748736 Jun 24 23:43 ImageMetadata.ibd
-rw-rw----. 1 ermesa 17825792 Jun 24 23:43 ImageComments.ibd
-rw-rw----. 1 ermesa 573440 Jun 24 23:43 Tags.ibd
-rw-rw----. 1 ermesa 75497472 Jun 24 23:43 ImageTags.ibd
-rw-rw----. 1 ermesa 104857600 Jun 24 23:43 Images.ibd
-rw-rw----. 1 ermesa 31457280 Jun 24 23:43 ImageInformation.ibd
-rw-rw----. 1 ermesa 629145600 Jun 24 23:43 ImageCopyright.ibd

Not sure if that helps with figuring out what went wrong, but…

djotaku · June 25, 2019, 3:50am

Meaning the file system the database resides on?

garagecoder · June 25, 2019, 3:52am

Well, either I suppose. But in light of your message about possible corruption that seems far less likely now! Missing log file is basically missing data…

Edit: but that doesn’t rule out file system corruption either

Looks like the first warning could just be about version problems (perhaps it happened before the upgrade was complete). The second error is a bit more worrying though; searches seem to suggest perhaps files were moved while it was running, or corruption. I hope you have backups!

paperdigits · June 25, 2019, 4:22am

Have you tried the AppImage?

pittendrigh · June 25, 2019, 3:21pm

…elaborating a bit, about darktable having its own database. I have terabytes of raw images. I do a lot of focus stacking so i have an order of magnitude more than most. One issue is I don’t want the slices in any database, only stacked images, prior to processing (stack RAWs into a tif, then edit that, then, finally, last step, export a jpeg). Having the tifs and jpegs in a database makes sense but not the raw slices.

Using a database for attaching keywords and rating values on a 0 - 10 scale is useful to Darktable but it’s useless and not available to other softwares.

If you use a well-thought-out file system way of storing and labeling images and/or image groups, then your efforts can be used by any and all software. Not just darktable.

garagecoder · June 25, 2019, 6:24pm

Why not start a post/guide about it? Seems like you had some interest already (and I suppose others use filesystem database)

djotaku · June 25, 2019, 7:06pm

More likely files being moved than corruption. It’s not perfect, but I’ve got btrfs RAID-1 with a scrub going to protect against corruption/bit rot.

Don’t know if Giles still hangs around here, but I’m still curious if I should go back to sqlite. At at any rate, in the long-term I should just move the database off of btrfs which is horrible for databases. It’s just that at this moment that would require throwing another drive in there.

pittendrigh · June 25, 2019, 8:21pm

I posted this once before.

/home/sandy/Camera/Birds/ is a main repository
/home/sandy/Camera/Insects might be another

/home/sandy/Camera/Birds/Costa_Rica/Savegre_Lodge/2011-02-16/raw1
might be a place to empty raw images off a card.

Inside raw1 might be the following:
raw1/DSC1234.NEF …perhaps this was a toucan photo

raw1/jpegs/DSC1234_Toucan-in-avocado-tree.jpg …now the jpeg can be matched to the raw it came from, usually one directory up

raw1/jpegs/web/up-DSC1234_Toucan-avocado-tree.jpg …700 pixel versions generated with imagemagick go in the web directory nested inside jpegs.

Perhaps I want to add some search keywords to the date directory below Savegre-Lodge:

cd /home/sandy/Camera/Birds/Costa_Rica/Savegre_Lodge/2011-02-16

touch overcast

touch dawn

===========
sudo updatedb

Lets say I know I have some toucan photos from Savegre-Lodge but I can’t remember the exact date, but I do remember is was overcast at dawn (touch keywords I frequently use are about time of day and weather conditions)

locate -i Costa | grep overcast | grep -i savegre | grep -i dawn | grep -i toucan

Boom. I’ve got it. Most important the relevant files are on the file system and so is the meta data, so I can now send the right files to gimp or to RawTherapee or to Darktable. The search process system isn’t useful ONLY to darktable.

And it doesn’t take three days to import 4 terabytes of raw files tifs and jpegs into darktable’s database.

I’m not saying my way is better than anything. It’s what works best for me.
I am an outlier.…I admit

tokyojerry · January 8, 2022, 5:00am

Regard your comment here, can you share thoughts on how to structure such a sophisticated file system using just a file explorer or similar and not go through the process of an app like Digikam or SQL database structures? I also am retired and from over the years I have accumulated about a half-million items collection of images and videos. I just want to be able to find something simply and quickly without having to go through databases like SQL or whatever. Also, I’ve never been a programmer so don’t know how to get involved with coding type details, nor do I want to at this stage in life.

I notice that Microsoft Windows 11 File Explorer accommodates some degree of metadata like for keywords, comments, etc., (Alt+Enter on a media item or group of items) but that of course has limitations like if I want to change the creation date for film era scanned image from the 1970s, or if I want to apply keywords or short description to video clips. Thanks.

I might add, I’ve been organizing data thus far in chronological sequence of year, 12 month sub folders, 31 days of further sub folders with file names that follow a naming convention of YYYY-MMDD @ HHMMSS-###. But then to subsequently find something that does not work. Usually when searching for media in order of more or less relative sequential important the 5 ‘W’ principle applies:
Who What Where Why When.
I’ve been making ‘W’ #5 the priority for organization when that is the least important, or at least less important…

rvietor · January 8, 2022, 9:36am

Leaving aside that that’s a post from over 2 years ago, @pittendrigh’s argument that he’s not useing an application-specfic database is a red herring.
updatedb does exactly what it says: it UPDATEs a DataBase; one that’s specific to the locate command. Also, his solution is for Linux and other unix-like OSes only (as far as I know) , command-line based, and rather manual:

the user has to update the database explicitly when the data changes
the command to interrogate the database is rather sophisticated, and he only gives an exemple of an AND search (OR and NOT are not all that difficult: e.g. grep -iE "savegre|dawn" but add extra complexicity to the command set to master).
There is nothing particularly sophisticated about his file structure, from what he posted.

Not really the kind of system I would like to work with (and I am using Linux, and I am familiar with the grep command and such). but if it works for him, fine.

Then again, if you don’t want to use such a system, nor a program like Digikam or an SQL database, you’re running out of options fast…

I’m using both Digikam and darktable, with keywords organised in trees, which follow more or less your 5 "W"s (one tree per “W”). Once mastered, the searching is fast and easy. And I cannot forget to update the database when adding information.

A few other point:

if ever you realise your tagging structure needs reorganising (e.g. add a level to a tree), that’s going to be fun with a system based on file-names…
how do you even create multiple tag trees with a directory-based system? Hard links are an option, but afaik not possible under MS-Windows, but how do you maintain such a system of links?

There are reasons databases are used…