I’m rewriting the facial detection and recognition engine in digiKam. As I tune the AI models, I have to decide which is better: More false-positives (mistakes) but the AI detects more actual faces, or fewer false-positives (mistakes) but also missing some actual faces.
Of course, we would all like no mistakes and finding all the faces, but that’s not realistic right now.
More false positives. It’s super annoying to wade through blurs identified as faces but it’s worse to miss a face that will then be forever lost in the data heap.
I’d bias it a bit towards more false positives as well.
But it’s probably an intractable problem. In a family gathering, you want to label every face. But when my kids run through a crowd, I’m only interested in the main few faces.
Can there be a middle ground? If there are many faces in a picture, select relatively fewer to only pick the main ones; but if there are only a few, lower the threshold and find them all?
I find face detection pretty decent in general, though. It seems to me that it improved significantly in the last few years, which is highly appreciated!
Where I see more potential is the recognition part. It often mislabels people, partly in obvious and repeatable ways: In particular, it often seems to recognize faces correctly in the entire image, but then assign them to the wrong people. For instance, there might be a shot of my wife and kids, which it will identify correctly, but it’ll assign the kid to the wife and vice versa. Or it’ll assign three faces in an image to the same person.
It often plainly mislabels people as well, but that seems harder to fix than the former issues.
However, I want to stress again that I find the system in general to work well, and it has notably improved in the last years. So thank you for all the hard work spent on this topic!
Hello Bastian,
As you very astutely pointed out, there are 2 parts to “facial recognition” in any system. The first is facial detection, or to put it a different way “Is this a face?”. The second is facial recognition, or “who is this?” I’ll be rewriting both parts, but for the moment I’m working on the “Is this a face?” problem, but “Who is this?” problem also has much room for improvement. I’ll tackle that next week.
My goal is to make both parts configurable by the user so you can adjust it based on current needs. This should allow you to easily use one set of settings for family gatherings, and another for better identify main faces in a crowd.
Hello Michael,
the face recongnition with digikam has much improved over the years! Thanks for that. I am using it all the time and it is a useful tool. However, I need to go and check every single detection and make sure its correct which is cumbersome especially when a huge backlog of faces must be checked. So my recommendation is clear: fewer false-positives.
Thanks again for the great work.
More false-positives with emphasis on a good user interface that let’s the users quickly assign the wrong face to an already existing “person” in the database, or disregard it completely.
For example, I’m using Immich(highly recommended) and I find that it struggles with people wearing sunglasses or doing strange faces, but it allows me to merge faces, hide them, etc.
I concur with Bastian’s points, both to have a dynamic detection threshold based on the number of faces in the photo, and that the recognition engine seems to continue to make incorrect recognitions even after you have spent a lot of time telling it which recognitions were correct and not.
One thing I would like is the ability is when working with “Unconfirmed” recognitions to be able to select several images and take the same action for all the selected ones. Another one would be an “undo” - I often find myself having clicked too quickly and assigned the wrong person to a photo.
The facial recognition functionality is a great aspect of Digikam so a big thanks for working to improve it!
There is one thing that just occurred to me, which always bugs me when I tag faces in Digikam (and apologies if this has already been fixed, I’m not currently at my computer and can’t check):
On the happy path, I tag faces one by one, type a letter or two to select the correct name, perhaps an arrow key, then hit enter to lock in the name and move on to the next face. That’s cool.
But there’s also an unhappy path, where most faces are invalid or unknown. And then I have to abandon the keyboard and start clicking instead. I wish there were a keyboard shortcut to mark the current as unknown, or remove it entirely. Or perhaps a possibility to cycle through all available faces with the tab key? That would be fantastic!
I’d prefer more false-positives too. I have more than 10 years worth of photos that I still want to organize in digiKam and having the reassurance that faces weren’t missed will be worth the extra effort of fixing some mistakes.
And as @hatsnp has mentioned, a nice interface for fixing the mistakes will be very useful, especially if you’ll be able to do it all with keyboard shortcuts. Having to switch between your keyboard and mouse constantly for repetitive tasks really slows down your workflow.
Hi Mike,
thank you very much for the answer. That makes me excited about trying digiKam.
I assume, that the AI model is not trained with the faces i would manually draw, so that it can better automatically detect the faces in future pictures?
I assume, that the AI model is not trained with the faces i would manually draw, so that it can better automatically detect the faces in future pictures?
You are correct. We use 2 neural network models and 1 machine learning (ML) model in the face engine in digiKam. The first neural network detects faces in an image, and the second extracts data from the detected face. These two models are pre-trained. The ML model is untrained and learns patterns in the extracted data by the user tagging and confirming faces.
Both of the neural network models (YuNet and SFace) are very accurate. Although we are constrained by running the models on your computer vs a cloud service.
I’ve been able to optimize the results from the 2 neural networks even more, and I completely rewrote the ML model in the coming 8.6.0 release.
An idea to make the confirmation process of misclassified faces faster:
What is the actual output of the face recognition model? I assume it’s something like probabilities for all known persons? If yes, it should be possible to also show the second and third most probable person for a face in the UI, so they could be selected with a single click.
What is the actual output of the face recognition model?
Yes, the output from the face classifier is a probability ranging from 0 to 1, but it’s for the best match.
I like the idea, but returning multiple confidence scores would require a significant refactoring of the face classifier due to the ensemble method used. Additionally, it would slow down face recognition considerably.
I realize I am late in joining this topic, I hope it is still relevant. I would prefer having more false-positives, but not to many. You seem to have done a great job with it so far.
I do have a few suggestions based on options I found useful in other face detecting programs I used before I went Linux only. The first one involves photos with a large number of faces but only a few faces that I want to tag. Like my twin grandchildren’s class photo where I only want to tag 2 faces. After tagging the faces you require there was an option to ignore all of the unidentified faces in the photo. This was a great help in reducing the number of faces I had to look through just to eliminate them. The second option was a setting to reduce the number of faces presented for a person to reduce the false-positives if there was a lot of them. Both of these helped speed everything along when I needed to work on a large number of photos in a short time period.
Thanks for all the great work.
Gord