Look at the wonderful can of worms we’ve opened here!
Finding some way to automate the analysis is definitely going to be useful. Obviously the scope of that has increased significantly and requires more thinking.
Seeing “AUTOEXP” for all of the local stuff kinda seems strange to me - maybe all of the local adjustments stuff should have had its own category???
@XavAL definitely seems to think it’s problematic - although so far what seemed to be an orphaned entry was not actually orphaned. I think it is worth at least ATTEMPTING to do an orphaned-entry search both for now and the future, at least a partial semi-automated solution.
I can easily find strings outside of HISTORY_MSG_NNN that are orphaned in the source code
I should be able to find HISTORY_MSG_NNN orphans without too much trouble
Limitations will be:
If it exists in source but is commented out, I won’t flag it with any currently planned approaches. Most likely the good reviewing @heckflosse and @floessie have been doing will prevent most if not all occurrences of this from happening
If it exists in source, but is in a code path that is somehow dead/inaccessible, an automated tool won’t detect this. Again, hopefully mostly if not entirely prevented by our code reviewers.
To be fair, I think my explanation make you miss my point: I’m not worried at all about duplicated, orphaned or obsolete entries. I know how to deal with them. The point is that translators also work hard in RT, and their (our) work is not something done in a few minutes, whenever we are bored and don’t know what else to do.
By the way, the brand new Spanish translation was ready a few months ago, with the invaluable help of @paco.lores . We are just waiting a final (?) English version to re-check our translations.
And thus I think it’s a responsibility to try and reduce your effort required if possible.
Longterm that might be something like crowdin or weblate (weblate seems to be 100% FOSS and there’s already an instance hosted on the pixls infrastructure, so while my past familiarity is with crowdin, weblate may be a better longterm option, although translator input would be beneficial here on the future roadmap too!)
Short term, if I can parse those tables and find obvious orphans with a few hours of work, it seems like I should.
Dupes are going to be harder, obsolete entries that are still somehow present in the source code will be too. But obvious orphans seems to be low-hanging fruit to me.
Edit: Side note, timing is slightly bad this week due to the upcoming Easter holiday.
It might be useful to translators to use something like Andy’s script on different commit levels and run the outputs through diff to see what has changes since they last checked.
It turns out buried in tools, there’s already a script that does some of what I was intending to do - see @Thanatomanic 's comment on Removal of obsolete strings · Issue #6457 · Beep6581/RawTherapee · GitHub - but traversing the history events table to find its orphans is new, so after poking at the legacy script, I’ll likely be de-scoping mine to focus on events table maintenance.
@Thanatomanic also pointed out generateTranslationDiffs which I was already aware of - that appears to/should handle the task of flagging new untranslated strings for translators. The main issue is that it is dependent on default being in good shape, and reviewing that was one of the final tasks that M was going to do.
We already found a confirmed bug due to some cruft in default that should not be there.
I was thinking that myself (I mentioned it in one post) - can you maybe move some of the existing posts on the topic to a new thread so we don’t break the conversation flow?