OK, here’s a quick update on the progress made on G’MIC development over the last few days.
I wanted to start optimizing the mathematical expression evaluator integrated into G’MIC. As you probably know, this expression evaluator has become more complex with each new release of G’MIC, and since it is now quite widely used in the code for the stdlib commands and filters of the G’MIC-Qt plug-in, I naturally thought that if we could save a few percent off the machine time spent evaluating mathematical expressions, it could really be a good thing!
So I focused on the “design principle” that seemed most likely to be optimized in the math evaluator: the way in which successive operations (e.g., arithmetic operations +, -, *, /, … or calls to built-in functions such as cos(), sin(), …) were called during evaluation.
It is important to know that this math evaluator operates in two phases: the first phase involves parsing and compiling the mathematical expression into a sequence of bytecodes, which actually describes the set of “atomic” operations to be performed in order to evaluate the expression. Running the interpreter of this bytecode sequence is precisely the second phase of the whole evaluation process.
TBH, I didn’t see many ways to significantly speed up the compilation phase. First, because the current parsing/compilation code seems relatively well done to me (apart from the fact it uses recursivity, but I don’t feel we can gain much from “unrolling” the process in a more sequential algorithm). And second, because this first compilation phase is always executed once when evaluating an expression, and the machine time savings that can be achieved should come more easily from interpreting the bytecode sequence, since it is often run many times (typically in the case of filling an image with a fill expression, such as fill “cos(x)*sin(u)”). To be precise, as many times as you have pixels in an image, so often thousands / millions of time ! Saving one or two milliseconds for each pixel would mean a lot of time saved in the end.
So I decided to try to optimize the interpretation of the expression bytecode once it had been compiled.
To do this, and to get some potentially interesting ideas, I even asked ChatGPT, which gave me some advice that I found interesting (and which actually corresponded to the idea I had in mind before asking): Try to optimize the “dispatch” of function calls.
When I evaluated the pre-compiled bytecode sequence, I was indeed using pointers to static functions (for example, the mp_add() function to add two values together), with the disadvantages that this entails: stacking arguments, jumping to functions (which cannot be inline in my case), etc.
So trying to avoid these indirections and jumps by avoiding the use of function pointers and instead trying to have a loop of inline function dispatch directly in my evaluation loop seemed like a pretty good idea ![]()
To cut a long story short: in the end, no, it was a really bad idea. ![]()
But the only way to get to this conclusion was to code it and make it work.
It took me a whole day’s work, because I had to change quite a few things in the mathematical expression evaluator to make it possible. Basically, I had to replace all uses of function pointers with integer identifiers (enum), and use these identifiers in a big switch(id) { ... } loop with these functions inlined during evaluation.
At this point, that’s when I found out that my mathematical expression evaluator is still quite comprehensive (it defines more than 350 built-in functions), which I realized a little too late, by the way.
Because trying to inline all these functions into a single large evaluation loop turned out to be a pretty bad idea: what we gain in function dispatch time, we lose because the bytecode evaluation loop we get no longer fits as well into the CPU cache. And after a good day’s work, I realize that I’m now down 25% in performance on my mathematical expression evaluator. Argh!
So finally, I hadto go back to the old bytecode evaluation model, which used function pointers, in a loop that was very small and could therefore make better use of the CPU cache, despite the additional time required to use function pointers.
The life of a developer also involves trying things that don’t work!
It could have ended there, with this big disappointment, but since I had already dug deep into the evaluator’s code, it seemed a shame not to take advantage of it while it was still fresh on my mind.
So I tried another optimization approach: searching for common patterns in the generated bytecode in order to define functions that perform several calculations at the same time corresponding to these patterns, but which only need to be called once.
Let me explain shortly: If you have an expression of the type A + B + C, where A, B, and C can be anything, (e.g. expressions around parentheses themselves), then instead of performing two successive additions temp = B + C then A + temp, you can define a single function add_add that performs two additions, and call it only once, with three arguments A, B, and C.
And if you do this for all cases of two successive operations, including the most common operators (+,-,*,/), you end up with 2^4 = 16 possible functions that calculate double operations.
And it turns out that these patterns occur very frequently. Think about all the expressions containing things like a*b + c, a*b*c, a - b/c, etc. So at the end, we gain a small advantage by calling a single function instead of two successive operations.
Even better, if in these double operations we are lucky enough to detect two constants, such as in the expression 3*cos(u)*2, it is easy to simplify this expression to 6*cos(u), and replace two simple operations with a single simple operation (in this case, two multiplications with a single one).
Well, that’s what I’ve been doing these past few days on the CImg/G’MIC code, and what I wanted to talk to you about.
It’s not super sexy, it’s not super impressive to sell, but it’s still a lot of work. And it made me realize that this mathematical expression evaluator in G’MIC was already quite good ![]()
The gain with these new optimisations is still pretty minimal “on average,” except when the expressions we’re trying to evaluate have those typical patterns that are now optimized.
Today I released binaries for version 3.7.0_pre, with these improvements to the mathematical expression evaluator. If you want to test it and report any bugs, I’m all ears ![]()
Cheers,
David.