PhotoFlow: new caching mechanism - TESTING NEEDED!

Carmelo_DrRaw · September 19, 2019, 4:47pm

The image caching logic implemented so far in PhotoFlow was rather slow and unstable, and it was not possible to easily control the total amount of memory used by the cache buffers. It could hence induce heavy swapping in some cases (ping @gadolf on this) .

In the past couple of weeks I have been working hard to re-write the image caching code, and provide something which is more stable and has a tunable maximum memory footprint.

The new caching should provide faster processing of the preview images, while it might give longer export times with some images. At the moment the total amount of memory reserved for caching is approximately 1GB, but it will soon become user-configurable.

I have been doing tests of the new code for few days now, and hopefully I have spotted and fixed all the bugs related to the new caching logic, but… who knows?

So I need your help to test this new feature, report crashes or glitches, and provide me some feedback wether you prefer or not the new code!

Packages will be available for download from the continuous release page (look for the new-caching label).

Thanks a lot in advance!

gadolf · September 20, 2019, 12:30am

Thanks!

I ran it for a couple of minutes and it was way, way faster, but then it crashed.

I had already an ongoing edit which I resumed with this release.

How can I help you to identify the issue? Hard to reproduce what I did… Is there a debugging mode or something?

Carmelo_DrRaw · September 20, 2019, 8:16am

That’s good to hear! The speed-ups concern mostly the preview image, and therefore are more difficult to quantify…

I am not very much surprised, unfortunately. The changes I made were quite substantial, and I suspect they introduced some thread synchronization bugs that are difficult to chase.

Could you remind me what operating system are you using?
The best would be to use a version built in RelWithDebInfo mode, and run the program through valgrind, but that’s only possible on some systems.
For the Windows case I am looking into adding support for Dr. Mingw, but that’s not ready yet…
Meanwhile, it would be already helpful to provide me the pfi file that triggers the crash and a short description of what you have been doing…

Thanks!

gadolf · September 20, 2019, 10:20am

I should have talked about memory, not speed, actually. Regarding memory use, from my perspective, it seems the same behavior as your last simplified pipeline release, that already ported changes to memory use. In summary, no swapping and no memory fill up.

Ubuntu 18.04

paulmiller · September 20, 2019, 12:30pm

The speed up on the preview image is really nice - the Shadows/Highlights layer is now almost keeping up with slider adjustments, whereas before I was typing numbers and waiting.

Threading bugs are the worst to find. Have you tried the various ‘sanitiser’ modes in clang? The address sanitiser and the thread sanitiser are good for tracking this sort of stuff down, and they don’t slow the application down too much.

I have a crash which I thought was reproducable, but it has gone away now!
I was loading a .pfi file which contains a Richardson-Lucy sharpening layer, then zooming to 1:1 and scrolling. This provokes an assertion failure which terminates photoflow (except when it doesn’t).

Thread 21 Crashed:: worker
0   libsystem_kernel.dylib        	0x00007fff724b82c6 __pthread_kill + 10
1   libsystem_pthread.dylib       	0x00007fff72573bf1 pthread_kill + 284
2   libsystem_c.dylib             	0x00007fff724226a6 abort + 127
3   libglib-2.0.0.dylib           	0x000000010acfe688 g_assertion_message + 423
4   libglib-2.0.0.dylib           	0x000000010acfe6e6 g_assertion_message_expr + 94
5   photoflow                     	0x0000000108773a07 phf_tile_cache_gen + 2039
6   libvips.42.dylib              	0x000000010a92e86d vips_region_prepare + 253
7   libvips.42.dylib              	0x000000010a91a78f vips_image_write_gen + 31
8   libvips.42.dylib              	0x000000010a92e86d vips_region_prepare + 253
9   photoflow                     	0x0000000108780708 vips_gmic_gen(_VipsRegion*, void*, void*, void*, int*) + 216

Log output looks like this:

....
OpParBase::build_many_internal(): adding tilecache for output image #0, padding=0
phf_tile_cache_init
phf_block_cache_build(): out ref count: 0x7fd2d91e3e20->1
phf_tile_cache_build(): out ref count: 0x7fd2d91e3e20->1
OpParBase::build_many_internal(): added tilecache for output image #0, padding=0
ColorCorrectionPar::build(): 3.79  0  1
ColorCorrectionPar::build(): 1.12  -0.008  0.92
phf_block_cache_dispose(): called, cache: 0x7fd2b90035b0  cache->tiles: 0x7fd2e8b2a360
  cache hash table size: 96
phf_block_cache_dispose(): after phf_block_cache_drop_all
phf_block_cache_dispose(): called, cache: 0x7fd2eab85210  cache->tiles: 0x7fd2d88f2b60
  cache hash table size: 0
phf_block_cache_dispose(): after phf_block_cache_drop_all
phf_block_cache_dispose(): called, cache: 0x7fd2e9132d80  cache->tiles: 0x7fd2bb802580
  cache hash table size: 96
phf_block_cache_dispose(): after phf_block_cache_drop_all
phf_block_cache_dispose(): called, cache: 0x7fd2d88f4ba0  cache->tiles: 0x7fd2e8aa48c0
  cache hash table size: 96
phf_block_cache_dispose(): after phf_block_cache_drop_all
phf_block_cache_minimise() called
  cache hash table size: 0
phf_block_cache_minimise() called
  cache hash table size: 182
phf_block_cache_minimise() called
  cache hash table size: 0
phf_block_cache_minimise() called
  cache hash table size: 98
phf_block_cache_minimise() called
  cache hash table size: 98
phf_block_cache_minimise() called
  cache hash table size: 0
**
ERROR:/Users/travis/build/aferrero2707/PhotoFlow/src/vips/tilecache_pf.c:794:phf_tile_unref: assertion failed: (tile->ref_count > 0)
zsh: abort      /Volumes/PhotoFlow/photoflow.app/Contents/MacOS/photoflow

(this is on macOS 10.14.6, photoflow d2c11)

paulmiller · September 20, 2019, 1:06pm

Richardson-Lucy deconvolution looks really bad in the preview - this is probably unavoidable if it is working from a low-resolution version of the image.

Maybe you could add an option for ‘calculate this layer at full resolution’ or a full resolution cache layer or something.

I’m trying to reproduce the capture sharpening feature in RawTherapee 5.7dev by using:

Optical Corrections
RL Sharpen (0.8-1.0 sigma, 10-30 iterations)
Raw Developer (with lens distortion correction turned off, but CA correction on)

This seems to be quite effective.

I can just turn sharpening off if viewing at small sizes.

Carmelo_DrRaw · September 20, 2019, 7:19pm

This is exactly the kind of bug I have been chasing (and I hoped to have fixed)…

I have started to check the code with CLang’s address sanitizer. and it reports some problem with the RL deconvolution code (which is not related to the new caching logic). Could you check if you have crashes also without using RL sharpening?

Thanks!

Carmelo_DrRaw · September 20, 2019, 7:21pm

Would you be able to build the code from source?

gadolf · September 21, 2019, 11:19am

Yes. Is it all about cloning the right branch from git and then just following the steps described on github PF main page? I believe I should add some extra settings while compiling, which ones?

Carmelo_DrRaw · September 21, 2019, 6:57pm

Here is what you should do:

git clone https://github.com/aferrero2707/PhotoFlow.git --branch new-caching --single-branch
cd PhotoFlow/build
BUILD_TYPE=RelWithDebInfo ./build-all.sh

The output binary will be installed as RELWITHDEBINFO/bin/photoflow. Run it with

valgrind --tool=memcheck RELWITHDEBINFO/bin/photoflow >& /tmp/phf.log

Then provide me the log file corresponding to the crashes.

To compile photoflow you will need some development packages. Here is a probably incomplete list:

sudo apt-get install libexiv2-dev lensfun-dev gtkmm-dev pugixml-dev libtiff-dev libpng-dev libjpeg-dev

I am quite frustrated, because I cannot reproduce the crashes that you and @paulmiller are experiencing… let’s try to shed some light.

Thanks!

afre · September 21, 2019, 7:09pm

Have some time. Downloading it now. What would you like me to check?

Carmelo_DrRaw · September 21, 2019, 7:14pm

Just use it like you normally do, possibly using tools that require caching (local contrast, shadows/highlights, relight, etc…). If it crashes, provide me the terminal output so that I can possibly see the reason.

Thanks!!!

afre · September 21, 2019, 7:16pm

Okay, the thing is I only use raw developer. LOL I will check RL as reported above.

Carmelo_DrRaw · September 21, 2019, 7:19pm

Try also the shadows/highlights, it requires quite some caching and should stress-test the new code…

afre · September 21, 2019, 7:40pm

My laptop is definitely lower end and older than the other two testers’. At 1:1, shadows/highlights definitely makes the scrolling refresh much slower. I cranked up local contrast, which was incredibly slow on my machine before; now, changing the settings is much more responsive.

Okay, just as I was typing this message, PF finally crashed. I have a feeling it has to do with cache memory allocation; i.e., no issues until I scrolled more at 1:1.

Carmelo_DrRaw · September 21, 2019, 7:43pm

Would be able to provide the terminal output? I’d like to see if the crash is due to tile reference counting as in the other cases…

afre · September 21, 2019, 8:51pm

I did some scrolling – no crash – and then left PF running while doing something else on the laptop. Eventually, it crashed. Here is the log: log.txt (74.9 KB).

paulmiller · September 21, 2019, 8:51pm

I’ve tried with shadows/highlights - I got the same crash as before:

ERROR:/Users/travis/build/aferrero2707/PhotoFlow/src/vips/tilecache_pf.c:794:phf_tile_unref: assertion failed: (tile->ref_count > 0)
zsh: abort      /Volumes/PhotoFlow/photoflow.app/Contents/MacOS/photoflow

This was when scrolling at 1:1 scale. I’ve also seen the crash when changing the settings on shadows/highlights with the preview zoomed to fit the window.

Carmelo_DrRaw · September 21, 2019, 9:13pm

I have prepared a new version with more checks of the tile reference counting, this should give a more detailed location of the reference counting error… the MacOS package should be ready in less than one hour.

I am testing the code on MacOS as well, and so far I cannot trigger any crash when zooming, panning or moving sliders…

@afre the log you pasted corresponds to a clean program shutdown, not a crash… are you sure it is the good one?

afre · September 21, 2019, 9:22pm

Yes, that is what I read too… Maybe I did close it unintentionally. Anyway, my laptop stalled / crashed soon after, showing its age.

What I did was test RL, save the PFI, close PF, open PFI, zoom and scroll, open raw, close RL tab, test highlights/shadows, zoom and scroll THEN crash. (Perhaps, the cache didn’t clear between app or tab closures? Also, closing tab didn’t crash PF, which is new and good for me!)

Second time testing highlights/shadows with logging, I may have unintentionally closed the console but I am not sure if I did or it crashed by itself.