PhotoFlow: new caching mechanism - TESTING NEEDED!

@afre @Silvio_Grosso I have added both a tooltip for the relight button, and a close button to the soft-proofing dialog. It is true that this was already mentioned in the past, but went over my head. Thanks for the reminder!

The fixes are committed to the stable branch only.

@paulmiller @gadolf @afre I have introduced two changes, now that it is more or less clear that the issue is in the multi-threading part:

  • I have made all tile caches persistent, which means that tiles are never dropped at the end of the processing
  • I have added further refcount consistency check at the beginning of the tile processing, to see if the wrong refcounts are there from the very beginning or are introduced during the tile processing.

Once more, your feedback will be very much appreciated, because on my system I can scroll like hell and nothing bad happens!!!

The new MacOS package should be ready in a short whileā€¦

You should make a scroll_like_hell branch just for kicks. It isnā€™t only about scrolling rapidly or scrolling lots but also leaving PF open for a long time.

Is that 1c461?

Its still crashing (but with a slightly different stack trace - there are a couple of extra vips_* calls in there):

Thread 19 Crashed:: worker
0   libsystem_kernel.dylib        	0x00007fff5989e2c6 __pthread_kill + 10
1   libsystem_pthread.dylib       	0x00007fff59959bf1 pthread_kill + 284
2   libsystem_c.dylib             	0x00007fff598086a6 abort + 127
3   libglib-2.0.0.dylib           	0x0000000108070688 g_assertion_message + 423
4   libglib-2.0.0.dylib           	0x00000001080706e6 g_assertion_message_expr + 94
5   photoflow                     	0x0000000105aef8fc phf_tile_cache_gen + 2172
6   libvips.42.dylib              	0x0000000107ca486d vips_region_prepare + 253
7   libvips.42.dylib              	0x0000000107bd8639 vips_embed_base_gen + 121
8   libvips.42.dylib              	0x0000000107ca486d vips_region_prepare + 253
9   photoflow                     	0x0000000105af6264 vips_layer_gen(_VipsRegion*, void*, void*, void*, int*) + 196

photoflow_log6.txt (13.9 KB)

This is fast, even at 1:1.
I had this crash after inserting a gradient layer early into the pixel pipe (I probably shouldnā€™t, but I did it)
debug2.txt (128.3 KB)

Just tried b9f2f:

It still crashes, but it took longer. I ran it with PHF_TILECACHE_DEBUG=1. The resulting log is 25MB!

photoflow_log7.txt.zip (1.5 MB)

@gadolf: there is nothing wrong with inserting a gradient layer anywhere in the layers stack, this should just work. However, the crash you observed does not seem to be related to tile caching, and would require running the program through gdb or valgrind in order to catch the exact place where crash occursā€¦

@paulmiller: I am happy to see that the crash is still there when using the verbose logging. Looking at the referencing/dereferencing history of the tile that generates the crash, I see the following:

phf_tile_ref: tile: 0x10f3c7338  ref_count: 1
phf_tile_ref: tile: 0x10f3c7338  ref_count: 2
phf_tile_ref: tile: 0x10f3c7338  ref_count: 3
phf_tile_unref: tile: 0x10f3c7338  ref_count: 3
phf_tile_unref: tile: 0x10f3c7338  ref_count: 2
phf_tile_unref: tile: 0x10f3c7338  ref_count: 1
phf_tile_ref: tile: 0x10f3c7338  ref_count: 1
phf_tile_ref: tile: 0x10f3c7338  ref_count: 2
phf_tile_unref: tile: 0x10f3c7338  ref_count: 2
phf_tile_ref: tile: 0x10f3c7338  ref_count: 3
phf_tile_unref: tile: 0x10f3c7338  ref_count: 1
phf_tile_cache_gen[1]: tile 0x10f3c7338 wrong ref_count 0, state=0, region=0x7facc62225a0, cache=0x7fac850ac970

In the last two lines on sees a jump in reference counting of 2 (from 3 to 1), which is not expected. However, I think we are in a good track to pin-down the issue.
If you donā€™t mind, I have prepared a new package with even more tile-related messages in PHF_TILECACHE_DEBUG=1 mode, hopefully it will let me figure out where exactly the mismatch is taking placeā€¦

Thanks!

I also got the impression that my recent changes made the preview even more responsive. Good to know that I am not the only one having this impression :wink:

Unfortunately, I havenā€™t been able to get 53faf to crash in PHF_TILECACHE_DEBUG mode yet. It still crashes if I turn that off. Maybe the extra printfs have disturbed the timing enough to fix the problem (does printf have an internal mutex?). Iā€™ll keep trying.

That is what I was afraid ofā€¦ IN the version you have tested, each printf was followed by a flush of stdout in order to make sure to see the latest messages on the terminal. Maybe this is slowing down the processing too much, so I have prepared a new version without the flushing. Given that the crashes are not segfaults but are due to failed asserts, it should make no difference in terms of what is printed to the terminal.

I would suggest to give this new version a try. Also, what are your CPU specs? How many cores?

Thanks!

2,6GHz i7, 6 cores (12 threads).

3f04b is not cooperating either: it will crash with PHF_TILECACHE_DEBUG off, but not with it on.

Edit:
I managed to get 3f04b to crash:
photoflow_log8.txt.zip (6.1 MB)

I have an i5 2.9GHz with 2 cores, so this might explain why my system is not fast enough to trigger the crash.

I now suspect that the issue is due to concurrent increase/decrease of the tile reference counters (which seems to be confirmed by your latest log).

I have now protected the phf_tile_ref() and phf_tile_unref() functions with a global mutex, and also decreased again the verbosity level with PHF_TILECACHE_DEBUG=1 (you can still get more verbose output with PHF_TILECACHE_DEBUG=2).

As usual, I am not getting any crashes on my systemā€¦

The new package is ready for download.
Thanks!

Iā€™ve been running 9596d all day (about 45mins total) with PHF_TILECACHE_DEBUG=1, and it hasnā€™t crashed yet. Looks like you may have fixed the refCounting race condition :+1:

There are a few problems that still need looking at:

  • Some layers donā€™t update the image when you change some settings (e.g. ā€˜Gamut Warningsā€™ on the ColourSpace conversion layer)
  • The crop tool gets confused if you try to edit a crop (related to the above I think - you get the crop box displayed over the cropped image and offset)
  • The image ā€˜zoom to fitā€™ is not the same colour as at 1:1 or when exported. The preview looks more saturated (more obvious with some images than others).
1 Like

I really hope so! It has been a real PITAā€¦
Did you also try without debugging messages?

I have just started to work on a proper fix for this. If you could help me catch the controls that do not update the image, it would be great. I have already taken note of the gamut warning and the crop tool

Could you provide me a good example? Might be due to the resizing algorithm that I am usingā€¦

Thanks!

Here is an example of the colour shift I mentioned:

The exported image:

The view in PhotoFlow:


Note that the vegetation in the foreground looks reddish.

The view in Photoflow at 1:1 scale:


Note that the colour appears to match the exported jpg.

Here are the source files:
DSC_1018rl.pfi (42.1 KB)
DSC_1018.NEF (14.4 MB)
These files are made available under a CC BY-NC-SA 4.0 Licence.

Hope this helps.

1 Like

Probably the color shift is due to the CA correction in the raw developer module not updated in the ā€œfit to screenā€ resize

@paulmiller @age
I confirm the color shift, but I rather suspect the demosaicing. At zoom levels below 1:1 a simple fast demosaicing is used. With the past caching mechanism, the cached image was computed at 1:1 and the re-scaled, while not the caching happens at the zoom level requested by the preview. I need to find a proper fix for this, that is force the demosaicing at 1:1 and re-scale on the fly.

I will post an update once this is doneā€¦

Not a crash, but a freezing.
Follow gdb output.


phf.txt (115.7 KB)
EDIT: The freeze happened when I tried to move the linear range slider to the left, after moving it successfully to the right.

As usual, I am not able to reproduce the freeze, but I am not very surprisedā€¦ those seem to be quite subtle bugs, and difficult to reproduce.

Your gdb output shows however that the freeze was actually a consequence of a segmentation fault. However, the log only shows the latest function call in the stack, not the full backtrace. Next time please run the following command at the gdb prompt, to get the backtrace from all active threads:

thread apply all bt full

This is better described in the bug reporting documentation of RawTherapee: How to write useful bug reports - RawPedia
Most of what is described there also applies to PhFā€¦

Thanks!

1 Like

I did it and got a new freeze, although I was able to do more work this time.

phf.txt (669.6 KB)