Processing on GPU (with tiling) -- but high CPU(!) time

@hannoschwalm , is the high CPU time here due to tiling-related operations on the CPU?

591.8997 [default_process_tiling_cl_ptp] [export] **** tiling module 'diffuse' for image with size 8288x5520 --> 8288x5520
591.8997 [default_process_tiling_cl_ptp] [export] (6x1) tiles with max dimensions 3640x5520, pinned=OFF, good 1592x3472 and overlap 1024
591.8998 [default_process_tiling_cl_ptp] [export] tile (0,0) size 3640x5520 at origin [0,0]
591.8998 [opencl memory] device 0: 321484800 bytes (306.6 MB) in use
...
592.5878 [opencl memory] device 0: 4888190976 bytes (4661.7 MB) in use
592.8815 [histogram] took 0.004 secs (0.008 CPU) scope draw
592.9508 [histogram] took 0.004 secs (0.008 CPU) scope draw
592.9784 [histogram] took 0.004 secs (0.008 CPU) scope draw
593.0067 [histogram] took 0.004 secs (0.008 CPU) scope draw
597.0167 [opencl memory] device 0: 4563197952 bytes (4351.8 MB) in use
...
597.2162 [opencl memory] device 0: 321484800 bytes (306.6 MB) in use
597.2174 [opencl memory] device 0: 0 bytes (0.0 MB) in use
597.2221 [default_process_tiling_cl_ptp] [export] tile (1,0) size 3640x5520 at origin [1592,0]
597.2221 [opencl memory] device 0: 321484800 bytes (306.6 MB) in use
...
602.4741 [opencl memory] device 0: 321484800 bytes (306.6 MB) in use
602.4754 [opencl memory] device 0: 0 bytes (0.0 MB) in use
602.4800 [default_process_tiling_cl_ptp] [export] tile (2,0) size 3640x5520 at origin [3184,0]
602.4800 [opencl memory] device 0: 321484800 bytes (306.6 MB) in use
...
607.6834 [opencl memory] device 0: 321484800 bytes (306.6 MB) in use
607.6845 [opencl memory] device 0: 0 bytes (0.0 MB) in use
607.6892 [default_process_tiling_cl_ptp] [export] tile (3,0) size 3512x5520 at origin [4776,0]
607.6892 [opencl memory] device 0: 310179840 bytes (295.8 MB) in use
...
612.6428 [opencl memory] device 0: 0 bytes (0.0 MB) in use
612.6475 [dev_pixelpipe] took 21.095 secs (18.691 CPU) [export] processed `diffuse' on GPU with tiling, blended on CPU

(A 40 MPx image with diffuse or sharpen used with the local contrast preset, with a 6 GB Nvidia card.)

FYI The tiling is happening in OpenCL.

Yes, I know that. Or do you mean the CPU is not involved at all (it does not divide the image into tiles, sending them off one by one, then merging them)?

I just wanted to be clear that each tile was done in the CL, but yes the CPU is also involved. It looks like each of the tiles seems to be taking ~5sec. -d perf might be easier to understand. 20s does not seem too bad in my opinion.

Nope . Don’t know exactly what the cpu is doing but for sure while required ptp mode we do some linear equations done iirc.

What’s ptp mode?

The log shows ptp tiling. While we tile we sometimes have to do more care about precise stitching the tiles together.

Whatever, I put it on my to-be-checked list.

Sorry, I didn’t mean to be annoying or demand that you check anything. I’m simply not familiar with the expression ptp tiling. It’s probably nothing I should concern myself with.

Not annoying at all! Just keeping a list for opencl stuff to be checked. So thanks for the question / input.