vkdt 6800u on Ubuntu 23 Profiling

Hi there,
first of all I’d like to say this such a nice project and everything builds so cleanly.
I went from a fresh ubuntu install to running in an evening and built the AMDVLK drivers along the way.
I rebuilt vkdt this evening to check and see if it was a package problem.

Looking at the debug logs, there seems to be quite a delay on my box:

[perf] [rawspeed] load /home/Pictures/DCIM/101_FUJI/only_raw/DSCF8016.RAF in 4034ms
[perf] upload source total: 55.704 ms
[perf] upload for raytrace: 0.338 ms
[mem] images : peak rss 3977.83 MB vmsize 4126.38 MB
[mem] buffers: peak rss 0 MB vmsize 0 MB
[mem] staging: peak rss 978.79 MB vmsize 978.79 MB
[perf] record cmd buffer: 598.918 ms
[perf] i-raw main : 19.872 ms
[perf] denoise noop : 19.328 ms
[perf] hilite half : 16.708 ms
[perf] hilite reduce : 37.899 ms
[perf] hilite reduce : 3.305 ms
[perf] hilite reduce : 0.848 ms
[perf] hilite reduce : 0.240 ms
[perf] hilite reduce : 0.070 ms
[perf] hilite reduce : 0.024 ms
[perf] hilite reduce : 0.008 ms
[perf] hilite reduce : 0.007 ms
[perf] hilite reduce : 0.007 ms
[perf] hilite reduce : 0.007 ms
[perf] hilite reduce : 0.005 ms
[perf] hilite reduce : 0.005 ms
[perf] hilite assemble: 0.002 ms
[perf] hilite assemble: 0.003 ms
[perf] hilite assemble: 0.003 ms
[perf] hilite assemble: 0.003 ms
[perf] hilite assemble: 0.003 ms
[perf] hilite assemble: 0.005 ms
[perf] hilite assemble: 0.013 ms
[perf] hilite assemble: 0.049 ms
[perf] hilite assemble: 0.279 ms
[perf] hilite assemble: 1.069 ms
[perf] hilite assemble: 4.204 ms
[perf] hilite assemble: 18.512 ms
[perf] hilite doub : 24.227 ms
[perf] sum hilite: 107.506 ms
[perf] demosaic down : 10.347 ms
[perf] demosaic gauss : 18.260 ms
[perf] demosaic splat : 29.503 ms
[perf] demosaic fix : 85.446 ms
[perf] sum demosaic: 143.557 ms
[perf] crop main : 62.358 ms
[perf] colour main : 56.485 ms
[perf] filmcurv main : 57.387 ms
[perf] llap curve : 126.798 ms
[perf] llap reduce : 115.105 ms
[perf] llap reduce : 28.915 ms
[perf] llap reduce : 7.265 ms
[perf] llap reduce : 1.823 ms
[perf] llap reduce : 0.461 ms
[perf] llap reduce : 0.118 ms
[perf] llap reduce : 0.022 ms
[perf] llap reduce : 0.003 ms
[perf] llap reduce : 0.003 ms
[perf] llap reduce : 0.002 ms
[perf] llap reduce : 0.001 ms
[perf] llap assemble: 0.008 ms
[perf] llap assemble: 0.007 ms
[perf] llap assemble: 0.007 ms
[perf] llap assemble: 0.006 ms
[perf] llap assemble: 0.016 ms
[perf] llap assemble: 0.052 ms
[perf] llap assemble: 0.225 ms
[perf] llap assemble: 0.870 ms
[perf] llap assemble: 3.379 ms
[perf] llap assemble: 13.376 ms
[perf] llap assemble: 57.890 ms
[perf] llap colour : 66.824 ms
[perf] sum llap: 423.175 ms
[perf] hist collect : 97.505 ms
[perf] hist map : 0.120 ms
[perf] total time: 987.521 ms

load itself is taking 4s! I checked and the disk benchmark shows I’m getting 6GB/s with 100MB chunks so I don’t think the SSD is the issue.

Is there an option to aggressively pre-cache?

What is limiting performance? I was planning to build a dedicated rig for vkdt, doing joint processing and culling. Is this load time more to do with the SOC or is it inherent in vkdt?

If I"m in the wrong place please re-direct me.

Thanks again for any input.

Perhaps a better overview of your system would be helpful. Do you have a discrete graphics card? How much RAM and vRAM, etc etc?

no discreet, just the SOC (680M 12 CU)
32Gb ram split 50:50 between gfx and applications

images are gfx100s compressed raws

vkdt needs certain vulkan feature for good performance, the integrated GPU might lack those. or the vulkan driver doesn’t support it yet.

Check vulkaninfo for shaderImageFloat32AtomicAdd

	shaderBufferFloat32Atomics   = true
	shaderBufferFloat32AtomicAdd = false
	shaderBufferFloat64Atomics   = true
	shaderBufferFloat64AtomicAdd = false
	shaderSharedFloat32Atomics   = true
	shaderSharedFloat32AtomicAdd = false
	shaderSharedFloat64Atomics   = true
	shaderSharedFloat64AtomicAdd = false
	shaderImageFloat32Atomics    = true
	shaderImageFloat32AtomicAdd  = false
	sparseImageFloat32Atomics    = true
	sparseImageFloat32AtomicAdd  = false

So the statement is that missing this feature is as indicated above would explain the performance?


the 4 seconds load is rawspeed/disk io/cpu time, i can’t do much about it. i think some compressed raw file formats are not super optimised (i have seen similar behaviour on some fuji RAF). there are different raw loading libraries which might do better on this particular file format. @cytrinox wanted to expose some c bindings if i remember correctly :wink:

from what i understand the 6800u is a low-power integrated mobile gpu comparable to something like a GTX970 a few generations back? i think you’ll probably not get much more perf out of this device. histogram collection is using the float atomics (which the device doesn’t support), but even the local laplacian (llap) is incredibly slow here (and doesn’t use atomics).

your images have quite high megapixel count? at least the staging memory at 1G seems to indicate that the images are large. while you seem to have enough video ram on the device to work with it, this is still a limiting factor (memory bandwidth, need to go through all these pixels for every module). to work with such images you probably want a discrete GPU with a bit more memory bandwidth (well, as much as you can get) and at least 8G ram.

when working with this GPU it’s probably a good idea to increase the LOD to work in low-res mode in the interactive gui session (~/.config/vkdt/config.rc set intgui/lod:2 or 3 or higher…).

1 Like