processing that sucks less?

Update - apparently a driver was missing. After I installed nvidia-vulkan-icd the output is this:

anna@anna-pc:~/vkdt/bin$ optirun ./vkdt -d qvk /media/anna/WINDOWS/Bilder/2019-07-20/P7200083.ORF
[ERR] module shared has no connectors!
[gui] vk extension required by SDL2:
[gui]   VK_KHR_surface
[gui]   VK_KHR_xlib_surface
Xlib:  extension "NV-GLX" missing on display ":0".
Xlib:  extension "NV-GLX" missing on display ":0".
[qvk] dev 0: GeForce MX250
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] dev 1: Intel(R) HD Graphics (Ice Lake 8x8 GT2)
[qvk] max number of allocations -1
[qvk] max image allocation size 16384 x 16384
[qvk] max uniform buffer range 134217728
[ERR] device does not support requested feature shaderStorageImageReadWithoutFormat, trying anyways
[ERR] device does not support requested feature shaderFloat64, trying anyways
[ERR] device does not support requested feature shaderInt64, trying anyways
[qvk] dev 2: GeForce MX250
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] dev 3: Intel(R) HD Graphics (Ice Lake 8x8 GT2)
[qvk] max number of allocations -1
[qvk] max image allocation size 16384 x 16384
[qvk] max uniform buffer range 134217728
[ERR] device does not support requested feature shaderStorageImageReadWithoutFormat, trying anyways
[ERR] device does not support requested feature shaderFloat64, trying anyways
[ERR] device does not support requested feature shaderInt64, trying anyways
[qvk] picked device 0
[qvk] num queue families: 3
[qvk] validation layer: terminator_CreateDevice: Failed in ICD libGLX_nvidia.so.0 vkCreateDevicecall
vkdt: qvk/qvk.c:95: VkBool32 vk_debug_callback(VkDebugUtilsMessageSeverityFlagBitsEXT, VkDebugUtilsMessageTypeFlagsEXT, const VkDebugUtilsMessengerCallbackDataEXT *, void *): Assertion `0' failed.

However, vkdt does not start. Maybe because the wrong device was chosen?

If I recall correctly, vanilla optirun doesn’t support Vulkan, you would have to use it with primus-vk

I have installed primus-vk

i don’t think that primus-vk is the right tool here. i suspect i’ll need to wire the vulkan setup code correctly and it’ll be just fine.

the part about NV-GLX missing in the second output (after installing a driver? maybe reboot/dkms would have been necessary?) does not sound great though.

Ok. So I just installed Lubuntu 19.10 on a pendrive, and the output is this - it is different from Debian testing. However, bumblebee is not installed here.

anna@anna-pc:~/vkdt/bin$ ./vkdt -d qvk /media/anna/WINDOWS/Bilder/2019-07-20/P7200083.ORF
[ERR] module shared has no connectors!
[gui] vk extension required by SDL2:
[gui]   VK_KHR_surface
[gui]   VK_KHR_xlib_surface
[qvk] dev 0: Intel(R) Iris(R) Plus Graphics (Ice Lake 8x8 GT2)
[qvk] max number of allocations -1
[qvk] max image allocation size 16384 x 16384
[qvk] max uniform buffer range 134217728
[ERR] device does not support requested feature shaderStorageImageReadWithoutFormat, trying anyways
[ERR] device does not support requested feature shaderFloat64, trying anyways
[ERR] device does not support requested feature shaderInt64, trying anyways
[qvk] dev 1: GeForce MX250
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] dev 2: Intel(R) Iris(R) Plus Graphics (Ice Lake 8x8 GT2)
[qvk] max number of allocations -1
[qvk] max image allocation size 16384 x 16384
[qvk] max uniform buffer range 134217728
[ERR] device does not support requested feature shaderStorageImageReadWithoutFormat, trying anyways
[ERR] device does not support requested feature shaderFloat64, trying anyways
[ERR] device does not support requested feature shaderInt64, trying anyways
[qvk] dev 3: GeForce MX250
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] picked device 0
[qvk] num queue families: 1
[qvk] num surface formats: 2
[qvk] available surface formats:
[qvk] B8G8R8A8_SRGB
[qvk] B8G8R8A8_UNORM
[qvk] colour space: 0
X Error of failed request:  BadDrawable (invalid Pixmap or Window parameter)
  Major opcode of failed request:  149 ()
  Minor opcode of failed request:  4
  Resource id in failed request:  0x2a0000canna@anna-pc:~/vkdt/bin$ ./vkdt -d qvk /media/anna/WINDOWS/Bilder/2019-07-20/P7200083.ORF
[ERR] module shared has no connectors!
[gui] vk extension required by SDL2:
[gui]   VK_KHR_surface
[gui]   VK_KHR_xlib_surface
[qvk] dev 0: Intel(R) Iris(R) Plus Graphics (Ice Lake 8x8 GT2)
[qvk] max number of allocations -1
[qvk] max image allocation size 16384 x 16384
[qvk] max uniform buffer range 134217728
[ERR] device does not support requested feature shaderStorageImageReadWithoutFormat, trying anyways
[ERR] device does not support requested feature shaderFloat64, trying anyways
[ERR] device does not support requested feature shaderInt64, trying anyways
[qvk] dev 1: GeForce MX250
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] dev 2: Intel(R) Iris(R) Plus Graphics (Ice Lake 8x8 GT2)
[qvk] max number of allocations -1
[qvk] max image allocation size 16384 x 16384
[qvk] max uniform buffer range 134217728
[ERR] device does not support requested feature shaderStorageImageReadWithoutFormat, trying anyways
[ERR] device does not support requested feature shaderFloat64, trying anyways
[ERR] device does not support requested feature shaderInt64, trying anyways
[qvk] dev 3: GeForce MX250
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] picked device 0
[qvk] num queue families: 1
[qvk] num surface formats: 2
[qvk] available surface formats:
[qvk] B8G8R8A8_SRGB
[qvk] B8G8R8A8_UNORM
[qvk] colour space: 0
X Error of failed request:  BadDrawable (invalid Pixmap or Window parameter)
  Major opcode of failed request:  149 ()
  Minor opcode of failed request:  4
  Resource id in failed request:  0x2a0000c
  Serial number of failed request:  225
  Current serial number in output stream:  231


  Serial number of failed request:  225
  Current serial number in output stream:  231

Without the Nvidia driver, vkdt starts.

I installed the following packages which installed other packages. All in all, more than 150 packages were installed. Now I am not sure if all of these packages are really absolutely necessary but with them I could definitely compile vkdt.

cmake, clang, libpugixml-dev, libvulkan-dev, vulkan-tools, vulkan-utils, clang-tools, glslang-dev, libsdl2-dev, libjpeg-dev, libxml2-dev, glslang-tools

cool thanks for posting this. you really needed pugixml and libxml2? fwiw these aren’t my dependencies, i blame rawspeed for this (also the cmake part). and it’s only the raw input dso that links against it. i am using sdl2 and vulkan, also i think you need clang at this point because gcc doesn’t agree with me what’s an integer constant and what isn’t (to their defense also the c standard doesn’t agree with me).

i just blindly pushed a small hack that checks the vendor ids of your GPUs before choosing a device. apparently its 0x8086 for intel (haha) and 0x10de for nvidia. if that works it should pick your nvidia device #3 automatically after pull/rebuild.

It seems to work. Anyway vkdt starts and appears to pick the Nvidia, at least according to the output. But I am not sure about the performance/speed. There is no subjective difference. As far as the performance counter is concerned, I am not sure if I can read the output correctly. If I type -d perf, I see a lot of numbers, among them something like total time, which is about 250 ms with Intel and 150 with the Nvidia. But neither loading one single file nor generating the thumbnails in Lighttable seems to be faster with Nvidia.
I do not use Bumblebee/optirun/primusrun here on Ubuntu but Nvidia-X-Sever-settings.
But there is no vulkan-nvidia-icd package or something similar in the Ubuntu repo. I only see a libnvidia-gl-435.

Two parts of loading a file are loading it from disc and decoding it. Both are expensive and do not gain from using GPU.

right. i interleave loading and processing in two threads, in an effort to at least be only limited by disk io/rawspeed decoding and not by processing at all. not sure i was very successful at this, need to trash all the kernel caches in between runs to get numbers.

cool, progress! probably 100ms out of these 150 are disk io, then the numbers start to make sense.

respect for fighting bumblebee/nvidia madness btw. i tried to get my double intel/nvidia laptop from 10years ago to work with some forward ported legacy driver… but completely failed to even start an x server at all without the nouveau driver (which, of course, does not support vulkan).

1 Like

The guys at Debienna said the same thing. I said that I did not find it so difficult to set up.
I think the situation has improved since then. The drivers are available and they are in the repo. But it is true that the computer crashes very often with the nouveau driver.
Well the only program that I actually need the Nvidia for so far is darktable.

Btw, vkdt still cannot use the Nvidia on Debian testing:

anna@anna-pc:~/vkdt/bin$ optirun ./vkdt /media/anna/WINDOWS/Bilder/2019-01-12/
Xlib:  extension "NV-GLX" missing on display ":0".
Xlib:  extension "NV-GLX" missing on display ":0".
vkdt: qvk/qvk.c:95: VkBool32 vk_debug_callback(VkDebugUtilsMessageSeverityFlagBitsEXT, VkDebugUtilsMessageTypeFlagsEXT, const VkDebugUtilsMessengerCallbackDataEXT *, void *): Assertion `0' failed.

But since it works on Ubuntu I guess it is a bug in the driver or something similar. It does choose the right device though:

anna@anna-pc:~/vkdt/bin$ optirun ./vkdt -d qvk /media/anna/WINDOWS/Bilder/2019-01-12/
[ERR] module shared has no connectors!
[gui] vk extension required by SDL2:
[gui]   VK_KHR_surface
[gui]   VK_KHR_xlib_surface
Xlib:  extension "NV-GLX" missing on display ":0".
Xlib:  extension "NV-GLX" missing on display ":0".
[qvk] dev 0: vendorid 0x10de
[qvk] dev 0: GeForce MX250
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] dev 1: vendorid 0x8086
[qvk] dev 1: Intel(R) HD Graphics (Ice Lake 8x8 GT2)
[qvk] max number of allocations -1
[qvk] max image allocation size 16384 x 16384
[qvk] max uniform buffer range 134217728
[ERR] device does not support requested feature shaderStorageImageReadWithoutFormat, trying anyways
[ERR] device does not support requested feature shaderFloat64, trying anyways
[ERR] device does not support requested feature shaderInt64, trying anyways
[qvk] dev 2: vendorid 0x10de
[qvk] dev 2: GeForce MX250
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] dev 3: vendorid 0x8086
[qvk] dev 3: Intel(R) HD Graphics (Ice Lake 8x8 GT2)
[qvk] max number of allocations -1
[qvk] max image allocation size 16384 x 16384
[qvk] max uniform buffer range 134217728
[ERR] device does not support requested feature shaderStorageImageReadWithoutFormat, trying anyways
[ERR] device does not support requested feature shaderFloat64, trying anyways
[ERR] device does not support requested feature shaderInt64, trying anyways
[qvk] picked device 2
[qvk] num queue families: 3
[qvk] validation layer: terminator_CreateDevice: Failed in ICD libGLX_nvidia.so.0 vkCreateDevicecall
vkdt: qvk/qvk.c:95: VkBool32 vk_debug_callback(VkDebugUtilsMessageSeverityFlagBitsEXT, VkDebugUtilsMessageTypeFlagsEXT, const VkDebugUtilsMessengerCallbackDataEXT *, void *): Assertion `0' failed.
1 Like

Same with a 1 years old laptop. Also I thought Bumblebee was not maintained anymore, and delivered degraded perf anyway.

Just read that Bumblebee does not support Vulkan in the Arch Wiki. I’ll have a closer look at this.

hm. let me know how this goes. you can also try to run a config through vkdt-cli, the command line interface. it doesn’t require the glx extension. i suppose the most terrible thing to happen is that i’d need to code up something that explicitly inits the swapchain/ui rendering part on one GPU and the processing on the other, copying the result before it is displayed. since i don’t have hardware for this it’s unlikely to work out though. also if i understand correctly this seems to be square the thing that primus-vk does in more generic ways.

Apparently you are right about primus-vk.

I was going to remove Bumblebee from Debian testing to check whether vkdt starts without it, but when I clicked on the package manager I saw the package primus-vk-nvidia, and I marked to remove it, but before I clicked on apply I thought I check which files it contains and there I saw /usr/bin/pvkrun. And I thought what if I try this:

anna@anna-pc:~/vkdt/bin$ pvkrun ./vkdt -d qvk /media/anna/WINDOWS/Bilder/2019-01-12//P1120441.ORF
[ERR] module shared has no connectors!
[gui] vk extension required by SDL2:
[gui]   VK_KHR_surface
[gui]   VK_KHR_xlib_surface
PrimusVK: Searching for display GPU:
PrimusVK: 0x1254bc0: 
PrimusVK: 0x1254e90: 
PrimusVK: Got integrated gpu!
PrimusVK: Device: Intel(R) HD Graphics (Ice Lake 8x8 GT2)
PrimusVK:   Type: 1
PrimusVK: Searching for render GPU:
PrimusVK: 0x1254bc0.
PrimusVK: Got discrete gpu!
PrimusVK: Device: GeForce MX250
PrimusVK:   Type: 2
[qvk] dev 0: vendorid 0x10de
[qvk] dev 0: GeForce MX250
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] picked device 0
[qvk] num queue families: 3
PrimusVK: fetching dispatch for 0x155b410
PrimusVK: Creating display device finished!: 0
PrimusVK: fetching dispatch for 0x14974d0
PrimusVK: CreateDevice done
[qvk] num surface formats: 2
[qvk] available surface formats:
[qvk] B8G8R8A8_SRGB
[qvk] B8G8R8A8_UNORM
[qvk] colour space: 0
PrimusVK: Application requested 3 images.
PrimusVK: Creating Swapchain for size: 1920x1080
PrimusVK: MinImageCount: 3
PrimusVK: fetching device for: 0x14974d0
PrimusVK: FamilyIndexCount: 0
PrimusVK: Dev: 0x155b410
PrimusVK: Swapchainfunc: 0x7f1053e01680
PrimusVK: >> Swapchain create done 0;0x1617ef0
PrimusVK: Image aquiring: 3
PrimusVK: Selected render mem: 9;7 display: 0
PrimusVK: Creating image: 1920x1080
PrimusVK: Creating image: 1920x1080
PrimusVK: Creating image: 1920x1080
PrimusVK: Creating image: 1920x1080
PrimusVK: Creating image: 1920x1080
PrimusVK: Creating image: 1920x1080
PrimusVK: Creating image: 1920x1080
PrimusVK: Creating image: 1920x1080
PrimusVK: Creating image: 1920x1080
PrimusVK: Creating a Swapchain thread.
PrimusVK: Count: 3
[perf] [thm] ran graph in   3ms
[ERR] individual config /media/anna/WINDOWS/Bilder/2019-01-12//P1120441.ORF.cfg not found, loading default!
[rawspeed] load /media/anna/WINDOWS/Bilder/2019-01-12//P1120441.ORF in 214ms

And so vkdt actually started. And obviously when started with pvkrun it only sees the Nvidia device so it can only use Nvidia. I also checked performance with -d perf and total time is about 150 ms, like on Ubuntu when Nvidia is used.

Apparently old darktable can be started with pvkrun as well.

wanted to share some new thoughts:

nvidia prime

i have access to a dual gpu laptop now (nvidia 1650 max-q/intel something). apparently optirun/bumblebee/etc are legacy and should not be used. the way to do it is to use nvidia prime (finally they seem to take responsibility for this piece of software themselves, so that may be the reason why it works now). a newish driver (440.96 + 5.4 kernel) works really well for me, including hybrid xorg with nvidia/intel as well as power management. battery life seems same on windows and linux.

vkdt will just choose the nvidia gpu during startup in hybrid mode, and also render pictures through xorg using this one. no further tricks/envvars/prime-run/nonsense required. i was really happy.

colour management

as mentioned previously, vkdt renders to 30-bit displays if your xorg is configured accordingly. also, i implemented full-window colour management for imgui, the display profile is now read from a small text file (or srgb is used if not found) and applied to the full window in a final fragment shader. this manages thumbnails and rendered images at the same time, consistently (well, for now i crossed some wires so the thumbnails are actually wrong, but the issue is elsewhere and i just need to fix it).

thumbnail creation

benchmarked thumbnail creation. i can create 34 bc1-compressed thumbnails for fuji x100t raf (33MB/img) in like 1.8s (give or take. there’s a lot of fluctuation in these numbers. take with a grain of salt). a directory with 202 canon cr2 (24MB/img) takes 22s. this is true for the GTX1650 laptop as well as for a RTX 2080ti desktop (both with nvme ssd). this averages out at ~50ms per thumbnail (raf), out of which ~20-30ms are spent in disk io/rawspeed on the cpu (though roman’s numbers seem to indicate it should be faster). the canon files average out at ~110-140ms/thumbnail, which is about the average rawspeed time here (cr2 decoding is single threaded, this is expected). the gpu transfers and compute are interleaved, so they are hidden by the cpu/disk latency (these would sum up to like 13ms on the GTX). i didn’t max out all possibilities (block compression during output is not parallel for instance).

This implies that you recommend Ubuntu, Arch or Opensuse since afaik Nvidia prime only works on those resp similar systems? Isn’t it closed source? I think bumblebee is still needed for Debian and maybe Fedora too.

I also have other questions/comments but so much for now.

i certainly don’t recommend ubuntu or such… fwiw i’m using it on debian. i needed a more recent driver than in apt (and yes it’s the closed source proprietary blob), so i made a mess with the .run directly from nvidia’s website, that’s all. also i have a custom xorg.conf, like so:

Section "ServerLayout"
	Identifier "layout"
	Option "AllowNVIDIAGPUScreens"
EndSection

Section "Device"
	Identifier "intel"
	Driver "modesetting"
EndSection

Section "Device"
	Identifier "nvidia"
	Driver "nvidia"
	BusID "PCI:2:0:0"
EndSection

Section "OutputClass"
	Identifier "intel"
	MatchDriver "i915"
	Driver "modesetting"
EndSection

Section "OutputClass"
	Identifier "nvidia"
	MatchDriver "nvidia-drm"
	Driver "nvidia"
EndSection`

and i made sure all nvidia* modules are in my initramfs (so they are loaded early on)

also i have some

$ cat /usr/local/bin/optimus.sh
#!/bin/sh
xrandr --setprovideroutputsource modesetting NVIDIA-G0
xrandr --auto

called from within my lightdm startup scripts (as recommended in the arch docs), but honestly i doubt it does anything. at least the setprovidersomething madness certainly goes wrong. the driver seems to automatically do the right thing without this. even hdmi output works. most of the work was to armwrestle debian into accepting the nvidia driver installation from the .run file.

vkdt decides by itself:

[gui] vk extension required by SDL2:
[gui]   VK_KHR_surface
[gui]   VK_KHR_xlib_surface
[qvk] dev 0: vendorid 0x8086
[qvk] dev 0: Intel(R) UHD Graphics (Comet Lake 3x8 GT2)
[qvk] max number of allocations -1
[qvk] max image allocation size 16384 x 16384
[qvk] max uniform buffer range 134217728
[ERR] device does not support requested feature shaderStorageImageReadWithoutFormat, trying anyways
[qvk] dev 1: vendorid 0x10de
[qvk] dev 1: GeForce GTX 1650 with Max-Q Design
[qvk] max number of allocations -1
[qvk] max image allocation size 32768 x 32768
[qvk] max uniform buffer range 65536
[qvk] picked device 1

and by magic driver/hybrid xorg it works, also displaying an image. and yes, i do get the tearing issues that people report with kernel 5.4. at this point i don’t care however.

Ok, thanks. This does give me a bit of a headache but obviously it is possible. Maybe I will try it a second time eventually.

gui

as we were talking about gui concepts, here are some more screenshots of the “gui in layers” (like an onion. you need to peel it a lot to get to the core):

surface level, just your custom favourite gui elements (from config file):

all parameters of all modules which are currently connected to the output image in some way (this is very similar to darktable’s right panel in darkroom mode):

pipeline configuration, “node editor”:

these images are straight “scrot” (these aren’t mockups, it’s functional) so they are colour managed for my display. may look dull on your screen, sorry.

3 Likes