processing that sucks less?

msd · October 20, 2019, 6:46pm

I just compiled the vkdt but when I run it, I get segfault. I debugged it a bit and I think the reason is that the code enables some shaders (sparseResidency*) which seemingly are not available for my GPU (Geforce GTX 960M) in driver version 430.26.
The debug messages: output.txt (6.4 KB)

hanatos · October 20, 2019, 6:55pm

oh, thanks for diving into this. i’m not even sure i require sparse residency features, could try and switch them off (or at least output some more helpful messages, but what’s the fun in that). not sure a 960M is much fun to use either though. sounds like it might be same slow as my builtin intel. for such devices i should really start to do some basic optimisations.

hanatos · October 20, 2019, 7:03pm

fwiw it still works here switching off the sparseResidency features in qvk.c. the only one i’m a bit afraid of is the “Aliasing” one though, i’m heavily aliasing the memory of multiple vkImages.

msd · October 20, 2019, 7:30pm

My absolute pleasure. I am very interested in your approach to the next generation raw processing software and can’t wait to try it!

I see

Interesting! I get this error:

[ERR] graph does not contain suitable display node main!

More precisely, I disabled these:

.shaderResourceResidency = 0, 
.shaderResourceMinLod = 0,
.sparseResidencyBuffer = 0,
.sparseResidencyImage2D = 0,
.sparseResidencyImage3D = 0,
.sparseResidency2Samples = 0,
.sparseResidency4Samples = 0,
.sparseResidency8Samples = 0,
.sparseResidency16Samples = 0,
.sparseResidencyAliased = 0,

Enabling each of them leads to segfault.

hanatos · October 20, 2019, 8:49pm

progress! how are you running it? the gui does not require the -g parameter any more (it has instead a default-darkroom.cfg that is loaded for all images that don’t have their own history yet, which is currently all of them…). the command line interface (vkdt-cli) still takes the -g argument.

maybe you’re trying a .cfg that wasn’t updated with the code? i’m now looking for a display instance named “main” for the output in the cli, because a graph may have many outputs. in the gui, the “main” display is the center view and “hist” is the histogram window in the top right.

msd · October 21, 2019, 2:13pm

My bad! I was running with the vkdt-cli as:
./vkdt-cli -g examples/histogram.cfg -d all

I ran vkdt (without -g parameter) with a raw photo as input but again got a segfault. This time it happens in the line 1221 of pipe/graph.c when n=99 and c=1

I don’t get a segfault for a jpg input file but it only opens an empty (black) window.

hanatos · October 21, 2019, 4:09pm

hm that sounds wrong. 100 nodes in your graph? maybe some memory corruption happened before this? hard to remote debug… i’d maybe try to use “make sanitize” and rerun to see if memory gets corrupted before the crash, or look at the “-d all” output whether there are any unusual things going on.

asn · October 21, 2019, 4:55pm

And merge my PR

hanatos · October 21, 2019, 5:34pm

oh, indeed. still not used to these fancy web things…

msd · October 22, 2019, 7:02pm

llap module creates 89 nodes (there is a nested loop in its create_nodes()). Shouldn’t this happen?

Tried. It doesn’t seem to be a memory corruption issue

Nothing seems unusual in the log messages.

I couldn’t figure out what llap actually does. Is it possible to exclude it from the pipe? I just tried to do so in the cfg file (connected filmcurv to f2srgb and disconnected hist) but got a bunch of vulkan errors.

hanatos · October 23, 2019, 7:07am

llap: local laplacian pyramids (local contrast), notoriously expensive operation as a speed test.

it’s possible to exclude it, not sure what errors you got… you’ll need to connect filmcurv->f2srgb as you did, and make sure the llap is not connected to the output nodes in any form (the graph is executed by pulling dependencies from the displays).

re: 100 nodes: i allocate memory for 100 modules, so number 99 makes me suspicious. the nodes should be 300 though.

msd · October 23, 2019, 9:45pm

Thanks, it finally worked!
There was nothing wrong with the code. I just needed to clean the old build. The modules had been built with an older version of module.h for some reason (figured it out after gdb debugging).

There are still two GPU features (shaderResourceResidency and shaderResourceMinLod) in the required_features that are not supported by the 960M but the code works just fine without them.

hanatos · October 24, 2019, 9:53am

ah, good to hear. reminds me i need to work on the module api, and clearly define the interface module/core (right now it’s api.h but should include at least the struct definitions of module and graph). still unclear to me which core functionality should be exposed, and then probably as struct of function pointers to avoid re-linking.

until then probably make clean is a very good idea to do occasionally (full debug rebuild takes 1.5s here/optimised 5s).

hanatos · October 31, 2019, 12:59pm

quick update:

highlights

spent some time on trying to improve highlight reconstruction using the global image (now that it’s cheap to process it all)

not my picture. these are houz + professional firebrigade, don’t try this at home kids.

pipeline

we discussed input pixel formats and which types modules support etc in this thread. turns out the texture units can nicely normalise away this issue and just give you floats for whatever quantisation the input has. so now modules support “*” pixel formats, reading float16 or uint16 or whatever while using the very same code.

refactor

made build system support parallel builds of modules, renamed input and output modules to “i-*” and “o-*”. also tried to make module dependencies a bit more robust, but the module ABI is still not any better/robust.

asn · October 31, 2019, 4:34pm

Very nice, could you take a look at https://gitlab.freedesktop.org/mesa/mesa/issues/1987#note_279990

hanatos · November 5, 2019, 8:23am

just for the record, i don’t need these shaderResource* extensions and removed them in my code now. will push together with some more updates at some point.

but it’s my understanding that on your machine even without these two features in the requested list it still doesn’t work?

msd · November 6, 2019, 7:21pm

Thanks. Now everything works as expected.

anon41087856 · November 11, 2019, 11:42am

The real-life use I would see for a nodal editor in a photo soft would be the ability to perform advanced operations on the mask (assuming each module would get an input and an alpha mask input), without crowding the masking interface with lots of options (refine edges, drawn, parametric key framing, etc.). So one could pick-up masking options and stack them, and also wire several modules to the same mask.

And, also, it would give users a visual sense of what the pixelpipe is.

hanatos · November 11, 2019, 1:53pm

heh, yes. doing experiments on the way i’d really like to have a node editor already. it’s a bit of a trade off though. for instance i’m expanding one module to something like this in fine grained nodes:

which might be a bit cumbersome to work with.

also it’s very helpful that i can visually inspect temporary buffers in nvidia’s nsight gfx, so i don’t have the continuous need to attach a display node to various places while debugging.

in the long run i think i want a node editor as a side feature, probably not for everyday use.

hanatos · November 24, 2019, 6:58pm

just wanted to say hi and that i didn’t lose interest in GPU raw image processing.

learning more about the vulkan api every day. i introduced array type connectors now, to simplify the above graph a fair bit. turns out for my current purposes i can bind texture arrays to glsl shaders even without any extension (like nonuniformEXT()), so this still works on my 2014 intel GPU.

comparison: old node graph:

new graph for same modules, but using array types:

as a side effect this makes everything faster because the array can be worked on in parallel (better GPU utilisation especially for small tasks). also i’m optimising here and there (mostly for sports, but bringing down run times from 20ms to 7ms for the full raw is fun. also there’s a lot of potential, didn’t go all-in on the optimisations yet).

generalised the highlight reconstruction a bit, i might post a comparison to darktable here at some point for a few different scenarios.

tested a few more heavy-weight processing graphs, also with multiple raws as inputs and multiple jpg as outputs. starts to be clumsy on the intel (say 300-600ms/frame). ui responsiveness could be improved by using the same caching scheme as we do in dt (almost implemented but not used so far because on a proper GPU it seems wasted programming efforts).