hi
frustrated with heavy dependencies and slow libraries, i’ve been experimenting with some game technology to render raw image pipelines. in particular, i’m using SDL2 and vulkan. to spur some discussion, here is a random collection of bits you may find interesting or not.
also please note this is just a rough prototype bashed together with very little care and lots of hardcoded things just to demonstrate what’s overall possible or not.
in case the video doesn’t play, here’s a still:
(thanks to andreas for the raw, i stole it from play raw here)
brute force processing the full raw (well, half size because i don’t have any demosaicing), this runs at vsync/has tearing because in fact there is no vsync and it’s too fast. these are some performance counters from a GTX 1080 (intel HD 5500 is 100x slower), but i don’t trust the numbers:
[pipe] query demosaic: 0.0031 ms
[pipe] query exposure: 0.002 ms
[pipe] query filmcurv: 0.0031 ms
in any case it seems clear that this is just the time it takes to carry the image through the compute shader pipeline, the gpu is completely unimpressed by the actual compute done.
so far this is implemented as a generic node graph, which can output dot files like this:
and every module is defined by a couple of text files, namely defining connectors:
input:read:rgb:f32
output:write:rgb:f32
and module parameters with annotations for gui generation:
x0:float:1:0.0:0.0:1.0
x1:float:1:0.2:0.0:1.0
x2:float:1:0.8:0.0:1.0
x3:float:1:1.0:0.0:1.0
y0:float:1:0.0:0.0:1.0
y1:float:1:0.2:0.0:1.0
y2:float:1:0.8:0.0:1.0
y3:float:1:1.0:0.0:1.0
and a compute shader which is then automatically compiled into a vulkan command buffer, one compute pipeline per node. the gui is immediate mode and uses dear imgui for the slider widgets. in fact the image is drawn this way, too, so the output of the compute shaders never leaves the gpu. if you drag a slider the raw stays on the device and only the rest of the pipeline is executed and the result displayed. added benefit: 30-bit/pixel setups should be straight forward to support.
the setup we’ve been looking at above comes from this config file:
module:rawinput:01
module:demosaic:01
module:exposure:01
module:filmcurv:01
module:display:01
connect:rawinput:01:output:demosaic:01:input
connect:demosaic:01:output:exposure:01:input
connect:exposure:01:output:filmcurv:01:input
connect:filmcurv:01:output:display:01:input
param:exposure:01:exposure:2.0
param:filmcurv:01:y2:0.8
in the video i’m using a fake demosaic module, exposure, and a fake filmic curve remotely similar to what aurelien has done for dt.
to give you an idea how advanced (or not) this is, here’s some screenshot from debugging the parametric curve (monotone hermite spline) with python:
i really like the performance i can get out of this, and i also like how there’s only one code path (glsl shaders) as opposed to three (i386, sse, opencl). seems these 2D image processing things map extremely well to GPU shaders, even on my 5yo intel laptop. i mean this in contrast to how well our opencl code path would have worked on this device. i’m also quite happy to get rid of a ton of dependencies on the way.
let me know your thoughts, i’d be interested in anything related to new and faster pipeline/ui.
i’d attach my fake-filmic glsl code just so you could have an impression what it would be like to write iop in such a framework, but it seems shader code is not among the allowed file types. let me know if you’re interested and i’ll paste or so.