vkdt devel diary

Maybe it’s more visible at fit to screen view.

Apparently, the lens module makes some photos blurry. Not every photo. I don’t know what it depends of.

The first screenshot is without the lens module, the second is with the lens module, I did not change any settings in the module.

Edit: I think this module needs to be rather late in the pixelpipe/graph.

the lens corrections should be before crop if you’re trying to correct actual lens distortions, so i think your position in the pipeline is good.

the lens module does some rather careless resampling. i should use the derivative of the distortion function to compute the kernel size/smoothness and potentially use nearest neighbour resampling for minification. who cares about aliasing for still photography, right?

the crop module has a special case when no rotation is used, it’ll just copy pixels over instead of applying a resampling kernel.

Shouldn’t vkdt be in the AUR?

hm i don’t know anything about arch, so i would be surprised if aur had vkdt.

i just merged a branch that implements support for evaluating gmic’s gaussian/poissonian resnet as gpu shaders. i’ve been working together with @David_Tschumperle on this. really he’s done all the work and i just ported it over to the gpu, which wouldn’t have been very easy without his support in debugging my broken implementation.

a few initial observations: my implementation is stupid:

  • it runs out of memory very quickly, does not attempt any tiling
  • it fetches everything from global memory and is thus very slow
  • it’s split into way too many small kernels

and currently it evaluates a 1080p image in ~300ms on a low end 1650GTX laptop version.

this is really just a starting point now, i need to do a faster implementation and then likely tweak the network architecture for real-time performance and temporal stability (video).

I was just going to ask which module is the most resource heavy, in order to make a “performance test” by putting lots of instances on top of each other and see when vkdt slows down. I just tried it with 5 instances of deconv, but this Nvidia 1660 Super still does not really slow down. It takes about 1/2 sec to update the preview when I move a slider, with 5 instances of deconv.

heh, yeah if you want a dead slow one that probably crashes because memory, try cnn, see instructions here: https://github.com/hanatos/vkdt/tree/master/src/pipe/modules/cnn

That’s fast! (compared to my CPU implementation :slight_smile: ).
Also, I’m thinking about retraining the network because I’m not completely happy with some of the result it renders for a few particular cases. This should not be an issue to change the weights once it’s retrained of course.

yes, and i’m expecting maybe it’ll make sense not only to change the architecture/retrain but also maybe implement something like your backpropagation on gpu as well, potentially even the whole solver.

so yes, not recommended to use in production as it is now. so far i didn’t push it to the obs packages either.

…and in other news, i just wired imgui’s gamepad navigation in vkdt:


edit: let’s try webm encoding instead

(sorry i had to squeeze the image into the top left corner of my screen so the x11grab device would allow me to capture this video fast enough).

this way i can do simple tasks from the sofa, without keyboard or mouse or table to support them. so far i think it’s useful for minor adjustments of parameters in an existing pipeline, and mostly for star rating/colour labeling of collections.

2 Likes

unacceptable. but i guess you asked for it, and it’s not really an actual use case.

OBS can build for Arch too.

Very cool!
I keep asking myself if a convertible with touchscreen + pen isn’t the way to go for image editing on the go / on the couch (without mouse and keyboard). I guess It’d feel quite natural when the interface is so responsive. Does this run on Intel XE graphics or a Ryzen Vega iGPU?

it’s just vulkan, so yes. but i didn’t try, and the weaker the gpu the less fun i suppose. though i thought @betazoid’s 1660 would be stretching it and apparently it’s workable.

about the gmic cnn for denoising:

i implemented a first version of a shmem/tiled megakernel instead of the layers in global memory with individual compute shader kernel calls.

runtimes for my 1080p test case go down to ~217ms with a 24x24 tile size, my laptop doesn’t run larger tiles than that. not sure i’m handling the tile overlap quite correctly, i hope i don’t (because i’m seeing tile artifacts even with 2px border). well at least this version does not run out of memory.

if you like apples and oranges, here’s a comparison ~200ms cnn vs ~4ms wavelets on this image (1080 pfm input, something wrong with highlights and noise profile in the wavelet case because not a raw image):
cnn:
cnn-crop
wav:
den-crop

the wavelets would clearly profit from some better treatment of edges vs smooth areas (there is special code that works better for one or the other, but codepath selection doesn’t work).

the network shows promise, is overall a little smooth and has strange ringing at edges. not sure this is worth the 50x cost. maybe a different arch (u-net, decimated memory) could be implemented faster.

How is it possible to change the font size of the gui? I guess it is necessary to add some text in config.rc?

it scales with the height of the window. why would you need to change it?

Apparently here it doesn’t, does it? But I guess it’s not so important.


hmm right it’s only read on first startup because i was to lazy to re-rasterise the font every time the window is resized.

wow, you filter by create date? does that work?