librtprocess - quo vadis

ggbutcher · January 26, 2019, 3:53pm

Fair enough, I was thinking of how to present librtprocess completely through librtprocess.h. It’s really a trade between how expressive can be the function signatures vice how much has to go in the readme…

ggbutcher · January 26, 2019, 3:56pm

BTW, I want to thank @heckflosse publicly for his patience in dealing with me through this little endeavor. He spent a late night doing it, at that, staying up 'till 2am his time to entertain my ruminations.

darix · January 26, 2019, 4:10pm

So imagine we found a bug in librtprocess, as a packager I would apply that bug fix to the librtprocess package and ship that to all the users. they restart their app. ALL apps using the library are fixed.

With a static library I have to ship the library (so an user can rebuild local apps against the fixed library) and every app using that library.

This makes things like golang/rust a lot of pain for distros. golang security fix means shipping many many packages.

You should see shared linking as the norm and static linking as the exception.

darix · January 26, 2019, 4:17pm

I was just wondering if we could use some of the code for opencl handling e.g. from darktable and move that into the library.

CarVac · January 26, 2019, 4:18pm

Could you explain how it works?

darix · January 26, 2019, 4:19pm

I am sure @houz or @hanatos are better suited for that

ggbutcher · January 26, 2019, 4:41pm

So, what I’d recommend is that this lib take on a slightly larger scope, one of a raw image library. A raw-specific data organization (float **) , and the relevant operations to that organization, like demosaic, camera_whitebalance, denoise; the sort of things you’d want to do prior to demosaicing an RGGB (or other pattern) array to an RGB-pixel array. Those operations have to be organized around the mosaic anyway, distinct from their post-demosaic RGB equivalents.

I envision ‘dcraw-nextgen’, wherein a rawspeed-delivered unsigned short mosaic array is converted to float **, then that array is sent through whatever raw operations one would contemplate in whatever order desired, then demosaiced to a ‘regular’ RGB array for saving.

FWIW…

CarVac · January 26, 2019, 5:27pm

Earlier on in the library’s life we had it install jaggedarray.h but decided against it when I modified Filmulator’s datatype to be accessed as a float**, thus making it compatible.

Do people think we should install jaggedarray in case someone just starting out with a new editor wants a compatible datatype?

hanatos · January 26, 2019, 5:38pm

i very much agree to the larger scope. fwiw i have friends in the aswf (https://www.aswf.io) who indicated they might be interested in supporting/using/contributing to an open source raw/colour library.

re: float** everything more complicated than a contiguous block of memory (i.e. float*) sounds suspiciously bloaty to me. of course there’ll need to be sidecar information about the format/strides/region of interest etc.

was that “how it works” question about opencl and you want me to say something about it? if so, at what level of detail?

CarVac · January 26, 2019, 5:50pm

At least in Filmulator’s implementation, it is in fact a contiguous block of memory but it’s accessed via float**.

As far as how it works, I guess I want to know about a) how does it detect the available capabilities and b) would we have amaze_cpu and amaze_gpu or one that automatically chooses GPU if it’s available?

heckflosse · January 26, 2019, 5:52pm

I’m not against using float *. The float ** can be generated in librtprocess.

ggbutcher · January 26, 2019, 6:00pm

If the library were to include other ops besides demosaics (oh, I did see CA_correct()) , one would benefit by doing float * → float** in a separate step, so as not to have to go back and forth to float * in a chain of ops.

float ** is beneficial to performance within the ops. To walk a float * image with respect to the row-column relationship requires computation of a position for every pixel access, where float ** just refers to pixel[y] and the pointers just go there, no x+y*w math required. A small burden, but they add up over the total work.

hanatos · January 26, 2019, 6:14pm

in dt, there’s quite a bit of code for detection and dynamic loading of libopencl. i think using compute shaders in vulkan most of that would go away (vulkan has run time loading of symbols anyways and it will fall back to cpu/intel gpu silently in the backend).

in dt we have process, process_sse42 and process_cl which are hand written to suit the capabilities of the device. i’m kindof hoping for a single code path in the future that would be transparently compiled down to suit our needs. hence my statement above that i’d like to play around more with vulkan. whether it would be fast enough even if it runs pretty much software emulated or on a poor gpu (when compared to the custom cpu code paths).

hanatos · January 26, 2019, 6:17pm

i challenge the performance improvement. and i’m sure you can beat pointer walking by more clever index computation. usually there would be a p++ (or p+=4) in the innermost loop, without the multiplication (also languages like compute shaders, ispc, or opencl would do most of that for you when multithreading).

heckflosse · January 26, 2019, 6:31pm

@hanatos

I agree about using pointer calculations not necessarily being slower than float** pointer access (which also needs place in cache).
But rewriting for example amaze code from float** to pointer calculations imho is very error prone and currently not an option. At least I don’t want to waste time on that.
On the other hand, keeping the interface simple (float* instead of float**) requires only some additional work inside librtprocess. That I am willing to do.

hanatos · January 26, 2019, 6:36pm

that all sounds very reasonable. thanks a lot!

ggbutcher · January 26, 2019, 6:38pm

But, float * requires a pointer calculation for a pixel at y,x. And, if I have it right, float ** would not, as pix[y] would directly refer to the location. Or, as is more frequent recently, am I missing something?

heckflosse · January 26, 2019, 7:07pm

In a processing dominated by float calculations the integer processing to calculate pointers most likely will be done without impact on performance because the integer unit runs independent of the float unit.

heckflosse · January 26, 2019, 10:44pm

though in the current first steps imho we should (at least I will) concentrate on making the current demosaicers and raw preprocessing steps (raw ca correction, maybe something else too) avaialble in librtprocess.

That sounds good.

ggbutcher · January 27, 2019, 12:28am

In the librtprocess branch of rawproc, there is now a fairly functional incorporation of librtprocess demosaics:

The build instructions aren’t updated yet, but ./configure --enable-librtprocess … checks for the installed library and enables the GUI selectors for the algorithms. --enable-demosaic has been removed; rawproc now includes by default a demosaic tool with the toy “half” algorithm.

Most of the librtprocess demosaic algorithms are now exposed in the demosaic tool, with the exception of xtransfast_demosaic() and lmmse_demosaic(). Also not yet exposed are GUI controls to enter algorithm-specific parameters, which are hard-coded for the time being.

In the code, one will find the librtprocess calls in src/gimage.cpp, in the gImage::ApplyDemosaic() method, about line 2191. Scroll past the half, half-resize, and color conditionals; there, I demonstrate three data mashalling schemes:

In-line malloc/free of float** arrays
JaggedArray declarations, using jaggedarray.h copied from librtprocess/include to /usr/include
RT_malloc() and RT_free() helper functions I wrote to do the memory management. Those functions can be found just prior to the gImage::ApplyDemosaic() method header/commentary.

I’m going to keep it in the librtprocess branch for a while, before merging it with master. I’m also working on porting the ahd_demosaic() routine from RawTherapee, about 80% complete. Edit: should point out, that work is in a fork of CarVac/librtprocess, so eventually I’ll, what, request a pull? push a patch? I’ve not done this before…