Possibly a new deconvolution module for Darktable

anon41087856 · November 12, 2017, 11:19pm

No you really want to access the RAW data and apply it before anything else to get the best of this method. It’s more low-level signal processing than a filter.

I’m currently working on a Cython (hybrid Python/C) implementation, and I already reduced the computing time by a quarter by just optimizing the total variation regularization term calculation, so I’m confident that with a tiled/multithreaded C FFT implementation, we could divide the global execution time by at least 2.

The 12 Mpx image took me an hour on 1 process because running it on 3 triggered a memory error. Playing with lower-level layers of code allows me to reduce both the multithreading over-head and and the RAM footprint.

Magnade · November 12, 2017, 11:31pm

I wasn’t too worried about python, its load time is a bit painful for certian types of things but it can be fast
if optimized correctly. it is used very often during scientific reasearch after all

Magnade · November 12, 2017, 11:31pm

ah then yeah needs to be in the raw processor. does darktable do caching of the output of a filter if you modify things down the stack during editing?

anon41087856 · November 12, 2017, 11:33pm

Research is no engineering. Scientists try to make things possible, engineers try to make them usable.

Python is widely used because it’s fast to code, but a nightmare to optimize. So it’s more a prototyping language.

afre · November 12, 2017, 11:44pm

I disagree. Much of the research is multidisciplinary with partnerships, consultation and multi-talented individuals. Many scientists are themselves accomplished engineers and software devs. If the processing is this intensive, even with tiling and multi-threading, it would still be impractical. That is why researchers come up with fast approximations that build on existing research and efficient novel methods.

anon41087856 · November 12, 2017, 11:54pm

Yeeeahh, weeeell, from what I have seen, the guys let 255×255 px pictures run for 45-90 min. The result is amazing, but I won’t call that efficient.

heckflosse · November 13, 2017, 12:26am

Absolutely. See the original Amaze demosaic from Emil (Scientist) and the current implementation. Though the Amaze implementation fro Emil was usable too!

afre · November 13, 2017, 12:55am

Sorry for being off-topic. I just think that it is inappropriate to generalize what people can do. We ought to recognize people’s efforts for what they are. If people had the time, energy and interest to refine what they started, I am sure they would.

Back on topic. @anon41087856 What resources are you working with BTW?

anon41087856 · November 13, 2017, 1:08am

I see nothing inappropriate into acknowledging that scientists and engineers have different jobs and priorities. No offense intended, but that’s just the way it is. The scientist “sells” papers, the engineer sells products to ever-complaining clients. Some guys do both, but that’s a minority.

As for generalizing, it’s a funny thing to say since that’s the only thing you do in science. Otherwise every model would be a look-up table and every curve fitting would become inappropriate.

You mean the papers or the gear ? Both are referenced on the Github page of the project.

afre · November 13, 2017, 1:26am

Okay, I see it; just the CPU info: Intel® Core™ i7-2670QM CPU @ 2.20GHz. Would be nice to know the RAM, etc.

I won’t pursue the discussion on scientists v engineers anymore as it is not productive. There is too much elitism as is in both fields (in general). Also, there is no need to misrepresent what I meant by generalization.

anon41087856 · November 13, 2017, 1:34am

16 GB of RAM @1333 MHz and a SSD @ 440/460 MB/s measured R/W rate. Linux 4.13/Ubuntu 16.04. The laptop heat dissipation is regulated by Intel Thermald so the CPU get throttled from 85°C, which happens quickly.

agriggio · November 13, 2017, 6:01am

I think this is a wise decision. fwiw though, I’m fully with you.

houz · November 13, 2017, 9:55am

Yes.

anon41087856 · November 15, 2017, 3:16am

Some updates here, I have ported my code into Cython and optimized low-level routines and array operations.

I found out that 75 of the time spent in the computations are FFTs because the convolution is done at the Python level. Optimizing a C routine to do all the rfft/irfft and fft shifts should give a good speed up. In addition, about 13 of the time is spent creating/copying numpy intermediate arrays.

However, I get weird results out of my custom FFTW convolution product.

anon41087856 · November 15, 2017, 10:14am

Some newer results from a real motion + lens blur. The output could be tweaked more, as the blur is quite large, some more iterations would be required to get a perfect sharp image. The point here is we didn’t add weird artifacts.

This took 2h45 but the CPU reached only 17 % of use, so I suspect some I/O issues.

hanatos · November 15, 2017, 1:12pm

very encouraging results! your deconvolution looks a lot cleaner that what i’ve seen so far using off the shelf algorithms, so that’s great.

a question about your blur kernel. do you estimate that as constant over the whole image or is it locally adaptive? i’m wondering because in your images the quality of the result seems to vary a bit over the image plane.

how large are your kernels for the convolution btw? sure it’s faster to do that in fourier space than in the usual spatial domain at all?

anon41087856 · November 15, 2017, 2:12pm

Thanks !

The blur kernel is estimated for each RGB channel separatly and constant over the image. The major issue with variable kernels is to merge the image tiles properly afterwards and I didn’t dug into that for now.

My masked variant allows to compute the blur only on a a part of the picture, it’s faster and more reliable when several motion blurs are added.

In this image, the kernel size was 21 px.

sankos · November 24, 2017, 11:27pm

I’ve just watched your mini-lecture. Very informative, thank you very much, even with my rusty French and a general humanistic bias. Towards the end you suggest that for cameras without the AA filter the Sharpen module in darktable is not so good. I find that it gives me good results when I set it to Lab lightness blending (to avoid coloured haloes) or darken blending (to avoid light halos); and exclude extreme shadows and highlights from being sharpened by parametric masking. What do you think about that?

And kudos for the deconvolution work because it looks like that is the future of micro-contrast enhancement, but how come the RawTherapee implementation is so much faster than yours?

heckflosse · November 24, 2017, 11:40pm

these are completely different approaches.
RT deconvolution sharpening works on one channel (L from Lab) whereas the method from Aurélien (at least if I understood his comment correctly) works seperatly on the three RGB channels

anon41087856 · November 25, 2017, 7:36pm

Hi @sankos and @heckflosse !

I have tested your hack on the sharpening module, with the Lab blending, but I still find it too harsh for my taste on the D810 pictures. Maybe just a matter of taste, but anyway, the unsharp mask remains a quick and dirty way to make the local contrast pop up.

I have never used the RT deconvolution, but if the deconvolution is performed as @heckflosse suggests, only on the L channel, well you have one third of the computations I do. The benefit of performing the deconvolution on RGB channels is you often have a spatial phase change in the 3 channels since the light transmission and deviation inside a lens is wavelength-dependent. Especially, over F/11, you get more diffraction in the red (800 nm) than in the blue (400 nm). So, dealing with the color signals helps correcting that, but doing it on the L channel is definetely possible as an approximation.

Also, what I do is a blind deconvolution with regularization, and I believe RT performs a regular Richardson-Lucy. The regularization performs 8 gradients evaluations per iteration (on the 8 neighbouring pixels) and damp the pixels having an abnormal gradient, which helps avoiding noise amplification and ringing. The blind aspect means you don’t need to pass a blur profile to the algorithm (and possibly give a wrong one, which could only diverge after a certain number of iterations), the algorithm estimates the blur itself and refines it at every iterations. But that’s 2 additional heavy operations to perform. So that’s the cost of the accuracy.

And finally, my programm is not fully optimized, written in an interpreted language for prototyping purposes, and the RT version should be compiled, so much more efficient.