Exporting/importing photos in full 32-bit representation (to train a neural network for image denoising)

No sharpened. The issue was colour space.

How many series of test sets will be enough? And 1 file for each ISO?

This is from my 7D. The software removed CA also.



I have no answer for how many test sets will have enough influence unfortunately.

My initial experiments were satisfying with a total of about 35 sets but you should need much less than that to fine-tune with the existing data. Maybe 5? (I’ve also recently added 3 full-frame images with a Z6 that haven’t been part of a training yet, they would probably be helpful too.)

1 file per ISO indeed. You don’t need every ISO for every set, maybe >= 5 different values. Different low-ISO values are important too so that the network learns that it should leave clean parts mostly intact.

Is this a noisy-denoised pair or different ISO settings?

Noisey-denoised pair. ISO 800.

  1. Noisy picture
  2. NR in darktable (the best I could)
  3. Neural network

9 seconds with RTX 2080 8GB at 85% activity and half memory. My second card, GTX 1060 3GB was not used.


I like it :slight_smile: It seems to be doing a good job at keeping the scratches and dents while smoothing what needs to be. It hasn’t properly learned this shade of blue-gray though.

I will see what I can do during the week to take some pictures.

1 Like

Not sure but there may be some info in these threads you find useful as how you set the profiles and what you use for your histogram profile could affect the values you see output from the pipeline and use for your visual reference when evaluating your histograms…

This was a big discussion here also

1 Like

For anyone who wants to compare my first result with 7D

Is it possible to use two GPU? Tried --cuda_device 0,1 without success.

No, it’s a PyTorch feature but I’ve never implemented it. (Only one Nvidia GPU on hand.)

1 Like

thanks for making your data available! i think your results look really excellent. to make this practical i think it needs to be a lot faster though. do you think the process could be sped up?

it seems the u-net architecture you’re using isn’t all that different from a few scales of wavelet transform in terms of compute. you did replace the 3x3 by 5x5 convolutions though if i understand that correctly.

does the network really require all the 1024 feature channels in the lowest resolution? i mean these guys did it in under 100ms for a 720p image (2017) and use only ~100 feature channels (https://research.nvidia.com/sites/default/files/publications/dnn_denoise_author.pdf). does the quality decrease much?

would it make sense to try and train intel’s denoiser with your data, for increased portability? (https://www.openimagedenoise.org/)

your input is linear demosaiced rgb, black point already subtracted, right? wouldn’t it be easier for a denoising algorithm to work on data before clipping at the black point? i mean originally black noise has expectation at zero, but after clipping away the negative values the expectation is > 0 and you’ll have to guess that this was probably in fact still black…


Everything turned off in darktable. Would it be possible to train on this type of file and possible to demosaic an exported tif file after noise reduction?
_MG_3385.zip (21.5 MB)

I have been playing around with your softw
are for a week now and I really like the results. ISO 1600 below

I have taken some raw files with my 6D and tried to train the network, but I got a lot of .pt files instead of one .pth like the one you have in the model folder. I used:
python3 nn_train.py --time_limit 14400 --g_network UNet --weight_SSIM 1 --batch_size 60 --train_data datasets/train/NIND_128_96
Another question about colour space. I denoised a picture with AdobeRGB and the output file had sRGB. User error from my side?

Raw files from my 6D CC0 https://drive.google.com/file/d/1lyYotdRbRbSDTfq6hVNB7rawFisqC7lm/view?usp=sharing

Edit: Worked with python3 run_nn.py --time_limit 120 --batch_size 70 --train_data datasets/train/NIND_128_96 --lr 3e-4


I haven’t had the time to explore this but it looks fun. :slight_smile: I share @hanatos’ thoughts.

Apologies for the delayed response, I wasn’t able to catch up on life for a while.


The U-Net architecture is the first architecture I found that gives such good performances. It’s in no way the most efficient one that will give these results, and it’s actually pretty wasteful in that a big chunk of the borders are thrown away.
Before that I tried DnCNN (just a stack of layers, good performance) and Red-Net (similar but with skip connections, better performance). All the experimentation I did then was based purely on rate-distortion and there is a lot of room for reduced complexity.
U-Net was designed for image segmentation and there are newer lighter segmentation networks that could be tried. It’s also worth training a very light and simple network that sacrifices rate-distortion for lowered complexity. Moreover any network can be trimmed / prunned and quantized to lower complexity a bit before getting into a final product.
Do you have a target runtime and/or complexity in mind?

I’m not sure what the black point subtraction involves :s Is that something I would change in processing with the exposure module (black level correction), “raw black/white point”, or … ? I’m open to reprocessing the data so that a trained model can target a more appropriate place in the pipeline. Only requirement I think is demosaic so that it can generalize to different sensors (more on that right below)

@hanatos and @Peter

The network can’t be trained with raw data directly, or it could but with much greater effort and less generalization. The tooling would have to be adapted. The align_image_stack tool expects demosaiced images so it would have to be adapted to take something like a numpy array and output that same format (because virtually nothing can edit and modify a raw file), same any preprocessing that’s currently done in darktable (eg normalize the exposure between shots, though that shouldn’t be too difficult to achieve). My biggest concern is with the specificities of every raw sensor and associated filetype. Most of the training data comes from an X-Trans (Fujifilm) sensor and while the trained model generalizes well to other APS-C sensors when demosaiced, it would be nearly impossible to share the training data with raw denoising. I believe that even the bayer RGGB data can take different orders which may make things incompatible. It would most likely be more appropriate to change the training data to have minimal processing so that denoising is performed very early in the pipeline (as it is now in dt) rather than on nearly fully processed images (as it is now provided). My only concern is that dark areas would then have very similarly low values (whereas processing boosts dark areas significantly), this would make it so that the network is not penalized much when it destroys a dark patch (eg using mean square error) whereas those dark patches are exactly what we want to denoise best in order to recover details. I guess different loss functions might mitigate that, it needs some research and experimentation imo.


Thank you for sharing more training data and the pretty results you’ve experimented with! I will integrate the Bark with the training data (on Wikimedia Commons) before I train the next model.

The network should work with either pt or pth files. I used pth initially, they are more like a compiled model that contain its structure, while the Pt file is just the models’ weights (should be more flexible to changes in Pytorch and in the network’s code), but the loader is supposed to handle both.

Color space shouldn’t matter I believe, from what I understand it’s only applied on the display side but doesn’t change the values that the network gets or outputs, so it will output sRGB as default for untagged but it’s completely agnostic of what color space the data is encoded with, and an Adobe RGB tag should be applied to that output to display it correctly (Ideally the denoising script should just copy over all the tags including color space from the input image). However, since all the training data was sRGB I don’t know if the denoising performance would be negatively impacted by data encoded with a color space it hasn’t seen.

I have a few improvements to make to the training loop, I’m focusing on it a lot right now and I should have updated code within a week or two. Will keep you posted.


Training it right now with your pictures, my pictures and some other I took that I am not allowed to share but that I am allowed to use just for training.

What is the point with ISOH1 and ISOH2 instead of for example to name them as ISO51200 and ISO102400?

Multi-GPU training is a pytorch feature that should be pretty straightforward but I haven’t implemented in this training loop. (I usually don’t have access to multiple Nvidia GPUs on the same machine and when I do they are usually running different experiments.)

The Fujifilm X-T1 only gets up to ISO6400 so I increased the shutter speed further and called those H* (high) instead of arbitrary ISO values. It shouldn’t matter as long as the base value is the lowest since the actual value is not used in the learning process.

I just realized that you made 8 whole sets (somehow I only saw Bark initially), thank you!

48 hours training with yours and my pictures. Seems next step will be to collect a lot more samples.
model_192.zip (52.5 MB)

Any idea of how to convert 100 tif files at the same time and with the same output file names?
I tried the following in bash but it didn’t work like I wanted it to

for i in *.tif
python3 denoise_image.py --cs 256 --ucs 192 --model_path "models/4/model_144.pth" -i in/${i%} -o out/$i