Here’s the dataset that most of the denoising models seem to have been trained on: SIDD
There’s also a paper describing their methodology for building the dataset.
Seems that most public datasets are pretty bad:
Denoising benchmark with real images There have been,
to the best of our knowledge, two attempts to quantitatively
benchmark denoising algorithms on real images. One is the
RENOIR dataset [2], which contains pairs of low/high-ISO
images. This dataset lacks accurate spatial alignment, and
the low-ISO images still contain noticeable noise. Also, the
raw image intensities are linearly mapped to 8-bit depth,
which adversely affects the quality of the images.
More closely related to our effort is the work on the
Darmstadt Noise Dataset (DND) [25]. Like the RENOIR
dataset, DND contains pairs of low/high-ISO images. By
contrast, the work in [25] post-processes the low-ISO im-
ages to (1) spatially align them to their high-ISO counter-
parts, and (2) overcome intensity changes due to changes
in ambient light or artificial light flicker. This work was
the first principled attempt at producing high-quality ground
truth images. However, most of the DND images have rel-
atively low levels of noise and normal lighting conditions.
As a result, there is a limited number of cases of high noise
levels or low-light conditions, which are major concerns
for image denoising and computer vision in general. Also,
treating misalignment between images as a global transla-
tion is not sufficient for cases including lens motion, radial
distortion, or optical image stabilization.
If anyone is looking for a project, building a high-quality DSLR/mirrorless dataset sounds like a splendid idea