Ah, excellent. Thanks for the advoce; I admit I’m flopping about in dt-lua. Spending a lot more time reading then I am writing. Considering everything I want to do is really simple, mostly I just feel kinda dumb. BUT I finally figured out where to grab an xmp path without making bad assumptions so there’s that.
In more recent (unpublishdd) renditions I spent a chunk of effort getting away from subprocess calls entirely except for a python entry-point. And then worked on dropping the darktable dependency entirely which brings me to
Rawnind is a colloquialism/shorthand for Dr. Brummer’s follow up paper (I think its linked somewhere in this thread already). As the name implies, it involves working with a new dataset - thematically similar to the first, but full of raw images. Go figure. He ran a bunch of experiments with denoising, denoising w/(implicit) demosaicing, and also combined with compression. The codebase has all kinds of tangentially related goodies that come with years of doctoral studies, including his own raw development ‘modules.’ In both numpy and pytorch, no less.
I digress. Anyway the big advantage to feeding the 'nets raw images isnt actually image quality, which is comparable. You get far more effective compression from images that have had noise removed, so there’s definitely a business case there - very practical. The other thing you get is computational efficiency. Its a lot less computational horsepower to feed the neural nets raw Bayer pattern (smaller input dimensions), and as a result it is much faster at inferece.
It seems a lot of his experimentation was with what processing to do before feeding to the net - hence all the modular raw development code.
The side effect is that his new codebase is way easier to adapt and much more flexible from a workflow perspective - which is why I’ve been sinking more time into wrapping my head around it than I have spentworking on this one. It just seems like a better time investment.
I don’t have a roadmap or anything, everything I’ve done so far has been exploratory. Mostly studyjng code and writing myself documentation. Ive been thinking about how it might best be restructured to take advantage if its flexibility & make it more maintainable, with sort of a side focus on arriving at a structure where I can dynamically assemble contiguous tracable graphs
TL;DR - rawnind makes it much easier to implement pure ex ante and post hoc tooling, so it could support denoise → import or export → denoise (and optionally compress) without having to split the history stack or anything. A standalone tool would basically be gratis, and it would be usable with the photo editing software of your choice.
The extra super cool workflow would be doing your edits in darktable (or whatever), and then using the xmp/style to assemble a full (compileable) ISP and feeding your image(s) through it. Kind of like a self extracting executable? Taking nondestructive to the max in any event.
i’m really interested in making this fast and streamlining the integration. i’m running prep_image_dataset.py today, i hope it’ll finish. do you have any experience with training? i want to re-train on the most simplified network architecture to make it run faster. i suppose this can be a huge timesink… so any datapoints from prior experience how many layers/how many channels are required for what quality trade-off are very valuable. i’ll start with the jddcnn u-net architecture that is already in vkdt. it yielded mixed results, but maybe because of lack of good training set.
Well, now that’s a right proper query if i’d ever heard one.
There’s probably more to answering this than can properly be said in a forum post. Lots of considerations, too. I can give you some broad hints to sort of set you on the right path, maybe. The first being, that this is, broadly speaking, what Ive been working on most for the last couple weeks and I haven’t actually started any training runs yet. You’re not going to ‘get lucky’ and be set to go with a handful of experiments. Spend some time now putting together some scaffolding. Neural architecture search is what you want to read about here. Second,
IMHO. That’s probably what I’m going to look at first once I finish scaffolding. Third is that you are right, this is a huge time-sink, so important to temper excitement, but its mostly active work only for your GPU/compute if you do it right. Fourth is to consider what you are targeting. Its probably not going to be “one size fits all.” Unless you’re whisper.
Take it all with a grain of salt; I am out of practice and have had to play catch-up so my info may or may not be totally out of date and irrelevant. I hope not, though.
I have some small learnings from small tests trying to optimize filter kernels.
Use pixelshuffle instead of transposed convolution, a lot better! Always save your results to a file, its a waste to always start from scratch and its easier to sometimes manually do the training scheduling that way. Clipping the gradients to something reasonable was helpful when my weights diverged from to high learning step, it made it possible to maintain large training steps in the beginning saving ALOT of time for me. Be careful about shuffling data from cpu to gpu but i feel like you if anyone knows that one from before. (I could fit my entire patchified dataset on vram so i ended up loading everything to the gpu once…)
yeah so i have this condition. i don’t want to load/link bloatware, so i keep some custom cooperative matrix GEMM code around for the convolutions (as opposed to linking in gigabytes of runtime or even straight python). it’s quite a bit of work to make these run at speed. i really really don’t want to / don’t have the time to look at every fancy detour that results in an additional piece of code that needs to be written. i think this is also the reason why open image denoise uses simplistic nearest upsampling and then optimises the code to death. i suppose you can always fix whatever you oversimplified in this step with one more convolution after the upsampling. this is also the reason why i can’t just use the pretrained weights.
i was hoping to simply use the training pytorch scripts provided with rawnind. seems like these are “production proven”
in other news… the image preparation python script has now been running for 24h straight and finished like 30%. there is a chance that i’ll have to cut power due to work on the house before this thing finishes even preparing the input data for the training. i may or may not have the patience to restart it (if any of you knows a place where the readily prepared dataset can be downloaded or of some way to speed up this process, let me know…).
Yeah IMO that type of the thing is the perfect application for AI coding. Limited in scope, very clear requirements. It’s not like its production code, anyway.
Edit: and yeah, I left the AI produced summary in place because I wanted to be transparent about it lol
Edit2: I guess I should clarify. Yes, The actual code was authored by an LLM, but depending on what you mean by “vibe-coded,” it may or may not fit that definition. If you mean did I just ask the AI to implement something for me without understanding what it was doing - no, that’s not the case. It was more like I did the T part of TDD and passed it off to the LLM to complete the rest.
def find_cached_result(ds_dpath, image_set, gt_file_endpath, f_endpath, cached_results):
gt_fpath = os.path.join(ds_dpath, image_set, gt_file_endpath)
f_fpath = os.path.join(ds_dpath, image_set, f_endpath)
for result in cached_results:
if result["gt_fpath"] == gt_fpath and result["f_fpath"] == f_fpath:
return result
EDIT: It’s probably not totally clear from what I linked. Here’s some more context:
# Check if matching GT coordinates exist
if coordinates in gt_file_coords:
fn_gt = gt_file_coords[coordinates]
crop = {
"coordinates": list(coordinates),
"f_linrec2020_fpath": os.path.join(search_dir, fn_f),
"gt_linrec2020_fpath": os.path.join(prgb_gt_dir, fn_gt),
}
if is_bayer:
f_bayer_path = os.path.join(
bayer_image_set_dpath,
"gt" if f_is_gt else "",
fn_f.replace("." + HDR_EXT, ".npy"),
)
gt_bayer_path = os.path.join(
bayer_image_set_dpath,
"gt",
fn_gt.replace("." + HDR_EXT, ".npy"),
)
crop["f_bayer_fpath"] = f_bayer_path
crop["gt_bayer_fpath"] = gt_bayer_path
# Use cached existence checks
if not cached_exists(f_bayer_path) or not cached_exists(gt_bayer_path):
logging.error(
f"Missing crop: {f_bayer_path} and/or {gt_bayer_path}"
)
continue # Skip instead of breaking
crops.append(crop)
return crops
I was wrong; it was probably about 2/3rds garbage, but had good ideas. The FFT approach works fine and is fast. The GPU implementation is kinda wild and needed to be rewritten. bunch of weird bugs like it had missed the whole “align within a scene” concept and was trying to align everything to everything…
haha nice, thanks for pointing this out. i couldn’t tell python from ai slop, all looks the same to me. i’ll let the slow original continue to run for a while in this case…
The script that reconstructs the directories structure after downloading the dataset did not properly handle “TEST”, “UNK” and “UNK_TEST” image sets, so they were all in the same directory and prep_image_dataset was trying to align completely unrelated test images (trying as far as MAX_SHIFT_SEARCH allows, 128 pixels around, and ultimately discarding the images).
Unfortunately you will have to reprocess the dataset. On the bright side it should go much faster now (Fewer image pairs to compare and the neighborhood search should be relatively quick). It’s been running on my (“humble” 6-cores Ryzen 5 5600G w/ 128 GB of RAM) computer since last night and so far 69% done.
(You will also need test_reserve: add TEST_ and UNK_TEST_ prefixes ( completes #2 and 3c8… · trougnouf/rawnind_jddc@b6c8159 · GitHub before launching the training s.t. the test images are not used for training. Also I would recommend including the UNK images in the training data, they are not in the default config, and if you are training a linear RGB model then you could also include the X-Trans images which are also not part of the training data by default since I did not use them in the paper.)
edit: the dataset preparation finished in less than 12 hours with no issue