amazing what difference the language alone makes. seems encouraging.
the code does indeed not look very performance oriented at all yet
all the boiler plate image struct + image operations look like they would introduce a fair bit of cache thrashing and unnecessary memory allocations. i mean things like this:
rlucy_img_mean(image, mean, time);
rlucy_alloc_image(&img_tmp1, image->w, image->h, image->c);
rlucy_img_subtract_1(&img_tmp1, mean, time);
rlucy_img_norm(&img_tmp1, norm, time);
(also i disagree with the code comments that seem to indicate if this kind of api is filled with calls into blas things would be better)
re: tiling. i don’t think you want to rely on our tiling engine. the requirements when buffers are tiled and what would happen with the tiles are very different and may lead to unexpected behaviour if the pixel pipeline engine changes in the future.