Incredibly Slow File Browser and RT Not Using CPU

I’ve heard the software can be slow, but this is absurd. I must be doing something wrong as a new user. When I open a folder with 3000 images, it loads maybe 1 image a second in the file browser. This stuff takes maybe 30 seconds total to load in Windows Explorer and RT takes literally hours. If I try to view a single 20MB file in the image viewer while it’s still loading the file browser, the image will take a couple seconds to load, which makes working in the program impossible. All I want to do is browse images, rank them, and then apply batch processing to the favorites.

The entire time it’s doing this my CPU usage is less than 20%. Is there some setting I’m missing?

Here are details on my system:
RawTherapee 5.6
Windows 10 Home
intel i7-8700 CPU
16 GB Ram
GTX 1070
Files stored on HDD

Several things.

  1. Update to 5.7.
  2. HDDs are slow, reading 3000 * 20MB takes time.
  3. Windows Explorer does not use the raw data, it only shows the JPEG image embedded in each raw file.

If there’s no problem with this, split the image collection into several folders, with say 500 images in each folder.

If you open a folder the first time in RT, it extracts the embedded jpeg from each raw file and generates a thumbnail which is written into RT thumbnail cache. This means a lot of IO. That’s why your CPU usage is less than 20%, you’re simply limited by IO speed.

I just made a test:

I opened a folder with 6934 raw files (each of size 12 MB) for the first time in RT. That took 12 minutes.

Then I closed RT and opened the folder again in RT (now the thumbnails are in RT thumbnail cache on disk). That took less than 30 seconds which imho is fine for this number of files.

2 Likes

I suspect that if the OP had been using a Linux system they would have seen iowait pegged at 100%

Unfortunately iowait isn’t easy to see on Windows…

(to @NinjaPhotographer - on Linux systems, the CPU utilization reporting metrics include an “iowait” percentage - this is the percentage of time that a task on a CPU COULD have been doing something but was instead waiting for an I/O operation to complete)

2 Likes

@NinjaPhotographer

It may be beneficial to set the RT thumbnail cache folder to a folder on a different physical storage (if available). On WIndows you can do this by defining an environment variable as described here:
http://rawpedia.rawtherapee.com/File_Paths#Custom_config_and_cache_folders

  1. Update to 5.7.
  2. HDDs are slow, reading 3000 * 20MB takes time.
  3. Windows Explorer does not use the raw data, it only shows the JPEG image embedded in each raw file.

Regarding point 1, I will do that, thanks!
For 2 and 3, these are just JPEG files, not even editing RAW yet. I was just testing the waters with RT. Why is Raw Therapee literally 120x slower than Windows Explore / Windows Photos in opening, viewing, and sorting them?

Thanks for the information! I tried limiting the thumbnail size, but it didn’t seem to increase the speed. I’m just surprised that other software like Windows Photos is able to do this thumbnail creation and previewing of photos at almost 100x faster.

Actually I think I understand this now. Windows thumbnails are lower resolution, and it doesn’t use the entirety of the JPEG data to create them. Even when I open the images initially, they are heavily compressed until I wait a second or two, which is aligned with Raw Therapee’s time. But a compressed version is perfect to load it quickly so I can decide if it is a photo worth working on or not. I guess I’ll just have to use different software than Raw Therapee to filter through and tag the best photos, then only upload the best photos to Raw Therapee later from a SDD to do edits and batch processing.

Thank you for your help!

Another thing to try (since @heckflosse indicated this is a one time penalty):

Load up the folder in RT, go grab a coffee, come back later. You should only need to do this once as long as something doesn’t invalidate the cache.

1 Like

@NinjaPhotographer That’s also important as there have been some speedups for filebrowser in RT 5.7 especially for folders with a huge amount of files.

1 Like

It’s also very slow jumping between images.

My workflow for shooting basketball games consists of cropping and rotating JPEGs from a 7DmkII (20 MP). I like RT over Darktable for this purpose because the cropping is done by drawing the bounds of the box rather than having to pull in from the edges; drawing the box is usually faster to do. For that matter, I’m also carrying a patch to have the crop tool not reset to the move tool (which seems useless anyway if you’re not zoomed in, but that’s another matter). I don’t actually export from RT, which is slow; I wrote a shell script with a perl helper to compute the appropriate crop and rotate and use ImageMagick on the command line to apply it (which lets me parallelize the operations). When you’re trying to process 500 images ASAP, things like that matter. Anyway.

In addition to the file browser loading very slowly (which isn’t that much of a problem, since it only happens once and I can do something else while I’m waiting for a few minutes), moving through the images (with f3 and f4) is also very slow – it takes a second or two (on an i7-6820/6920HQ) to load the next/previous image. There’s no obvious reason why that should be the case, particularly on JPEG files when there’s no processing (that I’m aware of) going on here.

I have a callgrind running on RT trying to load 100 files to try to see where this is all going. Loading a 20 MP JPEG on a reasonably modern processor really shouldn’t be more than tens or so of milliseconds.

Oh yeah, this is on Linux, using gcc 7.5, on the tip of the dev branch plus two patches I’m carrying (not to switch back to the hand tool, and to set the default crop ratio to not be fixed, neither of which have any performance implications).

It turned out to be a real headache to run under callgrind. Callgrind is of course slow, but this was borderline pathological. It looked like most of the time was being spent in LCMS transform code, even when I thought I had everything set (neutral profile, no monitor correction, cleared out the database and .pp3 files, and so forth).

I’m going to try again. I’d really like image loading to be faster, but I have enough FOSS projects on my plate that I can’t really add another.

If you’re using the OS package for LCMS, it may not be compiled with threading enabled. I compiled and installed my own LittleCMS2 in /usr/local, and my display transforms in rawproc sped up substantially.

Built it myself from source, and it does look like threading is enabled. The most relevant lines in CMakeCache.txt look like

//Build with OpenMP support
OPTION_OMP:BOOL=ON

//CXX compiler flags for OpenMP parallelization
OpenMP_CXX_FLAGS:STRING=-fopenmp

//CXX compiler libraries for OpenMP parallelization
OpenMP_CXX_LIB_NAMES:STRING=gomp;pthread

//C compiler flags for OpenMP parallelization
OpenMP_C_FLAGS:STRING=-fopenmp

//C compiler libraries for OpenMP parallelization
OpenMP_C_LIB_NAMES:STRING=gomp;pthread

//Path to the gomp library for OpenMP
OpenMP_gomp_LIBRARY:FILEPATH=/usr/lib64/gcc/x86_64-suse-linux/7/libgomp.so

//Path to the pthread library for OpenMP
OpenMP_pthread_LIBRARY:FILEPATH=/usr/lib64/libpthread.so

So it looks like part of my problem is that I had some old 5.7 version squirreled away in /usr/local/bin (how my carry patches appeared to work, I have no idea).

% rawtherapee -v
RawTherapee, version 5.7-381-g4fc28370a

Second, turning off every CMS option I could find greatly improves performance, although it’s far from instantaneous. I guess I really need a dedicated crop-and-rotate tool that simply writes out a sidecar that I can parse.

Well, it turns out that it was compiled without threads, apparently due to a linker bug. I tried removing --without-threads, and it indeed fails to link (undefined references to various pthread_mutex_*). And I verified that -lpthread is in the link line. Doesn’t seem to matter which compiler I use, either. And I don’t think I quite want to try swapping out my binutils (2.33.1).

From the callgrind output, it looks like it’s making an awful lot of calls to cmsDoTransform (about 28000 per function, from two functions each called about 160 times, which itself seems high considering that I just loaded two photos and flipped back and forth about 4 times until I ran out of time). But as I said, I don’t have time to dig into the code here, which codebase I’m completely unfamiliar with. I have enough with Gutenprint and KPhotoAlbum to keep me busy.

This reminds me of a common xkcd theme:

(xkcd comics are under Creative Commons Attribution-NonCommercial 2.5 License)

3 Likes

A large number of cmsDoTransform calls is appropriate, the function is arranged so a user can call it “per-stride” and do their own parallelization. However, 28000 does sound large, as the stride normally corresponds to a row of pixels in an image…