Export times on iMac and Mac mini with diffuse&sharpen

My iMac 27" i9 from 2019 is showing signs of age and acting a bit strangely lately so I have to think about a replacement for when it packs up (hoping it still lasts the longest possible…)
I tested two photos from a GFX100s and X100VI (102mp and 40mp respectively) going quite hard on the diffuse and sharpen module to test export speeds in uncompressed tif.

The GFX file exports in 202 seconds and the X100VI exports in 59s on the iMac (40Go ram)
The GFX file exports in 923s and the X100VI exports in 185 seconds on the Mac mini 2023 M2 basic model with 8Go Ram.
I am a bit surprised at the difference in export times, the difference in Ram must be the key factor.
I am interested in what export times you have on your computer to give me a clue on where to go next. I am not opposed to moving to Linux but a bit more reticent for windows.

Darktable was launched on the terminal with :
/Applications/darktable.app/Contents/MacOS/darktable -d opencl -d perf

I am including the raw files with xmp and the two log files from the terminal.

2024-06-19-11h15min55s (27) GFX100S.RAF (77.8 MB)
2024-06-19-11h15min55s (27) GFX100S.RAF.xmp (13.7 KB)

2024-06-19-09h51min16s (06) X100VI.RAF (40.9 MB)
2024-06-19-09h51min16s (06) X100VI.RAF.xmp (14.0 KB)

These files are licensed Creative Commons, By-Attribution, Share-Alike.

mac mini Terminal Saved Output.txt (16.3 KB)
imac Terminal Saved Output.txt (23.8 KB)

Well, the Mac Mini is less capable when it comes to OpenCL, at least as far as memory is concerned:

   DEVICE_TYPE:              GPU, unified mem
   GLOBAL MEM SIZE:          5461 MB

vs

   DEVICE_TYPE:              CPU, unified mem
   GLOBAL MEM SIZE:          40960 MB
   
   DEVICE_TYPE:              GPU, dedicated mem
   GLOBAL MEM SIZE:          8192 MB

If you also include -d tiling, you may get more details. For example, on my system (using an Nvidia 1060 with 6 GB):

    56.7530 process tiled             CL0 [export]         diffuse                (   0/   0) 7728x5152 scale=1.0000 --> (   0/   0) 7728x5152 scale=1.0000  34 IOP_CS_RGB
    56.7530 [default_process_tiling_cl_ptp] [export] **** tiling module 'diffuse' for image with size 7728x5152 --> 7728x5152
    56.7530 [default_process_tiling_cl_ptp] [export] buffer exceeds singlebuffer, corrected to 3764x5152
    56.7530 [default_process_tiling_cl_ptp] [export] (5x1) tiles with max dimensions 3764x5152, pinned=OFF, good 1716x3104 and overlap 1024
    56.7530 [default_process_tiling_cl_ptp] [export] tile (0,0) size 3764x5152 at origin [0,0]
...
    64.4636 [default_process_tiling_cl_ptp] [export] tile (3,0) size 2580x5152 at origin [5148,0]
    66.2384 process tiled             CL0 [export]         diffuse.1              (   0/   0) 7728x5152 scale=1.0000 --> (   0/   0) 7728x5152 scale=1.0000  35 IOP_CS_RGB
    66.2384 [default_process_tiling_cl_ptp] [export] **** tiling module 'diffuse.1' for image with size 7728x5152 --> 7728x5152
    66.2384 [default_process_tiling_cl_ptp] [export] buffer exceeds singlebuffer, corrected to 4993x5152
    66.2384 [default_process_tiling_cl_ptp] [export] (2x1) tiles with max dimensions 4993x5152, pinned=OFF, good 4865x5024 and overlap 64
    66.2384 [default_process_tiling_cl_ptp] [export] tile (0,0) size 4993x5152 at origin [0,0]
    77.9512 [default_process_tiling_cl_ptp] [export] tile (1,0) size 2863x5152 at origin [4865,0]

That was for the X100VI image, and export time was ~ 30 s.

    90.9632 process tiled             CL0 [export]         diffuse                (   0/   0) 11662x8744 scale=1.0000 --> (   0/   0) 11662x8744 scale=1.0000  34 IOP_CS_RGB
    90.9632 [default_process_tiling_cl_ptp] [export] **** tiling module 'diffuse' for image with size 11662x8744 --> 11662x8744
    90.9632 [default_process_tiling_cl_ptp] [export] buffer exceeds singlebuffer, corrected to 5085x3813
    90.9632 [default_process_tiling_cl_ptp] [export] (4x5) tiles with max dimensions 5084x3813, pinned=OFF, good 3036x1765 and overlap 1024
    90.9632 [default_process_tiling_cl_ptp] [export] tile (0,0) size 5084x3813 at origin [0,0]
...
   128.1086 [default_process_tiling_cl_ptp] [export] tile (3,3) size 2554x3449 at origin [9108,5295]
   129.1109 pipe cache get                [export]         diffuse.1              IOP_CS_RGB line  1( 2) at 0x75a931a4c040. hash=af0f78c8d1063851
   129.1112 process tiled             CL0 [export]         diffuse.1              (   0/   0) 11662x8744 scale=1.0000 --> (   0/   0) 11662x8744 scale=1.0000  35 IOP_CS_RGB
   129.1112 [default_process_tiling_cl_ptp] [export] **** tiling module 'diffuse.1' for image with size 11662x8744 --> 11662x8744
   129.1112 [default_process_tiling_cl_ptp] [export] buffer exceeds singlebuffer, corrected to 5857x4392
   129.1112 [default_process_tiling_cl_ptp] [export] (3x3) tiles with max dimensions 5856x4392, pinned=OFF, good 5728x4264 and overlap 64
   129.1112 [default_process_tiling_cl_ptp] [export] tile (0,0) size 5856x4392 at origin [0,0]
...
   174.1183 [default_process_tiling_cl_ptp] [export] tile (2,2) size 206x216 at origin [11456,8528]

Export time for the GFX100S image was ~85 s.

Notice the messages with 5084x3813, pinned=OFF, good 3036x1765 and overlap 1024. Of the ~ 19 MPx in the tile, only ~5.3 MPx were useful, the rest had to be recomputed over and over. More GPU memory would have meant much faster processing.

As @kofa said, if you want darktable speed, a graphics card with a much vram as you can afford is the ticket.

I’m processing Z7ii images (45mpix) on an i5 haswell with a 4gb nVidia 1050Ti, and its not fast, it is quite acceptable.

The same nVidia here, on an ancient PC from 2012, i5 ivy bridge and 16 GB of DDR3 RAM, such machines do not die easily. My photos are 24 mp, processing speed in darktable is surprisingly acceptable, on windows and during my ubuntu period. Even RT is still considerably faster than on a three years old fancy laptop with i7, 16 GB DDR4 etc.

This might or might not be of interest - processing with just an i7 cpu; I don’t have a graphics card. My i7 is several years old now. Intel Core i7-6700K CPU @ 4.00GHz × 8

smaller file
114s to complete output. System used a max of 14.7Gb (watching System Monitor). System was using 4.5Gb after DT finished exporting.

larger file
315s to complete output. System used a max of 26.0Gb (watching System Monitor). System was using 4.5Gb after DT finished exporting. Don’t think I’ve ever seen 26Gb used on my system before!

System: Ubuntu 22.04 cpu as above 32Gb memory