Well, the Mac Mini is less capable when it comes to OpenCL, at least as far as memory is concerned:
DEVICE_TYPE: GPU, unified mem
GLOBAL MEM SIZE: 5461 MB
vs
DEVICE_TYPE: CPU, unified mem
GLOBAL MEM SIZE: 40960 MB
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 8192 MB
If you also include -d tiling
, you may get more details. For example, on my system (using an Nvidia 1060 with 6 GB):
56.7530 process tiled CL0 [export] diffuse ( 0/ 0) 7728x5152 scale=1.0000 --> ( 0/ 0) 7728x5152 scale=1.0000 34 IOP_CS_RGB
56.7530 [default_process_tiling_cl_ptp] [export] **** tiling module 'diffuse' for image with size 7728x5152 --> 7728x5152
56.7530 [default_process_tiling_cl_ptp] [export] buffer exceeds singlebuffer, corrected to 3764x5152
56.7530 [default_process_tiling_cl_ptp] [export] (5x1) tiles with max dimensions 3764x5152, pinned=OFF, good 1716x3104 and overlap 1024
56.7530 [default_process_tiling_cl_ptp] [export] tile (0,0) size 3764x5152 at origin [0,0]
...
64.4636 [default_process_tiling_cl_ptp] [export] tile (3,0) size 2580x5152 at origin [5148,0]
66.2384 process tiled CL0 [export] diffuse.1 ( 0/ 0) 7728x5152 scale=1.0000 --> ( 0/ 0) 7728x5152 scale=1.0000 35 IOP_CS_RGB
66.2384 [default_process_tiling_cl_ptp] [export] **** tiling module 'diffuse.1' for image with size 7728x5152 --> 7728x5152
66.2384 [default_process_tiling_cl_ptp] [export] buffer exceeds singlebuffer, corrected to 4993x5152
66.2384 [default_process_tiling_cl_ptp] [export] (2x1) tiles with max dimensions 4993x5152, pinned=OFF, good 4865x5024 and overlap 64
66.2384 [default_process_tiling_cl_ptp] [export] tile (0,0) size 4993x5152 at origin [0,0]
77.9512 [default_process_tiling_cl_ptp] [export] tile (1,0) size 2863x5152 at origin [4865,0]
That was for the X100VI image, and export time was ~ 30 s.
90.9632 process tiled CL0 [export] diffuse ( 0/ 0) 11662x8744 scale=1.0000 --> ( 0/ 0) 11662x8744 scale=1.0000 34 IOP_CS_RGB
90.9632 [default_process_tiling_cl_ptp] [export] **** tiling module 'diffuse' for image with size 11662x8744 --> 11662x8744
90.9632 [default_process_tiling_cl_ptp] [export] buffer exceeds singlebuffer, corrected to 5085x3813
90.9632 [default_process_tiling_cl_ptp] [export] (4x5) tiles with max dimensions 5084x3813, pinned=OFF, good 3036x1765 and overlap 1024
90.9632 [default_process_tiling_cl_ptp] [export] tile (0,0) size 5084x3813 at origin [0,0]
...
128.1086 [default_process_tiling_cl_ptp] [export] tile (3,3) size 2554x3449 at origin [9108,5295]
129.1109 pipe cache get [export] diffuse.1 IOP_CS_RGB line 1( 2) at 0x75a931a4c040. hash=af0f78c8d1063851
129.1112 process tiled CL0 [export] diffuse.1 ( 0/ 0) 11662x8744 scale=1.0000 --> ( 0/ 0) 11662x8744 scale=1.0000 35 IOP_CS_RGB
129.1112 [default_process_tiling_cl_ptp] [export] **** tiling module 'diffuse.1' for image with size 11662x8744 --> 11662x8744
129.1112 [default_process_tiling_cl_ptp] [export] buffer exceeds singlebuffer, corrected to 5857x4392
129.1112 [default_process_tiling_cl_ptp] [export] (3x3) tiles with max dimensions 5856x4392, pinned=OFF, good 5728x4264 and overlap 64
129.1112 [default_process_tiling_cl_ptp] [export] tile (0,0) size 5856x4392 at origin [0,0]
...
174.1183 [default_process_tiling_cl_ptp] [export] tile (2,2) size 206x216 at origin [11456,8528]
Export time for the GFX100S image was ~85 s.
Notice the messages with 5084x3813, pinned=OFF, good 3036x1765 and overlap 1024
. Of the ~ 19 MPx in the tile, only ~5.3 MPx were useful, the rest had to be recomputed over and over. More GPU memory would have meant much faster processing.