It depends on module settings, a large radius in diffuse or sharpen can force tiling even with 6 GB. I’ve already posted that above.
Yes, for sure 8MP and beyond. I had a system with the I5-2500 non K CPU which I replaced in 2018 due to general slowness. Anything you build today with 2nd generation specs and 16GB of memory will be more than enough to handle darktable. Just place the build emphasis on the GPU and not the CPU.
Not all 4gb are used. Based on settings, it could use 70% of that. If you know you won’t use other programs while using dt, You can adjust your settings tonuse more of the memory.
I found the reason for the tiling. My GeForce GTX 750, while supporting up to 4GB, only has 1GB of memory. So tiling is inevitable.
If I set ‘use all device memory’ in the openCL settings, I get a nice speedup. Without that option, the GPU is almost as slow as the CPU.
Maybe it’s a good time to upgrade that gpu.
I know I’m a little late here, but I recently built a new desktop and took a chance with an Intel Arc GPU in an AMD system (the “abomination” build).
I used an Intel A770 as it was the most price economical for a 16GB card. Out of the three GPU makers, Intel seems to support open source Linux drivers the best of anybody. Once I was able to figure out all the packages that needed installing on Fedora, everything has been fantastic.
The A770 may be overkill for darktable (I use it for the latest Flight Simulator*) but good Linux support was a strong motivation for trying an Intel GPU.
edit to add: Using Linux to play a Microsoft game just makes me happy.
Darktable still leaves about 600 MB unused (‘headroom’). So, you’re only using about 400 MB of GPU memory, which is almost nothing. darktable 4.6 user manual - memory & performance tuning
WIth a card that has 4 or 6 GB, and the default settings, meaning 70% of (video RAM - 600 MB headroom), you’d have way more available:
For example, on a GPU with 6GB of memory, darktable will use approximately
(6 - 0.6) * 700 / 1024
, or 3.5GB of GPU RAM when using theresource_default
level.
@kofa , what are your recommended settings for a 6 GB card?
for 6MB possibly not use the GPU at all?! *scnr*
But the manual gives some advises. You can play around and test for example the “large” profile. If you do not have too much other stuff running on the GPU, it may give a better performance. It all depends what else is going on…
Oh! that was a typo, corrected it to read 6 GB
I’ve set opencl_mandatory_timeout=20000
in darktablerc
. My scheduling profile is ‘very fast GPU’.
Here are the OpenCL-related settings from my darktablerc
. As you’ll see, most of them are the defaults, I’ve only changed a couple of them.
# I used to tune this, a few % can be gained; this below is the default value.
# Read the docs and experiment (measure) to tweak.
cldevice_v5_nvidiacudanvidiageforcegtx10606gb=0 250 0 16 16 128 0 0 0.000 0.000 0.250
cldevice_v5_nvidiacudanvidiageforcegtx10606gb_building=-cl-fast-relaxed-math
# I think this is the headroom, in MB;
# default is cldevice_v5_nvidiacudanvidiageforcegtx10606gb_id0=600
cldevice_v5_nvidiacudanvidiageforcegtx10606gb_id0=400
clplatform_amdacceleratedparallelprocessing=TRUE
clplatform_apple=FALSE
clplatform_intelropenclhdgraphics=TRUE
clplatform_nvidiacuda=TRUE
clplatform_openclon12=FALSE
clplatform_other=FALSE
clplatform_rusticl=FALSE
darkroom/mouse/middle_button_cycle_zoom_to_200_percent=TRUE
opencl=TRUE
# default is opencl_device_priority=*/!0,*/*/*/!0,*,
# but it's only taken into account
# if 'opencl_scheduling_profile=default', and my
# opencl_scheduling_profile=very fast GPU setting below
# is basically the same as +0/+0/+0/+0/+0 (everything on the first card)
opencl_device_priority=+0/+0/+0/+0/+0
opencl_library=
# default is opencl_mandatory_timeout=400, in units of 50 ms
# (max. time to wait for GPU, then fall back to CPU)
opencl_mandatory_timeout=20000
# default is opencl_scheduling_profile=default
opencl_scheduling_profile=very fast GPU
opencl_tune_headroom=false
Finally, I have resourcelevel=large
(you can set that from preferences). I have 64 GB of RAM.
For that long, cryptic setting, I used to have
64 64 1024 1
and 16 16 1024 0
at some point (instead of the current defaults of 16 16 250 0).
16 16
(or 64 64
) means (see darktable 4.9 user manual - memory & performance tuning)
d. clroundup wh / e. clroundup ht
These parameters should be left at this default value – testing has not shown any benefit to using other values.
(I think 64 gave me a few percent improvement, but certainly not something one would actually feel; I thought at the time I was able to measure it.)
The 250 (or 1024) is:
f. number of event handles
default 128
[…] On most current devices and drivers you can expect a number of up to 1024 to be safe (for sure if your driver / card reports OpenCL V.2.0 or larger) leading to a slightly better OpenCL performance. If your driver runs out of free handles you will experience failing OpenCL kernels with error messageCL_OUT_OF_RESOURCES
or even crashes or system freezes.
I never had issues with 1024, so I’ll restore that, thanks for asking for my settings.
Finally, the 0
or 1
:
g. asynchronous mode
1 = use asynchronous mode; 0 = don’t use (default)
[…] For optimum latency set this to 1 […] If you experience OpenCL errors like failing kernels, reset the parameter to 0 […] Issues have been reported with some older AMD/ATI cards (like the HD57xx) which can produce garbled output if this parameter is set to 1. If in doubt, leave it at its default of 0.
Since these are easy to test quite some time ago I set micronap to 0 from 250. It gave a reasonable boost and I have no crashes…there is some comment about keeping the GPU busy and these i think allow it to catchup if needed?? I’m not sure but you could try it and see if you notice any improvement… I had 4 or 5 standard images and stacks that I had used for testing ranging from 5 -6 sec to complete to a torture tester at 30 or so seconds and all were faster when the micronap was zero…for sure you would have to test to be sure there are not issues likely on a system by system basis… I have a 3060TI and a 12th gen intel CPU with 32GB DDR5 and NVME drives so maybe that also allows for this to not be an issue and to help with performance…
On the subject of PC building, it seems to be rarer than hen’s teeth to find a decent full / midi tower ATX case nowadays which has a decent compliment of 5.25" and 3.5" bays for mounting optical DVD writer drives and bay mounted USB card readers.
Most PC cases seem to be focusing on “cool” looks, for “1337 gamers”, rather than function…
My old Raijentek “Arcadia” case I’m finding somewhat limiting now, due to running out of space for drives, as well as the MSI B550M Mini ATX motherboard running of of SATA slots. But when I bought that case (shown below) a couple of motherboards & CPUs ago it had a decent design for a functional case!
There seem to be quite a few.
https://www.digitec.ch/en/s1/tag/cases-524?filter=pt%3D77%2C5172%3D2_38|3_38|4_38|5_38|6_38%2C5173%3D7_38|6_38|5_38|4_38|3_38&filterGrid=expanded
My son has recommended this page to me: partspicker.com. In particular, he’s looking at a smaller build for gaming, but you’ll probably find larger cases there: https://pcpartpicker.com/guide/NtFfrH/entry-level-amd-gaming-build
Their case filter: https://pcpartpicker.com/products/case/
There are many cases that are built for function rather than design, but 5.25" and 3.5" external bays are something that seems to slowly extinct.
The case I currently use has many internal slots for 3.5" and 2.5" drives but not a single 5.25" nor 3.5" external slot. Instead, you can mount up to three fans in the front.
PC part picker says it only knows of 1360 cases with at least 2x 5.25 external bays and 4x 3.5 internal bays.
Did not say that there are none
It appears to be a trend to not have them anymore.
I guess they need the space because modern GPUs can be very large and dissipate a lot of heat, thus you need a lot of fans as well.
If you are dissipating a lot of heat, you need to think about the airflow within the case. A common setup is to have the air intake at the bottom (fans blow in), and evacuate the air to the back (fans blow out). But there need to be a path for the air to move.
While a lot of cases allow multiple fans at the top, it does not always make sense, as they may just take the airflow from the fans at the back. Remember that airflow in has to equal airflow out.
You can find a lot of resources about this online, but ultimately it makes sense to experiment (have the case set up, put on a heavy load, measure temperatures; then change the setup and try again). I build PCs for some numerical computations, and found that a bit of experimenting can easily make a 8–12 °C difference within the case, and make decrease fan noise significantly.
Also, remember to clean heat sinks from time to time. I do it once a year, approximately.
Yes, that’s what I was referring to. My guess is that case builders make space for air-paths and remove stuff that is in the way - such as optical drives.
There are indeed many resources online to learn about fan arrangement and also different types of fans. At the moment I have a front-to-back airflow, with two fans in the front and one in the back. For me, it works very well and CPU and GPU keep silent and cool.
Indeed some have claims about this, but my experience about these is mixed at best. I found no difference between plain vanilla cases which just have empty space and those that imagine that they control the path of air internally (and ask for a higher price because of this). YMMV. I think that size is key (don’t crowd the case), also more slow fans so that flow tends to be laminar instead of turbulent.
5.25 bays are usually on top/front, where they are usually not obstructing much. I think it is simply demand falling for this type. That said, there are still a zillion cases that have them.