I’m processing 42MP raw images using Darktable on an older laptop with these specifications:
- CPU: Intel(R) Core(TM) i7 CPU Q 740 @ 1.73GHz
- GPU: NVIDIA Corporation GF108M [GeForce GT 425M] (rev a1)
Ubuntu Studio 18.04 installed the Nouveau open source driver by default but exporting images took a long time. Therefore the Ubuntu binary driver (nvidia-384 390.116-0ubuntu0.18.04.1) was installed. And guess what: exporting images took even longer: about 50%. Why, how?
Maybe this CPU is relatively fast (I’m running with 8 threads – however this throttles the chip down considerably due to heat dissipation). Or the GT 425M is rather slow? Or is there a lack of dedicated video memory (1GB onboard)? When exporting an image with opencl support only a single thread runs on the CPU (except when a module is not supported in opencl).
$ darktable -d perf 2>/dev/null dsc00801.arw
12,245726 [dev] took 0,000 secs (0,000 CPU) to load the image.
12,390377 [export] creating pixelpipe took 0,143 secs (0,181 CPU)
12,438697 [dev_pixelpipe] took 0,048 secs (0,134 CPU) initing base buffer [export]
12,508381 [dev_pixelpipe] took 0,070 secs (0,056 CPU) processed `raw black/white point' on GPU, blended on GPU [export]
12,552541 [dev_pixelpipe] took 0,044 secs (0,033 CPU) processed `white balance' on GPU, blended on GPU [export]
12,592998 [dev_pixelpipe] took 0,040 secs (0,031 CPU) processed `highlight reconstruction' on GPU, blended on GPU [export]
13,527287 [dev_pixelpipe] took 0,934 secs (0,625 CPU) processed `demosaic' on GPU with tiling, blended on CPU [export]
14,221728 [dev_pixelpipe] took 0,694 secs (0,523 CPU) processed `base curve' on GPU with tiling, blended on CPU [export]
14,675627 [dev_pixelpipe] took 0,454 secs (0,424 CPU) processed `input color profile' on GPU with tiling, blended on CPU [export]
15,707754 [dev_pixelpipe] took 1,032 secs (0,840 CPU) processed `crop and rotate' on GPU with tiling, blended on CPU [export]
101,875460 [dev_pixelpipe] took 86,168 secs (58,746 CPU) processed `denoise (non-local means)' on GPU with tiling, blended on CPU [export]
102,680297 [dev_pixelpipe] took 0,805 secs (0,628 CPU) processed `sharpen' on GPU with tiling, blended on CPU [export]
103,370905 [dev_pixelpipe] took 0,691 secs (0,521 CPU) processed `output color profile' on GPU with tiling, blended on CPU [export]
103,474120 [dev_pixelpipe] took 0,103 secs (0,781 CPU) processed `gamma' on CPU, blended on CPU [export]
103,474459 [dev_process_export] pixel pipeline processing took 91,084 secs (63,343 CPU)
[export_job] exported to `dsc00801.jpg'
$ darktable --disable-opencl -d perf 2>/dev/null dsc00801.arw
21,905118 [dev] took 0,000 secs (0,000 CPU) to load the image.
22,053316 [export] creating pixelpipe took 0,146 secs (0,183 CPU)
22,102107 [dev_pixelpipe] took 0,049 secs (0,108 CPU) initing base buffer [export]
22,131318 [dev_pixelpipe] took 0,029 secs (0,076 CPU) processed `raw black/white point' on CPU, blended on CPU [export]
22,158311 [dev_pixelpipe] took 0,027 secs (0,091 CPU) processed `white balance' on CPU, blended on CPU [export]
22,178494 [dev_pixelpipe] took 0,020 secs (0,120 CPU) processed `highlight reconstruction' on CPU, blended on CPU [export]
22,848160 [dev_pixelpipe] took 0,670 secs (4,030 CPU) processed `demosaic' on CPU, blended on CPU [export]
23,051582 [dev_pixelpipe] took 0,203 secs (1,040 CPU) processed `base curve' on CPU, blended on CPU [export]
23,208922 [dev_pixelpipe] took 0,157 secs (1,164 CPU) processed `input color profile' on CPU, blended on CPU [export]
24,867515 [dev_pixelpipe] took 1,659 secs (13,088 CPU) processed `crop and rotate' on CPU, blended on CPU [export]
79,649894 [dev_pixelpipe] took 54,782 secs (430,211 CPU) processed `denoise (non-local means)' on CPU, blended on CPU [export]
80,092760 [dev_pixelpipe] took 0,443 secs (3,198 CPU) processed `sharpen' on CPU, blended on CPU [export]
80,409923 [dev_pixelpipe] took 0,317 secs (2,397 CPU) processed `output color profile' on CPU, blended on CPU [export]
80,515835 [dev_pixelpipe] took 0,106 secs (0,752 CPU) processed `gamma' on CPU, blended on CPU [export]
80,515892 [dev_process_export] pixel pipeline processing took 58,463 secs (456,276 CPU)
[export_job] exported to `dsc00801.jpg'
91 seconds using the GPU and only 58,5 on the CPU. Rather close to the 50% difference found looking at file time stamps.
Same tests on my development workstation (arie@quercus:/development/perftest/RAW.saves) with:
- CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
- GPU: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)
37.724091 [dev] took 0.000 secs (0.000 CPU) to load the image.
37.789180 [export] creating pixelpipe took 0.064 secs (0.082 CPU)
37.819754 [dev_pixelpipe] took 0.030 secs (0.068 CPU) initing base buffer [export]
37.842631 [dev_pixelpipe] took 0.023 secs (0.055 CPU) processed raw black/white point' on CPU, blended on CPU [export]
37.863322 [dev_pixelpipe] took 0.021 secs (0.096 CPU) processed
white balance' on CPU, blended on CPU [export]
37.880317 [dev_pixelpipe] took 0.017 secs (0.109 CPU) processed highlight reconstruction' on CPU, blended on CPU [export]
38.336162 [dev_pixelpipe] took 0.456 secs (2.151 CPU) processed
demosaic' on CPU with tiling, blended on CPU [export]
38.425399 [dev_pixelpipe] took 0.089 secs (0.471 CPU) processed base curve' on CPU with tiling, blended on CPU [export]
38.492039 [dev_pixelpipe] took 0.067 secs (0.476 CPU) processed
input color profile' on CPU with tiling, blended on CPU [export]
39.098752 [dev_pixelpipe] took 0.607 secs (4.769 CPU) processed crop and rotate' on CPU with tiling, blended on CPU [export]
64.361412 [dev_pixelpipe] took 25.263 secs (188.899 CPU) processed
denoise (non-local means)' on CPU with tiling, blended on CPU [export]
64.751435 [dev_pixelpipe] took 0.390 secs (2.610 CPU) processed sharpen' on CPU with tiling, blended on CPU [export]
64.894855 [dev_pixelpipe] took 0.143 secs (1.058 CPU) processed
output color profile' on CPU with tiling, blended on CPU [export]
64.943112 [dev_pixelpipe] took 0.048 secs (0.335 CPU) processed `gamma' on CPU, blended on CPU [export]
64.943129 [dev_process_export] pixel pipeline processing took 27.154 secs (201.097 CPU)
Ah no, using the GPU failed:
0.063212 [opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
Fixed now so proceed:
9.071683 [export] creating pixelpipe took 0.063 secs (0.082 CPU)
9.100971 [dev_pixelpipe] took 0.029 secs (0.081 CPU) initing base buffer [export]
9.118881 [dev_pixelpipe] took 0.018 secs (0.017 CPU) processed raw black/white point' on GPU, blended on GPU [export]
9.125964 [dev_pixelpipe] took 0.007 secs (0.001 CPU) processed
white balance' on GPU, blended on GPU [export]
9.136018 [dev_pixelpipe] took 0.010 secs (0.002 CPU) processed highlight reconstruction' on GPU, blended on GPU [export]
9.168534 [dev_pixelpipe] took 0.033 secs (0.012 CPU) processed
demosaic' on GPU, blended on GPU [export]
9.205392 [dev_pixelpipe] took 0.037 secs (0.011 CPU) processed base curve' on GPU, blended on GPU [export]
9.243288 [dev_pixelpipe] took 0.038 secs (0.021 CPU) processed
input color profile' on GPU, blended on GPU [export]
9.278088 [dev_pixelpipe] took 0.035 secs (0.022 CPU) processed crop and rotate' on GPU, blended on GPU [export]
14.503898 [dev_pixelpipe] took 5.226 secs (3.976 CPU) processed
denoise (non-local means)' on GPU, blended on GPU [export]
14.608538 [dev_pixelpipe] took 0.105 secs (0.024 CPU) processed sharpen' on GPU, blended on GPU [export]
14.666160 [dev_pixelpipe] took 0.058 secs (0.028 CPU) processed
output color profile' on GPU, blended on GPU [export]
14.884968 [dev_pixelpipe] took 0.219 secs (0.377 CPU) processed `gamma' on CPU, blended on CPU [export]
14.885048 [dev_process_export] pixel pipeline processing took 5.813 secs (4.572 CPU)
Wow: 5.8 seconds with GPU instead of 27 on the CPU only. On this workstation the graphical card does make a difference!
Test of the same photo export on an HP Elitebook 8540w – I didn’t take time to re-install the nvidia drivers so only the CPU-based export is done on:
Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz (= quadcore, total 8 threads)
46,825088 [export] creating pixelpipe took 0,154 secs (0,197 CPU)
46,891087 [dev_pixelpipe] took 0,066 secs (0,234 CPU) initing base buffer [export]
46,929235 [dev_pixelpipe] took 0,038 secs (0,117 CPU) processed raw black/white point' on CPU, blended on CPU [export] 46,970545 [dev_pixelpipe] took 0,041 secs (0,186 CPU) processed
white balance' on CPU, blended on CPU [export]
47,007366 [dev_pixelpipe] took 0,037 secs (0,242 CPU) processed highlight reconstruction' on CPU, blended on CPU [export]
47,728570 [dev_pixelpipe] took 0,721 secs (4,019 CPU) processed
demosaic' on CPU, blended on CPU [export]
47,952034 [dev_pixelpipe] took 0,223 secs (1,196 CPU) processed base curve' on CPU, blended on CPU [export]
48,120383 [dev_pixelpipe] took 0,168 secs (1,251 CPU) processed
input color profile' on CPU, blended on CPU [export]
50,021348 [dev_pixelpipe] took 1,901 secs (15,037 CPU) processed crop and rotate' on CPU, blended on CPU [export] 153,420066 [dev_pixelpipe] took 103,399 secs (814,992 CPU) processed
denoise (non-local means)' on CPU, blended on CPU [export]
154,084554 [dev_pixelpipe] took 0,664 secs (4,269 CPU) processed sharpen' on CPU, blended on CPU [export]
154,464895 [dev_pixelpipe] took 0,380 secs (2,916 CPU) processed
output color profile' on CPU, blended on CPU [export]
154,584284 [dev_pixelpipe] took 0,119 secs (0,894 CPU) processed `gamma' on CPU, blended on CPU [export]
154,584319 [dev_process_export] pixel pipeline processing took 107,759 secs (845,354 CPU)
So 108 seconds instead of 58 with a 1.6 GHz CPU versus a 1.7 – much higher steppable chip. The latter has a serious frequency drop when all cores are active. BTW the 1.6 GHz chip is rated as 25W, the other is 35W.