GPU Performance
The implementation of the algorithm on the GPU achieves the same results as the CPU, but is implemented in a different way. GPU hardware also has different strengths and weaknesses when compared to those of the CPU. In addition, different hardware vendors have their own pros and cons. Taken together, it’s impossible to give an exact performance ratio of CPU to GPU. However, some general guidelines can help you get the most out of your GPU.
General system performance
Just as on the CPU, the fractal flames algorithm is very taxing on the GPU. However, since the GPU is also managing your display, it can appear that the system is freezing up while rendering. There is no concept of limiting the number of threads or changing their priority on the GPU, as there is on the CPU. As a general rule, if you are going to do a heavy render on the GPU, let it run while you step away from your machine.
Running other programs which compete for GPU resources, such as having many browser tabs open, will slow down performance. It’s best to close all graphics intensive programs before running a GPU render.
Hardware
The GPU implementation of the algorithm exploits features found on cards that were manufactured by AMD and Nvidia from 2011 onward. Specifically post GCN for AMD and post Fermi for Nvidia. Using cards older than these is not supported and can lead to slower performance than the CPU gives. It can crash the program, and in worst case scenarios, crash the entire operating system.
Some users have reported problems with laptops using Nvidia Optimus technology on Windows, whereas others have reported success with it on Linux. Due to the non-uniform results, this hardware is unsupported.
Compilation
The code running on the GPU is compiled at run-time, which will cause the program to pause briefly before beginning a render. Compilation is triggered if a flame differs in a certain way from the previously rendered flame. Any of the following differences between subsequent renders will trigger a recompilation:
- Xform count
- Final xform presence
- Xaos presence
- Palette accumulation mode, step or linear
- Xform post affine presence
- Variation count
- Variation type
Flames with more xforms, variations, and a final xform will take longer to compile because more code needs to be processed.
Precision
While the difference in performance between single and double precision on the CPU is negligible (and thus DP should always be used there), the difference on the GPU is much more pronounced. Most GPU reviews/ratings show their performance in SP mode since that is what’s required for most gaming. When it comes to DP, most have a rating that shows the performance of it being some fraction of SP. For example, 1/8 DP means that DP calculations are one eighth as fast as SP calculations. Check your hardware specs to see what this ratio is.
Some higher end cards can render using DP without much of a performance loss, so in these cases DP is the obvious choice. However, if your card is really struggling with DP, either switch it to SP or use DP on the CPU.
Density filtering
Larger supersample values slow down density filtering considerably on the CPU, however the slowdown is even more pronounced on the GPU. In low quality renders, it can take longer than the iteration phase.
Nvidia
Nvidia cards have severe problems with max density filter widths of 9 or greater. This will cause the render to abort, in which case you need to restart the render with a smaller value. There is no known fix to this problem. The only solutions are a smaller value for the filter width, or a value of zero to turn it off altogether. This is one of the reasons AMD cards are preferred for Fractorium.
Sub batch percent per thread per kernel launch in OpenCL
This specifies how much work, as a percentage of a sub batch, is done on each thread in the GPU per kernel launch. The default value is 0.025 which will cause each thread to do 256 iters per kernel launch when using the default sub batch size of 10k. Increasing this value to 0.2 or above can give a 1% speed increase, at the risk of crashing the GPU driver if too much work is requested. Adjust this value with caution, especially if you have a slower GPU.
Multi-GPU
Fractorium supports using more than one GPU at a time to render. This will give a performance increase in most cases. However there is some overhead associated with tasking both GPUs and combining the data once the process is done. So it’s best to only use multiple GPUs when doing high quality renders. As such, using them for the low quality interactive editor is not advised.