Iterating
Iteration is the portion of the algorithm where the most time is spent. Any possible optimization that can be taken, should be taken within the innermost loops. Ember optimized every piece to the maximum possible extent.
Incremental rendering
Ember is designed to be run from the command line, or from the interactive GUI, Fractorium. To facilitate the latter, the rendering process keeps state information about its progress. This is done so that it can be aborted in mid-render, and resumed later on. It also serves to ensure the minimum amount of processing is performed in response to a change in the parameters being rendered.
For example, if a render completes and the user only wants to change the vibrancy, then only final accumulation needs to be ran again. Another feature is that if a render completes and the user increases the quality, all previous iteration information is preserved in the histogram and the new iterations for the quality difference are simply added to them.
The downside of this design is that it admittedly butchers the structure of the main rendering function with numerous conditionals. The cost is worth it as the state-preserving design greatly facilitates interactive rendering from Fractorium.
Iterators
Depending on the parameters of the flame being rendered, different calculations take place in the main iteration loop. Rather than test for these conditions on every iteration, separate loops are implemented for the different combinations of cases.
The parameters that affect these loops are whether or not the following are present: xaos, 3D and final xform.
To account for all possible combinations of these, 8 different loops are present and the iterator will chose the correct one to use.
Fusing
Academic IFS descriptions generally state that the first 20 iterations of the process are not plotted in order to get a more concentrated image with less stray points. This is somewhat misleading, since different implementations use different values. Ember uses a default value of 15 like most implementations, but it differs in that it allows the value to be changed by the user and saved to Xml. In practice, this is almost never needed since 15 fuse iterations works well.
Further deviating from academic descriptions, fusing is not just done at the very beginning of the process. Rather, all iteration is broken up into chunks, or sub batches, of 10,000. At the beginning of each sub batch, the point trajectory is reset, and fused again. Assuming a fuse value of 15 and a sub batch size of 10,000, fusing takes place for 0.0015 of the total iterations.
Fractorium allows the user to adjust the sub batch size, although values other than 10,000 rarely make any difference.
For each iteration, flam3 checks to see whether fusing is done yet, if so, the point is plotted. This is wasteful to check for something every iteration that occurs so infrequently. In the interest of maximum efficiency, Ember splits all iteration up into two identical loops, one with fusing and one without. This reduces the number of conditional checks needed.
A further optimization opportunity is that when a final xform is present, it need not be applied during fusing because the computed point is not used.
Point assignment
The general structure of an IFS iteration loop looks like:
p2 = xform(p1) save p2 to temp buffer for plotting later p1 = p2
For the case of no 3D projection and no final xform, this was optimized to omit the assignment from p2 back to p1 for the start of the next iteration. The buffer is read from and written to directly, alleviating the need for temporary assignments. It roughly takes the form of:
i = 0; while (i less than SubBatchSize) { xform(buf[i], buf[i + 1]); i++; }
When 3D projection is used or final xforms are present, temporary assignments are used.
Xform selection
Before iterating begins, a buffer of 16,383 elements is created. All xform weights are normalized and the elements of the buffer are populated with xform indices whose distribution is proportional to the xform weights.
For example, given an Ember with 4 xforms, each with a weight of 1, their normalized weights would each become 0.25. The random selection buffer would then be populated like so:
buf[0..4095] = 0 buf[4096..8191] = 1 buf[8192..12287] = 2 buf[12288..16383] = 3
To select a random xform, retrieve the next random unsigned integer from ISAAC and perform a bitwise AND with 16,383. The value at that index in the buffer is the index of the next xform to use.
Note the usage of the number 16,383. This is an optimization that allows selecting a new index by using a bitwise AND which is much faster than using modulo, as is done in other implementations.
Since it is unlikely that more than 256 xforms will be used, the elements were made to be 1 byte each in an effort to make the buffer fit into the cache better.