In this post I'll be talking about:
* Discovered optimizations for Elm compiler and core library
* Result of smarter update function
* Bool vs custom type
* Size of new implementation
* Future work
Discovered optimizations for Elm compiler and core library
There are two functions which are vital for both the core implementation of Dicts, and my new LLRB implementation: balance and compare.
To improve the performance of balance, I tried manually inlining the Node type constructor in that function. Performance skyrocketed. I went from a 177% performance improvement for inserts, to 209%.
After talking with @ilias on Slack, he suggested that I try the result of manually inlining the A5 calls, so that instead of A5(_Skinney$elm_dict_exploration$LLRB$Node, ...) it would be _Skinney$elm_dict_exploration$LLRB$Node.func(...). I got the same result. Avoiding A* wrappers for function calls and instead going straight through .func can have a huge performance impact for code in general.
@justinmimbs raised a github issue and made me aware of improved performance when --not-- using the compare function in the get and insert functions, and instead using if-statements. This is a little counterintuitive, as compare only compares it's arguments once, while you potentially need to perform two comparisons when doing if-statements. It turns out that this performance increase (which can be significant) is only valid for basic types like ints and strings, while compound types like tuples have slower performance, as the overhead of calling the compare function is more expensive. In any case, I found a way to optimize the compare function by performing less allocation, which should bring a speed improvement no matter what kind of type one uses as keys.
Result of smarter update function.
In my last email, I mentioned a plan for improving the performance of the update function. As a quick reminder, the idea was to use the implementation of remove as a basis for the update function. While update performance became faster than the core dict implementation for cases when the update-fn returns Nothing, update performance for the case where the update-fn returns Just x tanked. Before the change, insert by update was about 10-15% slower than core, while remove by update was ~20% slower than core. After the change those numbers were about -50% and +8%, respectively. I decided not to include the change.
Bool vs custom type
Up til now, I've represented the color of a dictionary node as a boolean. This works fine, as there are only two colors: red & black. However, the code is a little more readable if you use a custom type for this. Making this change had no performance impact, so I kept it in.
Size of new implementation
This time, instead of looking at line size, I created two .js files of compiled output. Then I removed all the core Dict code from one of the files, and all the LLRB code from the other file. I then minified the files with uglify3 and compared the size. Before compiling, I also changed the type constructor names of LLRB so they matched what was in the core dict, so the comparison would be more fair.
core.min.js: 87 605 bytes
llrb.min.js: 83 199 bytes
total reduction: 4 406 bytes
I should mention though, that the core implementation does a lot more error reporting. Which is to say it does error reporting. There is not a single Debug.crash call in my LLRB code. Which means that there are fewer strings, and so it minifies better. That said, I still think the LLRB implementation is smaller, with or without error reporting.
I'm now turning my attention to optimizing set operations like union, intersect and difference. This requires adding a height counter to the data structure, which will increase memory use. Let's see if it's worth it.