* Is Left-Leaning Red-Black tree.
* For Chrome: 8% faster reads, 171% faster inserts, 30% faster removals and 15-25% slower updates.
* Less code. (643 vs 917 lines after compilation)
* Needs testing. Please try it out and report bugs.
Read on for more information on:
* Why a new implementation
* How is this different from todays implementation in core?
* What have I tried to speed up core?
* What will I continue experimenting on?
* Call for help
Why a new implementation?
A new implementation wasn't originally the goal. I started out with a goal of optimizing the current implementation in core, but as I've never seen a red-black implementation before, I thought it wise to read some papers. After some searching I found a paper by Sedgewik, explaining the benefits of Left-leaning red-black trees.
1) They're simple to implement (this is what initially caught my eye).
2) There are fewer cases to consider when balancing the tree (potential performance increase)
3) The implementation of LLRBs are smaller (less code) than other implementations.
4) LLRB performs close to optimal balancing. Or to quote the author himself: "In practical applications, however, the cost of a typical search is half that value, not perceptibly different from the cost of a search in a perfectly balanced tree."
The papers I found are listed here for reference:
There is also a Java reference implementation here:
I essentially converted the Java code to functional Elm code, and inlined function by function until I couldn't inline much more. What I found validated Sedgewick's claims. You can find the results in the tl;dr; section at the top of this e-mail.
How is this different from today's implementation in core?
There actually aren't that many differences. The Dict type has seen one change, and that is that color is now represented by a Bool instead of a Color type. Update now has a naive implementation (call alter-fn with result of a get, then insert or remove as needed). Insert and Remove now have their own implementation, instead of calling Update. Everything related to balancing is new. Everything else is left untouched (union, intersect, filter, foldl, size, foldr...)
The balancing function is really simple, as in a LLRB, there are really only four cases to consider. This comes from an added restriction in LLRBs, that there can never be a right-leaning red child. I recommend that the reader look at the balancing function in today's core implementation, and then check out the LLRB one.
Inserts are straight forward. You find the position where a new node is to be inserted, then insert the key-value pair as a red node. You then perform balancing on the way up.
For removals, the easiest thing is to remove a red node. So the remove function moves red down the search path, removes the node, and then fixes the tree on the way up again using balancing. This is by far the most difficult code to understand, and the portion that requires most code, but still easier than what is in core today (IMHO).
What have I tried to speed up core?
Very little. I did have an idea that the original implementation could be faster if insert and remove had their own implementations, instead of being implemented through update. This was based on Chrome's profiler signaling that a lot of time was spent inside the update function, which does perform some pattern matches to support both update and removal. This idea proved to be false, or at least have very little gain. It would seem that the most taxing code is the balancing act, and there is no easy way to change that without changing the implementation.
What will I continue experimenting on?
There is one failed experiment to note, and two main areas of interest remaining.
1) I've tried making a remove implementation which doesn't need to "move reds down" the tree. The reason is that it would be easier to have a performant update implementation with such a model (see point 2) and a more performant implementation in general. I've read several places that the reference implementation of pushing red nodes down is because that is the simplest implementation, but I've yet to discover what a more complex, but potentially faster implementation, might be.
2) To improve the performance of update (which is currently a naive implementation, and 15-25% slower than the core implementation) I could use the remove implementation as a basis for a new update implementation. There is a real chance however, that it could make updates slower when using update to insert elements (as pushing red nodes down requires more balancing).
3) The code does perform several allocations. Since Nodes are not records, this results in a lot of function calls. I want to try to inline these function calls in the compiled code and see if this makes a difference. Might be worth mentioning the results to Evan.
Call for help
So this needs to be tested. So please use it in your code. I'm fairly certain that the implementation works, as I've tested it with the core test-suite, in addition to some fuzz-tests I've written myself. But I could've made a mistake, so real-world testing is preferable. The package is available (link in the tl;dr section), and is api compatible, so it should be fairly easy to use.
I'd also love some code review, or perhaps just someone looking over the tests and benchmarks (and the validation function) and let me know if it looks good. I've spent to much time on this code to view it objectively anymore.
I believe I have a generally faster (except for updates) and smaller implementation of Dicts for Elm. I've no rush, but I'd love to see this merged in elm-lang/core once it has been tested in the wild.
I'm open for questions and suggestions.
Thanks for reading =)