New Dict implementation

947 views
Skip to first unread message

Robin Heggelund Hansen

unread,
Oct 30, 2017, 6:59:18 PM10/30/17
to elm-dev
tl;dr;
* Is Left-Leaning Red-Black tree.
* For Chrome: 8% faster reads, 171% faster inserts, 30% faster removals and 15-25% slower updates.
* Less code. (643 vs 917 lines after compilation)
* Needs testing. Please try it out and report bugs.

Read on for more information on:
* Why a new implementation
* How is this different from todays implementation in core?
* What have I tried to speed up core?
* What will I continue experimenting on?
* Call for help

Why a new implementation?

A new implementation wasn't originally the goal. I started out with a goal of optimizing the current implementation in core, but as I've never seen a red-black implementation before, I thought it wise to read some papers. After some searching I found a paper by Sedgewik, explaining the benefits of Left-leaning red-black trees.

1) They're simple to implement (this is what initially caught my eye).
2) There are fewer cases to consider when balancing the tree (potential performance increase)
3) The implementation of LLRBs are smaller (less code) than other implementations.
4) LLRB performs close to optimal balancing. Or to quote the author himself: "In practical applications, however, the cost of a typical search is half that value, not perceptibly different from the cost of a search in a perfectly balanced tree."

The papers I found are listed here for reference:

There is also a Java reference implementation here:

I essentially converted the Java code to functional Elm code, and inlined function by function until I couldn't inline much more. What I found validated Sedgewick's claims. You can find the results in the tl;dr; section at the top of this e-mail.

How is this different from today's implementation in core?

There actually aren't that many differences. The Dict type has seen one change, and that is that color is now represented by a Bool instead of a Color type. Update now has a naive implementation (call alter-fn with result of a get, then insert or remove as needed). Insert and Remove now have their own implementation, instead of calling Update. Everything related to balancing is new. Everything else is left untouched (union, intersect, filter, foldl, size, foldr...)

The balancing function is really simple, as in a LLRB, there are really only four cases to consider. This comes from an added restriction in LLRBs, that there can never be a right-leaning red child. I recommend that the reader look at the balancing function in today's core implementation, and then check out the LLRB one.

Inserts are straight forward. You find the position where a new node is to be inserted, then insert the key-value pair as a red node. You then perform balancing on the way up.

For removals, the easiest thing is to remove a red node. So the remove function moves red down the search path, removes the node, and then fixes the tree on the way up again using balancing. This is by far the most difficult code to understand, and the portion that requires most code, but still easier than what is in core today (IMHO).

What have I tried to speed up core?

Very little. I did have an idea that the original implementation could be faster if insert and remove had their own implementations, instead of being implemented through update. This was based on Chrome's profiler signaling that a lot of time was spent inside the update function, which does perform some pattern matches to support both update and removal. This idea proved to be false, or at least have very little gain. It would seem that the most taxing code is the balancing act, and there is no easy way to change that without changing the implementation.

What will I continue experimenting on?

There is one failed experiment to note, and two main areas of interest remaining.

1) I've tried making a remove implementation which doesn't need to "move reds down" the tree. The reason is that it would be easier to have a performant update implementation with such a model (see point 2) and a more performant implementation in general. I've read several places that the reference implementation of pushing red nodes down is because that is the simplest implementation, but I've yet to discover what a more complex, but potentially faster implementation, might be.

2) To improve the performance of update (which is currently a naive implementation, and 15-25% slower than the core implementation) I could use the remove implementation as a basis for a new update implementation. There is a real chance however, that it could make updates slower when using update to insert elements (as pushing red nodes down requires more balancing).

3) The code does perform several allocations. Since Nodes are not records, this results in a lot of function calls. I want to try to inline these function calls in the compiled code and see if this makes a difference. Might be worth mentioning the results to Evan.

Call for help

So this needs to be tested. So please use it in your code. I'm fairly certain that the implementation works, as I've tested it with the core test-suite, in addition to some fuzz-tests I've written myself. But I could've made a mistake, so real-world testing is preferable. The package is available (link in the tl;dr section), and is api compatible, so it should be fairly easy to use.

I'd also love some code review, or perhaps just someone looking over the tests and benchmarks (and the validation function) and let me know if it looks good. I've spent to much time on this code to view it objectively anymore.

Conclusion

I believe I have a generally faster (except for updates) and smaller implementation of Dicts for Elm. I've no rush, but I'd love to see this merged in elm-lang/core once it has been tested in the wild.

I'm open for questions and suggestions.

Thanks for reading =)

Evan Czaplicki

unread,
Oct 30, 2017, 7:08:22 PM10/30/17
to elm-dev
Very exciting!

It may be worth checking out Haskel's Data.Map implementation which lists asymptotics of O(m*log(n/m + 1)) where m <= n for operations like union and intersect.

I don't know anything about the OCaml implementation, but I suspect it's worth checking out as well.

Also, when it comes to size, the thing that matters is the minified/gzipped size. The lines of Elm or JS code is not as predictive as you'd hope.

--
You received this message because you are subscribed to the Google Groups "elm-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/f563f6e9-bac1-478c-9d0a-aa3005a3d36f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robin Heggelund Hansen

unread,
Oct 30, 2017, 9:42:26 PM10/30/17
to elm-dev
Yeah, I've looked at the Haskell implementation a couple of times. I've just been focusing on getting insert and removal working. Union and intersect do seem interesting, though.

Nice, didn't know about the OCaml implementation either.

Regarding size, I agree that lines of JS isn't the most definitive way to measure, I'll come back with better measurements for my next update.

Btw. Why does the core implementation use long names like "dict__elm__builtin" (or something like that) for type constructors? Is it so it can be easily identified in `Utils.eq`?
To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+u...@googlegroups.com.

Evan Czaplicki

unread,
Oct 30, 2017, 9:48:28 PM10/30/17
to elm-dev
About the weird names, it is used by toString and eq, but it will be removed in some later release for a better implementation. I would not worry about it during implementation exploration.

To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/09518945-b148-49b1-a780-71353119ecb7%40googlegroups.com.

Evan Czaplicki

unread,
Oct 30, 2017, 9:51:07 PM10/30/17
to elm-dev
Separately, I am curious if some of the optimizations you are doing could be done by the compiler instead.

For example, if there is a union type with exactly two things, can it be mapped to true and false at runtime? Or if there is a union type with 10 things and none have arguments, can they be mapped to integers at runtime? Etc. With 0.19 Debug.toString will only work in --debug, so we can be more creative with runtime representations in production code.

Point is, I'd be interested in any lessons you learn along those lines, so please keep an eye out!

Robin Heggelund Hansen

unread,
Oct 31, 2017, 4:09:10 AM10/31/17
to elm-dev
I'll do some experiments, and see what I come up with =)
Reply all
Reply to author
Forward
0 new messages