Nov 6. New Elm Dict update

375 views

Skip to first unread message

Robin Heggelund Hansen

unread,

Nov 6, 2017, 4:11:59 PM11/6/17

to elm-dev

tl;dr;

* Much faster set operations (union, intersect, diff)

* Code size update

* Interesting fromList experiments

Much faster set operations (union, interesect, diff)

Set operations were already faster in LLRB, because the easiest implementation is just a Dict.insert inside of a Dict.foldl, and insert performance in LLRB is much faster than the core implementation. However, by utilizing some of the features of trees, one can easily make this much faster.

In this update I've implemented a splitBy function, and a join function (non-exported fns). splitBy takes a key and a Dict, and returns two dicts where the first contains every key-value pair with a key that is less than the provided key, and the other contains every key-value pair greater than the provided key. join is essentially the reverse operation. It takes a key, value and two Dicts where the first contains key-value pairs lower than the provided key, and the other contains key-value pairs greater than the provided key. It then creates a new Dict where the key and two Dicts are joined.

The cool thing here, is that since Dicts are already sorted from least to greatest, the best case scenario for these functions can be very fast. In the case of a splitBy, the best case is that you're splitting by the same key in that resides in the root node. In that case you simply return the left and right subtrees. join, being the reverse operation, just creates a new node with the provided key and two Dicts.

By implementing Union, Intersect and Diff using splitBy and join, performance increased significantly.

Compared to core, the LLRB implementation of Union is 173% faster.

Compared to core, the LLRB implementation of Intersect is 505% faster.

Compared to core, the LLRB implementation of Diff is 686% faster.

* Tested with Dicts of equal size and on Chrome.

Code size update

Since new functions have been added, it's interesting to see how much code is generated. In short, LLRB is still less code than the current implementation.

The code size has been measured by compiling the benchmark project, then removing the core Dict implementation or LLRB implementation, then minifying and gzipping the file. The command for measuring code size is as follows: uglifyjs elm.js -mc 'pure_funcs="F2,F3,F4,F5,F6,F7,F8,F9",pure_getters=true,keep_fargs=false,unsafe_comps=true' | gzip -c | wc -c

Dict: 23 290

LLRB: 22 748

Interesting fromList experiments

An interesting property of LLRB is that Dict's with size 2 will always look the same, a black root with a red left child. I tried to use this fact with the new union implementation to increase fromList performance, but didn't have much luck.

One thing that did seem promising though, was to use union to join 28-sized dicts together in fromList. Said another way, build a series of smaller dicts, and then join these together using union. This could theoretically be faster, because inserts becomes slower the larger the dicts becomes due to more comparisons and balancing. By using union, we can reduce the total number of comparisons and balancing. This approach requires more experimentation though. It also only becomes interesting when the list to build from is large, so it might not be that useful in general.