I'm still looking for good examples of Racket code that heavily
exercise immutable hashes and/or sets. But I've put together:
- a set of micro-benchmarks for hash-ref, hash-set, and in-immutable-hash-keys
- a small, initial set of benchmarks from more realistic code
I've run the benchmarks against the following Racket builds:
- 7.4: a vanilla 7.4 Chez build, which uses a Patricia trie
- 7.4/hamt: a 7.4 Chez build that uses a HAMT (actually a variant
called a CHAMP[2])
- 7.4/C: a 7.3 build using the C runtime, which uses a HAMT
Both HAMTs incorporate a space optimization for nodes that contain
only #t values (as used by hash-sets). These nodes do not explicitly
store their values. In the graphs, "w/vals" means that the hash stores
values other than #t, whereas "w/o vals" means that all values are #t
(and therefore are not actually stored in the HAMTs). The Patricia
trie does not implement this space optimization.
All of the results in the graphs are normalized to the run time of the
vanilla 7.4 build. So the "time" for 7.4 is always 1.0 and the other
builds' times are factors of that one.
None of these benchmarks measure space usage, but the HAMTs are much
more space-efficient, except at very small sizes.
====
First, the micro-benchmarks:
(I don't have a repo for the micro-benchmarks right now. Of course,
I'm happy to make them available. They're what you'd expect: loops
that perform the operation that's being benchmarked.)
I. hash-set
There is a huge difference in the (relative) results when hash access
is in-order (by hash-code), as opposed to when it is random. This is
because the Patricia trie exhibits very different performance in these
cases. The HAMTs do not.
A. In-order insertion
When keys are inserted in order of hash-code[1], the Patricia trie is
the clear winner. It beats the Chez HAMT by a 2-1 margin in these
cases. The C-based HAMT's performance for in-order insertion is
curious. For eq-hashes, the C-based HAMT is almost as good as the
Patricia trie, but for equal-hashes, it's considerably slower than
both of the other builds. I don't know why that is.
The test runs use fixnum keys for both eq- and equal-based hashes, but
(as noted in [1]), these results are not peculiar to the key type.
See attachment: hash-set-in-order.png
B. Random insertion
Here, the Chez HAMT is the clear winner. The results are essentially
the opposite of the in-order case. The Chez HAMT beats the Patricia
trie (approximately) 2-1. The C-based HAMT performs better than the
Patricia trie on eq-hashes (and nearly as well as the Chez HAMT for
eq-hashes where values are implicit) and about the same as Patricia
for equal-hashes.
These tests similarly use fixnum keys, but again, the (relative)
performance results do not seem to be dependent on the type of keys.
See attachment: hash-set-shuffled.png
II. hash-ref
Results are quite similar to the hash-set results, except the in-order
vs. random distinction is even more pronounced in the random case,
where the Chez HAMT beats the Patricia trie by roughly 4-1.
See attachments:
- hash-ref-in-order.png
- hash-set-shuffled.png
III. in-immutable-hash-keys
Since iteration isn't dependent on the hash's equivalence function
these tests just use equal-based hashes. I did test the implicit vs.
explicit values cases as well as hashes that had been build from
sequentially-ordered keys vs. ones build from randomly-ordered
keys[2], but results are the same in all cases.
The C-based HAMT wins this category. The Chez HAMT comes in second,
and Patricia is worst. (Maybe the Chez HAMT could benefit from the
C-HAMT's approach to iteration?)
See attachment: in-immutable-hash-keys.png
IV. Effect of hash collisions
These tests use equal-based hashes that produce a large number of
collisions. (Roughly a third of the keys have colliding hash codes.)
The Chez HAMT is badly affected by hash collisions. It is about 2.5x
the Patricia trie for hash-ref and hash-set, and it's even worse at
iteration, at nearly 4x. (The C-based HAMT's hash-set is badly
affected, though it performs respectably on hash-set and very well on
iteration.)
The poor performance of the Chez HAMT on these tests might be due to
the fact that it forces all collision nodes to the lowest possible
level of the trie, and therefore requires more pointer-chasing and,
consequently, worse memory locality. This implementation approach
comes from the CHAMP[2] reference implementation[3] and simplifies the
task of maintaining a canonical form for the HAMT (modulo order of
keys in a collision node). It should be possible to keep collision
nodes at shallower levels and retain canonical form, though.
See attachment: many-collisions.png
====
Next, the less-contrived benchmarks.
There are six of these right now:
1. trie:
http://ccs.neu.edu/home/types/tmp/trie.tar.gz
2. build-graph: building the graph from
(
https://github.com/stchang/graph/blob/master/graph-test/tests/graph/timing-test-in-neighbors.rkt)
3. traverse-graph: iterating through the same graph as above
4. k-cfa:
https://github.com/bennn/gtp-benchmarks/tree/master/benchmarks/kcfa
5. acquire:
https://github.com/bennn/gtp-benchmarks/tree/master/benchmarks/acquire
6. mbta:
https://github.com/bennn/gtp-benchmarks/tree/master/benchmarks/mbta
The mbta benchmark actually uses the same graph library (though
possibly a different version of it) that build-graph and
traverse-graph use.
Unlike the micro-benchmarks, these show the different builds
performing roughly the same, with the notable exception of the k-cfa
benchmark, where the Patricia version is the big winner, and the
traverse-graph benchmark, where the C-based HAMT does poorly.[4]
See attachment: results.png
===
[1] Access does not need to be strictly in-order. Rather, the Patricia
trie is especially fast when the access pattern exhibits good locality
of reference. In-order access is just the simplest way to accomplish
this. For example, using strings generated by string/e from
consecutive naturals, even though the results do not have consecutive
hash-codes, nevertheless exhibits the same (relative) performance
results.
[2]
https://michael.steindorfer.name/publications/oopsla15.pdf
[3]
https://github.com/usethesource/capsule/blob/master/capsule-core/src/main/java/io/usethesource/capsule/core/PersistentTrieMap.java#L788
[4] This is somewhat odd, considering how well the C-based HAMT does
on the iteration micro-benchmarks, but there is code other than just
hash-iteration in this test, so the difference could be explained by
the general Chez vs. C runtime difference.