Interesting, and times do you get for 500 ets lookup on that data?
Oh, I really should have tested that as well:
timer:tc(fun () -> lists:map(fun(K) -> ets:lookup(t, K) end, Keys) end).
* using lookups: [set, {keypos, 2}]: 3148 us
* using lookup: [ordered_set, {keypos, 2}]: 3423 us
I have repeated the select/pattern runs to have comparable results:
* using pattern, [ordered_set, {keypos, 2}]: 3173 us
* using pattern, [set, {keypos, 2}]: 8091 us
Note that there was quite a high jitter on the measurements so I have compute the mean value of 3 measurements each.
So lookups and pattern/ordered_set were too close to really come to a winner here. I had the impression that the variance was slightly higher with lookup, but digging into this would require a better measurement setup than what I have been using.
One thing to be aware of when doing benchmarks on ETS set vs ordered_set is that ordered_set may get boosted cache performance if the keys are accessed in term order. The same internal tree nodes are visited over and over again as term adjacent key are most often close to each other in the tree.
So if that is not your typical traffic case, the benchmark should access the keys in some random order.
/Sverker
Hi,
that's a good hint, thanks. I indeed have used adjacent keys in my "benchmarks" and the distance between keys has indeed a big influence on the times.
In addition I did the lookup measurements in the shell, which was unfair since the select call was just a function call while the lookup loop was executed by the shell.
So the numbers using a compiled module are
* using lookups: [set, {keypos, 2}]: 295 us
* using lookup: [ordered_set, {keypos, 2}]: 221 us
Measurements with a varying number of keys and a varying distance
between keys:
nkeys skip pat/ordset pat/set lu/ordset
lu/set
100 1 261 390 128
63
100 10 2482 382 154 66
100 100 10348 488 100 76
200 1 222 948 128 146
200 10 3758 618 188 141
200 100 37582 666 163 149
500 1 1206 4570 221 295
500 10 22419 3967 265 232
500 100 224580 4062 445 205
1000 1 5489 15148 251 227
1000 10 89676 15077 654 370
1000 100 889965 15941 763 302
2000 1 19062 57493 526 439
2000 10 341417 57467 1063 457
2000 100 3542462 59253 1850 547
5000 1 114807 359162 1589 1477
5000 10 2136702 362625 2954 1853
5000 100 22301409 357902 4933 2424
nkeys: number of keys per query
skip: distance between subsequent keys ( Keys = lists:seq(1,
Skip * Nkeys, Skip)
pat: select with key in pattern
lu: lookup loop
set: ETS type set
ordset: ETS type ordered_set
Long story short: Follow Dan Gudmundsson suggestion for ETS based operations, it's much faster, scales better and suffers less from non-adjacent keys.
On the other hand, the results might be different with distributed Mnesia and disk-only tables.
/Jacob