I've just written some code to test the performance of the map type. I
am quite curious about its performance because according to the article
from go blog[1], it seems that hash_lookup function may be time
consuming under some circumstances.
However, according to the textbook, the time complexity of search
operation on a hash table should be O(1) (suppose we have a good hash
function). At first, I thought it may because of the constant part of
the complexity function is too large. But according to my recent test,
it seems that the map type is more like a tree structure. NOTE: I didn't
read the code (which should be in src/pkg/runtime/hashmap.c ).
The test code is here: http://pastebin.com/W9me7bvc
It is very easy: allocate N items in a map, write N key-values, with
each key has same size. Then take all values in insertion order. Divide
the total time used in getting items by the number of items, then we
have the average time on each get-operation.
Here is my result:
Map Seq. Access. iterations: 1000. 0.08400000000000002 us/op
Map Seq. Access. iterations: 10000. 0.1177 us/op
Map Seq. Access. iterations: 100000. 0.26981 us/op
Map Seq. Access. iterations: 1000000. 0.390944 us/op
Map Seq. Access. iterations: 10000000. 0.4742399 us/op
The average access time is actually an O(log(N)) function not O(N). It
looks like a tree structure not a hash table.
In comparison, I wrote similar python code, and here are results:
Dict Seq. Access. iterations: 1000 . 0.123 us/op
Dict Seq. Access. iterations: 10000 . 0.1137 us/op
Dict Seq. Access. iterations: 100000 . 0.14328 us/op
Dict Seq. Access. iterations: 1000000 . 0.163299 us/op
Dict Seq. Access. iterations: 10000000 . 0.1822816 us/op
That's more like a hash table.
So I used C++ with map from STL and wrote another test program in C++. I
got similar result like go's map (using g++ 4.6.2, with -O3):
Map Seq. Access. iterations: 1000. 0.143us/op
Map Seq. Access. iterations: 10000. 0.2117us/op
Map Seq. Access. iterations: 100000. 0.33286us/op
Map Seq. Access. iterations: 1000000. 0.559903us/op
Map Seq. Access. iterations: 10000000. 0.731576us/op
According to the language specification of go:
The comparison operators == and != ... must be fully
defined for operands of the key type
If it is a hash table, then rather than == and !=, we need to define a
hash function on it. Such evidence let me believe that map is more like
a tree not a hash table.
However, in the source code, we have hashmap.c which, from the name,
seems like a hash table. I apologize that I didn't read the source code
because the test results is a very strong argument (to me) that map is
actually a tree structure, not a hash table. But in Go for C++
programmer[2], it says: Hash tables are provided by the language. They
are called maps.
All these pieces of evidence, to me, are inconsistent. Is it simply a
naming convention in go (and among its developer) that any key-value
storage is a "hash table"?
Regard,
-Monnand
1. http://blog.golang.org/2011/06/profiling-go-programs.html
2. http://golang.org/doc/go_for_cpp_programmers.html
If you test again using the natural map interator
for k := range a {
sum += a[k]
}
Then you get results like this
before:
odessa(~/devel/src) % go run -v hmaptest.go
command-line-arguments
Map Seq. Access. iterations: 1000. 0.087 us/op
Map Seq. Access. iterations: 10000. 0.0863 us/op
Map Seq. Access. iterations: 100000. 0.19273000000000004 us/op
Map Seq. Access. iterations: 1000000. 0.270274 us/op
Map Seq. Access. iterations: 10000000. 0.32927910000000005 us/op
after:
odessa(~/devel/src) % go run -v hmaptest.go
command-line-arguments
Map Seq. Access. iterations: 1000. 0.12800000000000003 us/op
Map Seq. Access. iterations: 10000. 0.1097 us/op
Map Seq. Access. iterations: 100000. 0.12813000000000002 us/op
Map Seq. Access. iterations: 1000000. 0.145759 us/op
Map Seq. Access. iterations: 10000000. 0.20803340000000003 us/op
I think the last value is probably outside the L2/3 (?) cache on this
host (core i5 mac mini) so there is a lot of cache tramping going on.
You should take CPU cache misses into account:http://pastebin.com/pfz8zvFpMap Seq. Access. iterations: 1000. 0.19 us/op, IPC 1.26Map Seq. Access. iterations: 10000. 0.20 us/op, IPC 1.19Map Seq. Access. iterations: 100000. 0.31 us/op, IPC 0.91Map Seq. Access. iterations: 1000000. 0.44 us/op, IPC 0.60Map Seq. Access. iterations: 10000000. 0.49 us/op, IPC 0.52
Rember that in python, we can have relatively constant CPI hence constant miss rate.
Not actually. But this could be derived from my previous analyze:
hash search time is a constant C;
average memory access time = hit time + miss rate * miss penalty
Then the time in each iteration is:
hit time + miss rate * miss penalty + C
Here is the python benchmark result:
Dict Seq. Access. iterations: 1000 . 0.123 us/op
Dict Seq. Access. iterations: 10000 . 0.1137 us/op
Dict Seq. Access. iterations: 100000 . 0.14328 us/op
Dict Seq. Access. iterations: 1000000 . 0.163299 us/op
Dict Seq. Access. iterations: 10000000 . 0.1822816 us/op
The us/op keeps (nearly) the same (~0.1x)
This means: hit time + miss rate * miss penalty + C keeps same. Then
miss rate should be similar.
Regards,
-Monnand
Atom makes a good point, by iterating through the map in key order,
rather than natural order, you're taking a lot of cache misses.If you test again using the natural map interator
for k := range a {
sum += a[k]
}
In case you need, here is the python benchmark:
Thanks for your reply!
It is true that map[int]int is faster. But still, it shows an O(log(N))
complexity:
Map Seq. Access. iterations: 1000. 0.06 us/op
Map Seq. Access. iterations: 10000. 0.0643 us/op
Map Seq. Access. iterations: 100000. 0.13352000000000003 us/op
Map Seq. Access. iterations: 1000000. 0.297422 us/op
Map Seq. Access. iterations: 10000000. 0.3801063 us/op
here is the code:
http://pastebin.com/96Gh4P5k
Regards,
-Monnand
According to the language specification of go:
The comparison operators == and != ... must be fully
defined for operands of the key typeIf it is a hash table, then rather than == and !=, we need to define a
hash function on it. Such evidence let me believe that map is more like
a tree not a hash table.
On Mar 19, 10:54 am, "monn...@gmail.com" <monn...@gmail.com> wrote:
> It is true that map[int]int is faster. But still, it shows an O(log(N))
> complexity:
>
> Map Seq. Access. iterations: 1000. 0.06 us/op
> Map Seq. Access. iterations: 10000. 0.0643 us/op
> Map Seq. Access. iterations: 100000. 0.13352000000000003 us/op
> Map Seq. Access. iterations: 1000000. 0.297422 us/op
> Map Seq. Access. iterations: 10000000. 0.3801063 us/op
The data you measured do _not_ indicate a O(log(N)) behavior:
Plot the data and you'll see it.
The O(1) in textbooks corresponds to how long it takes to compute the
hashcode of a key and to compare two keys for equality. The O(1)
includes any cache misses that may happen during the computation.
Comparing two strings for equality also corresponds to O(1),
irrespective of the length of the strings. Thus, O(1) is the average
time it takes to compute the hashcode and to compare two keys. You
can think of it as O(keyType) instead of O(1).
It is physically impossible to select/choose 1 element out of N
elements in less than O(log N) time. The best case is O(log N), so in
any case O(1) is completely invalid.
The O(1) mentioned in the textbook should be replaced with the actual
time complexity of hashing and comparing keys. So, in case of this
golang-nuts discussion about Go maps, the first approximation would be
to replace O(1) with O(log N). Ultimately, O(1) should be replaced
with a formula that closely matches the actual measured performance of
Go/Python/C++ maps.
sorry, incomplete post.
that was hashing both ways into a full table as in the original.
here it is in natural order (range over the hash table and using the values):
1024 0.0850
2048 0.0483
4096 0.0469
8192 0.0483
16384 0.0485
32768 0.0466
65536 0.0449
131072 0.0432
262144 0.0432
524288 0.0453
1048576 0.0484
2097152 0.0502
4194304 0.0515
8388608 0.0509
16777216 0.0537
the difference is the growth rate under discussion. can do no more
now. on the way to the airport.
On Mon, Mar 19, 2012 at 6:40 AM, Michael Jones <m...@google.com> wrote:
> 1024 0.1045
> 2048 0.1191
> 4096 0.1074
> 8192 0.1056
> 16384 0.1258
> 32768 0.1428
> 65536 0.1790
> 131072 0.2497
> 262144 0.3111
> 524288 0.3344
> 1048576 0.3434
> 2097152 0.3695
> 4194304 0.3819
> 8388608 0.4103
> 16777216 0.5523
> 33554432 0.5905
> 67108864 0.6256
> --
> Michael T. Jones | Chief Technology Advocate | m...@google.com | +1
> 650-335-5765
--
Michael T. Jones | Chief Technology Advocate | m...@google.com | +1
650-335-5765
The O(1) in textbooks corresponds to how long it takes to compute the
hashcode of a key and to compare two keys for equality. The O(1)
includes any cache misses that may happen during the computation.
Comparing two strings for equality also corresponds to O(1),
irrespective of the length of the strings. Thus, O(1) is the average
time it takes to compute the hashcode and to compare two keys. You
can think of it as O(keyType) instead of O(1).It is physically impossible to select/choose 1 element out of N
elements in less than O(log N) time. The best case is O(log N), so in
any case O(1) is completely invalid.
On Mon, Mar 19, 2012 at 6:02 AM, Volker Dobler
<dr.volke...@gmail.com> wrote:
>
>
--
that was hashing both ways into a full table as in the original.
here it is in natural order (range over the hash table and using the values):
1024 0.0850
2048 0.0483
4096 0.0469
8192 0.0483
16384 0.0485
32768 0.0466
65536 0.0449
131072 0.0432
262144 0.0432
524288 0.0453
1048576 0.0484
2097152 0.0502
4194304 0.0515
8388608 0.0509
16777216 0.0537
the difference is the growth rate under discussion. can do no more
now. on the way to the airport.
On Mon, Mar 19, 2012 at 6:40 AM, Michael Jones <m...@google.com> wrote:
Thank you all for your wonderful replies. I read them carefully and
reproduced the result with modified benchmarks.
On cache miss argument mentioned by 0xe2.0x9a.0x9b (I'm sorry, I don't
know how to put your name/symbol here), and Dave, I totally buy it. Yes,
from the IPC, it is quite clear that the cache miss makes IPC drop.
0xe2.0x9a.0x9b, your perf package is really helpful. Thanks a lot!
Dave also mentioned that we can keep the access time be a constant if we
access the map according to its natural order. i.e. access it using for
k := range a . In such case, we can mitigate the cache miss rate with
sequential accesses.
Dave's argument made me think another question: Why python has a
constant access time, hence constant miss rate?
Here is my theory: in my python code, the elements are accessed in their
insert order. The insert order is the same order as the summation of
their symbols using their ASCII number. (the first element has key
"000000000", second one has "000000001", etc.) If the hash function in
python generates the hash of each key in such order, then my code would
access the hash table in its natural order. Hence we reduced the miss
rate and kept the access time as a constant.
Then I changed my python code to let it access elements in a random
order. If my previous theory hold, then the access time should not be a
constant any more because of cache misses.
Here is the code: http://pastebin.com/xrcYDmsw
Here is the result of python benchmark (I ran it 3 times):
$ vim test.py
$ python test.py
Dict Random Access. N = 1000 : 0.132 us/op
Dict Random Access. N = 10000 : 0.1317 us/op
Dict Random Access. N = 100000 : 0.41815 us/op
Dict Random Access. N = 1000000 : 0.505788 us/op
Dict Random Access. N = 10000000 : 0.6442865 us/op
$ python test.py
Dict Random Access. N = 1000 : 0.136 us/op
Dict Random Access. N = 10000 : 0.1361 us/op
Dict Random Access. N = 100000 : 0.41679 us/op
Dict Random Access. N = 1000000 : 0.497187 us/op
Dict Random Access. N = 10000000 : 0.6444111 us/op
$ python test.py
Dict Random Access. N = 1000 : 0.133 us/op
Dict Random Access. N = 10000 : 0.1393 us/op
Dict Random Access. N = 100000 : 0.41424 us/op
Dict Random Access. N = 1000000 : 0.501201 us/op
Dict Random Access. N = 10000000 : 0.6417045 us/op
From the result, my question is solved: the reason why my previous
python benchmark behaves better may be attribute to the access sequence.
It seems to me that the hash function in python is not randomized well
on its input. This may be attribute to the psudo random generator on the
system.
Another person sent me a result with same benchmark I used previously
but running on a windows machine. Then the access time is no longer a
constant.
Finally, here is my conclusion:
1. The reason why python benchmark performs better than go's is the
access pattern. If the access is a random access, then the access time
will be increase because of cache miss.
2. Python hash function may be not that "good" on some system depending
on the psudo random generator in the library (may be C library?).
Thanks again for your replies!
Regards,
-Monnand
Python's dictionaries are arrays (similar to slices) that are
reallocated on growth.
> It seems to me that the hash function in python is not randomized well on
> its input. This may be attribute to the psudo random generator on the
> system.
Python's hash function is not randomized at all. Integers hash as
themselves. This may change in Python 3.3 or 3.4 as randomizatino is
introduced.
> 2. Python hash function may be not that "good" on some system depending on
> the psudo random generator in the library (may be C library?).
As said, Python's hash function is not randomized.
Go hashmaps are represented as very flat tree-like structures: I guess
that minimizes the allocation of giant chunks of memory and possibly
reduces pressure on the garbage collector. The arity of the tree is
very large (~1000) so the logarithmic effect is very slight.
Thus, small hash maps (less then 16384 elements) are supposed to
behave like Python's hashmaps, and beyond you start observing the tree
structure effect. [Maybe rutime authors can correct me].
The attached Go program introspects the top level data structure that
underlies maps. You should really read data structure definitions in
src/pkg/runtime/hashmap.c. Code is probbly a bit harder to read (I'm a
bit lazy to delve into details), but the data structure itself seems
self-explanatory.
% go run map.go
{"Count":1,"DataSize":8,"MaxPower":12,"Indirect":0,"ValOffset":4,"NodeAllocs":0,"Seed":1546380794,"SubtableLogSize":1}
{"Count":10,"DataSize":8,"MaxPower":12,"Indirect":0,"ValOffset":4,"NodeAllocs":0,"Seed":1639877174,"SubtableLogSize":4}
{"Count":100,"DataSize":8,"MaxPower":12,"Indirect":0,"ValOffset":4,"NodeAllocs":0,"Seed":1274387075,"SubtableLogSize":8}
{"Count":1000,"DataSize":8,"MaxPower":12,"Indirect":0,"ValOffset":4,"NodeAllocs":0,"Seed":526629865,"SubtableLogSize":11}
{"Count":10000,"DataSize":8,"MaxPower":12,"Indirect":0,"ValOffset":4,"NodeAllocs":0,"Seed":1053259730,"SubtableLogSize":14}
{"Count":100000,"DataSize":8,"MaxPower":12,"Indirect":0,"ValOffset":4,"NodeAllocs":15185,"Seed":2106519460,"SubtableLogSize":12}
{"Count":1000000,"DataSize":8,"MaxPower":12,"Indirect":0,"ValOffset":4,"NodeAllocs":32273,"Seed":1716741222,"SubtableLogSize":12}
{"Count":10000000,"DataSize":8,"MaxPower":12,"Indirect":0,"ValOffset":4,"NodeAllocs":47864,"Seed":1119210241,"SubtableLogSize":12}
--
Rémy.
> Python's dictionaries are arrays (similar to slices) that are
> reallocated on growth.
Thanks for pointing this out. I have read Python's code and it's a
classic implementation of open addressing hash table like in textbook.
>> It seems to me that the hash function in python is not randomized well on
>> its input. This may be attribute to the psudo random generator on the
>> system.
>
> Python's hash function is not randomized at all. Integers hash as
> themselves. This may change in Python 3.3 or 3.4 as randomizatino is
> introduced.
Well... I was using fixed-length strings instead of integers. By
randomize, I mean the order of the hash value may different from the
order of the real input value (taking the input as a big integer or
something).
To make things clear, here is some python code:
>>> hash("000000000")
-1389505745207213127
>>> hash("000000001")
-1389505745207213128
>>> hash("000000002")
-1389505745207213125
>>> hash("000000009")
-1389505745207213136
>>> hash("000000010")
-1389505745206213126
>>> hash("000000011")
-1389505745206213125
The hash values are closed to each other and my code access them
sequentially.
>
>> 2. Python hash function may be not that "good" on some system depending on
>> the psudo random generator in the library (may be C library?).
>
> As said, Python's hash function is not randomized.
>
> Go hashmaps are represented as very flat tree-like structures: I guess
> that minimizes the allocation of giant chunks of memory and possibly
> reduces pressure on the garbage collector. The arity of the tree is
> very large (~1000) so the logarithmic effect is very slight.
>
Dave told me this also. But if it was (only) the tree structure that
leads to a logarithmic effect, then the IPC should not change.
To me, the cache effect may contribute more than the tree structure
considering the change of IPC (or CPI if you like).
Regards,
-Monnand
Well. I think I didn't make this clear.
I think the tree structure may be the reason why we have cache miss and
cache miss leads to lower IPC.
You are right, so the tree structure leads to a logarithmic effect.
Thanks R�my!
Regards,
-Monnand
Dict Seq. Access. iterations: 1000 . 0.123 us/op
Dict Seq. Access. iterations: 10000 . 0.1137 us/op
Dict Seq. Access. iterations: 100000 . 0.14328 us/op
Dict Seq. Access. iterations: 1000000 . 0.163299 us/op
Dict Seq. Access. iterations: 10000000 . 0.1822816 us/op
That's more like a hash table.
So I used C++ with map from STL and wrote another test program in C++. I
got similar result like go's map (using g++ 4.6.2, with -O3):
Map Seq. Access. iterations: 1000. 0.143us/op
Map Seq. Access. iterations: 10000. 0.2117us/op
Map Seq. Access. iterations: 100000. 0.33286us/op
Map Seq. Access. iterations: 1000000. 0.559903us/op
Map Seq. Access. iterations: 10000000. 0.731576us/op
According to the language specification of go:
The comparison operators == and != ... must be fully
defined for operands of the key typeIf it is a hash table, then rather than == and !=, we need to define a
hash function on it. Such evidence let me believe that map is more like
a tree not a hash table.
However, in the source code, we have hashmap.c which, from the name,
seems like a hash table. I apologize that I didn't read the source code
because the test results is a very strong argument (to me) that map is
actually a tree structure, not a hash table. But in Go for C++
programmer[2], it says: Hash tables are provided by the language. They
are called maps.
--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/HDz5KiG6oMY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.