and
Size:
Collections.unmodifiableMap: 7.4% bigger than HashMap
google immutable map: 22.4% bigger than HashMap
Speed:
Collections.unmodifiableMap: 4.2% slower than HashMap
google immutable map: 26.0% slower than HashMap
is it expected?
> --
> guava-...@googlegroups.com.
> http://groups.google.com/group/guava-discuss?hl=en
> unsubscribe: guava-discus...@googlegroups.com
>
> This list is for discussion; for help, post to Stack Overflow instead:
> http://stackoverflow.com/questions/ask
> Use the tag "guava".
>
>
--
Robert Konigsberg
konig...@gmail.com
Firstly, you might also want to check this thread:
http://groups.google.com/group/google-collections-users/browse_thread/thread/9bf5db0f507f7fa7/2b8fa3f34b3e69b3
2010/1/11 Noble <noble...@gmail.com>:
> My tests show that HashSet is ~250% faster than immutableSet
>
> and
> Size:
> Collections.unmodifiableMap: 7.4% bigger than HashMap
You mean memory footprint? 7.4% doesn't make sense, unless you are
talking about very tiny HashMaps. Collections.unmodifiableMap adds a
negligible memory footprint (12 bytes from the top of my head)
> google immutable map: 22.4% bigger than HashMap
>
> Speed:
> Collections.unmodifiableMap: 4.2% slower than HashMap
> google immutable map: 26.0% slower than HashMap
>
>
> is it expected?
Not really :) but then, since you are not saying what exactly you are
measuring, all bets are off.
Dimitris
-
On Jan 11, 3:13 pm, Dimitris Andreou <jim.andr...@gmail.com> wrote:
> Hi
>
> Firstly, you might also want to check this thread:http://groups.google.com/group/google-collections-users/browse_thread...
>
> 2010/1/11 Noble <noble.p...@gmail.com>:
Weird, from a cursory look, it seems that ImmutableSet ought to
simulate exactly a HashSet, using the same hash, table size and all.
Also, there are some problems like including the time to do a println
(for HashSet) in the time measured for ImmutableSet, and that
currentTimeMillis() is used instead of nanoTime(), but anyway,
something is strange here.
Dimitris
2010/1/11 Noble <noble...@gmail.com>:
2010/1/11 Dimitris Andreou <jim.a...@gmail.com>:
I assume that this is because short tests don't let the JVM hotspot
optimizer do its thing. Is there more?
From what I've seen in real world testing, the recent JVMs are quite
impressive at all sorts of optimization, from code inline-ing, loop
unrolling, constant propagation to fancier stuff.
I'm interested in the larger scale optimizations and timings, I did a
fair amount of work in that area when I was in grad school.
A related question that I don't yet have an empirical answer to is:
what rule of thumb should be used in selecting between a HashSet-style
implementation and a TreeSet implementation. While the trivial answer
is that HashSet is O(1) and TreeSet is O(ln N), with 1 < ln N for all
meaningful cases, in practice the constants for the O(1) function seem
fairly high, and since the TreeSet, as a red/black tree is optimal, I
suspect that the naive assumption is not accurate in the real work.
I was planning to incorporate google collections to Apache Solr
because everyone claimed it faster and more efficient. (For us
performance is everything ). If the performance is comparable it is
acceptable because google collections offer a neat API.
when you say the test is inconclusive, is there a reason to suggest
what we observe in test will not be same in production as well?
On Jan 11, 11:10 pm, Kevin Bourrillion <kev...@google.com> wrote:
> Noble,
>
> I'm sorry to have to tell you the sad news that simple, common-sense test
> programs like this, for measuring performance of a Java library, are
> absolutely inconclusive. We will be publishing a statistically rigorous
> assessment of ImmutableSet performance shortly (for some definition of
> "shortly"). In fact, we did extensive testing of this before ImmutableSet
> was released, but rather than just refer to those ancient numbers, please be
> patient to hear fresh results.
>
> On Mon, Jan 11, 2010 at 4:13 AM, Dimitris Andreou <jim.andr...@gmail.com>wrote:
>
>
>
>
>
> > Sorry, I wrote the first sentence while I thought that it happened
> > that the particular example chosen was a bad one - only afterwards I
> > realized that these should behave equivalently since they used the
> > same hash, forgot to remove it.
>
> > 2010/1/11 Dimitris Andreou <jim.andr...@gmail.com>:
> > > It's hard to say since the test is very incomplete. Both sets have an
> > > underlying table of length=64, and use the same different hash, but
> > > still, in the example you picked:
> > > - In HashSet, "StrField.class" is discovered immediately
> > > - In ImmutableSet, it collides with others, and only with the 4th
> > > attempt is "StrField.class" discovered.
>
> > > Weird, from a cursory look, it seems that ImmutableSet ought to
> > > simulate exactly a HashSet, using the same hash, table size and all.
>
> > > Also, there are some problems like including the time to do a println
> > > (for HashSet) in the time measured for ImmutableSet, and that
> > > currentTimeMillis() is used instead of nanoTime(), but anyway,
> > > something is strange here.
>
> > > Dimitris
>
> > > 2010/1/11 Noble <noble.p...@gmail.com>:
> > >> This is a simple programhttp://gist.github.com/274175for Set.
>
> > >> -
>
> > >> On Jan 11, 3:13 pm, Dimitris Andreou <jim.andr...@gmail.com> wrote:
> > >>> Hi
>
> > >>> Firstly, you might also want to check this thread:
> >http://groups.google.com/group/google-collections-users/browse_thread...
>
> > >>> 2010/1/11 Noble <noble.p...@gmail.com>:
>
> > >>> > My tests show that HashSet is ~250% faster than immutableSet
>
> > >>> > and
> > >>> > Size:
> > >>> > Collections.unmodifiableMap: 7.4% bigger than HashMap
>
> > >>> You mean memory footprint? 7.4% doesn't make sense, unless you are
> > >>> talking about very tiny HashMaps. Collections.unmodifiableMap adds a
> > >>> negligible memory footprint (12 bytes from the top of my head)
>
> > >>> > google immutable map: 22.4% bigger than HashMap
>
> > >>> > Speed:
> > >>> > Collections.unmodifiableMap: 4.2% slower than HashMap
> > >>> > google immutable map: 26.0% slower than HashMap
>
> > >>> > is it expected?
>
> > >>> Not really :) but then, since you are not saying what exactly you are
> > >>> measuring, all bets are off.
>
> > >>> Dimitris
>
> > >>> > --
> > >>> > guava-...@googlegroups.com.
> > >>> >http://groups.google.com/group/guava-discuss?hl=en
> > >>> > unsubscribe: guava-discus...@googlegroups.com<guava-discuss%2Bunsubscribe@goog legroups.com>
>
> > >>> > This list is for discussion; for help, post to Stack Overflow
> > instead:
> > >>> >http://stackoverflow.com/questions/ask
> > >>> > Use the tag "guava".
>
> > >> --
> > >> guava-...@googlegroups.com.
> > >>http://groups.google.com/group/guava-discuss?hl=en
> > >> unsubscribe: guava-discus...@googlegroups.com<guava-discuss%2Bunsubscribe@goog legroups.com>
>
> > >> This list is for discussion; for help, post to Stack Overflow instead:
> > >>http://stackoverflow.com/questions/ask
> > >> Use the tag "guava".
>
> > --
> > guava-...@googlegroups.com.
> >http://groups.google.com/group/guava-discuss?hl=en
> > unsubscribe: guava-discus...@googlegroups.com<guava-discuss%2Bunsubscribe@goog legroups.com>
Hi Kevin ,
Here is a more elaborate and thorough test for the map .
https://issues.apache.org/jira/secure/attachment/12429895/TestPerf.java
I was planning to incorporate google collections to Apache Solr
because everyone claimed it faster and more efficient.
(For us
performance is everything ). If the performance is comparable it is
acceptable because google collections offer a neat API.
when you say the test is inconclusive, is there a reason to suggest
what we observe in test will not be same in production as well?
While I agree with what you say, could you comment on my questions
earlier in the thread? From a look, it seems that ImmutableSet ought
to have the same behavior as HashSet - same hash and same table size,
yet in the example, the same value loopup had different behavior - how
do you explain the discrepancy? Obviously some assumption I make is
wrong - but which one?
Regards,
Dimitris
2010/1/12 Kevin Bourrillion <kev...@google.com>:
Then typically ImmutableSet will have bigger tables than HashSet, so
the test (in which I think I saw the table sizes happened to be equal,
64) is completely unfair - I presume the situation would look quite
better if the ImmutableSet of the test had, say, 33 elements, rathen
than 24. Agreed though, this is still in the domain of very small
sets, and the advantages of cache locality are not obvious yet.
Thanks for explaining this through Kevin! Now I'm even more happy with
the ImmutableSet implementation we have.
For some further reading on performance testing, I'd highly recommend section 12.3 ('Avoiding performance testing pitfalls') of Java Concurrency in Practice (Goetz et al.)
As an exercise, I created a HashSet by embedding and dumbing down
HashMap source code, but it's tricky to actually make it lighter. This
is the entry object of HashMap:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
For a Set, we don't need the value field. But removing that field (I
have a typical enough machine) still leaves an Entry object at 24
bytes - no reduction. If I forgo the "hash" field too, then I get a
23% reduction in memory compared to java.util.HashSet.
I should have started with IdentityHashMap as a base :(
I've had the same thoughts.
> As an exercise, I created a HashSet by embedding and dumbing down
> HashMap source code, but it's tricky to actually make it lighter. This
> is the entry object of HashMap:
>
> static class Entry<K,V> implements Map.Entry<K,V> {
> final K key;
> V value;
> Entry<K,V> next;
> final int hash;
>
> For a Set, we don't need the value field. But removing that field (I
> have a typical enough machine) still leaves an Entry object at 24
> bytes - no reduction. If I forgo the "hash" field too, then I get a
> 23% reduction in memory compared to java.util.HashSet.
Removing the "value" field has some value of course,
depending on the JVM. One would expect a reduction
from 7-8 words per key to 6-7 words per key.
I expect that the "hash" field has to stay, especially
if our aim is to actually replace java.util.HashSet.
The maintenance cost of duplicated code keeps one
from starting on this project, especially while other folks
still have grand plans for improving HashMap.
I believe IdentityHashMap uses open addressing
because the cost of a few extra comparisons is much lower
when comparison involves "==" instead of ".equals".
But I am still tempted to explore an implementation
based on a pair of arrays with open addressing,
one to store elements and one to store hashes,
for a space cost of about 4 words per key.
Martin
I trusted too much Instrumentation#getObjectSize(), which shown that
the object remains at 24 bytes with or without a single field, but as
I read more carefully the javadocs, this is an approximation.
I was more or less following the discussion on a new HashMap, but I
haven't heard news on that for months. If HashMap itself changes, the
current HashSet will evolve automatically, but a HashSet replacement
would not do so as easily, and it would have to accommodate
LinkedHashSet somehow too. I was thinking more in terms of replacing
the implementation used in Sets#newHashSet() methods of guava (for
just the benefit of new code, ok).
I would think that open addressing does not mean more collisions if
the trade-off is striked at the right balance. If my rudimentary
calculations are right, linear probing would requite a load factor of
35% to have the same expected collisions as a chained hashtable with
load factor 75% (i.e. the default of HashMap). If one gets rids of
those Entry objects, it seems that the memory reduction will be just
about enough to compensate for the longer required tables (which are
going to be about twice the size), and then some more, so it should be
at least a slight win.
The easy solution obviously would be to ping Doug at some appropriate time. :)
Regards,
Dimitris
2010/1/13 Martin Buchholz <mart...@google.com>:
I did chat with Doug last week.
He has made some effort in the past to
make a much improved HashSet,
but he also did not come up with anything
sufficiently good to replace the current HashSet.
Martin
> Regards,
>
> Dimitris