Hash table performance

Jon Harrop

unread,

Nov 21, 2009, 1:33:14 PM11/21/09

to

I'm having trouble getting Java's hash tables to run as fast as .NET's.
Specifically, the following program is 32x slower than the equivalent
on .NET:

import java.util.Hashtable;

public class Hashtbl {
public static void main(String args[]){
Hashtable hashtable = new Hashtable();

for(int i=1; i<=10000000; ++i) {
double x = i;
hashtable.put(x, 1.0 / x);
}

System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
}
}

My guess is that this is because the JVM is boxing every floating point
number individually in the hash table due to type erasure whereas .NET
creates a specialized data structure specifically for a float->float hash
table with the floats unboxed. Consequently, the JVM is doing enormously
numbers of allocations whereas .NET is not.

Is that correct?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Tom Anderson

unread,

Nov 21, 2009, 12:43:52 PM11/21/09

to

On Sat, 21 Nov 2009, Jon Harrop wrote:

> I'm having trouble getting Java's hash tables

I note you're using java.util.Hashtable. That's an ancient class which has
a few things wrong with it: i'd strongly suggest using java.util.HashMap
instead. I wouldn't expect it to be significantly faster in this test,
though.

> to run as fast as .NET's. [...] My guess is that this is because the JVM

> is boxing every floating point number individually in the hash table due
> to type erasure whereas .NET creates a specialized data structure
> specifically for a float->float hash table with the floats unboxed.
> Consequently, the JVM is doing enormously numbers of allocations whereas
> .NET is not.
>
> Is that correct?

I can't comment on what the CLR is doing in a comparable situation, but
that is certainly a reasonable description of what java is doing.

Could you try changing the put line to:

hashtable.put(Double.toString(x), Double.toString(1.0 / x));

And making the corresponding change in the C# or whatever version, and
making the comparison again? That eliminates boxing in java, so if the
difference is due to boxing, it will be significantly reduced, which will
give you some clues as to what's going on.

If you need optimal performance for a double->double map in java, you
would need to write one specifically for that case. Or rather, someone
would: you could have a google to see if someone already has. Also, since
converting doubles to longs is cheap and reversible, you could look for a
long->long hashmap, which strikes me as more likely to exist. I didn't
find one in two minutes of googling, but i did find some long->Object
hashmaps, which get you halfway there.

tom

--
Oh, well of course *everything* looks bad if you remember it

John B. Matthews

unread,

Nov 21, 2009, 12:54:53 PM11/21/09

to

In article <T9GdnWzZ_sPfvJXW...@brightview.co.uk>,
Jon Harrop <j...@ffconsultancy.com> wrote:

In addition to using HashMap, you might also specify an appropriate
initialCapacity and loadFactor. You may be seeing a lot of rehash
operations.

<http://java.sun.com/javase/6/docs/api/java/util/HashMap.html>

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

Arne Vajhøj

unread,

Nov 21, 2009, 1:17:51 PM11/21/09

to

Yes.

And why do you ask when it is so easy to test and verify?

Arne

Arne Vajhøj

unread,

Nov 21, 2009, 1:19:58 PM11/21/09

to

Tom Anderson wrote:
> Could you try changing the put line to:
>
> hashtable.put(Double.toString(x), Double.toString(1.0 / x));
>
> And making the corresponding change in the C# or whatever version, and
> making the comparison again? That eliminates boxing in java, so if the
> difference is due to boxing, it will be significantly reduced, which
> will give you some clues as to what's going on.

I would still consider it boxing - just to String instead of Double.

Arne

Jon Harrop

unread,

Nov 21, 2009, 2:47:14 PM11/21/09

to

Arne Vajhøj wrote:
> And why do you ask when it is so easy to test and verify?

In case there is faster alternative that I'm not aware of.

Marcin Rzeźnicki

unread,

Nov 21, 2009, 1:47:06 PM11/21/09

to

Hi Jon
You are using Hashtable instead of HashMap - probably the performance
loss you've observed is due to synchronization (though "fat"
synchronization might be optimized away in case of single thread you
still pay the price, though lower). If you took a look at JavaDoc,
you'd notice that HashTable methods are synchronized As of boxing, you
are correct (though there is no type erasure in your example because
you did not specify type parameters at all) but I suspect that these
costs are not the most contributing factor to overall poor
performance. I'd blame synchronization in the first place.

Patricia Shanahan

unread,

Nov 21, 2009, 2:00:13 PM11/21/09

to

I think there has to be something more to it than just the autoboxing.

My reasoning is that you never reuse a key, so every put call creates a
new Entry instance. Creating a Double from a double is about as simple
as object creation can be, so I don't see how the boxing could to more
than triple the time spent in object creation during an average put
call. That cannot, by itself, account for a 32x performance ratio.

Patricia

Peter Duniho

unread,

Nov 21, 2009, 2:02:17 PM11/21/09

to

Right. In fact, because the conversion to a string is costlier, it
could obscure the real performance costs inherent in the hash data
structure.

One could do a similar "even the odds" change simply by forcing boxing
in .NET, by using "Object" as the type in the .NET data structure
instead of "Double". That way both versions are boxing, but in the most
efficient way available for the platform.

Pete

Arne Vajhøj

unread,

Nov 21, 2009, 2:06:27 PM11/21/09

to

I think it can.

I did some test on my PC.

Java both HashMap and Hashtable are about 8 times slower
than .NET Dictionary<double,double>, but both .NET
Dictionary<object,object> and Hashtable are about the
same speed as Java.

Arne

Arne Vajhøj

unread,

Nov 21, 2009, 2:07:57 PM11/21/09

to

Marcin Rzeźnicki wrote:
> On 21 Lis, 19:33, Jon Harrop <j...@ffconsultancy.com> wrote:
>> I'm having trouble getting Java's hash tables to run as fast as .NET's.
>> Specifically, the following program is 32x slower than the equivalent
>> on .NET:
>>
>> import java.util.Hashtable;
>>
>> public class Hashtbl {
>> public static void main(String args[]){
>> Hashtable hashtable = new Hashtable();
>>
>> for(int i=1; i<=10000000; ++i) {
>> double x = i;
>> hashtable.put(x, 1.0 / x);
>> }
>>
>> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
>> }
>> }
>>
>> My guess is that this is because the JVM is boxing every floating point
>> number individually in the hash table due to type erasure whereas .NET
>> creates a specialized data structure specifically for a float->float hash
>> table with the floats unboxed. Consequently, the JVM is doing enormously
>> numbers of allocations whereas .NET is not.
>>
>> Is that correct?

> You are using Hashtable instead of HashMap - probably the performance

> loss you've observed is due to synchronization (though "fat"
> synchronization might be optimized away in case of single thread you
> still pay the price, though lower). If you took a look at JavaDoc,
> you'd notice that HashTable methods are synchronized As of boxing, you
> are correct (though there is no type erasure in your example because
> you did not specify type parameters at all) but I suspect that these
> costs are not the most contributing factor to overall poor
> performance. I'd blame synchronization in the first place.

That does not match measurements.

Arne

Tom Anderson

unread,

Nov 21, 2009, 2:44:25 PM11/21/09

to

On Sat, 21 Nov 2009, Marcin Rze?nicki wrote:

> On 21 Lis, 19:33, Jon Harrop <j...@ffconsultancy.com> wrote:
>> I'm having trouble getting Java's hash tables to run as fast as .NET's.
>> Specifically, the following program is 32x slower than the equivalent
>> on .NET:
>>
>> � import java.util.Hashtable;
>>
>> � public class Hashtbl {
>> � � public static void main(String args[]){
>> � � � Hashtable hashtable = new Hashtable();
>>
>> � � � for(int i=1; i<=10000000; ++i) {
>> � � � � double x = i;
>> � � � � hashtable.put(x, 1.0 / x);
>> � � � }
>>
>> � � � System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
>> � � }
>> � }
>>
>> My guess is that this is because the JVM is boxing every floating point
>> number individually in the hash table due to type erasure whereas .NET
>> creates a specialized data structure specifically for a float->float hash
>> table with the floats unboxed. Consequently, the JVM is doing enormously
>> numbers of allocations whereas .NET is not.
>>
>> Is that correct?
>

> You are using Hashtable instead of HashMap - probably the performance
> loss you've observed is due to synchronization (though "fat"
> synchronization might be optimized away in case of single thread you
> still pay the price, though lower). If you took a look at JavaDoc, you'd
> notice that HashTable methods are synchronized As of boxing, you are
> correct (though there is no type erasure in your example because you did
> not specify type parameters at all) but I suspect that these costs are
> not the most contributing factor to overall poor performance. I'd blame
> synchronization in the first place.

I'd be *very* surprised if that was true. In this simple program, escape
analysis could eliminate the locking entirely - and current versions of
JDK 1.6 do escape analysis. Even if for some reason it didn't, you'd only
be using a thin lock here, which takes two x86 instructions and one memory
access for each lock and unlock operation, far less than the boxing or
unboxing.

I modified the test code to look like this (yes, with no warmup - this is
very quick and dirty):

import java.util.Map;
import java.util.HashMap;
import java.util.Hashtable;

public class HashPerf {
public static void main(String args[]) throws InterruptedException{
for(int i=1; i<=100; ++i) {
long t0 = System.nanoTime();
test();
long t1 = System.nanoTime();
long dt = t1 - t0;
System.out.println(dt);
System.gc();
Thread.sleep(200);
}
}
private static void test(){
Map<Double, Double> hashtable = new HashMap<Double, Double>();
// Map<Double, Double> hashtable = new Hashtable<Double, Double>();
for(int i=1; i<=1000; ++i) {
double x = i;
// synchronized (hashtable) {
hashtable.put(x, 1.0 / x);
// }
}
}
}

And then ran it with three variations on the comments: one as above, one
uncommenting the synchronization of the hashtable, and one switching the
HashMap to a Hashtable. I have java 1.5.0_19 on an elderly and ailing
PowerPC Mac laptop. I ran with -server and otherwise stock settings.

The timings for each show the usual power curve distribution: 80% of the
measurements are no more than 50% longer than the fastest, and 90% are no
more than twice as long, with the last 10% being up to 10 times longer. If
we say that the slowest 10% are artifacts of warmup, GC, the machine doing
other things, etc, and ignore them, then the average times i got were
(with standard error of the mean, which is broadly like a ~60% confidence
limit IIRC):

HashMap 933500 +/- 15006
sync HashMap 1003200 +/- 16187
Hashtable 868322 +/- 11602

That is, adding synchronization to the accesses adds a 7.5% overhead.
Although somehow, the old Hashtable comes out faster!

So, even with java 1.5, adding synchronization to HashMap.put() imposes
only a small performance penalty - i'd expect it to be less with 1.6. I
doubt very much that this is the major factor in the OP's performance
problem.

tom

--
... the gripping first chapter, which literally grips you because it's
printed on a large clamp.

Tom Anderson

unread,

Nov 21, 2009, 2:46:08 PM11/21/09

to

I take the point that it's still creating an object. But it is simply,
undeniably, not boxing.

Moreover, the point is that i was suggesting adding the *same* object
creation to both the java and .NET versions, which would eliminate any
differences in language-level boxing behaviour.

tom

--
Ten years of radio astronomy have taught humanity more about the creation
and organization of the universe than thousands of years of religion
and philosophy. -- P. C. W. Davis

Peter Duniho

unread,

Nov 21, 2009, 4:02:41 PM11/21/09

to

Tom Anderson wrote:
> On Sat, 21 Nov 2009, Arne Vajh?j wrote:
>
>> Tom Anderson wrote:
>>> Could you try changing the put line to:
>>>
>>> hashtable.put(Double.toString(x), Double.toString(1.0 / x));
>>>
>>> And making the corresponding change in the C# or whatever version,
>>> and making the comparison again? That eliminates boxing in java, so
>>> if the difference is due to boxing, it will be significantly reduced,
>>> which will give you some clues as to what's going on.
>>
>> I would still consider it boxing - just to String instead of Double.
>
> I take the point that it's still creating an object. But it is simply,
> undeniably, not boxing.
>
> Moreover, the point is that i was suggesting adding the *same* object
> creation to both the java and .NET versions, which would eliminate any
> differences in language-level boxing behaviour.

But you can force boxing in .NET just by using Object as the type for
the collection class type parameters instead of Double. So wouldn't a
better test be to do that? Rather than introducing yet another point of
possible incongruity? If you are concerned about "differences in
language-level boxing behavior", I'd think there'd be just as much
concern about "differences in language-level string conversion
behavior". I see no reason to think that boxing in .NET would be any
more or less different from boxing in Java than string conversion in
.NET would be from string conversion in Java.

In any case, I think Arne's point still stands though: "boxing" can be
thought of generally or specifically. Specifically, you're
right...conversion to String isn't strictly speaking boxing. But
generally, inasmuch as "boxing" is simply a way to convert a value type
(primitive) to a reference type (class), converting to String is as
valid a way to "box" a value type as converting to Object or Double.

It really just depends on how strictly you choose to define boxing, and
how literally you take the phrase "consider it boxing". In some sense,
converting to a String isn't "eliminating boxing in Java", but rather
just accomplishing the same thing as boxing but in a slightly different way.

A person who insists with absolute stubbornness that there is only one
possible interpretation of a particular term or phrase may of course
disagree with my analysis of the question. :)

Pete

markspace

unread,

Nov 21, 2009, 4:07:19 PM11/21/09

to

Jon Harrop wrote:

> My guess is that this is because the JVM is boxing every floating point
> number individually in the hash table due to type erasure whereas .NET
> creates a specialized data structure specifically for a float->float hash
> table with the floats unboxed. Consequently, the JVM is doing enormously
> numbers of allocations whereas .NET is not.
>
> Is that correct?

No, not primarily. In my testing I found:

1. I think the JVM itself makes the most difference.

2. Next was synchronization.

3. Autoboxing appears to have an impact but it's less than either #1 or #2.

The first time I ran your program, it took over 79 seconds to execute.
During the course of testing the other algorithms, that time reduced to
about 13 seconds and stayed there. Very odd, but I assume that
something somewhere optimized the code.

Unlike a lot of the posters here, I'm running a 32 client JVM, which is
slow to optimize code. 64 bit systems run in -server mode by default,
which optimizes code before executing it.

The first place to look is your environment: memory, flags to the JVM,
which JVM version and vendor, other "system" variables, etc.

Somewhat surprising to me, just substituting a a HashMap for the
HashTable reduced the time to about half -- 6 seconds instead of 13.
Both autobox, the only difference is synchronization.

Lastly, removing the autoboxing by writing a specialized class saved
about 0.7 second from just using HashMap. The specialized version time
was 5.3 seconds. Note that I allocated a new object myself -- the
"Entry" for the HashMap -- each time a new double is added. I suspect
that any reasonable C# hash object must do the same, so you're not
losing as much time as you think by Java's auto object creation.

I'm curious to see your C# version, and see actual times for both
programs on your system. The results might tell us what is going on.
"32x" doesn't really give us much of a clue.

Output of the following:

<output>
run-single:
HashTable time: 12844
hashtable(100.0) = 0.01
HashMap time: 6016
hashtable(100.0) = 0.01
HashTest time: 5373
hashtable(100.0) = 0.01
BUILD SUCCESSFUL (total time: 29 seconds)
</output>

<code>
package cljp;

import java.util.HashMap;
import java.util.Hashtable;
import static java.lang.System.out;

public class HashTest
{

public static void main( String[] args )
{
Hashtbl.test();
HashM.test();
HashTest.test();
}

HashMap<Entry,Entry> hash = new HashMap<Entry,Entry>();

private static class Entry {
final double key;
final double value;

public Entry( double key, double value )
{
this.key = key;
this.value = value;
}

@Override
public int hashCode()
{
long bits = Double.doubleToLongBits( key );
bits ^= bits >>> 32;
return (int)bits;
}

@Override
public boolean equals( Object obj )
{
if( !(obj instanceof Entry ) ){
return false;
}
return key == ((Entry)obj).key;
}
}

public void put( double key, double value ) {
Entry e = new Entry(key, value );
hash.put( e, e );
}

public double get( double key ) {
Entry e = new Entry( key, 0.0 );
Entry valueEntry = hash.get( e );
if( valueEntry != null ) {
return valueEntry.value;
} else {
throw new IllegalArgumentException("Not found: "+key);
}
}
public static void test()
{
long start = System.nanoTime();
HashTest hashTest = new HashTest();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashTest.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashTest time: "+ (end-start)/1000000 );
out.println( "hashtable(100.0) = " +
hashTest.get( 100.0 ) );
}

}
class HashM
{

public static void test()
{
long start = System.nanoTime();
HashMap<Double,Double> hashM = new HashMap<Double, Double>();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashM.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashMap time: "+ (end-start)/1000000 );
out.println( "hashtable(100.0) = " +
hashM.get( 100.0 ) );
}
}
class Hashtbl
{

public static void test()
{
long start = System.nanoTime();

Hashtable hashtable = new Hashtable();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashtable.put( x, 1.0 / x );
}

long end = System.nanoTime();
out.println( "HashTable time: "+ (end-start)/1000000 );
out.println( "hashtable(100.0) = " +
hashtable.get( 100.0 ) );
}
}
</code>

Tom Anderson

unread,

Nov 21, 2009, 4:59:25 PM11/21/09

to

On Sat, 21 Nov 2009, Peter Duniho wrote:

> Tom Anderson wrote:
>> On Sat, 21 Nov 2009, Arne Vajh?j wrote:
>>
>>> Tom Anderson wrote:
>>>> Could you try changing the put line to:
>>>>
>>>> hashtable.put(Double.toString(x), Double.toString(1.0 / x));
>>>>
>>>> And making the corresponding change in the C# or whatever version, and
>>>> making the comparison again? That eliminates boxing in java, so if the
>>>> difference is due to boxing, it will be significantly reduced, which will
>>>> give you some clues as to what's going on.
>>>
>>> I would still consider it boxing - just to String instead of Double.
>>
>> I take the point that it's still creating an object. But it is simply,
>> undeniably, not boxing.
>>
>> Moreover, the point is that i was suggesting adding the *same* object
>> creation to both the java and .NET versions, which would eliminate any
>> differences in language-level boxing behaviour.
>
> But you can force boxing in .NET just by using Object as the type for the
> collection class type parameters instead of Double. So wouldn't a better
> test be to do that?

Yes, absolutely. It just didn't occur to me to do that!

tom

--
NO REAL THAN YOU ARE -- Ego Leonard, The Zandvoort Man

Tom Anderson

unread,

Nov 21, 2009, 10:21:45 PM11/21/09

to

On Sat, 21 Nov 2009, markspace wrote:

> Jon Harrop wrote:
>
>> My guess is that this is because the JVM is boxing every floating point
>> number individually in the hash table due to type erasure whereas .NET
>> creates a specialized data structure specifically for a float->float hash
>> table with the floats unboxed. Consequently, the JVM is doing enormously
>> numbers of allocations whereas .NET is not.
>

> 3. Autoboxing appears to have an impact but it's less than either #1 or #2.
>

> Lastly, removing the autoboxing by writing a specialized class saved
> about 0.7 second from just using HashMap. The specialized version time
> was 5.3 seconds. Note that I allocated a new object myself -- the
> "Entry" for the HashMap -- each time a new double is added. I suspect
> that any reasonable C# hash object must do the same, so you're not
> losing as much time as you think by Java's auto object creation.

Hang on, if i've read your code right, then what you've done is replaced
automatic boxing with manual boxing. The point is that in .NET, there is
*no* boxing in this case - doubles are stored unwrapped in the map.
Although in your code, you put both doubles in a single box, so if boxing
is a slowdown, your code should roughly halve it.

I took the liberty of converting java's HashMap to work directly with
primitive doubles:

http://urchin.earth.li/~twic/Code/DoubleMap.java

I haven't tested that this implementation is correct, but on the benchmark
i posted a little earlier, it comes out 17% faster than a HashMap with
boxing.

So, 7.5% for synchronization, 17% for boxing - we're still a good way off
this reported 32x!

tom

--

Patricia Shanahan

unread,

Nov 21, 2009, 10:41:36 PM11/21/09

to

Tom Anderson wrote:
...

> So, 7.5% for synchronization, 17% for boxing - we're still a good way
> off this reported 32x!

...

In my experience, there are two main ways of getting a 32x performance
ratio for the same job:

1. Different algorithm.

2. Memory issues.

In this case, I suspect possibly memory issues. If the .NET table is
more compact, because of using primitives, it might fit into a level in
Jon's computer's memory hierarchy that the Java Hashtable does not fit.
This sort of thing is configuration dependent, so the performance
difference might not be reproducible on a different computer, even using
the same programs.

Patricia

markspace

unread,

Nov 22, 2009, 1:53:38 AM11/22/09

to

Tom Anderson wrote:

>
> Hang on, if i've read your code right, then what you've done is replaced
> automatic boxing with manual boxing. The point is that in .NET, there is

Yeah, I don't make any effort to conserve "Entries" that are already
made. I don't think it would matter. Jon's test makes new doubles, it
never repeats a value.

All I'm really trying to do is provide a close approximation to what I
think C# might be doing. So far the result doesn't seem to speed up
running time much, so I'm not sure of the value of perusing this
further. Jon's 32x number seems bogus. I think there's some
environmental or JVM issue for him.

Kevin McMurtrie

unread,

Nov 22, 2009, 3:55:22 AM11/22/09

to

In article <T9GdnWzZ_sPfvJXW...@brightview.co.uk>,
Jon Harrop <j...@ffconsultancy.com> wrote:

Yes. Creating custom hashing classes for primitives pays off if
performance needs to be very high. There are also alternative ways of
handling collisions that will eliminate common memory allocations in
put(). It's unfortunate that creating these classes is such a manual
process.
--
I won't see Goolge Groups replies because I must filter them as spam

Tom Anderson

unread,

Nov 22, 2009, 8:54:58 AM11/22/09

to

Ah, my machine is rather slow, and i didn't have the patience to wait for
the full-size version to run, so i did significantly reduce the size of
the table in my tests to 1000 entries, which will easily fit in L3 caches
and have pretty good representation in higher ones Thus, any effects
caused by data hugeness may be lost.

tom

--
Would you like to remember more?

markspace

unread,

Nov 22, 2009, 11:49:13 AM11/22/09

to

Kevin McMurtrie wrote:

> Yes. Creating custom hashing classes for primitives pays off if
> performance needs to be very high.

I've actually been playing around with "alternate ways of handling
collisions" and I agree that hashing a double is hard. The default
algorithm both Tom and I used:

long bits = Double.doubleToLongBits( key );

int hash = (int)(bits ^ (bits >>> 32));

provides terrible performance. Java Map implementations provide some
"hash mangling" that help poor hash algorithms, so if you roll your own
Map class you'll have to provide a better hash function or provide your
own mangle function.

markspace

unread,

Nov 22, 2009, 12:10:45 PM11/22/09

to

Note: found this site. Haven't read it all but it seems cool:

<http://www.partow.net/programming/hashfunctions/>

Marcin Rzeźnicki

unread,

Nov 22, 2009, 12:34:16 PM11/22/09

to

First of all, escape analysis is turned off by default. The next thing
is that there is subtle difference between synchronized method and
synchronized block. Hashtable has the former - escape analysis does
not help here very much afaik. So the only benefit would be gained
from thin locks.

Even if for some reason it didn't, you'd only
> be using a thin lock here, which takes two x86 instructions and one memory
> access for each lock and unlock operation, far less than the boxing or
> unboxing.
>

Probably because it was off, and it wouldn't have eliminated locks
anyway. I wonder - maybe something prevented taking the fast path
locking?
Boxing is fast as well, it is one simple monomorhic call resulting in
creating one Double (which may be cached anyway)

Interesting indeed - anyway, your synchronized block is more likely to
be optimized away with EA turned on than synchronized method's lock
on this, though I am not quite sure seeing your results :-)
Well, I am more inclined to believe that his VM somehow did not
perform lock optimization.

Marcin Rzeźnicki

unread,

Nov 22, 2009, 12:35:26 PM11/22/09

to

Yeah, I see. Well, what VM did Jon use? That might be crucial to know
what happened.

Jon Harrop

unread,

Nov 22, 2009, 2:43:28 PM11/22/09

to

Marcin Rzeźnicki wrote:
> Yeah, I see. Well, what VM did Jon use? That might be crucial to know
> what happened.

$ java -version
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode)

Marcin Rzeźnicki

unread,

Nov 22, 2009, 2:02:14 PM11/22/09

to

On 22 Lis, 20:43, Jon Harrop <j...@ffconsultancy.com> wrote:
> Marcin Rzeźnicki wrote:
> > Yeah, I see. Well, what VM did Jon use? That might be crucial to know
> > what happened.
>
> $ java -version
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode)
>

Quite recent. And were your measurements affected by using HashMap
instead?

Martin Gregorie

unread,

Nov 22, 2009, 2:38:01 PM11/22/09

to

Did you try setting the initial capacity to 1000?

I've had terrible performance from C programs that malloced for every one
of a huge number of little items it wanted to put on the heap. I'd hope
any JVM was better than that but you never know.... Besides, its quite
possible NET allocates bigger chunks since Windows memory allocation was/
is quite slow.

--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |

Patricia Shanahan

unread,

Nov 22, 2009, 2:47:04 PM11/22/09

to

Jon Harrop wrote:
> Marcin Rzeźnicki wrote:
>> Yeah, I see. Well, what VM did Jon use? That might be crucial to know
>> what happened.
>
> $ java -version
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode)
>

It might help to show your .NET program as well. If people ran both it
and the Java version on a few different systems, we could see how much,
if any, of the performance difference is configuration-specific.

Patricia

Lew

unread,

Nov 22, 2009, 3:54:18 PM11/22/09

to

Tom Anderson wrote:
>> I'd be *very* surprised if that was true. In this simple program, escape
>> analysis could eliminate the locking entirely - and current versions of
>> JDK 1.6 do escape analysis.

The OP is not using a current version of Java 6.

Jon Harrop wrote:
>> $ java -version
>> java version "1.6.0_13"
>> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
>> Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode)

According to
<http://java.sun.com/javase/6/webnotes/6u14.html>
> Optionally available are two new features -
> escape analysis and compressed object pointers.

Which implies strongly that escape analysis, being "new" in 6u14, was not
available in 6u13. Even then, as Marcin Rzeźnicki wrote:
> ... escape analysis is turned off by default.
...

> Well, I am more inclined to believe that his VM somehow did not
> perform lock optimization.

--
Lew

Roedy Green

unread,

Nov 22, 2009, 5:41:45 PM11/22/09

to

On Sat, 21 Nov 2009 18:33:14 +0000, Jon Harrop <j...@ffconsultancy.com>
wrote, quoted or indirectly quoted someone who said :

>I'm having trouble getting Java's hash tables to run as fast as .NET's.
>Specifically, the following program is 32x slower than the equivalent
>on .NET:

see http://mindprod.com/jgloss/hashmap.html#SPEED

--
Roedy Green Canadian Mind Products
http://mindprod.com
Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.

Jon Harrop

unread,

Nov 23, 2009, 7:48:14 AM11/23/09

to

Not affected.

Jon Harrop

unread,

Nov 23, 2009, 7:51:58 AM11/23/09

to

Martin Gregorie wrote:
> Did you try setting the initial capacity to 1000?

I just tried setting it to the final size of the hash table (10M) and the
time actually got slightly worse.

> I've had terrible performance from C programs that malloced for every one
> of a huge number of little items it wanted to put on the heap. I'd hope
> any JVM was better than that but you never know.... Besides, its quite
> possible NET allocates bigger chunks since Windows memory allocation was/
> is quite slow.

I believe the JVM is doing orders of magnitude more allocations than .NET
here due to its type erasure approach to generics.

Jon Harrop

unread,

Nov 23, 2009, 7:57:53 AM11/23/09

to

Patricia Shanahan wrote:
> It might help to show your .NET program as well. If people ran both it
> and the Java version on a few different systems, we could see how much,
> if any, of the performance difference is configuration-specific.

Sure, here's the F# code for .NET:

let n = 10000000
let m = System.Collections.Generic.Dictionary(n)
for i=1 to n do
m.[float i] <- 1.0 / float i
printf "%d %g\n" m.Count m.[100.0]

That takes around 1s and the Java code takes around 32s here.

Lew

unread,

Nov 23, 2009, 8:59:42 AM11/23/09

to

Jon Harrop wrote:
> I believe the JVM is doing orders of magnitude more allocations than .NET
> here due to its type erasure approach to generics.

Type erasure has no effect on your original problem.

Type erasure has no effect on the number of allocations, let alone "orders of
magnitude".

It simply means that a parametrized type is treated as an 'Object' by the JVM.
It affects the number of casts, which may be affected by the HotSpot optimizer.

If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo' instances, you
have 1001 allocations plus however many allocation/copy operations are
necessary to grow the 'Set' to hold 1000 references (six with a typical
default initial size, zero if you made it big enough to hold 1000). Type
erasure has nothing to do with it - you're still creating only 'Foo' objects
and enough 'Set' slots to refer to them.

--
Lew

Thomas Pornin

unread,

Nov 23, 2009, 9:25:57 AM11/23/09

to

According to Lew <no...@lewscanon.com>:

> Jon Harrop wrote:
> > I believe the JVM is doing orders of magnitude more allocations than .NET
> > here due to its type erasure approach to generics.
>
> Type erasure has no effect on your original problem.
>
> Type erasure has no effect on the number of allocations, let alone
> "orders of magnitude".

He wrote "type erasure approach to generics". He did not wrote that type
erasure itself was responsible.

Here, the extra allocations are due to the non-specialization of code at
runtime: the container code (e.g. HashMap) has no knowledge of the
contained element type. The root decision is for backward compatibility
and mixing of pre-generics source code and compiled code with newer code
within the same application. A consequence is that code cannot be
specialized at runtime depending on the type parameters, which forces
the Set or Map implementation to handle the float values as 'Object'
instances, implying boxing (which are dynamic allocations, regardless of
how you wish to name them). Another consequence of that decision of
backward compatibility is the use of type erasure as a compilation
gimmick to make generic types available at the syntax level.

Naming this decision of backward compatibility the "type erasure
approach to generics" is not illegitimate; syntax is what is seen and
type erasure was duly advertised as the core technology which enabled
use of generic types while maintaining backward compatibility. As far as
terminology goes, "type erasure approach" is not bad here.

Now type non-erasure does not automatically imply code specialization.
It allows it. .NET uses code specialization (well, at least some
implementations of the CLR do). This is what is shown here: in some
corner cases, code specialization may provide a measurable performance
boost.

--Thomas Pornin

Jon Harrop

unread,

Nov 23, 2009, 10:58:35 AM11/23/09

to

Patricia Shanahan wrote:
> My reasoning is that you never reuse a key, so every put call creates a
> new Entry instance.

Note that an "Entry instance" is a value type on the CLR, something that the
JVM is incapable of expressing.

> Creating a Double from a double is about as simple
> as object creation can be,

Note that there is no object creation on the CLR and, indeed, I believe that
is precisely why it is so much faster.

> so I don't see how the boxing could to more
> than triple the time spent in object creation during an average put
> call. That cannot, by itself, account for a 32x performance ratio.

I believe the difference is that the CLR does no boxing whatsoever (not of
doubles and not of entries) whereas the JVM boxes everything.

Patricia Shanahan

unread,

Nov 23, 2009, 9:45:28 AM11/23/09

to

Jon Harrop wrote:
> Patricia Shanahan wrote:
>> It might help to show your .NET program as well. If people ran both it
>> and the Java version on a few different systems, we could see how much,
>> if any, of the performance difference is configuration-specific.
>
> Sure, here's the F# code for .NET:
>
> let n = 10000000
> let m = System.Collections.Generic.Dictionary(n)
> for i=1 to n do
> m.[float i] <- 1.0 / float i
> printf "%d %g\n" m.Count m.[100.0]
>
> That takes around 1s and the Java code takes around 32s here.
>

What is the size of an F# float?

Patricia

Jon Harrop

unread,

Nov 23, 2009, 11:00:50 AM11/23/09

to

Am I correct in thinking that escape analysis might result in unboxing of
local values in functions but it will never result in unboxed data
structures on the heap? For example, it cannot unbox the keys and values in
a hash table?

Patricia Shanahan

unread,

Nov 23, 2009, 9:50:54 AM11/23/09

to

Jon Harrop wrote:
> Martin Gregorie wrote:
>> Did you try setting the initial capacity to 1000?
>
> I just tried setting it to the final size of the hash table (10M) and the
> time actually got slightly worse.
>
>> I've had terrible performance from C programs that malloced for every one
>> of a huge number of little items it wanted to put on the heap. I'd hope
>> any JVM was better than that but you never know.... Besides, its quite
>> possible NET allocates bigger chunks since Windows memory allocation was/
>> is quite slow.
>
> I believe the JVM is doing orders of magnitude more allocations than .NET
> here due to its type erasure approach to generics.
>

I don't think type erasure or generics have anything to do with it.

On a non-rehashing call, the JVM would do three allocations, one of
which, for the Entry, would have been required regardless of the types.
The other two are due the lack of primitive-based implementations of the
java.util collections, causing two autobox conversions from double to
Double.

The .NET implementation may have a different way of storing the hash
chains that avoids allocating an object for each put call.

In addition, there is the issue of the number of rehashes. Each Java
rehash call only involves one object allocation, but a lot of memory
access and computation.

That is, the issues I see as being candidates for the problem are the
non-support of primitive collections, and other issues in the design of
the F# and Java collections.

Patricia

Jon Harrop

unread,

Nov 23, 2009, 11:11:53 AM11/23/09

to

Lew wrote:
> Jon Harrop wrote:
>> I believe the JVM is doing orders of magnitude more allocations than .NET
>> here due to its type erasure approach to generics.
>
> Type erasure has no effect on your original problem.

The JVM cannot generate type-specialized data structures as the CLR does
because the types have been erased.

> Type erasure has no effect on the number of allocations, let alone "orders
> of magnitude".

Type erasure forces all values to be cast to Object which forces value types
(like double) to be boxed into objects which is an allocation. The JVM
boxes all 20,000,000 of the doubles generated by this program whereas the
CLR boxes none of them. That is the "orders of magnitude" difference I was
referring to.

> It simply means that a parametrized type is treated as an 'Object' by the
> JVM.
> It affects the number of casts, which may be affected by the HotSpot
> optimizer.

Casting a primitive type to Object incurs an allocation.

> If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo' instances,
> you have 1001 allocations plus however many allocation/copy operations are
> necessary to grow the 'Set' to hold 1000 references (six with a typical
> default initial size, zero if you made it big enough to hold 1000). Type
> erasure has nothing to do with it - you're still creating only 'Foo'
> objects and enough 'Set' slots to refer to them.

You are assuming that Foo is an object which is not true in general and is
not true in this case.

Jon Harrop

unread,

Nov 23, 2009, 11:21:25 AM11/23/09

to

I was considering the possibility of hand coding type-specialized hash
tables in Java to work around this deficiency in generics on the JVM when I
realised that it is actually impossible in the general case because the JVM
lacks value types and, therefore, is incapable of expressing efficient hash
tables.

Patricia Shanahan

unread,

Nov 23, 2009, 10:12:42 AM11/23/09

to

Jon Harrop wrote:
> I was considering the possibility of hand coding type-specialized hash
> tables in Java to work around this deficiency in generics on the JVM when I
> realised that it is actually impossible in the general case because the JVM
> lacks value types and, therefore, is incapable of expressing efficient hash
> tables.
>

Are you sure Java is the right language for your current project? If I
were fighting the language to the extent you seem to be doing, I would
seriously reexamine my choice of language.

Patricia

Lew

unread,

Nov 23, 2009, 11:03:09 AM11/23/09

to

Jon Harrop wrote:
> Type erasure forces all values to be cast to Object which forces value types
> (like double) to be boxed into objects which is an allocation. The JVM
> boxes all 20,000,000 of the doubles generated by this program whereas the
> CLR boxes none of them. That is the "orders of magnitude" difference I was
> referring to.
>

That's not an issue of type erasure but of Java not supporting
collections of primitive types. It is the autoboxing of 'double' to
'Double' that you discuss here, not the erasure of 'Double' to
'Object'.

Type erasure has nothing to do with value types being boxed. The
erasure from 'Double' to 'Object' has no further effect on
performance.

Others in this thread have tried to duplicate the degree of slowdown
you report without being able to. No one else here seems to see a
32:1 ratio in performance.

--
Lew

Patricia Shanahan

unread,

Nov 23, 2009, 11:27:20 AM11/23/09

to

Lew wrote:
...

> Others in this thread have tried to duplicate the degree of slowdown
> you report without being able to. No one else here seems to see a
> 32:1 ratio in performance.

That may change with the opportunity to run Jon's F# code - let's see
what other people who have F# installed get.

Patricia

Marcin Rzeźnicki

unread,

Nov 23, 2009, 11:51:49 AM11/23/09

to

On 23 Lis, 17:11, Jon Harrop <j...@ffconsultancy.com> wrote:
> Lew wrote:
> > Jon Harrop wrote:
>
> Type erasure forces all values to be cast to Object which forces value types
> (like double) to be boxed into objects which is an allocation. The JVM
> boxes all 20,000,000 of the doubles generated by this program whereas the
> CLR boxes none of them. That is the "orders of magnitude" difference I was
> referring to.
>

That's not true Jon. Type erasure does not force anything in your
example - you even did not use generics at all, so how can it affect
anything? The problem here is that Java lacks value types which is
orthognal to generics implementation

> > It simply means that a parametrized type is treated as an 'Object' by the
> > JVM.
> > It affects the number of casts, which may be affected by the HotSpot
> > optimizer.
>
> Casting a primitive type to Object incurs an allocation.
>

Not necessarily - Java specification permits caching instances of
boxed numerics.

> > If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo' instances,
> > you have 1001 allocations plus however many allocation/copy operations are
> > necessary to grow the 'Set' to hold 1000 references (six with a typical
> > default initial size, zero if you made it big enough to hold 1000). Type
> > erasure has nothing to do with it - you're still creating only 'Foo'
> > objects and enough 'Set' slots to refer to them.
>
> You are assuming that Foo is an object which is not true in general and is
> not true in this case.
>

It is true in general, furthermore it is always true in F# or any
language implemented on top of CLR where there is no notion of
"primitive type". The thing we should discuss after filtering half-
truths out is whether difference between value types and reference
types might cause 32x performance degradation

Marcin Rzeźnicki

unread,

Nov 23, 2009, 11:57:01 AM11/23/09

to

Neither holds because Java makes distinction between primitives and
reference types. If a primitive value is boxed than it means that it
is to be used in context where reference type is expected, if a
reference type is unboxed to a primitive then it means that primitive
is expected. No optimization mechanism can freely mix them because
that would result in non-verifiable code which would be rejected by
Java code verifier, thus rendered incorrect.

Marcin Rzeźnicki

unread,

Nov 23, 2009, 11:59:26 AM11/23/09

to

On 23 Lis, 16:58, Jon Harrop <j...@ffconsultancy.com> wrote:
> Patricia Shanahan wrote:
> > My reasoning is that you never reuse a key, so every put call creates a
> > new Entry instance.
>
> Note that an "Entry instance" is a value type on the CLR, something that the
> JVM is incapable of expressing.
>
> > Creating a Double from a double is about as simple
> > as object creation can be,
>
> Note that there is no object creation on the CLR and, indeed, I believe that
> is precisely why it is so much faster.
>

But, correct me if I am wrong, it involves copying a value type. If it
is so, then I am not sure why copying would be better than an object
allocation

Patricia Shanahan

unread,

Nov 23, 2009, 12:01:13 PM11/23/09

to

Marcin Rzeźnicki wrote:
...

> It is true in general, furthermore it is always true in F# or any
> language implemented on top of CLR where there is no notion of
> "primitive type". The thing we should discuss after filtering half-
> truths out is whether difference between value types and reference
> types might cause 32x performance degradation

...

I would back off even further to something like "What are the probable
causes of Jon's observations?".

Patricia

Marcin Rzeźnicki

unread,

Nov 23, 2009, 12:51:40 PM11/23/09

to

I profiled his example in net beans.

That's my JVM
C:\Users\Rzeźnik\Documents\java>java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)

Here is the code I used:

package hashmapexample;

import java.util.HashMap;

/**
*
* @author Rzeźnik
*/
public class Main {

/**
* @param args the command line arguments
*/
public static void main(String[] args) {
HashMap<Double, Double> hashtable = new HashMap<Double, Double>
();
for (int i = 1; i <= 1000000; ++i) { /* changed upper bound to
1m - sorry no, patience */

double x = i;
hashtable.put(x, 1.0 / x);
}

System.out.println("hashtable(100.0) = " + hashtable.get
(100.0));
}
}

I used -Xms512m -Xmx512m to eliminate extensive collections.

The results of profiling are as follows:
54.2% of time spent in java.util.HashMap.put(Object, Object) (1m
invocations)
of which:
* * 19.5% in java.util.HashMap.addEntry(int, Object, Object, int)
* * * * 11.1% in java.util.HashMap.resize(int) (17 invocations)
<--- !!!
* * * * 3.3% self-time
* * * * 1.4% in java.util.HashMap$Entry.<init>(int, Object, Object,
java.util.HashMap.Entry) <-- so the cost of allocating entries is
negligible
* * 8.1% in java.lang.Double.hashCode() <--- that's too much (?)
... rest of put omitted, circa 1%

Now, the interesting part is
30.3% of time spent in java.lang.Double.valueOf(double) <--- that's
boxing
Furthermore, there were 2m + 1 calls to new Double meaning that no
caching occurred.

markspace

unread,

Nov 23, 2009, 12:52:37 PM11/23/09

to

Jon Harrop wrote:

> That takes around 1s and the Java code takes around 32s here.

32 seconds for the Java code you posted is almost certainly an outlier.
I can run the original code you posed on a laptop with a 32 bit JVM in
13 seconds. Clearly, something is up with your environment.

Marcin Rzeźnicki

unread,

Nov 23, 2009, 1:00:46 PM11/23/09

to

Oh yes, conclusions:
Taking Jon's 32s of the execution time he could have saved around 3-4s
had he preallocated HashMap. He actually did that in his F# so this
modification alone might have caused F# version to run in, let's say,
28s. He, of course, could not eliminate boxing which might have taken
around 10s of his original execution time. So subtracting costs of
boxing from implied theoretical F# version's execution time we end up
with conclusion that F# should have executed in ~18s (which is
erroneous proceeder in itself because F# probably copies values from
stack). Roughly 1:2 in favor of F#.

Roedy Green

unread,

Nov 23, 2009, 1:16:21 PM11/23/09

to

On Mon, 23 Nov 2009 16:21:25 +0000, Jon Harrop <j...@ffconsultancy.com>

wrote, quoted or indirectly quoted someone who said :

>I was considering the possibility of hand coding type-specialized hash

>tables in Java to work around this deficiency in generics on the JVM when I
>realised that it is actually impossible in the general case because the JVM
>lacks value types and, therefore, is incapable of expressing efficient hash
>tables.

Could you elaborate a bit on what value types would look like if they
existed in Java and how you would use them for efficient hash tables?

Jon Harrop

unread,

Nov 23, 2009, 2:30:16 PM11/23/09

to

Marcin Rzeźnicki wrote:
> On 23 Lis, 16:58, Jon Harrop <j...@ffconsultancy.com> wrote:
>> Patricia Shanahan wrote:
>> > My reasoning is that you never reuse a key, so every put call creates a
>> > new Entry instance.
>>
>> Note that an "Entry instance" is a value type on the CLR, something that
>> the JVM is incapable of expressing.
>>
>> > Creating a Double from a double is about as simple
>> > as object creation can be,
>>
>> Note that there is no object creation on the CLR and, indeed, I believe
>> that is precisely why it is so much faster.
>
> But, correct me if I am wrong, it involves copying a value type.

Correct.

> If it is so, then I am not sure why copying would be better than an object
> allocation

Copying a few bytes of data is a *lot* faster than allocating a few bytes of
data.

Jon Harrop

unread,

Nov 23, 2009, 2:31:26 PM11/23/09

to

Patricia Shanahan wrote:
> What is the size of an F# float?

64 bits.

Patricia Shanahan

unread,

Nov 23, 2009, 1:18:49 PM11/23/09

to

Marcin Rzeźnicki wrote:
...

> Now, the interesting part is
> 30.3% of time spent in java.lang.Double.valueOf(double) <--- that's
> boxing
> Furthermore, there were 2m + 1 calls to new Double meaning that no
> caching occurred.

Interesting results. That is about what I would expect. If we were
trying to explain a 30% performance difference I would seriously
consider autoboxing as cause. If creating an Entry takes about as much
time as creating a Double, object creation could account for up to 45%
of the Java time.

The problem is explaining a 32x performance difference. The cause or
causes have to account for over 96% of the Java time.

Patricia

Roedy Green

unread,

Nov 23, 2009, 1:20:32 PM11/23/09

to

On Mon, 23 Nov 2009 12:51:58 +0000, Jon Harrop <j...@ffconsultancy.com>

wrote, quoted or indirectly quoted someone who said :

>I believe the JVM is doing orders of magnitude more allocations than .NET

>here due to its type erasure approach to generics.

I think you mean the extra overhead of boxing and unboxing, or perhaps
the extra overhead of allocating independent objects, rather that
plopping them into a single array the way you would with primitives.

You could find out the boxing/allocation overhead by preboxing your
elements, and saving the unboxing until you had waved the flag.

Jon Harrop

unread,

Nov 23, 2009, 2:36:14 PM11/23/09

to

Patricia Shanahan wrote:
> Jon Harrop wrote:
>> Martin Gregorie wrote:
>>> Did you try setting the initial capacity to 1000?
>>
>> I just tried setting it to the final size of the hash table (10M) and the
>> time actually got slightly worse.
>>
>>> I've had terrible performance from C programs that malloced for every
>>> one of a huge number of little items it wanted to put on the heap. I'd
>>> hope any JVM was better than that but you never know.... Besides, its
>>> quite possible NET allocates bigger chunks since Windows memory
>>> allocation was/ is quite slow.
>>
>> I believe the JVM is doing orders of magnitude more allocations than .NET
>> here due to its type erasure approach to generics.
>
> I don't think type erasure or generics have anything to do with it.
>
> On a non-rehashing call, the JVM would do three allocations, one of
> which, for the Entry, would have been required regardless of the types.

There is no need to allocate each entry separately because the hash table
can be an array of unboxed entries.

> The other two are due the lack of primitive-based implementations of the
> java.util collections, causing two autobox conversions from double to
> Double.

Yes.

> The .NET implementation may have a different way of storing the hash
> chains that avoids allocating an object for each put call.

I believe the chains are stored in the same array: closed hashing.

Roedy Green

unread,

Nov 23, 2009, 1:28:07 PM11/23/09

to

On Sat, 21 Nov 2009 18:33:14 +0000, Jon Harrop <j...@ffconsultancy.com>

wrote, quoted or indirectly quoted someone who said :

>My guess is that this is because the JVM is boxing every floating point
>number individually in the hash table due to type erasure whereas

In the olden days of FORTRAN, you would have handled this by writing a
version of HashMap that took a floating point primitive. You would
hard code it in. You have to hold your nose, but if you want just to
get the job done.

Roedy Green

unread,

Nov 23, 2009, 1:31:21 PM11/23/09

to

On Mon, 23 Nov 2009 12:57:53 +0000, Jon Harrop <j...@ffconsultancy.com>

wrote, quoted or indirectly quoted someone who said :

> printf "%d %g\n" m.Count m.[100.0]
>

>That takes around 1s and the Java code takes around 32s here.

How fast is it without the printf. Java's console i/o is painfully
slow.

Marcin Rzeźnicki

unread,

Nov 23, 2009, 1:36:36 PM11/23/09

to

On 23 Lis, 19:18, Patricia Shanahan <p...@acm.org> wrote:
> Marcin Rzeźnicki wrote:
>
> ...
>
> > Now, the interesting part is
> > 30.3% of time spent in java.lang.Double.valueOf(double) <--- that's
> > boxing
> > Furthermore, there were 2m + 1 calls to new Double meaning that no
> > caching occurred.
>
> Interesting results. That is about what I would expect. If we were
> trying to explain a 30% performance difference I would seriously
> consider autoboxing as cause. If creating an Entry takes about as much
> time as creating a Double, object creation could account for up to 45%
> of the Java time.
>

Actually it takes much longer. I had 2m double creations, and 1m entry
creations - so I was expecting that ratio would be:
(time_to_create_all_doubles) = 2 * (time_to_create_all_entries). It is
not quite so looking at the profiled call trace. It costed me about 6x
more to create single Double than to create single entry. 1m of entry
<init> calls accounted for 1.4% of total time, while 2m of Double
<init> calls accounted for 16.4% of total time. I think that it is
because each Double must super-call the Number constructor.

> The problem is explaining a 32x performance difference. The cause or
> causes have to account for over 96% of the Java time.
>

That could well be hidden in GC/heap resizing costs if he did not
allocate Java heap properly. I prevented these effects mostly from
occurring by running this example with -Xms512m -Xmx512m.

Marcin Rzeźnicki

unread,

Nov 23, 2009, 1:38:16 PM11/23/09

to

Yes, but I am talking about amortized cost. If Java has to create a
new instance it does not allocate memory. Cost of memory allocations
is amortized through many object creations - which cannot be done when
copying.

Marcin Rzeźnicki

unread,

Nov 23, 2009, 1:42:35 PM11/23/09

to

On 23 Lis, 19:31, Roedy Green <see_webs...@mindprod.com.invalid>
wrote:

> On Mon, 23 Nov 2009 12:57:53 +0000, Jon Harrop <j...@ffconsultancy.com>
> wrote, quoted or indirectly quoted someone who said :
>
> > printf "%d %g\n" m.Count m.[100.0]
>
> >That takes around 1s and the Java code takes around 32s here.
>
> How fast is it without the printf. Java's console i/o is painfully
> slow.

It is certainlynot THAT slow. According to profiling session results
it did take about 1ms, while whole profiled program ran in 35s

markspace

unread,

Nov 23, 2009, 1:45:04 PM11/23/09

to

Jon Harrop wrote:
> I was considering the possibility of hand coding type-specialized hash
> tables in Java to work around this deficiency in generics on the JVM when I
> realised that it is actually impossible in the general case because the JVM
> lacks value types and, therefore, is incapable of expressing efficient hash
> tables.
>

If you must use the Map interface, yes. If you can use a specific type,
then you can overload the put() and get().

class DoubleMap extends AbstractMap<Double,Double> {

...

@Overload public Double put( Double key, Double value )
...

public double put( double key, double value )
...

}

I believe that the compiler will always chose the second put() method
when the parameter types involved are double instead of Double. If the
type of the object is DoubleMap, of course. If you use the generic Map
interface as your type, the compiler will have no choice but to use the
first method.

Jon Harrop

unread,

Nov 23, 2009, 3:04:11 PM11/23/09

to

Marcin Rzeźnicki wrote:
> On 23 Lis, 17:11, Jon Harrop <j...@ffconsultancy.com> wrote:
>> Lew wrote:
>> > Jon Harrop wrote:
>>
>> Type erasure forces all values to be cast to Object which forces value
>> types (like double) to be boxed into objects which is an allocation. The
>> JVM boxes all 20,000,000 of the doubles generated by this program whereas
>> the CLR boxes none of them. That is the "orders of magnitude" difference
>> I was referring to.
>
> That's not true Jon. Type erasure does not force anything in your
> example - you even did not use generics at all, so how can it affect
> anything?

Type erasure prevents generics from helping on the JVM as they do on the
CLR.

> The problem here is that Java lacks value types which is orthognal to
> generics implementation

The non-generic F# solution is roughly as slow as the Java. Only the generic
F# solution is faster.

>> > It simply means that a parametrized type is treated as an 'Object' by
>> > the JVM.
>> > It affects the number of casts, which may be affected by the HotSpot
>> > optimizer.
>>
>> Casting a primitive type to Object incurs an allocation.
>
> Not necessarily - Java specification permits caching instances of
> boxed numerics.

Sure. That obviously isn't helping on this benchmark.

>> > If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo'
>> > instances, you have 1001 allocations plus however many allocation/copy
>> > operations are necessary to grow the 'Set' to hold 1000 references (six
>> > with a typical default initial size, zero if you made it big enough to
>> > hold 1000). Type erasure has nothing to do with it - you're still
>> > creating only 'Foo' objects and enough 'Set' slots to refer to them.
>>
>> You are assuming that Foo is an object which is not true in general and
>> is not true in this case.
>
> It is true in general,

In this case, Foo is double which is not an object and Lew's statements do
not apply.

> furthermore it is always true in F# or any
> language implemented on top of CLR where there is no notion of
> "primitive type".

My program is a counter example. In general, the CLR's notion of value types
are equivalent to the JVM's primitive types.

> The thing we should discuss after filtering half-
> truths out is whether difference between value types and reference
> types might cause 32x performance degradation

When using reference types, F# is roughly as slow as Java. The difference
occurs when a generic data structure is parameterized over a value type.

markspace

unread,

Nov 23, 2009, 1:53:03 PM11/23/09

to

Marcin Rzeźnicki wrote:

> That could well be hidden in GC/heap resizing costs if he did not
> allocate Java heap properly. I prevented these effects mostly from
> occurring by running this example with -Xms512m -Xmx512m.

I ran my test (18 seconds for Jon's code on a 32 bit laptop) with
-Xmx800m. Which is why I keep saying that Jon should look at his JVM
flags before trying anything else.

Roedy Green

unread,

Nov 23, 2009, 1:54:08 PM11/23/09

to

On Mon, 23 Nov 2009 10:16:21 -0800, Roedy Green
<see_w...@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>Could you elaborate a bit on what value types would look like if they

>existed in Java and how you would use them for efficient hash tables?

for a little background on them see
http://mindprod.com/jgloss/valuetype.html

Jon Harrop

unread,

Nov 23, 2009, 3:09:43 PM11/23/09

to

Marcin Rzeźnicki wrote:
> Oh yes, conclusions:
> Taking Jon's 32s of the execution time he could have saved around 3-4s
> had he preallocated HashMap.

I already posted results using preallocation: it was no faster.

> He actually did that in his F# so this
> modification alone might have caused F# version to run in, let's say,
> 28s.

Are you speculating that F# would take 28x longer had I not preallocated?

Measuring it, F# takes 1.24s if you don't preallocate and 0.945s if you do.

> He, of course, could not eliminate boxing which might have taken
> around 10s of his original execution time. So subtracting costs of
> boxing from implied theoretical F# version's execution time we end up
> with conclusion that F# should have executed in ~18s (which is
> erroneous proceeder in itself because F# probably copies values from
> stack). Roughly 1:2 in favor of F#.

Eh?

Jon Harrop

unread,

Nov 23, 2009, 3:16:49 PM11/23/09

to

Patricia Shanahan wrote:
> Jon Harrop wrote:
>> I was considering the possibility of hand coding type-specialized hash
>> tables in Java to work around this deficiency in generics on the JVM when
>> I realised that it is actually impossible in the general case because the
>> JVM lacks value types and, therefore, is incapable of expressing
>> efficient hash tables.
>

> Are you sure Java is the right language for your current project? If I
> were fighting the language to the extent you seem to be doing, I would
> seriously reexamine my choice of language.

I'm more interested in modern JVM-based languages like Clojure and Scala. Do
you know if they have done anything to work around these deficiencies in
the JVM?

Also, some influential people on the JVM Languages mailing list were
discussing what they perceived to be the main issues with the JVM today and
they flagged lack of value types as a major issue. This benchmark seems to
have verified their findings.

Moreover, I am amazed that I watched a lecture where a guy from Azul Systems
described the enormous lengths they'd gone to in creating a scalable hash
table built upon the JVM for parallel machines. Their implementation would
have to scale across at least 32 cores (far more than any of today's
consumer-level machines) just to catch up with the CLR!

Jon Harrop

unread,

Nov 23, 2009, 3:18:04 PM11/23/09

to

Roedy Green wrote:
> On Sat, 21 Nov 2009 18:33:14 +0000, Jon Harrop <j...@ffconsultancy.com>
> wrote, quoted or indirectly quoted someone who said :
>
>>My guess is that this is because the JVM is boxing every floating point
>>number individually in the hash table due to type erasure whereas
>
> In the olden days of FORTRAN, you would have handled this by writing a
> version of HashMap that took a floating point primitive. You would
> hard code it in. You have to hold your nose, but if you want just to
> get the job done.

You cannot do that in Java because it cannot represent the array that
underpins the hash table because it requires different primitive types in
the same array.

Jon Harrop

unread,

Nov 23, 2009, 3:20:30 PM11/23/09

to

Roedy Green wrote:
> Could you elaborate a bit on what value types would look like if they
> existed in Java and how you would use them for efficient hash tables?

Sure. They're basically just structs from C or C++ and they make the set of
primitive types (that don't get boxed) user extensible. They're used for
various things including hash table entries (hash, key and value) and
vertex data (texture coords, vertex coords, normal vectors) in a
GPU-friendly format.

Jon Harrop

unread,

Nov 23, 2009, 3:23:06 PM11/23/09

to

Roedy Green wrote:
> On Mon, 23 Nov 2009 12:51:58 +0000, Jon Harrop <j...@ffconsultancy.com>
> wrote, quoted or indirectly quoted someone who said :
>>I believe the JVM is doing orders of magnitude more allocations than .NET
>>here due to its type erasure approach to generics.
>
> I think you mean the extra overhead of boxing and unboxing, or perhaps
> the extra overhead of allocating independent objects, rather that
> plopping them into a single array the way you would with primitives.

Yes.

> You could find out the boxing/allocation overhead by preboxing your
> elements, and saving the unboxing until you had waved the flag.

Good idea!

Marcin Rzeźnicki

unread,

Nov 23, 2009, 2:11:21 PM11/23/09

to

On 23 Lis, 21:09, Jon Harrop <j...@ffconsultancy.com> wrote:
> Marcin Rzeźnicki wrote:
> > Oh yes, conclusions:
> > Taking Jon's 32s of the execution time he could have saved around 3-4s
> > had he preallocated HashMap.
>
> I already posted results using preallocation: it was no faster.
>
> > He actually did that in his F# so this
> > modification alone might have caused F# version to run in, let's say,
> > 28s.
>
> Are you speculating that F# would take 28x longer had I not preallocated?
>
> Measuring it, F# takes 1.24s if you don't preallocate and 0.945s if you do.
>

No, see my answer below

> > He, of course, could not eliminate boxing which might have taken
> > around 10s of his original execution time. So subtracting costs of
> > boxing from implied theoretical F# version's execution time we end up
> > with conclusion that F# should have executed in ~18s (which is
> > erroneous proceeder in itself because F# probably copies values from
> > stack). Roughly 1:2 in favor of F#.
>
> Eh?
>

I started from 32s of your Java program execution time which you'd
reported and subtracted all costs that were suggested not to affect F#
version (preallocation/boxing). This is quite inaccurate because of
different algorithms, environments etc but it enabled me to compare
what F#'s execution time could have been IF Java's was 32s as you'd
observed. The result, as you have seen, is that even if F# did not pay
any additional cost in execution time for value type copying etc it
should execute 2 times faster than Java version. So the only logical
conclusion is that 32 times faster is something not usual.

Jon Harrop

unread,

Nov 23, 2009, 3:30:31 PM11/23/09

to

markspace wrote:
> Marcin Rzeźnicki wrote:
>> That could well be hidden in GC/heap resizing costs if he did not
>> allocate Java heap properly. I prevented these effects mostly from
>> occurring by running this example with -Xms512m -Xmx512m.
>
> I ran my test (18 seconds for Jon's code on a 32 bit laptop)

This is a 2x quadcore 2.0GHz Intel Xeon E5405. What is you CPU?

> with -Xmx800m. Which is why I keep saying that Jon should look at his JVM
> flags before trying anything else.

Doesn't seem to make any difference:

$ time java Hashtbl
hashtable(100.0) = 0.01

real 0m35.749s
user 1m53.287s
sys 0m3.208s
$ time java Hashtbl -Xms512m -Xmx512m
hashtable(100.0) = 0.01

real 0m36.336s
user 2m5.896s
sys 0m3.216s
$ time java Hashtbl -Xmx800m
hashtable(100.0) = 0.01

real 0m34.676s
user 1m55.723s
sys 0m3.332s
$ time java -server -Xmx800m Hashtbl
hashtable(100.0) = 0.01

real 0m37.198s
user 1m55.967s
sys 0m3.220s

Jon Harrop

unread,

Nov 23, 2009, 3:31:12 PM11/23/09

to

Patricia Shanahan wrote:

I think the bottleneck must be the GC by a long long way. Look at this:

$ time java Hashtbl -Xmx800m
hashtable(100.0) = 0.01

real 0m34.676s
user 1m55.723s
sys 0m3.332s

The CPU time taken is actually ~100x worse than for F#. Given that this is a
serial benchmark, that parallelism could only have come from the GC.

Jon Harrop

unread,

Nov 23, 2009, 3:31:40 PM11/23/09

to

Lew wrote:
> Jon Harrop wrote:
>> Type erasure forces all values to be cast to Object which forces value
>> types (like double) to be boxed into objects which is an allocation. The
>> JVM boxes all 20,000,000 of the doubles generated by this program whereas
>> the CLR boxes none of them. That is the "orders of magnitude" difference
>> I was referring to.
>

> That's not an issue of type erasure but of Java not supporting
> collections of primitive types. It is the autoboxing of 'double' to
> 'Double' that you discuss here, not the erasure of 'Double' to
> 'Object'.

Type erasure means that Java's generics cannot be used to improve the
performance of my benchmark.

> Type erasure has nothing to do with value types being boxed. The
> erasure from 'Double' to 'Object' has no further effect on
> performance.
>
> Others in this thread have tried to duplicate the degree of slowdown
> you report without being able to. No one else here seems to see a
> 32:1 ratio in performance.

Has anyone else here even benchmarked the F# code? If so, what performance
ratio did they get?

Marcin Rzeźnicki

unread,

Nov 23, 2009, 2:23:37 PM11/23/09

to

On 23 Lis, 21:04, Jon Harrop <j...@ffconsultancy.com> wrote:
> Marcin Rzeźnicki wrote:
> > On 23 Lis, 17:11, Jon Harrop <j...@ffconsultancy.com> wrote:
> >> Lew wrote:
> >> > Jon Harrop wrote:
>
> >> Type erasure forces all values to be cast to Object which forces value
> >> types (like double) to be boxed into objects which is an allocation. The
> >> JVM boxes all 20,000,000 of the doubles generated by this program whereas
> >> the CLR boxes none of them. That is the "orders of magnitude" difference
> >> I was referring to.
>
> > That's not true Jon. Type erasure does not force anything in your
> > example - you even did not use generics at all, so how can it affect
> > anything?
>
> Type erasure prevents generics from helping on the JVM as they do on the
> CLR.
>

In your case I cannot see generics used at all.

> > The problem here is that Java lacks value types which is orthognal to
> > generics implementation
>
> The non-generic F# solution is roughly as slow as the Java. Only the generic
> F# solution is faster.
>

So you are running different program than you showed us. But that's
not the point. The point is that the only thing which may make it run
so fast is the presence of value types. Which still has nothing to do
with type erasure. Type erasure does not forbid compiler from making
this kind of optimization which you've been talking about.

> >> > It simply means that a parametrized type is treated as an 'Object' by
> >> > the JVM.
> >> > It affects the number of casts, which may be affected by the HotSpot
> >> > optimizer.
>
> >> Casting a primitive type to Object incurs an allocation.
>
> > Not necessarily - Java specification permits caching instances of
> > boxed numerics.
>
> Sure. That obviously isn't helping on this benchmark.
>

Yes, unfortunately.

> >> > If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo'
> >> > instances, you have 1001 allocations plus however many allocation/copy
> >> > operations are necessary to grow the 'Set' to hold 1000 references (six
> >> > with a typical default initial size, zero if you made it big enough to
> >> > hold 1000). Type erasure has nothing to do with it - you're still
> >> > creating only 'Foo' objects and enough 'Set' slots to refer to them.
>
> >> You are assuming that Foo is an object which is not true in general and
> >> is not true in this case.
>
> > It is true in general,
>
> In this case, Foo is double which is not an object and Lew's statements do
> not apply.
>

Ok, nevermind. It is not relevant to our discussion I suppose (his
assumption that "Foo is an object" IS true)

> > furthermore it is always true in F# or any
> > language implemented on top of CLR where there is no notion of
> > "primitive type".
>
> My program is a counter example. In general, the CLR's notion of value types
> are equivalent to the JVM's primitive types.
>

No, it is not. Value types can be heap-allocated while primitives
cannot.

> > The thing we should discuss after filtering half-
> > truths out is whether difference between value types and reference
> > types might cause 32x performance degradation
>
> When using reference types, F# is roughly as slow as Java. The difference
> occurs when a generic data structure is parameterized over a value type.
>

Could you please take a look at my benchmark (it states that slowdown
should be ~30%) and, if you have access to a machine with NetBeans (or
any Java profiler), and perform similar profiling on your own? I am
suspecting there is something more than than meets the eye.

Lew

unread,

Nov 23, 2009, 2:30:12 PM11/23/09

to

Jon Harrop wrote:
> Copying a few bytes of data is a *lot* faster than allocating a few bytes of
> data.
>

According to sun.com and Brian Goetz's articles on DeveloperWorks,
object allocation in Java (discounting GC) is an O(1) operation
involving 10-12 machine instructions.
<http://www.ibm.com/developerworks/java/library/j-jtp09275.html>
"The answer may surprise you -- allocation in modern JVMs is far
faster than the best performing malloc implementations. The common
code path for new Object() in HotSpot 1.4.2 and later is approximately
10 machine instructions (data provided by Sun; see Resources),"

--
Lew

Marcin Rzeźnicki

unread,

Nov 23, 2009, 2:43:49 PM11/23/09

to

When I was running mine I observed that only ~6% of time was marked as
spent in GC and even that only if I allocated too little space for the
heap (then, right before OutOfMemory it started to reclaim space
wildly but, of course, failed). When I set heap to -Xms512m -Xmx512m
GC time was negligibly small. I wonder what would happened if you
repeated your tests with -Xms800m -Xmx800m.

Jon Harrop

unread,

Nov 23, 2009, 4:00:32 PM11/23/09

to

Marcin Rzeźnicki wrote:
> In your case I cannot see generics used at all.

In the F#?

>> The non-generic F# solution is roughly as slow as the Java. Only the
>> generic F# solution is faster.
>
> So you are running different program than you showed us.

I showed you the generic F# program that is 32x faster than the Java
program.

> But that's
> not the point. The point is that the only thing which may make it run
> so fast is the presence of value types.

Perhaps you mean that without generics but with value types you could obtain
the same performance by writing a type specialized DoubleDoubleHashMap? I'd
agree with that. My point was that using a non-generic Hashtable in F#
destroys its performance.

> Which still has nothing to do
> with type erasure. Type erasure does not forbid compiler from making
> this kind of optimization which you've been talking about.

The VM cannot type specialize data structures if the type information has
been erased.

>> > furthermore it is always true in F# or any
>> > language implemented on top of CLR where there is no notion of
>> > "primitive type".
>>
>> My program is a counter example. In general, the CLR's notion of value
>> types are equivalent to the JVM's primitive types.
>
> No, it is not. Value types can be heap-allocated while primitives
> cannot.

Local value/primitive types are stack allocated. Heap allocated reference
types may contain value/primitive types.

>> > The thing we should discuss after filtering half-
>> > truths out is whether difference between value types and reference
>> > types might cause 32x performance degradation
>>
>> When using reference types, F# is roughly as slow as Java. The difference
>> occurs when a generic data structure is parameterized over a value type.
>
> Could you please take a look at my benchmark (it states that slowdown
> should be ~30%)

No, it shows that ~30% of the time is spent allocating in the mutator. It
says nothing about how much time is spent collecting in the GC. Given that
this benchmark spends half of its time burning all eight of my cores, I
think collection must be the bottleneck.

Jon Harrop

unread,

Nov 23, 2009, 4:06:19 PM11/23/09

to

Marcin Rzeźnicki wrote:
> I started from 32s of your Java program execution time which you'd
> reported and subtracted all costs that were suggested not to affect F#
> version (preallocation/boxing). This is quite inaccurate because of
> different algorithms, environments etc but it enabled me to compare
> what F#'s execution time could have been IF Java's was 32s as you'd
> observed. The result, as you have seen, is that even if F# did not pay
> any additional cost in execution time for value type copying etc it
> should execute 2 times faster than Java version. So the only logical
> conclusion is that 32 times faster is something not usual.

You didn't account for time spent garbage collecting.

FWIW, the F# performs only one collection of each generation. Verbose GC
information on the Java code produces pages and pages of collections.

Jon Harrop

unread,

Nov 23, 2009, 4:07:48 PM11/23/09

to

Jon Harrop wrote:
> Doesn't seem to make any difference:

Spoke too soon. This is 4x faster than before:

$ time java -Xms2000m -Xmx2000m Hashtbl
hashtable(100.0) = 0.01

real 0m8.198s
user 0m45.227s
sys 0m1.068s

Marcin Rzeźnicki

unread,

Nov 23, 2009, 3:19:18 PM11/23/09

to

On 23 Lis, 22:00, Jon Harrop <j...@ffconsultancy.com> wrote:
> Marcin Rzeźnicki wrote:
> > In your case I cannot see generics used at all.
>
> In the F#?
>
> >> The non-generic F# solution is roughly as slow as the Java. Only the
> >> generic F# solution is faster.
>
> > So you are running different program than you showed us.
>
> I showed you the generic F# program that is 32x faster than the Java
> program.
>

In your program you have:
let m = System.Collections.Generic.Dictionary(n)

Maybe I am not reading this line correctly but where are generics used
here?

> > But that's
> > not the point. The point is that the only thing which may make it run
> > so fast is the presence of value types.
>
> Perhaps you mean that without generics but with value types you could obtain
> the same performance by writing a type specialized DoubleDoubleHashMap? I'd
> agree with that. My point was that using a non-generic Hashtable in F#
> destroys its performance.
>

More or less, I think that specialized double map would have performed
~50% better.

> > Which still has nothing to do
> > with type erasure. Type erasure does not forbid compiler from making
> > this kind of optimization which you've been talking about.
>
> The VM cannot type specialize data structures if the type information has
> been erased.
>

Yes, but if a value type were descendants of Object (as it is done in
CLR) then type erasure would not prevent compiler from using this
optimization. The cost of type erasure would lie only in casts that
would have to be performed.

> >> > furthermore it is always true in F# or any
> >> > language implemented on top of CLR where there is no notion of
> >> > "primitive type".
>
> >> My program is a counter example. In general, the CLR's notion of value
> >> types are equivalent to the JVM's primitive types.
>
> > No, it is not. Value types can be heap-allocated while primitives
> > cannot.
>
> Local value/primitive types are stack allocated. Heap allocated reference
> types may contain value/primitive types.
>

Sometimes :-) Here is great article explaining this matter:
http://blogs.msdn.com/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx

> >> > The thing we should discuss after filtering half-
> >> > truths out is whether difference between value types and reference
> >> > types might cause 32x performance degradation
>
> >> When using reference types, F# is roughly as slow as Java. The difference
> >> occurs when a generic data structure is parameterized over a value type.
>
> > Could you please take a look at my benchmark (it states that slowdown
> > should be ~30%)
>
> No, it shows that ~30% of the time is spent allocating in the mutator. It
> says nothing about how much time is spent collecting in the GC. Given that
> this benchmark spends half of its time burning all eight of my cores, I
> think collection must be the bottleneck.
>

Well, yes, it does not show GC costs. If you happened to use NetBeans
profiler you could easily see what these are

Roedy Green

unread,

Nov 23, 2009, 3:20:56 PM11/23/09

to

On Mon, 23 Nov 2009 20:18:04 +0000, Jon Harrop <j...@ffconsultancy.com>

wrote, quoted or indirectly quoted someone who said :

>You cannot do that in Java because it cannot represent the array that

>underpins the hash table because it requires different primitive types in
>the same array.

Really really holding your nose, you have to map them to longs,
stealing a low order bit to encode privitive type.

Marcin Rzeźnicki

unread,

Nov 23, 2009, 3:23:29 PM11/23/09

to

Right, so - the next conclusion is that real costs are elsewhere, in
some place I did not take into account. That may be GC and heap
resizing. If you start with (Xms) heap too small then you'll have
frequent collections and, additionally, cost of heap resizing will be
more and more noticeable.

markspace

unread,

Nov 23, 2009, 3:34:40 PM11/23/09

to

Jon Harrop wrote:
> markspace wrote:
>> Marcin Rzeźnicki wrote:
>>> That could well be hidden in GC/heap resizing costs if he did not
>>> allocate Java heap properly. I prevented these effects mostly from
>>> occurring by running this example with -Xms512m -Xmx512m.
>> I ran my test (18 seconds for Jon's code on a 32 bit laptop)
>
> This is a 2x quadcore 2.0GHz Intel Xeon E5405. What is you CPU?

Intel Core Duo T2600 (2.16 GHz).

2 Gigs of main memory installed. 667MHz - 2 DIMM Slots (2 x 1GB).

>
> $ time java Hashtbl

This obviously includes the start-up time of the JVM. That's not the
runtime of your algorithm.

Try this:

class HashM
{

public static void main( String... args )
{
long start = System.nanoTime();
HashMap<Double,Double> hashM = new HashMap<Double, Double>();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashM.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashMap time: "+ (end-start)/1000000 );
out.println( "hashmap(100.0) = " +
hashM.get( 100.0 ) );
}
}

I'm also curious: what runs the F# runtime under Unix? Which version of
Unix are you using? Are you dual booting or running Cygwin?

I'm still suspecting that it's differences in the environment (drivers
under Unix/Cygwin vs. Microsoft) that make the most difference in your
observed times.

markspace

unread,

Nov 23, 2009, 3:41:47 PM11/23/09

to

Jon Harrop wrote:
> Jon Harrop wrote:
>> Doesn't seem to make any difference:
>
> Spoke too soon. This is 4x faster than before:
>
> $ time java -Xms2000m -Xmx2000m Hashtbl
> hashtable(100.0) = 0.01
>
> real 0m8.198s

This number here is close to the real CPU run times I'm seeing using
HashMap, and about half the time I see using your Hashtbl benchmark. I
think we're closer to running with equivalent environments now. I'm
going to guess that your 64 bit JVM also needs a bit more memory than
mine (-Xmx800m) to prevent or at least reduce garbage collection.

John B. Matthews

unread,

Nov 23, 2009, 3:45:52 PM11/23/09

to

In article
<f97396c8-46ac-4c38...@g31g2000vbr.googlegroups.com>,

Marcin Rze�nicki <marcin.r...@gmail.com> wrote:

> Could you please take a look at my benchmark (it states that slowdown
> should be ~30%) and, if you have access to a machine with NetBeans
> (or any Java profiler), and perform similar profiling on your own? I
> am suspecting there is something more than than meets the eye.

As you observed above, Jon needs to specify the initialCapacity of his
Java data structure, just as he does in F# and I do in Ada. The
difference changes the Java profile percentages significantly. A larger
initial heap is helpful, too:

<http://sites.google.com/site/trashgod/hashmap>

As others have observed, something else is awry.

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

Roedy Green

unread,

Nov 23, 2009, 9:15:38 PM11/23/09

to

On Mon, 23 Nov 2009 21:07:48 +0000, Jon Harrop <j...@ffconsultancy.com>

wrote, quoted or indirectly quoted someone who said :

>

>Spoke too soon. This is 4x faster than before:

Also keep in mind the way the JVM works. It runs code in interpretive
mode for a while, the takes time out to compile it to native machine
code, then picks up where it left off. You want to discount that
initial period.

--
Roedy Green Canadian Mind Products
http://mindprod.com

I mean the word proof not in the sense of the lawyers, who set two half proofs equal to a whole one, but in the sense of a mathematician, where half proof = 0, and it is demanded for proof that every doubt becomes impossible.
~ Carl Friedrich Gauss

Roedy Green

unread,

Nov 23, 2009, 9:43:09 PM11/23/09

to

On Sat, 21 Nov 2009 18:33:14 +0000, Jon Harrop <j...@ffconsultancy.com>

wrote, quoted or indirectly quoted someone who said :

> import java.util.Hashtable;
>
> public class Hashtbl {
> public static void main(String args[]){
> Hashtable hashtable = new Hashtable();
>
> for(int i=1; i<=10000000; ++i) {
> double x = i;
> hashtable.put(x, 1.0 / x);
> }
>
> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
> }
> }

Some more datapoints:

java Hashtbl --> out of memory error

java -Xmx1000m Hashtbl (hashtable(100.0) = 0.01) --> 29 secs

Hashtbl.exe(jet statically compiled, hashtable(100.0) = null)--> 5 sec
(Why did this code fail? I have reported the bug to jet).

machine is Athlon 64 X2 3800+, 2GHz. 3 gig ram.

Since you did not provide initial size constraints, the table will
have to be recopied over and over to double the buffer size.

Most of us now have dual core machines. Perhaps with such large
datasets you could split the work for two cpus to work simultaneously,
e.g. two HashSets, one for small numbers and one for big ones.

Jon Harrop

unread,

Nov 24, 2009, 1:13:49 AM11/24/09

to

Marcin Rzeźnicki wrote:
> In your program you have:
> let m = System.Collections.Generic.Dictionary(n)
>
> Maybe I am not reading this line correctly but where are generics used
> here?

F# has type inference so you don't specify the types in the source code. In
this case, it infers Dictionary<float, float> from its subsequent use.

>> > Which still has nothing to do
>> > with type erasure. Type erasure does not forbid compiler from making
>> > this kind of optimization which you've been talking about.
>>
>> The VM cannot type specialize data structures if the type information has
>> been erased.
>
> Yes, but if a value type were descendants of Object (as it is done in
> CLR) then type erasure would not prevent compiler from using this
> optimization.

Type erasure relies upon a uniform representation to work. Value types
cannot work with that. So you'd end up boxing your value types to get the
same uniform (reference) representation for all types.

> The cost of type erasure would lie only in casts that
> would have to be performed.

Casting a value type to Object will almost certainly incur an allocation.
That is the cost (and the subsequent collection, of course).

Peter Duniho

unread,

Nov 24, 2009, 4:33:39 AM11/24/09

to

Copying a 64-bit word from one memory location to another should be a
single instruction, or two at most.

So, an allocation is 5 to 10 times more work, not even counting that
after the allocation, the instance needs initialization.

All of the above analysis (from Jon's post onward, including my own) is
completely bogus anyway, because a lot of the overhead here isn't
directly related to the instruction count, but rather to speed of RAM
and/or cache, and what happens to be in the cache, and a lot of other
stuff that's unpredictable from the information available and which may
be amortized over a large number of instructions executed anyway.

But if you're looking just a numbers of instructions, I'd say even based
on the reference you've provided, copying a 64-bit word from one place
to another is a lot faster than allocating a few byte of data.

Pete

Peter Duniho

unread,

Nov 24, 2009, 4:39:23 AM11/24/09

to

Jon Harrop wrote:
> [...]

> Also, some influential people on the JVM Languages mailing list were
> discussing what they perceived to be the main issues with the JVM today and
> they flagged lack of value types as a major issue. This benchmark seems to

> have verified their findings. [...]

I agree that lack of user-defined value types in Java is a significant
missing feature. But, I don't see how that directly leads to the
performance problem here. That is, you can still have an array of
"double" instead of "Double", and that array has value type semantics
for each element.

So if you _need_ to avoid the boxing overhead, you can. It's just more
work, because you can't leverage an existing collection class to do it.

(I'll also point out that your comments about generics might not be
subjected to so much nit-picking if your original code example had in
fact used a generic class. I realize that in the end, the performance
characteristics are going to be the same whether you use Hashtable in
its raw form or generic form, but we're all programmers here, and that
means a higher-than-normal proportion of nit-pickers :) ).

Pete

Marcin Rzeźnicki

unread,

Nov 24, 2009, 7:50:50 AM11/24/09

to

On 24 Lis, 07:13, Jon Harrop <j...@ffconsultancy.com> wrote:
> Marcin Rzeźnicki wrote:
> > In your program you have:
> > let m = System.Collections.Generic.Dictionary(n)
>
> > Maybe I am not reading this line correctly but where are generics used
> > here?
>
> F# has type inference so you don't specify the types in the source code. In
> this case, it infers Dictionary<float, float> from its subsequent use.
>

I see, thx for the explanation.

> >> > Which still has nothing to do
> >> > with type erasure. Type erasure does not forbid compiler from making
> >> > this kind of optimization which you've been talking about.
>
> >> The VM cannot type specialize data structures if the type information has
> >> been erased.
>
> > Yes, but if a value type were descendants of Object (as it is done in
> > CLR) then type erasure would not prevent compiler from using this
> > optimization.
>
> Type erasure relies upon a uniform representation to work. Value types
> cannot work with that. So you'd end up boxing your value types to get the
> same uniform (reference) representation for all types.
>

??

> > The cost of type erasure would lie only in casts that
> > would have to be performed.
>
> Casting a value type to Object will almost certainly incur an allocation.
> That is the cost (and the subsequent collection, of course).
>

I'll have to check that because I am not sure whether what you said is
true. I can certainly imagine a system where allocation is not needed
in such case.

Tom Anderson

unread,

Nov 24, 2009, 2:09:19 PM11/24/09

to

On Mon, 23 Nov 2009, Jon Harrop wrote:

> Roedy Green wrote:
>> On Sat, 21 Nov 2009 18:33:14 +0000, Jon Harrop <j...@ffconsultancy.com>
>> wrote, quoted or indirectly quoted someone who said :
>>
>>> My guess is that this is because the JVM is boxing every floating point
>>> number individually in the hash table due to type erasure whereas
>>
>> In the olden days of FORTRAN, you would have handled this by writing a
>> version of HashMap that took a floating point primitive. You would
>> hard code it in. You have to hold your nose, but if you want just to
>> get the job done.
>
> You cannot do that in Java because it cannot represent the array that
> underpins the hash table because it requires different primitive types
> in the same array.

You could have separate arrays for keys and values (and chain pointers if
you want to use collision lists rather than reprobing). You lose the cache
locality between keys and values, but you would improve the cache density
of keys: for heavily-loaded tables, where the ratio of accesses to keys to
accesses to values is significantly greater than one, that might well be a
performance win.

Pain in the arse to code, though!

tom

--
I don't know what the hell you should do. Try clicking on some shit
or somethin'.

Tom Anderson

unread,

Nov 24, 2009, 2:12:06 PM11/24/09

to

On Mon, 23 Nov 2009, Jon Harrop wrote:

> Also, some influential people on the JVM Languages mailing list were
> discussing what they perceived to be the main issues with the JVM today
> and they flagged lack of value types as a major issue. This benchmark
> seems to have verified their findings.

Yes, it's something i've been wanting for years. The trouble is that there
are a few little things in java's semantics that make adding value types
(by which i really mean inlined objects, since we're talking about
implementation technique rather than semantics at this point) under the
hood amazingly hard. Adding them as a language-level feature would get
round this, but would be a big change at this point in java's life cycle.

> Moreover, I am amazed that I watched a lecture where a guy from Azul
> Systems described the enormous lengths they'd gone to in creating a
> scalable hash table built upon the JVM for parallel machines. Their
> implementation would have to scale across at least 32 cores (far more
> than any of today's consumer-level machines) just to catch up with the
> CLR!

As long as you only want to store doubles in it. I imagine that if you're
using object keys, the CLR's advantage is rather smaller.

Tom Anderson

unread,

Nov 24, 2009, 2:17:10 PM11/24/09

to

On Sun, 22 Nov 2009, Marcin Rze?nicki wrote:

> On 21 Lis, 20:44, Tom Anderson <t...@urchin.earth.li> wrote:
>> On Sat, 21 Nov 2009, Marcin Rze?nicki wrote:
>>> On 21 Lis, 19:33, Jon Harrop <j...@ffconsultancy.com> wrote:
>>>> I'm having trouble getting Java's hash tables to run as fast as .NET's.
>>>> Specifically, the following program is 32x slower than the equivalent
>>>> on .NET:
>>>
>>> You are using Hashtable instead of HashMap - probably the performance
>>> loss you've observed is due to synchronization (though "fat"
>>> synchronization might be optimized away in case of single thread you
>>> still pay the price, though lower).
>>
>> I'd be *very* surprised if that was true. In this simple program,
>> escape analysis could eliminate the locking entirely - and current
>> versions of JDK 1.6 do escape analysis.
>
> First of all, escape analysis is turned off by default.

Curses! I knew it was experimental when it first came into 1.6, but i
thought it was mature and on by default in the latest version.

> The next thing is that there is subtle difference between synchronized
> method and synchronized block. Hashtable has the former - escape
> analysis does not help here very much afaik.

Could you explain why? I don't see why they should be any different. You
can rewrite:

synchronized void foo() {
// ...
}

As:

void foo() {
synchronized (this) {
// ...
}
}

With exactly the same semantics, no?

Tom Anderson

unread,

Nov 24, 2009, 2:23:31 PM11/24/09

to

On Sun, 22 Nov 2009, markspace wrote:

> Note: found this site. Haven't read it all but it seems cool:
>
> <http://www.partow.net/programming/hashfunctions/>

It lists neither Bob Jenkins' nor Paul Hsieh's hashes, and as such is of
only academic interest, since these are way better than the 20th-century
hashes. See:

http://burtleburtle.net/bob/hash/doobs.html
http://www.azillionmonkeys.com/qed/hash.html

Tom Anderson

unread,

Nov 24, 2009, 2:26:28 PM11/24/09

to

On Sun, 22 Nov 2009, markspace wrote:

> Kevin McMurtrie wrote:
>
>> Yes. Creating custom hashing classes for primitives pays off if
>> performance needs to be very high.
>
> I've actually been playing around with "alternate ways of handling
> collisions" and I agree that hashing a double is hard. The default
> algorithm both Tom and I used:
>
> long bits = Double.doubleToLongBits( key );
> int hash = (int)(bits ^ (bits >>> 32));
>
> provides terrible performance.

Interesting. I chose that function because it's what java.lang.Double
does, rather than because i thought it'd be good, but i am surprised to
hear it's terrible - doubles are quite complicated internally, so would
have thought that a parade of natural numbers would give reasonably
well-distributed hashes this way (whereas longs wouldn't, of course). How
did you conclude it's terrible? Can you reccommend a better hash function
for doubles?

markspace

unread,

Nov 24, 2009, 5:06:54 PM11/24/09

to

Tom Anderson wrote:
>
>> long bits = Double.doubleToLongBits( key );
>> int hash = (int)(bits ^ (bits >>> 32));
>>
>> provides terrible performance.
>
> Interesting. I chose that function because it's what java.lang.Double
> does, rather than because i thought it'd be good, but i am surprised to
> hear it's terrible - doubles are quite complicated internally, so would
> have thought that a parade of natural numbers would give reasonably
> well-distributed hashes this way (whereas longs wouldn't, of course).
> How did you conclude it's terrible?

Writing my own hash table implementation, I noticed that I was getting
terrible performance with a ton of collisions and everything was heaped
up in a tiny spot in the table.

Inspecting the hash in hexadecimal, I realized that Jon's data keys --
the natural counting numbers 1, 2, 3, etc. -- are represented in a
double as a few bits in the upper most bits of the double. The lower
bits are always 0, even after slicing the 64 bit double's bit pattern in
half and xoring the two halves.

This xoring results in regular hash bit patterns like:

0x20200000
0x40200000
0x40600000
0x60200000
etc. as the numbers count up
(bit patterns made up from memory, but you get the idea.)

i.e., hashes with very few bits different, and all in the upper most
bits of the hash. This is exactly the opposite of what you want in a
good hash, which is lots of randomness in the lower bits of the hash code.

So I concluded: absent any other perturbation in the hash, it sucks.