import java.util.Hashtable;
public class Hashtbl {
public static void main(String args[]){
Hashtable hashtable = new Hashtable();
for(int i=1; i<=10000000; ++i) {
double x = i;
hashtable.put(x, 1.0 / x);
}
System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
}
}
My guess is that this is because the JVM is boxing every floating point
number individually in the hash table due to type erasure whereas .NET
creates a specialized data structure specifically for a float->float hash
table with the floats unboxed. Consequently, the JVM is doing enormously
numbers of allocations whereas .NET is not.
Is that correct?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u
> I'm having trouble getting Java's hash tables
I note you're using java.util.Hashtable. That's an ancient class which has
a few things wrong with it: i'd strongly suggest using java.util.HashMap
instead. I wouldn't expect it to be significantly faster in this test,
though.
> to run as fast as .NET's. [...] My guess is that this is because the JVM
> is boxing every floating point number individually in the hash table due
> to type erasure whereas .NET creates a specialized data structure
> specifically for a float->float hash table with the floats unboxed.
> Consequently, the JVM is doing enormously numbers of allocations whereas
> .NET is not.
>
> Is that correct?
I can't comment on what the CLR is doing in a comparable situation, but
that is certainly a reasonable description of what java is doing.
Could you try changing the put line to:
hashtable.put(Double.toString(x), Double.toString(1.0 / x));
And making the corresponding change in the C# or whatever version, and
making the comparison again? That eliminates boxing in java, so if the
difference is due to boxing, it will be significantly reduced, which will
give you some clues as to what's going on.
If you need optimal performance for a double->double map in java, you
would need to write one specifically for that case. Or rather, someone
would: you could have a google to see if someone already has. Also, since
converting doubles to longs is cheap and reversible, you could look for a
long->long hashmap, which strikes me as more likely to exist. I didn't
find one in two minutes of googling, but i did find some long->Object
hashmaps, which get you halfway there.
tom
--
Oh, well of course *everything* looks bad if you remember it
In addition to using HashMap, you might also specify an appropriate
initialCapacity and loadFactor. You may be seeing a lot of rehash
operations.
<http://java.sun.com/javase/6/docs/api/java/util/HashMap.html>
--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>
Yes.
And why do you ask when it is so easy to test and verify?
Arne
I would still consider it boxing - just to String instead of Double.
Arne
In case there is faster alternative that I'm not aware of.
Hi Jon
You are using Hashtable instead of HashMap - probably the performance
loss you've observed is due to synchronization (though "fat"
synchronization might be optimized away in case of single thread you
still pay the price, though lower). If you took a look at JavaDoc,
you'd notice that HashTable methods are synchronized As of boxing, you
are correct (though there is no type erasure in your example because
you did not specify type parameters at all) but I suspect that these
costs are not the most contributing factor to overall poor
performance. I'd blame synchronization in the first place.
I think there has to be something more to it than just the autoboxing.
My reasoning is that you never reuse a key, so every put call creates a
new Entry instance. Creating a Double from a double is about as simple
as object creation can be, so I don't see how the boxing could to more
than triple the time spent in object creation during an average put
call. That cannot, by itself, account for a 32x performance ratio.
Patricia
Right. In fact, because the conversion to a string is costlier, it
could obscure the real performance costs inherent in the hash data
structure.
One could do a similar "even the odds" change simply by forcing boxing
in .NET, by using "Object" as the type in the .NET data structure
instead of "Double". That way both versions are boxing, but in the most
efficient way available for the platform.
Pete
I think it can.
I did some test on my PC.
Java both HashMap and Hashtable are about 8 times slower
than .NET Dictionary<double,double>, but both .NET
Dictionary<object,object> and Hashtable are about the
same speed as Java.
Arne
> You are using Hashtable instead of HashMap - probably the performance
> loss you've observed is due to synchronization (though "fat"
> synchronization might be optimized away in case of single thread you
> still pay the price, though lower). If you took a look at JavaDoc,
> you'd notice that HashTable methods are synchronized As of boxing, you
> are correct (though there is no type erasure in your example because
> you did not specify type parameters at all) but I suspect that these
> costs are not the most contributing factor to overall poor
> performance. I'd blame synchronization in the first place.
That does not match measurements.
Arne
> On 21 Lis, 19:33, Jon Harrop <j...@ffconsultancy.com> wrote:
>> I'm having trouble getting Java's hash tables to run as fast as .NET's.
>> Specifically, the following program is 32x slower than the equivalent
>> on .NET:
>>
>> � import java.util.Hashtable;
>>
>> � public class Hashtbl {
>> � � public static void main(String args[]){
>> � � � Hashtable hashtable = new Hashtable();
>>
>> � � � for(int i=1; i<=10000000; ++i) {
>> � � � � double x = i;
>> � � � � hashtable.put(x, 1.0 / x);
>> � � � }
>>
>> � � � System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
>> � � }
>> � }
>>
>> My guess is that this is because the JVM is boxing every floating point
>> number individually in the hash table due to type erasure whereas .NET
>> creates a specialized data structure specifically for a float->float hash
>> table with the floats unboxed. Consequently, the JVM is doing enormously
>> numbers of allocations whereas .NET is not.
>>
>> Is that correct?
>
> You are using Hashtable instead of HashMap - probably the performance
> loss you've observed is due to synchronization (though "fat"
> synchronization might be optimized away in case of single thread you
> still pay the price, though lower). If you took a look at JavaDoc, you'd
> notice that HashTable methods are synchronized As of boxing, you are
> correct (though there is no type erasure in your example because you did
> not specify type parameters at all) but I suspect that these costs are
> not the most contributing factor to overall poor performance. I'd blame
> synchronization in the first place.
I'd be *very* surprised if that was true. In this simple program, escape
analysis could eliminate the locking entirely - and current versions of
JDK 1.6 do escape analysis. Even if for some reason it didn't, you'd only
be using a thin lock here, which takes two x86 instructions and one memory
access for each lock and unlock operation, far less than the boxing or
unboxing.
I modified the test code to look like this (yes, with no warmup - this is
very quick and dirty):
import java.util.Map;
import java.util.HashMap;
import java.util.Hashtable;
public class HashPerf {
public static void main(String args[]) throws InterruptedException{
for(int i=1; i<=100; ++i) {
long t0 = System.nanoTime();
test();
long t1 = System.nanoTime();
long dt = t1 - t0;
System.out.println(dt);
System.gc();
Thread.sleep(200);
}
}
private static void test(){
Map<Double, Double> hashtable = new HashMap<Double, Double>();
// Map<Double, Double> hashtable = new Hashtable<Double, Double>();
for(int i=1; i<=1000; ++i) {
double x = i;
// synchronized (hashtable) {
hashtable.put(x, 1.0 / x);
// }
}
}
}
And then ran it with three variations on the comments: one as above, one
uncommenting the synchronization of the hashtable, and one switching the
HashMap to a Hashtable. I have java 1.5.0_19 on an elderly and ailing
PowerPC Mac laptop. I ran with -server and otherwise stock settings.
The timings for each show the usual power curve distribution: 80% of the
measurements are no more than 50% longer than the fastest, and 90% are no
more than twice as long, with the last 10% being up to 10 times longer. If
we say that the slowest 10% are artifacts of warmup, GC, the machine doing
other things, etc, and ignore them, then the average times i got were
(with standard error of the mean, which is broadly like a ~60% confidence
limit IIRC):
HashMap 933500 +/- 15006
sync HashMap 1003200 +/- 16187
Hashtable 868322 +/- 11602
That is, adding synchronization to the accesses adds a 7.5% overhead.
Although somehow, the old Hashtable comes out faster!
So, even with java 1.5, adding synchronization to HashMap.put() imposes
only a small performance penalty - i'd expect it to be less with 1.6. I
doubt very much that this is the major factor in the OP's performance
problem.
tom
--
... the gripping first chapter, which literally grips you because it's
printed on a large clamp.
I take the point that it's still creating an object. But it is simply,
undeniably, not boxing.
Moreover, the point is that i was suggesting adding the *same* object
creation to both the java and .NET versions, which would eliminate any
differences in language-level boxing behaviour.
tom
--
Ten years of radio astronomy have taught humanity more about the creation
and organization of the universe than thousands of years of religion
and philosophy. -- P. C. W. Davis
But you can force boxing in .NET just by using Object as the type for
the collection class type parameters instead of Double. So wouldn't a
better test be to do that? Rather than introducing yet another point of
possible incongruity? If you are concerned about "differences in
language-level boxing behavior", I'd think there'd be just as much
concern about "differences in language-level string conversion
behavior". I see no reason to think that boxing in .NET would be any
more or less different from boxing in Java than string conversion in
.NET would be from string conversion in Java.
In any case, I think Arne's point still stands though: "boxing" can be
thought of generally or specifically. Specifically, you're
right...conversion to String isn't strictly speaking boxing. But
generally, inasmuch as "boxing" is simply a way to convert a value type
(primitive) to a reference type (class), converting to String is as
valid a way to "box" a value type as converting to Object or Double.
It really just depends on how strictly you choose to define boxing, and
how literally you take the phrase "consider it boxing". In some sense,
converting to a String isn't "eliminating boxing in Java", but rather
just accomplishing the same thing as boxing but in a slightly different way.
A person who insists with absolute stubbornness that there is only one
possible interpretation of a particular term or phrase may of course
disagree with my analysis of the question. :)
Pete
> My guess is that this is because the JVM is boxing every floating point
> number individually in the hash table due to type erasure whereas .NET
> creates a specialized data structure specifically for a float->float hash
> table with the floats unboxed. Consequently, the JVM is doing enormously
> numbers of allocations whereas .NET is not.
>
> Is that correct?
No, not primarily. In my testing I found:
1. I think the JVM itself makes the most difference.
2. Next was synchronization.
3. Autoboxing appears to have an impact but it's less than either #1 or #2.
The first time I ran your program, it took over 79 seconds to execute.
During the course of testing the other algorithms, that time reduced to
about 13 seconds and stayed there. Very odd, but I assume that
something somewhere optimized the code.
Unlike a lot of the posters here, I'm running a 32 client JVM, which is
slow to optimize code. 64 bit systems run in -server mode by default,
which optimizes code before executing it.
The first place to look is your environment: memory, flags to the JVM,
which JVM version and vendor, other "system" variables, etc.
Somewhat surprising to me, just substituting a a HashMap for the
HashTable reduced the time to about half -- 6 seconds instead of 13.
Both autobox, the only difference is synchronization.
Lastly, removing the autoboxing by writing a specialized class saved
about 0.7 second from just using HashMap. The specialized version time
was 5.3 seconds. Note that I allocated a new object myself -- the
"Entry" for the HashMap -- each time a new double is added. I suspect
that any reasonable C# hash object must do the same, so you're not
losing as much time as you think by Java's auto object creation.
I'm curious to see your C# version, and see actual times for both
programs on your system. The results might tell us what is going on.
"32x" doesn't really give us much of a clue.
Output of the following:
<output>
run-single:
HashTable time: 12844
hashtable(100.0) = 0.01
HashMap time: 6016
hashtable(100.0) = 0.01
HashTest time: 5373
hashtable(100.0) = 0.01
BUILD SUCCESSFUL (total time: 29 seconds)
</output>
<code>
package cljp;
import java.util.HashMap;
import java.util.Hashtable;
import static java.lang.System.out;
public class HashTest
{
public static void main( String[] args )
{
Hashtbl.test();
HashM.test();
HashTest.test();
}
HashMap<Entry,Entry> hash = new HashMap<Entry,Entry>();
private static class Entry {
final double key;
final double value;
public Entry( double key, double value )
{
this.key = key;
this.value = value;
}
@Override
public int hashCode()
{
long bits = Double.doubleToLongBits( key );
bits ^= bits >>> 32;
return (int)bits;
}
@Override
public boolean equals( Object obj )
{
if( !(obj instanceof Entry ) ){
return false;
}
return key == ((Entry)obj).key;
}
}
public void put( double key, double value ) {
Entry e = new Entry(key, value );
hash.put( e, e );
}
public double get( double key ) {
Entry e = new Entry( key, 0.0 );
Entry valueEntry = hash.get( e );
if( valueEntry != null ) {
return valueEntry.value;
} else {
throw new IllegalArgumentException("Not found: "+key);
}
}
public static void test()
{
long start = System.nanoTime();
HashTest hashTest = new HashTest();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashTest.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashTest time: "+ (end-start)/1000000 );
out.println( "hashtable(100.0) = " +
hashTest.get( 100.0 ) );
}
}
class HashM
{
public static void test()
{
long start = System.nanoTime();
HashMap<Double,Double> hashM = new HashMap<Double, Double>();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashM.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashMap time: "+ (end-start)/1000000 );
out.println( "hashtable(100.0) = " +
hashM.get( 100.0 ) );
}
}
class Hashtbl
{
public static void test()
{
long start = System.nanoTime();
Hashtable hashtable = new Hashtable();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashtable.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashTable time: "+ (end-start)/1000000 );
out.println( "hashtable(100.0) = " +
hashtable.get( 100.0 ) );
}
}
</code>
> Tom Anderson wrote:
>> On Sat, 21 Nov 2009, Arne Vajh?j wrote:
>>
>>> Tom Anderson wrote:
>>>> Could you try changing the put line to:
>>>>
>>>> hashtable.put(Double.toString(x), Double.toString(1.0 / x));
>>>>
>>>> And making the corresponding change in the C# or whatever version, and
>>>> making the comparison again? That eliminates boxing in java, so if the
>>>> difference is due to boxing, it will be significantly reduced, which will
>>>> give you some clues as to what's going on.
>>>
>>> I would still consider it boxing - just to String instead of Double.
>>
>> I take the point that it's still creating an object. But it is simply,
>> undeniably, not boxing.
>>
>> Moreover, the point is that i was suggesting adding the *same* object
>> creation to both the java and .NET versions, which would eliminate any
>> differences in language-level boxing behaviour.
>
> But you can force boxing in .NET just by using Object as the type for the
> collection class type parameters instead of Double. So wouldn't a better
> test be to do that?
Yes, absolutely. It just didn't occur to me to do that!
tom
--
NO REAL THAN YOU ARE -- Ego Leonard, The Zandvoort Man
> Jon Harrop wrote:
>
>> My guess is that this is because the JVM is boxing every floating point
>> number individually in the hash table due to type erasure whereas .NET
>> creates a specialized data structure specifically for a float->float hash
>> table with the floats unboxed. Consequently, the JVM is doing enormously
>> numbers of allocations whereas .NET is not.
>
> 3. Autoboxing appears to have an impact but it's less than either #1 or #2.
>
> Lastly, removing the autoboxing by writing a specialized class saved
> about 0.7 second from just using HashMap. The specialized version time
> was 5.3 seconds. Note that I allocated a new object myself -- the
> "Entry" for the HashMap -- each time a new double is added. I suspect
> that any reasonable C# hash object must do the same, so you're not
> losing as much time as you think by Java's auto object creation.
Hang on, if i've read your code right, then what you've done is replaced
automatic boxing with manual boxing. The point is that in .NET, there is
*no* boxing in this case - doubles are stored unwrapped in the map.
Although in your code, you put both doubles in a single box, so if boxing
is a slowdown, your code should roughly halve it.
I took the liberty of converting java's HashMap to work directly with
primitive doubles:
http://urchin.earth.li/~twic/Code/DoubleMap.java
I haven't tested that this implementation is correct, but on the benchmark
i posted a little earlier, it comes out 17% faster than a HashMap with
boxing.
So, 7.5% for synchronization, 17% for boxing - we're still a good way off
this reported 32x!
tom
--
In my experience, there are two main ways of getting a 32x performance
ratio for the same job:
1. Different algorithm.
2. Memory issues.
In this case, I suspect possibly memory issues. If the .NET table is
more compact, because of using primitives, it might fit into a level in
Jon's computer's memory hierarchy that the Java Hashtable does not fit.
This sort of thing is configuration dependent, so the performance
difference might not be reproducible on a different computer, even using
the same programs.
Patricia
>
> Hang on, if i've read your code right, then what you've done is replaced
> automatic boxing with manual boxing. The point is that in .NET, there is
Yeah, I don't make any effort to conserve "Entries" that are already
made. I don't think it would matter. Jon's test makes new doubles, it
never repeats a value.
All I'm really trying to do is provide a close approximation to what I
think C# might be doing. So far the result doesn't seem to speed up
running time much, so I'm not sure of the value of perusing this
further. Jon's 32x number seems bogus. I think there's some
environmental or JVM issue for him.
Yes. Creating custom hashing classes for primitives pays off if
performance needs to be very high. There are also alternative ways of
handling collisions that will eliminate common memory allocations in
put(). It's unfortunate that creating these classes is such a manual
process.
--
I won't see Goolge Groups replies because I must filter them as spam
Ah, my machine is rather slow, and i didn't have the patience to wait for
the full-size version to run, so i did significantly reduce the size of
the table in my tests to 1000 entries, which will easily fit in L3 caches
and have pretty good representation in higher ones Thus, any effects
caused by data hugeness may be lost.
tom
--
Would you like to remember more?
> Yes. Creating custom hashing classes for primitives pays off if
> performance needs to be very high.
I've actually been playing around with "alternate ways of handling
collisions" and I agree that hashing a double is hard. The default
algorithm both Tom and I used:
long bits = Double.doubleToLongBits( key );
int hash = (int)(bits ^ (bits >>> 32));
provides terrible performance. Java Map implementations provide some
"hash mangling" that help poor hash algorithms, so if you roll your own
Map class you'll have to provide a better hash function or provide your
own mangle function.
Note: found this site. Haven't read it all but it seems cool:
First of all, escape analysis is turned off by default. The next thing
is that there is subtle difference between synchronized method and
synchronized block. Hashtable has the former - escape analysis does
not help here very much afaik. So the only benefit would be gained
from thin locks.
Even if for some reason it didn't, you'd only
> be using a thin lock here, which takes two x86 instructions and one memory
> access for each lock and unlock operation, far less than the boxing or
> unboxing.
>
Probably because it was off, and it wouldn't have eliminated locks
anyway. I wonder - maybe something prevented taking the fast path
locking?
Boxing is fast as well, it is one simple monomorhic call resulting in
creating one Double (which may be cached anyway)
Interesting indeed - anyway, your synchronized block is more likely to
be optimized away with EA turned on than synchronized method's lock
on this, though I am not quite sure seeing your results :-)
Well, I am more inclined to believe that his VM somehow did not
perform lock optimization.
Yeah, I see. Well, what VM did Jon use? That might be crucial to know
what happened.
$ java -version
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode)
Quite recent. And were your measurements affected by using HashMap
instead?
Did you try setting the initial capacity to 1000?
I've had terrible performance from C programs that malloced for every one
of a huge number of little items it wanted to put on the heap. I'd hope
any JVM was better than that but you never know.... Besides, its quite
possible NET allocates bigger chunks since Windows memory allocation was/
is quite slow.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
It might help to show your .NET program as well. If people ran both it
and the Java version on a few different systems, we could see how much,
if any, of the performance difference is configuration-specific.
Patricia
The OP is not using a current version of Java 6.
Jon Harrop wrote:
>> $ java -version
>> java version "1.6.0_13"
>> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
>> Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode)
According to
<http://java.sun.com/javase/6/webnotes/6u14.html>
> Optionally available are two new features -
> escape analysis and compressed object pointers.
Which implies strongly that escape analysis, being "new" in 6u14, was not
available in 6u13. Even then, as Marcin Rzeźnicki wrote:
> ... escape analysis is turned off by default.
...
> Well, I am more inclined to believe that his VM somehow did not
> perform lock optimization.
--
Lew
>I'm having trouble getting Java's hash tables to run as fast as .NET's.
>Specifically, the following program is 32x slower than the equivalent
>on .NET:
see http://mindprod.com/jgloss/hashmap.html#SPEED
--
Roedy Green Canadian Mind Products
http://mindprod.com
Finding a bug is a sign you were asleep a the switch when coding. Stop debugging, and go back over your code line by line.
Not affected.
I just tried setting it to the final size of the hash table (10M) and the
time actually got slightly worse.
> I've had terrible performance from C programs that malloced for every one
> of a huge number of little items it wanted to put on the heap. I'd hope
> any JVM was better than that but you never know.... Besides, its quite
> possible NET allocates bigger chunks since Windows memory allocation was/
> is quite slow.
I believe the JVM is doing orders of magnitude more allocations than .NET
here due to its type erasure approach to generics.
Sure, here's the F# code for .NET:
let n = 10000000
let m = System.Collections.Generic.Dictionary(n)
for i=1 to n do
m.[float i] <- 1.0 / float i
printf "%d %g\n" m.Count m.[100.0]
That takes around 1s and the Java code takes around 32s here.
Type erasure has no effect on your original problem.
Type erasure has no effect on the number of allocations, let alone "orders of
magnitude".
It simply means that a parametrized type is treated as an 'Object' by the JVM.
It affects the number of casts, which may be affected by the HotSpot optimizer.
If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo' instances, you
have 1001 allocations plus however many allocation/copy operations are
necessary to grow the 'Set' to hold 1000 references (six with a typical
default initial size, zero if you made it big enough to hold 1000). Type
erasure has nothing to do with it - you're still creating only 'Foo' objects
and enough 'Set' slots to refer to them.
--
Lew
He wrote "type erasure approach to generics". He did not wrote that type
erasure itself was responsible.
Here, the extra allocations are due to the non-specialization of code at
runtime: the container code (e.g. HashMap) has no knowledge of the
contained element type. The root decision is for backward compatibility
and mixing of pre-generics source code and compiled code with newer code
within the same application. A consequence is that code cannot be
specialized at runtime depending on the type parameters, which forces
the Set or Map implementation to handle the float values as 'Object'
instances, implying boxing (which are dynamic allocations, regardless of
how you wish to name them). Another consequence of that decision of
backward compatibility is the use of type erasure as a compilation
gimmick to make generic types available at the syntax level.
Naming this decision of backward compatibility the "type erasure
approach to generics" is not illegitimate; syntax is what is seen and
type erasure was duly advertised as the core technology which enabled
use of generic types while maintaining backward compatibility. As far as
terminology goes, "type erasure approach" is not bad here.
Now type non-erasure does not automatically imply code specialization.
It allows it. .NET uses code specialization (well, at least some
implementations of the CLR do). This is what is shown here: in some
corner cases, code specialization may provide a measurable performance
boost.
--Thomas Pornin
Note that an "Entry instance" is a value type on the CLR, something that the
JVM is incapable of expressing.
> Creating a Double from a double is about as simple
> as object creation can be,
Note that there is no object creation on the CLR and, indeed, I believe that
is precisely why it is so much faster.
> so I don't see how the boxing could to more
> than triple the time spent in object creation during an average put
> call. That cannot, by itself, account for a 32x performance ratio.
I believe the difference is that the CLR does no boxing whatsoever (not of
doubles and not of entries) whereas the JVM boxes everything.
What is the size of an F# float?
Patricia
Am I correct in thinking that escape analysis might result in unboxing of
local values in functions but it will never result in unboxed data
structures on the heap? For example, it cannot unbox the keys and values in
a hash table?
I don't think type erasure or generics have anything to do with it.
On a non-rehashing call, the JVM would do three allocations, one of
which, for the Entry, would have been required regardless of the types.
The other two are due the lack of primitive-based implementations of the
java.util collections, causing two autobox conversions from double to
Double.
The .NET implementation may have a different way of storing the hash
chains that avoids allocating an object for each put call.
In addition, there is the issue of the number of rehashes. Each Java
rehash call only involves one object allocation, but a lot of memory
access and computation.
That is, the issues I see as being candidates for the problem are the
non-support of primitive collections, and other issues in the design of
the F# and Java collections.
Patricia
The JVM cannot generate type-specialized data structures as the CLR does
because the types have been erased.
> Type erasure has no effect on the number of allocations, let alone "orders
> of magnitude".
Type erasure forces all values to be cast to Object which forces value types
(like double) to be boxed into objects which is an allocation. The JVM
boxes all 20,000,000 of the doubles generated by this program whereas the
CLR boxes none of them. That is the "orders of magnitude" difference I was
referring to.
> It simply means that a parametrized type is treated as an 'Object' by the
> JVM.
> It affects the number of casts, which may be affected by the HotSpot
> optimizer.
Casting a primitive type to Object incurs an allocation.
> If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo' instances,
> you have 1001 allocations plus however many allocation/copy operations are
> necessary to grow the 'Set' to hold 1000 references (six with a typical
> default initial size, zero if you made it big enough to hold 1000). Type
> erasure has nothing to do with it - you're still creating only 'Foo'
> objects and enough 'Set' slots to refer to them.
You are assuming that Foo is an object which is not true in general and is
not true in this case.
Are you sure Java is the right language for your current project? If I
were fighting the language to the extent you seem to be doing, I would
seriously reexamine my choice of language.
Patricia
That's not an issue of type erasure but of Java not supporting
collections of primitive types. It is the autoboxing of 'double' to
'Double' that you discuss here, not the erasure of 'Double' to
'Object'.
Type erasure has nothing to do with value types being boxed. The
erasure from 'Double' to 'Object' has no further effect on
performance.
Others in this thread have tried to duplicate the degree of slowdown
you report without being able to. No one else here seems to see a
32:1 ratio in performance.
--
Lew
That may change with the opportunity to run Jon's F# code - let's see
what other people who have F# installed get.
Patricia
That's not true Jon. Type erasure does not force anything in your
example - you even did not use generics at all, so how can it affect
anything? The problem here is that Java lacks value types which is
orthognal to generics implementation
> > It simply means that a parametrized type is treated as an 'Object' by the
> > JVM.
> > It affects the number of casts, which may be affected by the HotSpot
> > optimizer.
>
> Casting a primitive type to Object incurs an allocation.
>
Not necessarily - Java specification permits caching instances of
boxed numerics.
> > If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo' instances,
> > you have 1001 allocations plus however many allocation/copy operations are
> > necessary to grow the 'Set' to hold 1000 references (six with a typical
> > default initial size, zero if you made it big enough to hold 1000). Type
> > erasure has nothing to do with it - you're still creating only 'Foo'
> > objects and enough 'Set' slots to refer to them.
>
> You are assuming that Foo is an object which is not true in general and is
> not true in this case.
>
It is true in general, furthermore it is always true in F# or any
language implemented on top of CLR where there is no notion of
"primitive type". The thing we should discuss after filtering half-
truths out is whether difference between value types and reference
types might cause 32x performance degradation
Neither holds because Java makes distinction between primitives and
reference types. If a primitive value is boxed than it means that it
is to be used in context where reference type is expected, if a
reference type is unboxed to a primitive then it means that primitive
is expected. No optimization mechanism can freely mix them because
that would result in non-verifiable code which would be rejected by
Java code verifier, thus rendered incorrect.
But, correct me if I am wrong, it involves copying a value type. If it
is so, then I am not sure why copying would be better than an object
allocation
I would back off even further to something like "What are the probable
causes of Jon's observations?".
Patricia
I profiled his example in net beans.
That's my JVM
C:\Users\Rzeźnik\Documents\java>java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)
Here is the code I used:
package hashmapexample;
import java.util.HashMap;
/**
*
* @author Rzeźnik
*/
public class Main {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
HashMap<Double, Double> hashtable = new HashMap<Double, Double>
();
for (int i = 1; i <= 1000000; ++i) { /* changed upper bound to
1m - sorry no, patience */
double x = i;
hashtable.put(x, 1.0 / x);
}
System.out.println("hashtable(100.0) = " + hashtable.get
(100.0));
}
}
I used -Xms512m -Xmx512m to eliminate extensive collections.
The results of profiling are as follows:
54.2% of time spent in java.util.HashMap.put(Object, Object) (1m
invocations)
of which:
* * 19.5% in java.util.HashMap.addEntry(int, Object, Object, int)
* * * * 11.1% in java.util.HashMap.resize(int) (17 invocations)
<--- !!!
* * * * 3.3% self-time
* * * * 1.4% in java.util.HashMap$Entry.<init>(int, Object, Object,
java.util.HashMap.Entry) <-- so the cost of allocating entries is
negligible
* * 8.1% in java.lang.Double.hashCode() <--- that's too much (?)
... rest of put omitted, circa 1%
Now, the interesting part is
30.3% of time spent in java.lang.Double.valueOf(double) <--- that's
boxing
Furthermore, there were 2m + 1 calls to new Double meaning that no
caching occurred.
> That takes around 1s and the Java code takes around 32s here.
32 seconds for the Java code you posted is almost certainly an outlier.
I can run the original code you posed on a laptop with a 32 bit JVM in
13 seconds. Clearly, something is up with your environment.
Oh yes, conclusions:
Taking Jon's 32s of the execution time he could have saved around 3-4s
had he preallocated HashMap. He actually did that in his F# so this
modification alone might have caused F# version to run in, let's say,
28s. He, of course, could not eliminate boxing which might have taken
around 10s of his original execution time. So subtracting costs of
boxing from implied theoretical F# version's execution time we end up
with conclusion that F# should have executed in ~18s (which is
erroneous proceeder in itself because F# probably copies values from
stack). Roughly 1:2 in favor of F#.
>I was considering the possibility of hand coding type-specialized hash
>tables in Java to work around this deficiency in generics on the JVM when I
>realised that it is actually impossible in the general case because the JVM
>lacks value types and, therefore, is incapable of expressing efficient hash
>tables.
Could you elaborate a bit on what value types would look like if they
existed in Java and how you would use them for efficient hash tables?
Correct.
> If it is so, then I am not sure why copying would be better than an object
> allocation
Copying a few bytes of data is a *lot* faster than allocating a few bytes of
data.
64 bits.
Interesting results. That is about what I would expect. If we were
trying to explain a 30% performance difference I would seriously
consider autoboxing as cause. If creating an Entry takes about as much
time as creating a Double, object creation could account for up to 45%
of the Java time.
The problem is explaining a 32x performance difference. The cause or
causes have to account for over 96% of the Java time.
Patricia
>I believe the JVM is doing orders of magnitude more allocations than .NET
>here due to its type erasure approach to generics.
I think you mean the extra overhead of boxing and unboxing, or perhaps
the extra overhead of allocating independent objects, rather that
plopping them into a single array the way you would with primitives.
You could find out the boxing/allocation overhead by preboxing your
elements, and saving the unboxing until you had waved the flag.
There is no need to allocate each entry separately because the hash table
can be an array of unboxed entries.
> The other two are due the lack of primitive-based implementations of the
> java.util collections, causing two autobox conversions from double to
> Double.
Yes.
> The .NET implementation may have a different way of storing the hash
> chains that avoids allocating an object for each put call.
I believe the chains are stored in the same array: closed hashing.
>My guess is that this is because the JVM is boxing every floating point
>number individually in the hash table due to type erasure whereas
In the olden days of FORTRAN, you would have handled this by writing a
version of HashMap that took a floating point primitive. You would
hard code it in. You have to hold your nose, but if you want just to
get the job done.
> printf "%d %g\n" m.Count m.[100.0]
>
>That takes around 1s and the Java code takes around 32s here.
How fast is it without the printf. Java's console i/o is painfully
slow.
Actually it takes much longer. I had 2m double creations, and 1m entry
creations - so I was expecting that ratio would be:
(time_to_create_all_doubles) = 2 * (time_to_create_all_entries). It is
not quite so looking at the profiled call trace. It costed me about 6x
more to create single Double than to create single entry. 1m of entry
<init> calls accounted for 1.4% of total time, while 2m of Double
<init> calls accounted for 16.4% of total time. I think that it is
because each Double must super-call the Number constructor.
> The problem is explaining a 32x performance difference. The cause or
> causes have to account for over 96% of the Java time.
>
That could well be hidden in GC/heap resizing costs if he did not
allocate Java heap properly. I prevented these effects mostly from
occurring by running this example with -Xms512m -Xmx512m.
Yes, but I am talking about amortized cost. If Java has to create a
new instance it does not allocate memory. Cost of memory allocations
is amortized through many object creations - which cannot be done when
copying.
It is certainlynot THAT slow. According to profiling session results
it did take about 1ms, while whole profiled program ran in 35s
If you must use the Map interface, yes. If you can use a specific type,
then you can overload the put() and get().
class DoubleMap extends AbstractMap<Double,Double> {
...
@Overload public Double put( Double key, Double value )
...
public double put( double key, double value )
...
}
I believe that the compiler will always chose the second put() method
when the parameter types involved are double instead of Double. If the
type of the object is DoubleMap, of course. If you use the generic Map
interface as your type, the compiler will have no choice but to use the
first method.
Type erasure prevents generics from helping on the JVM as they do on the
CLR.
> The problem here is that Java lacks value types which is orthognal to
> generics implementation
The non-generic F# solution is roughly as slow as the Java. Only the generic
F# solution is faster.
>> > It simply means that a parametrized type is treated as an 'Object' by
>> > the JVM.
>> > It affects the number of casts, which may be affected by the HotSpot
>> > optimizer.
>>
>> Casting a primitive type to Object incurs an allocation.
>
> Not necessarily - Java specification permits caching instances of
> boxed numerics.
Sure. That obviously isn't helping on this benchmark.
>> > If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo'
>> > instances, you have 1001 allocations plus however many allocation/copy
>> > operations are necessary to grow the 'Set' to hold 1000 references (six
>> > with a typical default initial size, zero if you made it big enough to
>> > hold 1000). Type erasure has nothing to do with it - you're still
>> > creating only 'Foo' objects and enough 'Set' slots to refer to them.
>>
>> You are assuming that Foo is an object which is not true in general and
>> is not true in this case.
>
> It is true in general,
In this case, Foo is double which is not an object and Lew's statements do
not apply.
> furthermore it is always true in F# or any
> language implemented on top of CLR where there is no notion of
> "primitive type".
My program is a counter example. In general, the CLR's notion of value types
are equivalent to the JVM's primitive types.
> The thing we should discuss after filtering half-
> truths out is whether difference between value types and reference
> types might cause 32x performance degradation
When using reference types, F# is roughly as slow as Java. The difference
occurs when a generic data structure is parameterized over a value type.
> That could well be hidden in GC/heap resizing costs if he did not
> allocate Java heap properly. I prevented these effects mostly from
> occurring by running this example with -Xms512m -Xmx512m.
I ran my test (18 seconds for Jon's code on a 32 bit laptop) with
-Xmx800m. Which is why I keep saying that Jon should look at his JVM
flags before trying anything else.
>Could you elaborate a bit on what value types would look like if they
>existed in Java and how you would use them for efficient hash tables?
for a little background on them see
http://mindprod.com/jgloss/valuetype.html
I already posted results using preallocation: it was no faster.
> He actually did that in his F# so this
> modification alone might have caused F# version to run in, let's say,
> 28s.
Are you speculating that F# would take 28x longer had I not preallocated?
Measuring it, F# takes 1.24s if you don't preallocate and 0.945s if you do.
> He, of course, could not eliminate boxing which might have taken
> around 10s of his original execution time. So subtracting costs of
> boxing from implied theoretical F# version's execution time we end up
> with conclusion that F# should have executed in ~18s (which is
> erroneous proceeder in itself because F# probably copies values from
> stack). Roughly 1:2 in favor of F#.
Eh?
I'm more interested in modern JVM-based languages like Clojure and Scala. Do
you know if they have done anything to work around these deficiencies in
the JVM?
Also, some influential people on the JVM Languages mailing list were
discussing what they perceived to be the main issues with the JVM today and
they flagged lack of value types as a major issue. This benchmark seems to
have verified their findings.
Moreover, I am amazed that I watched a lecture where a guy from Azul Systems
described the enormous lengths they'd gone to in creating a scalable hash
table built upon the JVM for parallel machines. Their implementation would
have to scale across at least 32 cores (far more than any of today's
consumer-level machines) just to catch up with the CLR!
You cannot do that in Java because it cannot represent the array that
underpins the hash table because it requires different primitive types in
the same array.
Sure. They're basically just structs from C or C++ and they make the set of
primitive types (that don't get boxed) user extensible. They're used for
various things including hash table entries (hash, key and value) and
vertex data (texture coords, vertex coords, normal vectors) in a
GPU-friendly format.
Yes.
> You could find out the boxing/allocation overhead by preboxing your
> elements, and saving the unboxing until you had waved the flag.
Good idea!
No, see my answer below
> > He, of course, could not eliminate boxing which might have taken
> > around 10s of his original execution time. So subtracting costs of
> > boxing from implied theoretical F# version's execution time we end up
> > with conclusion that F# should have executed in ~18s (which is
> > erroneous proceeder in itself because F# probably copies values from
> > stack). Roughly 1:2 in favor of F#.
>
> Eh?
>
I started from 32s of your Java program execution time which you'd
reported and subtracted all costs that were suggested not to affect F#
version (preallocation/boxing). This is quite inaccurate because of
different algorithms, environments etc but it enabled me to compare
what F#'s execution time could have been IF Java's was 32s as you'd
observed. The result, as you have seen, is that even if F# did not pay
any additional cost in execution time for value type copying etc it
should execute 2 times faster than Java version. So the only logical
conclusion is that 32 times faster is something not usual.
This is a 2x quadcore 2.0GHz Intel Xeon E5405. What is you CPU?
> with -Xmx800m. Which is why I keep saying that Jon should look at his JVM
> flags before trying anything else.
Doesn't seem to make any difference:
$ time java Hashtbl
hashtable(100.0) = 0.01
real 0m35.749s
user 1m53.287s
sys 0m3.208s
$ time java Hashtbl -Xms512m -Xmx512m
hashtable(100.0) = 0.01
real 0m36.336s
user 2m5.896s
sys 0m3.216s
$ time java Hashtbl -Xmx800m
hashtable(100.0) = 0.01
real 0m34.676s
user 1m55.723s
sys 0m3.332s
$ time java -server -Xmx800m Hashtbl
hashtable(100.0) = 0.01
real 0m37.198s
user 1m55.967s
sys 0m3.220s
I think the bottleneck must be the GC by a long long way. Look at this:
$ time java Hashtbl -Xmx800m
hashtable(100.0) = 0.01
real 0m34.676s
user 1m55.723s
sys 0m3.332s
The CPU time taken is actually ~100x worse than for F#. Given that this is a
serial benchmark, that parallelism could only have come from the GC.
Type erasure means that Java's generics cannot be used to improve the
performance of my benchmark.
> Type erasure has nothing to do with value types being boxed. The
> erasure from 'Double' to 'Object' has no further effect on
> performance.
>
> Others in this thread have tried to duplicate the degree of slowdown
> you report without being able to. No one else here seems to see a
> 32:1 ratio in performance.
Has anyone else here even benchmarked the F# code? If so, what performance
ratio did they get?
In your case I cannot see generics used at all.
> > The problem here is that Java lacks value types which is orthognal to
> > generics implementation
>
> The non-generic F# solution is roughly as slow as the Java. Only the generic
> F# solution is faster.
>
So you are running different program than you showed us. But that's
not the point. The point is that the only thing which may make it run
so fast is the presence of value types. Which still has nothing to do
with type erasure. Type erasure does not forbid compiler from making
this kind of optimization which you've been talking about.
> >> > It simply means that a parametrized type is treated as an 'Object' by
> >> > the JVM.
> >> > It affects the number of casts, which may be affected by the HotSpot
> >> > optimizer.
>
> >> Casting a primitive type to Object incurs an allocation.
>
> > Not necessarily - Java specification permits caching instances of
> > boxed numerics.
>
> Sure. That obviously isn't helping on this benchmark.
>
Yes, unfortunately.
> >> > If you allocate, say, a 'Set<Foo>' then fill it with 1000 'Foo'
> >> > instances, you have 1001 allocations plus however many allocation/copy
> >> > operations are necessary to grow the 'Set' to hold 1000 references (six
> >> > with a typical default initial size, zero if you made it big enough to
> >> > hold 1000). Type erasure has nothing to do with it - you're still
> >> > creating only 'Foo' objects and enough 'Set' slots to refer to them.
>
> >> You are assuming that Foo is an object which is not true in general and
> >> is not true in this case.
>
> > It is true in general,
>
> In this case, Foo is double which is not an object and Lew's statements do
> not apply.
>
Ok, nevermind. It is not relevant to our discussion I suppose (his
assumption that "Foo is an object" IS true)
> > furthermore it is always true in F# or any
> > language implemented on top of CLR where there is no notion of
> > "primitive type".
>
> My program is a counter example. In general, the CLR's notion of value types
> are equivalent to the JVM's primitive types.
>
No, it is not. Value types can be heap-allocated while primitives
cannot.
> > The thing we should discuss after filtering half-
> > truths out is whether difference between value types and reference
> > types might cause 32x performance degradation
>
> When using reference types, F# is roughly as slow as Java. The difference
> occurs when a generic data structure is parameterized over a value type.
>
Could you please take a look at my benchmark (it states that slowdown
should be ~30%) and, if you have access to a machine with NetBeans (or
any Java profiler), and perform similar profiling on your own? I am
suspecting there is something more than than meets the eye.
According to sun.com and Brian Goetz's articles on DeveloperWorks,
object allocation in Java (discounting GC) is an O(1) operation
involving 10-12 machine instructions.
<http://www.ibm.com/developerworks/java/library/j-jtp09275.html>
"The answer may surprise you -- allocation in modern JVMs is far
faster than the best performing malloc implementations. The common
code path for new Object() in HotSpot 1.4.2 and later is approximately
10 machine instructions (data provided by Sun; see Resources),"
--
Lew
When I was running mine I observed that only ~6% of time was marked as
spent in GC and even that only if I allocated too little space for the
heap (then, right before OutOfMemory it started to reclaim space
wildly but, of course, failed). When I set heap to -Xms512m -Xmx512m
GC time was negligibly small. I wonder what would happened if you
repeated your tests with -Xms800m -Xmx800m.
In the F#?
>> The non-generic F# solution is roughly as slow as the Java. Only the
>> generic F# solution is faster.
>
> So you are running different program than you showed us.
I showed you the generic F# program that is 32x faster than the Java
program.
> But that's
> not the point. The point is that the only thing which may make it run
> so fast is the presence of value types.
Perhaps you mean that without generics but with value types you could obtain
the same performance by writing a type specialized DoubleDoubleHashMap? I'd
agree with that. My point was that using a non-generic Hashtable in F#
destroys its performance.
> Which still has nothing to do
> with type erasure. Type erasure does not forbid compiler from making
> this kind of optimization which you've been talking about.
The VM cannot type specialize data structures if the type information has
been erased.
>> > furthermore it is always true in F# or any
>> > language implemented on top of CLR where there is no notion of
>> > "primitive type".
>>
>> My program is a counter example. In general, the CLR's notion of value
>> types are equivalent to the JVM's primitive types.
>
> No, it is not. Value types can be heap-allocated while primitives
> cannot.
Local value/primitive types are stack allocated. Heap allocated reference
types may contain value/primitive types.
>> > The thing we should discuss after filtering half-
>> > truths out is whether difference between value types and reference
>> > types might cause 32x performance degradation
>>
>> When using reference types, F# is roughly as slow as Java. The difference
>> occurs when a generic data structure is parameterized over a value type.
>
> Could you please take a look at my benchmark (it states that slowdown
> should be ~30%)
No, it shows that ~30% of the time is spent allocating in the mutator. It
says nothing about how much time is spent collecting in the GC. Given that
this benchmark spends half of its time burning all eight of my cores, I
think collection must be the bottleneck.
You didn't account for time spent garbage collecting.
FWIW, the F# performs only one collection of each generation. Verbose GC
information on the Java code produces pages and pages of collections.
Spoke too soon. This is 4x faster than before:
$ time java -Xms2000m -Xmx2000m Hashtbl
hashtable(100.0) = 0.01
real 0m8.198s
user 0m45.227s
sys 0m1.068s
In your program you have:
let m = System.Collections.Generic.Dictionary(n)
Maybe I am not reading this line correctly but where are generics used
here?
> > But that's
> > not the point. The point is that the only thing which may make it run
> > so fast is the presence of value types.
>
> Perhaps you mean that without generics but with value types you could obtain
> the same performance by writing a type specialized DoubleDoubleHashMap? I'd
> agree with that. My point was that using a non-generic Hashtable in F#
> destroys its performance.
>
More or less, I think that specialized double map would have performed
~50% better.
> > Which still has nothing to do
> > with type erasure. Type erasure does not forbid compiler from making
> > this kind of optimization which you've been talking about.
>
> The VM cannot type specialize data structures if the type information has
> been erased.
>
Yes, but if a value type were descendants of Object (as it is done in
CLR) then type erasure would not prevent compiler from using this
optimization. The cost of type erasure would lie only in casts that
would have to be performed.
> >> > furthermore it is always true in F# or any
> >> > language implemented on top of CLR where there is no notion of
> >> > "primitive type".
>
> >> My program is a counter example. In general, the CLR's notion of value
> >> types are equivalent to the JVM's primitive types.
>
> > No, it is not. Value types can be heap-allocated while primitives
> > cannot.
>
> Local value/primitive types are stack allocated. Heap allocated reference
> types may contain value/primitive types.
>
Sometimes :-) Here is great article explaining this matter:
http://blogs.msdn.com/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx
> >> > The thing we should discuss after filtering half-
> >> > truths out is whether difference between value types and reference
> >> > types might cause 32x performance degradation
>
> >> When using reference types, F# is roughly as slow as Java. The difference
> >> occurs when a generic data structure is parameterized over a value type.
>
> > Could you please take a look at my benchmark (it states that slowdown
> > should be ~30%)
>
> No, it shows that ~30% of the time is spent allocating in the mutator. It
> says nothing about how much time is spent collecting in the GC. Given that
> this benchmark spends half of its time burning all eight of my cores, I
> think collection must be the bottleneck.
>
Well, yes, it does not show GC costs. If you happened to use NetBeans
profiler you could easily see what these are
>You cannot do that in Java because it cannot represent the array that
>underpins the hash table because it requires different primitive types in
>the same array.
Really really holding your nose, you have to map them to longs,
stealing a low order bit to encode privitive type.
Right, so - the next conclusion is that real costs are elsewhere, in
some place I did not take into account. That may be GC and heap
resizing. If you start with (Xms) heap too small then you'll have
frequent collections and, additionally, cost of heap resizing will be
more and more noticeable.
Intel Core Duo T2600 (2.16 GHz).
2 Gigs of main memory installed. 667MHz - 2 DIMM Slots (2 x 1GB).
>
> $ time java Hashtbl
This obviously includes the start-up time of the JVM. That's not the
runtime of your algorithm.
Try this:
class HashM
{
public static void main( String... args )
{
long start = System.nanoTime();
HashMap<Double,Double> hashM = new HashMap<Double, Double>();
for( int i = 1; i <= 10000000; ++i ) {
double x = i;
hashM.put( x, 1.0 / x );
}
long end = System.nanoTime();
out.println( "HashMap time: "+ (end-start)/1000000 );
out.println( "hashmap(100.0) = " +
hashM.get( 100.0 ) );
}
}
I'm also curious: what runs the F# runtime under Unix? Which version of
Unix are you using? Are you dual booting or running Cygwin?
I'm still suspecting that it's differences in the environment (drivers
under Unix/Cygwin vs. Microsoft) that make the most difference in your
observed times.
This number here is close to the real CPU run times I'm seeing using
HashMap, and about half the time I see using your Hashtbl benchmark. I
think we're closer to running with equivalent environments now. I'm
going to guess that your 64 bit JVM also needs a bit more memory than
mine (-Xmx800m) to prevent or at least reduce garbage collection.
> Could you please take a look at my benchmark (it states that slowdown
> should be ~30%) and, if you have access to a machine with NetBeans
> (or any Java profiler), and perform similar profiling on your own? I
> am suspecting there is something more than than meets the eye.
As you observed above, Jon needs to specify the initialCapacity of his
Java data structure, just as he does in F# and I do in Ada. The
difference changes the Java profile percentages significantly. A larger
initial heap is helpful, too:
<http://sites.google.com/site/trashgod/hashmap>
As others have observed, something else is awry.
--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>
>
>Spoke too soon. This is 4x faster than before:
Also keep in mind the way the JVM works. It runs code in interpretive
mode for a while, the takes time out to compile it to native machine
code, then picks up where it left off. You want to discount that
initial period.
--
Roedy Green Canadian Mind Products
http://mindprod.com
I mean the word proof not in the sense of the lawyers, who set two half proofs equal to a whole one, but in the sense of a mathematician, where half proof = 0, and it is demanded for proof that every doubt becomes impossible.
~ Carl Friedrich Gauss
> import java.util.Hashtable;
>
> public class Hashtbl {
> public static void main(String args[]){
> Hashtable hashtable = new Hashtable();
>
> for(int i=1; i<=10000000; ++i) {
> double x = i;
> hashtable.put(x, 1.0 / x);
> }
>
> System.out.println("hashtable(100.0) = " + hashtable.get(100.0));
> }
> }
Some more datapoints:
java Hashtbl --> out of memory error
java -Xmx1000m Hashtbl (hashtable(100.0) = 0.01) --> 29 secs
Hashtbl.exe(jet statically compiled, hashtable(100.0) = null)--> 5 sec
(Why did this code fail? I have reported the bug to jet).
machine is Athlon 64 X2 3800+, 2GHz. 3 gig ram.
Since you did not provide initial size constraints, the table will
have to be recopied over and over to double the buffer size.
Most of us now have dual core machines. Perhaps with such large
datasets you could split the work for two cpus to work simultaneously,
e.g. two HashSets, one for small numbers and one for big ones.
F# has type inference so you don't specify the types in the source code. In
this case, it infers Dictionary<float, float> from its subsequent use.
>> > Which still has nothing to do
>> > with type erasure. Type erasure does not forbid compiler from making
>> > this kind of optimization which you've been talking about.
>>
>> The VM cannot type specialize data structures if the type information has
>> been erased.
>
> Yes, but if a value type were descendants of Object (as it is done in
> CLR) then type erasure would not prevent compiler from using this
> optimization.
Type erasure relies upon a uniform representation to work. Value types
cannot work with that. So you'd end up boxing your value types to get the
same uniform (reference) representation for all types.
> The cost of type erasure would lie only in casts that
> would have to be performed.
Casting a value type to Object will almost certainly incur an allocation.
That is the cost (and the subsequent collection, of course).
Copying a 64-bit word from one memory location to another should be a
single instruction, or two at most.
So, an allocation is 5 to 10 times more work, not even counting that
after the allocation, the instance needs initialization.
All of the above analysis (from Jon's post onward, including my own) is
completely bogus anyway, because a lot of the overhead here isn't
directly related to the instruction count, but rather to speed of RAM
and/or cache, and what happens to be in the cache, and a lot of other
stuff that's unpredictable from the information available and which may
be amortized over a large number of instructions executed anyway.
But if you're looking just a numbers of instructions, I'd say even based
on the reference you've provided, copying a 64-bit word from one place
to another is a lot faster than allocating a few byte of data.
Pete
I agree that lack of user-defined value types in Java is a significant
missing feature. But, I don't see how that directly leads to the
performance problem here. That is, you can still have an array of
"double" instead of "Double", and that array has value type semantics
for each element.
So if you _need_ to avoid the boxing overhead, you can. It's just more
work, because you can't leverage an existing collection class to do it.
(I'll also point out that your comments about generics might not be
subjected to so much nit-picking if your original code example had in
fact used a generic class. I realize that in the end, the performance
characteristics are going to be the same whether you use Hashtable in
its raw form or generic form, but we're all programmers here, and that
means a higher-than-normal proportion of nit-pickers :) ).
Pete
I see, thx for the explanation.
> >> > Which still has nothing to do
> >> > with type erasure. Type erasure does not forbid compiler from making
> >> > this kind of optimization which you've been talking about.
>
> >> The VM cannot type specialize data structures if the type information has
> >> been erased.
>
> > Yes, but if a value type were descendants of Object (as it is done in
> > CLR) then type erasure would not prevent compiler from using this
> > optimization.
>
> Type erasure relies upon a uniform representation to work. Value types
> cannot work with that. So you'd end up boxing your value types to get the
> same uniform (reference) representation for all types.
>
??
> > The cost of type erasure would lie only in casts that
> > would have to be performed.
>
> Casting a value type to Object will almost certainly incur an allocation.
> That is the cost (and the subsequent collection, of course).
>
I'll have to check that because I am not sure whether what you said is
true. I can certainly imagine a system where allocation is not needed
in such case.
> Roedy Green wrote:
>> On Sat, 21 Nov 2009 18:33:14 +0000, Jon Harrop <j...@ffconsultancy.com>
>> wrote, quoted or indirectly quoted someone who said :
>>
>>> My guess is that this is because the JVM is boxing every floating point
>>> number individually in the hash table due to type erasure whereas
>>
>> In the olden days of FORTRAN, you would have handled this by writing a
>> version of HashMap that took a floating point primitive. You would
>> hard code it in. You have to hold your nose, but if you want just to
>> get the job done.
>
> You cannot do that in Java because it cannot represent the array that
> underpins the hash table because it requires different primitive types
> in the same array.
You could have separate arrays for keys and values (and chain pointers if
you want to use collision lists rather than reprobing). You lose the cache
locality between keys and values, but you would improve the cache density
of keys: for heavily-loaded tables, where the ratio of accesses to keys to
accesses to values is significantly greater than one, that might well be a
performance win.
Pain in the arse to code, though!
tom
--
I don't know what the hell you should do. Try clicking on some shit
or somethin'.
> Also, some influential people on the JVM Languages mailing list were
> discussing what they perceived to be the main issues with the JVM today
> and they flagged lack of value types as a major issue. This benchmark
> seems to have verified their findings.
Yes, it's something i've been wanting for years. The trouble is that there
are a few little things in java's semantics that make adding value types
(by which i really mean inlined objects, since we're talking about
implementation technique rather than semantics at this point) under the
hood amazingly hard. Adding them as a language-level feature would get
round this, but would be a big change at this point in java's life cycle.
> Moreover, I am amazed that I watched a lecture where a guy from Azul
> Systems described the enormous lengths they'd gone to in creating a
> scalable hash table built upon the JVM for parallel machines. Their
> implementation would have to scale across at least 32 cores (far more
> than any of today's consumer-level machines) just to catch up with the
> CLR!
As long as you only want to store doubles in it. I imagine that if you're
using object keys, the CLR's advantage is rather smaller.
> On 21 Lis, 20:44, Tom Anderson <t...@urchin.earth.li> wrote:
>> On Sat, 21 Nov 2009, Marcin Rze?nicki wrote:
>>> On 21 Lis, 19:33, Jon Harrop <j...@ffconsultancy.com> wrote:
>>>> I'm having trouble getting Java's hash tables to run as fast as .NET's.
>>>> Specifically, the following program is 32x slower than the equivalent
>>>> on .NET:
>>>
>>> You are using Hashtable instead of HashMap - probably the performance
>>> loss you've observed is due to synchronization (though "fat"
>>> synchronization might be optimized away in case of single thread you
>>> still pay the price, though lower).
>>
>> I'd be *very* surprised if that was true. In this simple program,
>> escape analysis could eliminate the locking entirely - and current
>> versions of JDK 1.6 do escape analysis.
>
> First of all, escape analysis is turned off by default.
Curses! I knew it was experimental when it first came into 1.6, but i
thought it was mature and on by default in the latest version.
> The next thing is that there is subtle difference between synchronized
> method and synchronized block. Hashtable has the former - escape
> analysis does not help here very much afaik.
Could you explain why? I don't see why they should be any different. You
can rewrite:
synchronized void foo() {
// ...
}
As:
void foo() {
synchronized (this) {
// ...
}
}
With exactly the same semantics, no?
> Note: found this site. Haven't read it all but it seems cool:
>
> <http://www.partow.net/programming/hashfunctions/>
It lists neither Bob Jenkins' nor Paul Hsieh's hashes, and as such is of
only academic interest, since these are way better than the 20th-century
hashes. See:
http://burtleburtle.net/bob/hash/doobs.html
http://www.azillionmonkeys.com/qed/hash.html
> Kevin McMurtrie wrote:
>
>> Yes. Creating custom hashing classes for primitives pays off if
>> performance needs to be very high.
>
> I've actually been playing around with "alternate ways of handling
> collisions" and I agree that hashing a double is hard. The default
> algorithm both Tom and I used:
>
> long bits = Double.doubleToLongBits( key );
> int hash = (int)(bits ^ (bits >>> 32));
>
> provides terrible performance.
Interesting. I chose that function because it's what java.lang.Double
does, rather than because i thought it'd be good, but i am surprised to
hear it's terrible - doubles are quite complicated internally, so would
have thought that a parade of natural numbers would give reasonably
well-distributed hashes this way (whereas longs wouldn't, of course). How
did you conclude it's terrible? Can you reccommend a better hash function
for doubles?
Writing my own hash table implementation, I noticed that I was getting
terrible performance with a ton of collisions and everything was heaped
up in a tiny spot in the table.
Inspecting the hash in hexadecimal, I realized that Jon's data keys --
the natural counting numbers 1, 2, 3, etc. -- are represented in a
double as a few bits in the upper most bits of the double. The lower
bits are always 0, even after slicing the 64 bit double's bit pattern in
half and xoring the two halves.
This xoring results in regular hash bit patterns like:
0x20200000
0x40200000
0x40600000
0x60200000
etc. as the numbers count up
(bit patterns made up from memory, but you get the idea.)
i.e., hashes with very few bits different, and all in the upper most
bits of the hash. This is exactly the opposite of what you want in a
good hash, which is lots of randomness in the lower bits of the hash code.
So I concluded: absent any other perturbation in the hash, it sucks.