Google Groups Home
Help | Sign in
Performance characteristics of mutable static primitives?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 41 - Collapse all   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Charles Oliver Nutter  
View profile
 More options Apr 2, 4:48 am
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Wed, 02 Apr 2008 09:48:04 +0100
Local: Wed, Apr 2 2008 4:48 am
Subject: Performance characteristics of mutable static primitives?
I ran into a very strange effect when some Sun folks tried to benchmark
JRuby's multi-thread scalability. In short, adding more threads actually
caused the benchmarks to take longer.

The source of the problem (at least the source that, when fixed, allowed
normal thread scaling), was an increment, mask, and test of a static int
field. The code in question looked like this:

private static int count = 0;

public void pollEvents(ThreadContext context) {
   if ((count++ & 0xFF) == 0) context.poll();

}

So the basic idea was that this would call poll() every 256 hits,
incrementing a counter all the while. My first attempt to improve
performance was to comment out the body of poll() in case it was causing
a threading bottleneck (it does some locking and such), but that had no
effect. Then, as a total shot in the dark, I commented out the entire
line above. Thread scaling went to normal.

So I'm rather confused here. Is a ++ operation on a static int doing
some kind of atomic update that causes multiple threads to contend? I
never would have expected this, so I wrote up a small Java benchmark:

http://pastie.org/173993

The benchmark does basically the same thing, with a single main counter
and another "fired" counter to prevent hotspot from optimizing things
completely away. I've been running this on a dual-core MacBook Pro with
both Apple's Java 5 and the soylatte Java 6 release. The results are
very confusing:

First on Apple's Java 5

~/NetBeansProjects/jruby ➔ java -server Trouble 1
time: 3924
fired: 3906250
time: 3945
fired: 3906250
time: 1841
fired: 3906250
time: 1882
fired: 3906250
time: 1896
fired: 3906250
~/NetBeansProjects/jruby ➔ java -server Trouble 2
time: 3243
fired: 4090645
time: 3245
fired: 4100505
time: 1173
fired: 3906049
time: 1233
fired: 3906188
time: 1173
fired: 3906134

Normal scaling here...1 thread on my system uses about 60-65% CPU, so
the extra thread uses up the remaining 35-40% and the numbers show it.
Then there's soylatte Java 6:

~/NetBeansProjects/jruby ➔ java -server Trouble 1
time: 1772
fired: 3906250
time: 1973
fired: 3906250
time: 2748
fired: 3906250
time: 2114
fired: 3906250
time: 2294
fired: 3906250
~/NetBeansProjects/jruby ➔ java -server Trouble 2
time: 3402
fired: 3848648
time: 3805
fired: 3885471
time: 4145
fired: 3866850
time: 4140
fired: 3839130
time: 3658
fired: 3880202

Don't compare the times directly, since these are two pretty different
codebases and they each have different general performance
characteristics. Instead pay attention to the trend...the soylatte Java
6 run with two threads is significantly slower than the run with a
single thread. This mirrors the results with JRuby when there was a
single static counter being incremented.

So what's up here?

- Charlie


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Wright  
View profile
 More options Apr 2, 4:59 am
From: "Patrick Wright" <pdoubl...@gmail.com>
Date: Wed, 2 Apr 2008 10:59:25 +0200
Local: Wed, Apr 2 2008 4:59 am
Subject: Re: [jvm-l] Performance characteristics of mutable static primitives?
Have you tried looking at the generated code from Hotspot? See e.g.
http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html

Patrick


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Wilson  
View profile
 More options Apr 2, 5:10 am
From: "John Wilson" <tugwil...@gmail.com>
Date: Wed, 2 Apr 2008 10:10:52 +0100
Local: Wed, Apr 2 2008 5:10 am
Subject: Re: [jvm-l] Performance characteristics of mutable static primitives?
On 4/2/08, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

That is rather odd.

Shouldn't count be volatile?

If it's declared as volatile does that make any difference?

John Wilson


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile
 More options Apr 2, 5:43 am
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Wed, 02 Apr 2008 10:43:46 +0100
Local: Wed, Apr 2 2008 5:43 am
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

I had not tried it because I expected volatile would only make it
slower. And in this case, the code in question didn't really care about
perfect accuracy for the counter since it's just a rough guide. But
here's numbers with volatile on my machine:

Apple Java 5:

~/NetBeansProjects/jruby ➔ java -server Trouble 1
time: 9047
fired: 3906250
time: 9007
fired: 3906250
time: 9613
fired: 3906250
time: 9846
fired: 3906250
time: 10005
fired: 3906250
~/NetBeansProjects/jruby ➔ java -server Trouble 2
time: 22349
fired: 3540696
time: 27641
fired: 3341096
time: 26815
fired: 3546695
time: 26641
fired: 3542920
time: 26789
fired: 3534386

soylatte Java 6:

~/NetBeansProjects/jruby ➔ java -server Trouble 1
time: 9777
fired: 3906250
time: 9701
fired: 3906250
time: 9070
fired: 3906250
time: 8656
fired: 3906250
time: 9065
fired: 3906250
~/NetBeansProjects/jruby ➔ java -server Trouble 2
time: 24781
fired: 3464957
time: 23668
fired: 3758204
time: 21235
fired: 3783215
time: 22198
fired: 3761491
time: 22937
fired: 3752534

So as expected, volatile does slow things down a lot, but Java 6 does do
a little better here. Also interesting to see that volatile completely
obliterates any gain from running multiple threads on both Java 5 and
Java 6, and the total time is almost 3x slower than a single thread on
Java 5.

I tried another non-volatile run using i += 1 rather than i++ and the
numbers were almost identical, with Java 6 severely degrading with
multiple threads running and Java 5 improving.

Here's another set of numbers from Vladimir Sizikov, on a dual-core
windows machine running Sun Java 5 and Sun Java 6:

D:\work>D:/re/java5/bin/java -server Trouble 2
time: 1666
fired: 3345620
time: 1453
fired: 4033604
time: 629
fired: 3592687
time: 569
fired: 3595772
time: 578

D:\work>D:/re/java6/bin/java -server Trouble 2
time: 1595
fired: 3896153
time: 1588
fired: 3900934
time: 2090
fired: 3896066
time: 2133
fired: 3891300
time: 2154
fired: 3892321

Again, significantly worse performance in Java 6 with multiple threads.

- Charlie


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile
 More options Apr 2, 5:47 am
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Wed, 02 Apr 2008 10:47:08 +0100
Local: Wed, Apr 2 2008 5:47 am
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

Patrick Wright wrote:
> Have you tried looking at the generated code from Hotspot? See e.g.
> http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html

No, I haven't...no access to a machine that can run debug builds at the
moment, but I may try it later if the answer to my riddle does not
present itself.

- Charlie


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Attila Szegedi  
View profile
 More options Apr 2, 6:35 am
From: Attila Szegedi <szege...@gmail.com>
Date: Wed, 2 Apr 2008 12:35:30 +0200
Local: Wed, Apr 2 2008 6:35 am
Subject: Re: [jvm-l] Performance characteristics of mutable static primitives?

On 2008.04.02., at 10:48, Charles Oliver Nutter wrote:

> So I'm rather confused here. Is a ++ operation on a static int doing
> some kind of atomic update that causes multiple threads to contend?

No. You know your JVM bytecodes, Charlie - the only incrementing  
bytecode in existence is IINC and it only works on an integer local  
variable.

Pretty much the only sane implementation of ++ on a static field would  
be:

GETSTATIC someClass.someField
ICONST_1
IADD
PUTSTATIC someClass.someField

Not even volatility of the field will ensure atomic increment  
operations, as the value must be temporarily held on the thread  
operand stack.

If you want atomic updates, java.util.concurrent.atomic.AtomicInteger  
might give you what you need on Java 5 and above.

Attila.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Wilson  
View profile
 More options Apr 2, 6:36 am
From: "John Wilson" <tugwil...@gmail.com>
Date: Wed, 2 Apr 2008 11:36:14 +0100
Local: Wed, Apr 2 2008 6:36 am
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?
On 4/2/08, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

>  John Wilson wrote:
>  > That is rather odd.

>  > Shouldn't count be volatile?

>  > If it's declared as volatile does that make any difference?

> I had not tried it because I expected volatile would only make it
>  slower. And in this case, the code in question didn't really care about
>  perfect accuracy for the counter since it's just a rough guide. But
>  here's numbers with volatile on my machine:

Interesting. I thought that the runtime system might have inferred
that count should have been volatile but you numbers show that this is
not the case.
[snip]

>  I tried another non-volatile run using i += 1 rather than i++ and the
>  numbers were almost identical, with Java 6 severely degrading with
>  multiple threads running and Java 5 improving.

It would be interesting to try:

tmp = i;
tmp++;
i = tmp;

for tmp as a local variable and again for tmp as a public instance field.

(I'm sorry I can't do the tests for mystelf as I don't have access to
a muti core machine here).

John Wilson


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile
 More options Apr 2, 6:46 am
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Wed, 02 Apr 2008 11:46:22 +0100
Local: Wed, Apr 2 2008 6:46 am
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

Attila Szegedi wrote:
> No. You know your JVM bytecodes, Charlie - the only incrementing  
> bytecode in existence is IINC and it only works on an integer local  
> variable.

Yes, I know that...but I hadn't dug into what code was actually being
generated for a field++. Looking now they don't appear to be any
different. Either way it doesn't give me any answers...

> Pretty much the only sane implementation of ++ on a static field would  
> be:

> GETSTATIC someClass.someField
> ICONST_1
> IADD
> PUTSTATIC someClass.someField

> Not even volatility of the field will ensure atomic increment  
> operations, as the value must be temporarily held on the thread  
> operand stack.

> If you want atomic updates, java.util.concurrent.atomic.AtomicInteger  
> might give you what you need on Java 5 and above.

I'm more interested in finding out why the performance is so bad with
multiple threads under Java 6. I don't need atomic updates, for which I
would certainly use AtomicInteger instead.

- Charlie


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Wilson  
View profile
 More options Apr 2, 7:01 am
From: "John Wilson" <tugwil...@gmail.com>
Date: Wed, 2 Apr 2008 12:01:53 +0100
Local: Wed, Apr 2 2008 7:01 am
Subject: Re: [jvm-l] Performance characteristics of mutable static primitives?
On 4/2/08, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

>  I ran into a very strange effect when some Sun folks tried to benchmark
>  JRuby's multi-thread scalability. In short, adding more threads actually
>  caused the benchmarks to take longer.

Was this work being done on Intel hardware?

I'm just wondering if this might be some issue with the cache on Intel
mutli core processors (i.e. the cache is constantly being
invalidated).

This could be tested by running the equivalent C code on Intel kit and
running the Java code on non Intel kit.

John Wilson


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile
 More options Apr 2, 7:10 am
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Wed, 02 Apr 2008 12:10:09 +0100
Local: Wed, Apr 2 2008 7:10 am
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

John Wilson wrote:
> On 4/2/08, Charles Oliver Nutter <charles.nut...@sun.com> wrote:
>>  I ran into a very strange effect when some Sun folks tried to benchmark
>>  JRuby's multi-thread scalability. In short, adding more threads actually
>>  caused the benchmarks to take longer.

> Was this work being done on Intel hardware?

> I'm just wondering if this might be some issue with the cache on Intel
> mutli core processors (i.e. the cache is constantly being
> invalidated).

> This could be tested by running the equivalent C code on Intel kit and
> running the Java code on non Intel kit.

In both cases (my benchmarking and "Sun guys" benchmarking) it was on
x86-based hardware, but my runs were on OS X Intel (Core Duo) and theirs
were on Solaris AMD x86-64 (Opteron). I have not yet heard back whether
my fix improved thread scaling for them.

- Charlie


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jochen Theodorou  
View profile
 More options Apr 2, 11:27 am
From: Jochen Theodorou <blackd...@gmx.org>
Date: Wed, 02 Apr 2008 17:27:44 +0200
Local: Wed, Apr 2 2008 11:27 am
Subject: Re: [jvm-l] Performance characteristics of mutable static primitives?
Charles Oliver Nutter schrieb:

> I ran into a very strange effect when some Sun folks tried to benchmark
> JRuby's multi-thread scalability. In short, adding more threads actually
> caused the benchmarks to take longer.
[...]
> Instead pay attention to the trend...the soylatte Java
> 6 run with two threads is significantly slower than the run with a
> single thread. This mirrors the results with JRuby when there was a
> single static counter being incremented.

I think there is not enough data to see a trend. I modified your test,
made it run from 1-20 threads and for 50 loops, making an average time
containing the time it took to execute all threads and put these in a
diagram. I used a Q6600 Quadcore intel CPU with java 1.6.0_03-b05 on
Linux 2.6.22-14-generic #1 SMP x86_64 GNU/Linux. What I can see is that
the time constantly goes up until 4 Threads are reached, my number of
CPUs. Using 5 threads is takes less time than using 4, but after that
the time looks more or less constant.

This looks quite scalable to me.

bye blackdrag

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/
http://www.g2one.com/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile
 More options Apr 2, 11:51 am
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Wed, 02 Apr 2008 16:51:30 +0100
Local: Wed, Apr 2 2008 11:51 am
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?