Web Images Videos Maps News Shopping Gmail more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Performance characteristics of mutable static primitives?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 26 - 41 of 41 - Collapse all  -  Translate all to Translated (View all originals) < Older 
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
John Cowan  
View profile  
 More options Apr 3 2008, 3:31 pm
From: "John Cowan" <johnwco...@gmail.com>
Date: Thu, 3 Apr 2008 15:31:56 -0400
Local: Thurs, Apr 3 2008 3:31 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

On Thu, Apr 3, 2008 at 1:43 PM, John Rose <John.R...@sun.com> wrote:
> Even better (if applicable) would be to have fast thread-local counters,
> with a slow background phase which occasionally tallies them into a trailing
> global counter.

That reminds me of an old question of mine.  In the case where a
thread has been created from an object whose class is a subclass of
Thread, is Thread.currentThread() guaranteed to give you back the same
object?  IOW, if I say:

class MyThread extends Thread { ... }
myThread = new MyThread();
myThread.start();
...

then in the new thread, is Thread.currentThread() always == to the
original value of myThread?  If so, that provides a different approach
to thread-local state, where the state is held in the instance
variables of MyThread objects, and the JVM is used to thread them
through :-) the code until the point where they're needed.

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Rose  
View profile  
 More options Apr 3 2008, 7:01 pm
From: John Rose <John.R...@Sun.COM>
Date: Thu, 03 Apr 2008 16:01:54 -0700
Local: Thurs, Apr 3 2008 7:01 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?
On Apr 3, 2008, at 12:31 PM, John Cowan wrote:

> That reminds me of an old question of mine.  In the case where a
> thread has been created from an object whose class is a subclass of
> Thread, is Thread.currentThread() guaranteed to give you back the same
> object?

Yes.  You can then cast it to the subclass and go on from there.

> If so, that provides a different approach
> to thread-local state, where the state is held in the instance
> variables of MyThread objects, and the JVM is used to thread them
> through :-) the code until the point where they're needed.

That's how Java thread locals are implemented in the JVM.

You can only use this approach if (a) you control the creation of the  
thread and can specify your own subclass for it, or (b) you can edit  
the rt.jar code for java.lang.Thread.  Plan (c) is to make a  
ThreadLocal.

You might be interested to know that Hotspot intrinsifies  
Thread.currentThread to a couple of instructions; it's cheap.  We did  
that to make things like thread locals efficient.

Best,
-- John


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Cowan  
View profile  
 More options Apr 3 2008, 7:26 pm
From: "John Cowan" <johnwco...@gmail.com>
Date: Thu, 3 Apr 2008 19:26:21 -0400
Local: Thurs, Apr 3 2008 7:26 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

On Thu, Apr 3, 2008 at 7:01 PM, John Rose <John.R...@sun.com> wrote:
>  Yes.  You can then cast it to the subclass and go on from there.

That's what I had hoped, but the JLS doesn't actually seem to guarantee this.

>  > If so, that provides a different approach
>  > to thread-local state, where the state is held in the instance
>  > variables of MyThread objects, and the JVM is used to thread them
>  > through :-) the code until the point where they're needed.

>  That's how Java thread locals are implemented in the JVM.

I don't understand how that can be, as I can create as many
ThreadLocals as I want.  They can't all fit in the Thread object,
surely.  And how can reference to them be fast if they have to
indirect through a Thread subclass object?

>  You can only use this approach if (a) you control the creation of the
>  thread and can specify your own subclass for it,

That's what I had in mind.

>  You might be interested to know that Hotspot intrinsifies
>  Thread.currentThread to a couple of instructions; it's cheap.  We did
>  that to make things like thread locals efficient.

Good.

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marcelo Fukushima  
View profile  
 More options Apr 3 2008, 7:37 pm
From: "Marcelo Fukushima" <takesh...@gmail.com>
Date: Thu, 3 Apr 2008 20:37:06 -0300
Local: Thurs, Apr 3 2008 7:37 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?
On 4/3/08, John Cowan <johnwco...@gmail.com> wrote:

the source for ThreadLocal explains the first bit:

public void set(T value) {
        Thread t = Thread.currentThread();
        ThreadLocalMap map = getMap(t);
        if (map != null)
            map.set(this, value);
        else
            createMap(t, value);
    }

apparently, every thread has a map that holds all ThreadLocal's registered

--
[]'s
Marcelo Takeshi Fukushima

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Rose  
View profile  
 More options Apr 3 2008, 7:39 pm
From: John Rose <John.R...@Sun.COM>
Date: Thu, 03 Apr 2008 16:39:10 -0700
Local: Thurs, Apr 3 2008 7:39 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

On Apr 3, 2008, at 4:26 PM, John Cowan wrote:

> They can't all fit in the Thread object, surely.

That's what collections are for.

http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/75fca0b0ab83/src/share/
classes/java/lang/Thread.java

class Thread { ...
     /* ThreadLocal values pertaining to this thread. This map is  
maintained
      * by the ThreadLocal class. */
     ThreadLocal.ThreadLocalMap threadLocals;
... }

-- John


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rémi Forax  
View profile  
 More options Apr 4 2008, 4:46 am
From: Rémi Forax <fo...@univ-mlv.fr>
Date: Fri, 04 Apr 2008 10:46:07 +0200
Local: Fri, Apr 4 2008 4:46 am
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?
John Cowan a écrit :

yes, ThreadLocal is currently implemented has a field of  
java.lang.Thread storing a map,
so there is no synchronization.
>  and the JVM is used to thread them
> through :-) the code until the point where they're needed.

cheers,
Rémi

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
hlovatt  
View profile  
 More options Apr 7 2008, 6:02 am
From: hlovatt <howard.lov...@gmail.com>
Date: Mon, 7 Apr 2008 03:02:48 -0700 (PDT)
Local: Mon, Apr 7 2008 6:02 am
Subject: Re: Performance characteristics of mutable static primitives?
This is a very interesting discussion and reveals a lot about the
difficulties of programming for multiple cores.

I was trying to understand what was going on and was 'messing about'
with the code and I noticed that almost any change I made slowed the
code down considerably - more than the 1 or 2 thread options in Java 6
does. Therefore I am not sure how realistic the code is. If the code
did more than simply increment a variable or two than the problem
might go away (because contention would be less and because the
uncontested operations would be a bigger percentage).

Going back to the original code. If it is OK to miss a few increments
(hence non-volitile statics) then the following should be OK:

          threads[ j ] =
            new Thread() {
              public void run() {
                final int total = totalSize / threadCount;
                for ( int k = 0; k < total; k++ ) {
                  if ( ( k & 0xFF ) == 0 ) {  // New
                    i += 256;                     // New
                    firedCount++;              // New
                  }                                   // New
//                  if ( ( i++ & 0xFF ) == 0 ) { firedCount++; } //
Original
                }
              }
            };

The above version doesn't show the problem on Java 6 and is quicker
than the original.

On Apr 2, 6:48 pm, Charles Oliver Nutter <charles.nut...@sun.com>
wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
hlovatt  
View profile  
 More options Apr 7 2008, 6:03 am
From: hlovatt <howard.lov...@gmail.com>
Date: Mon, 7 Apr 2008 03:03:56 -0700 (PDT)
Local: Mon, Apr 7 2008 6:03 am
Subject: Re: Performance characteristics of mutable static primitives?
This is a very interesting discussion and reveals a lot about the
difficulties of programming for multiple cores.

I was trying to understand what was going on and was 'messing about'
with the code and I noticed that almost any change I made slowed the
code down considerably - more than the 1 or 2 thread options in Java 6
does. Therefore I am not sure how realistic the code is. If the code
did more than simply increment a variable or two than the problem
might go away (because contention would be less and because the
uncontested operations would be a bigger percentage).

Going back to the original code. If it is OK to miss a few increments
(hence non-volitile statics) then the following should be OK:

          threads[ j ] =
            new Thread() {
              public void run() {
                final int total = totalSize / threadCount;
                for ( int k = 0; k < total; k++ ) {
                  if ( ( k & 0xFF ) == 0 ) {  // New
                    i += 256;                     // New
                    firedCount++;              // New
                  }                                   // New
//                  if ( ( i++ & 0xFF ) == 0 ) { firedCount++; } //
Original
                }
              }
            };

The above version doesn't show the problem on Java 6 and is quicker
than the original.

On Apr 2, 6:48 pm, Charles Oliver Nutter <charles.nut...@sun.com>
wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile  
 More options Apr 7 2008, 11:17 am
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Mon, 07 Apr 2008 10:17:51 -0500
Local: Mon, Apr 7 2008 11:17 am
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

hlovatt wrote:
> This is a very interesting discussion and reveals a lot about the
> difficulties of programming for multiple cores.

> I was trying to understand what was going on and was 'messing about'
> with the code and I noticed that almost any change I made slowed the
> code down considerably - more than the 1 or 2 thread options in Java 6
> does. Therefore I am not sure how realistic the code is. If the code
> did more than simply increment a variable or two than the problem
> might go away (because contention would be less and because the
> uncontested operations would be a bigger percentage).

> Going back to the original code. If it is OK to miss a few increments
> (hence non-volitile statics) then the following should be OK:

Yeah, clever, and basically the same as the eventual fix I had for JRuby
since it gives each thread its own counter. I think the primary rule
here is that don't expect any sort of unsynchronized, uncontrolled
updates to the same shared resource to either behave the way you want or
perform the way you want.

- Charlie


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Pollak  
View profile  
 More options Apr 8 2008, 12:55 pm
From: "David Pollak" <feeder.of.the.be...@gmail.com>
Date: Tue, 8 Apr 2008 09:55:01 -0700
Local: Tues, Apr 8 2008 12:55 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

On Mon, Apr 7, 2008 at 8:17 AM, Charles Oliver Nutter <

It may be a bit off-topic, but it's interesting how this argues for
immutable data structures and Erlang's concurrency model.

> - Charlie

--
Scala lift off unconference, May 10th 2008, San Francisco
http://scalaliftoff.com
lift, the secure, simple, powerful web framework http://liftweb.net
Collaborative Task Management http://much4.us

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin Probst  
View profile  
 More options Apr 8 2008, 1:25 pm
From: Martin Probst <m...@martin-probst.com>
Date: Tue, 8 Apr 2008 19:25:53 +0200
Local: Tues, Apr 8 2008 1:25 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

>> Yeah, clever, and basically the same as the eventual fix I had for  
>> JRuby
>> since it gives each thread its own counter. I think the primary rule
>> here is that don't expect any sort of unsynchronized, uncontrolled
>> updates to the same shared resource to either behave the way you  
>> want or
>> perform the way you want.

> It may be a bit off-topic, but it's interesting how this argues for
> immutable data structures and Erlang's concurrency model.

I guess once you're into concurrency, everything argues for immutable  
data structures.

It's also nice to see how this fits with Gilad Brachas arguments  
against the Java 'static' keyword (which allows mutable structures to  
be shared implicitly between threads), as Charles already noted.

Regards,
Martin


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthias Ernst  
View profile  
 More options Apr 8 2008, 1:55 pm
From: "Matthias Ernst" <ernst.matth...@gmail.com>
Date: Tue, 8 Apr 2008 19:55:04 +0200
Local: Tues, Apr 8 2008 1:55 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?
I'm surprised you're surprised.

Whatever machinery involved: if multiple actors contend for the same
memory location they will suffer. The best strategy is to distribute
jobs, work on each of them independently on local resources, be it
processor registers, thread locals, local memory or disks and to
synchronize the results with others as infrequently as possible. It's
the basic principle behind fork-join in a multi-threaded process and
map-reduce on a large cluster. Nothing new that Gilad Bracha
discovered in 2008, is it?

Matthias


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Martin Probst  
View profile  
(1 user)  More options Apr 8 2008, 4:14 pm
From: Martin Probst <m...@martin-probst.com>
Date: Tue, 8 Apr 2008 22:14:35 +0200
Local: Tues, Apr 8 2008 4:14 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

> I'm surprised you're surprised.

Who's surprised?

> It's the basic principle behind fork-join in a multi-threaded  
> process and
> map-reduce on a large cluster.
> Nothing new that Gilad Bracha discovered in 2008, is it?

I was referring to Bracha's stuff about how static fields and global  
names in general harm software engineering, deployment (live code  
updating), concurrency, testability, etc. pp. I'm not sure that this  
stuff is novel, and it's certainly building on research of others. But  
it's definitely non-trivial, and it goes far beyond the basic insight  
that resource contention is a bottleneck in parallel/distributed  
systems.

The interesting part is that it indeed comes to similar solutions  
looking from a different angle and a different problem set. That's  
what I wanted to point out.

Regards,
Martin


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthias Ernst  
View profile  
 More options Apr 8 2008, 4:32 pm
From: "Matthias Ernst" <ernst.matth...@gmail.com>
Date: Tue, 8 Apr 2008 22:32:10 +0200
Local: Tues, Apr 8 2008 4:32 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

On Tue, Apr 8, 2008 at 10:14 PM, Martin Probst <m...@martin-probst.com> wrote:

>  > I'm surprised you're surprised.

>  Who's surprised?

Not you in particular. I just picked your message as the last in this
thread. I was just commenting on the overall sentiment that -to me-
sounds like "wow we may have discovered something new in concurrent
programming and great how all these pieces (Erlang, Bracha) finally
fit together now". Gilad's compilation was certainly well compiled but
global variables, hey, ...

Whatever. I'm not trying to provoke here.

Cheers
Matthias


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Rose  
View profile  
 More options Apr 8 2008, 6:05 pm
From: John Rose <John.R...@Sun.COM>
Date: Tue, 08 Apr 2008 15:05:14 -0700
Local: Tues, Apr 8 2008 6:05 pm
Subject: Re: [jvm-l] Re: Performance characteristics of mutable static primitives?

On Apr 8, 2008, at 10:25 AM, Martin Probst wrote:

> I guess once you're into concurrency, everything argues for immutable
> data structures.

Some ad lib observations on memory sharing, concurrency, and the JVM:

- Probably the hardest part about updating the Java memory model was  
making sure the rules for immutable objects (final fields) would work  
properly without synchronization.  It was necessary, because Java  
needs to work well on multiprocessors.

- Hotspot put in thread-local allocation buffers a decade ago, again  
to avoid avoid contention by memory partitioning.  We weren't quite  
as painfully cache conscious back then, but cache locality was also a  
consideration.  (Still, somebody needs to experiment more with cache-
conscious coloring for TLABs.  Sounds like a masters' project to me.)

- An immutable object is not immutable at first.  The trick is that  
the mutable part of its lifecycle is private, and so the system can  
cleverly protect it from contention, e.g., using TLABs.  (Occurs to  
me that "larvatus" = "hidden"; an immutable object has a larval stage  
during which it is constructed.)  This can be generalized to other  
controlled mutation phases; that's what GC safepoints do, for  
example:  Allow immutable objects to be temporarily mutated.  There  
might be an interesting "phased immutability" software abstraction to  
be investigated along these lines.

- The JVM needs to know about immutability in order to optimize it  
properly.  However, there are interesting (semi-)global analyses  
which can help it discover immutability even when it is not clearly  
marked in the source program.  (Optimistic ones are the coolest, and  
require a framework for dependency checking and deoptimization to  
recover.  Hotspot is good at that, and can get better.  I think there  
are PhD projects there.)

- In modern memory systems, main memory is usually many tens of  
cycles away from each CPU.  Caches try to hide this.  As a corollary,  
there are at least two protocols in the hidden network that connects  
memories, caches, and CPUs, and they differ greatly in intrinsic  
performance characteristics.  In essence, one protocol is (often)  
called RTS (read to share) and one is RTO (read to own).  RTO means  
your CPU & cache have to handshake with the rest of the world to make  
sure your have an exclusive right to a block of memory (cache line),  
before your can either read or write it.  Your program incurs this  
cost because it intends to write the memory.  But you want to use RTS  
when you can, because any number of CPUs can RTS the same block of  
memory (cache line) and happily work on the same data, as long as it  
remains immutable.  (I'm tempted to think RTS:RTO :: UDP:TCP; there  
are some basic similarities I suppose.)  In any case, when memory is  
shared, immutable memory is faster than mutable, and the JVM works  
hard to exploit the difference.

Best,
-- John


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
beelzabub  
View profile  
 More options Apr 15 2008, 1:04 pm
From: beelzabub <vlov...@gmail.com>
Date: Tue, 15 Apr 2008 10:04:24 -0700 (PDT)
Local: Tues, Apr 15 2008 1:04 pm
Subject: Re: Performance characteristics of mutable static primitives?
What you should do is run the thing through a profiler.  This will
tell you how many times poll is being called.  My guess is that
perhaps the thread contention is prolonging the amount of time
necessary to increment count - Threads 1-3 all load count into memory
(i.e 0), all add 1 to it, and then write it back out to memory, so
after 3 threads have "written" to count, it only contains 1.  This
seems to be the case - tested with linux jdk1.6, intel Q6600 (quad-
core):

$ java -server Trouble 1
time: 1995
fired: 3906250
time: 1966
fired: 3906250
time: 930
fired: 3906250
time: 916
fired: 3906250
time: 894
fired: 3906250
$ java -server Trouble 2
time: 734
fired: 3654450
time: 769
fired: 3802532
time: 821
fired: 3755766
time: 829
fired: 3775890
time: 800
fired: 3779483
$ java -server Trouble 3
time: 1836
fired: 3934611
time: 4518
fired: 3566843
time: 3102
fired: 3618666
time: 2886
fired: 3648195
time: 3844
fired: 3667866

On a side note, it's way faster to just optimize away the algorithm
(at least in your benchmark):
int numCalls = total / 0xFF;
i += total;
for (int k = 0; k < numCalls; ++k) doNothing();

Here's the same way, using AtomicIntegers:
int tmp = i.getAndAdd(total);
for (int k  = 0; k < numCalls; ++k)
{
   if ((k & 0xFF) == 0) doNothing();

}

However, none of this is related to the actual code, just the
benchmark.  The way to optimize the actual code would be to push count
into the ThreadContext & instead of doing a binary and, simply reset
the counter & do a comparison of 256 / numThreads.  Or you can keep
that code if you want every thread to poll once every 256 calls rather
than a thread getting polled once every 256 calls.

Other ways to speed it up:  Don't count & simply call pollEvents with
longer intervals in-between & pollEvents then simply calls doNothing
directly.

Summary: The problem is that multiple threads are overwriting the
global variable, so that it takes longer to actually increment it by
1.

Think of it this way:
count = 1
Threads A, B, C are running.
The CPU loads count (with a value of 1) into the register.
The CPU increments the values in its registers to 2
The CPU then writes out the values of each of the registers into
count.  So after all 3 threads increment count, the value is still 1.
Add in the overhead of switching between threads, and you can see why
you get a slowdown.  Adding in synchronization will help, provided you
do it intelligently - too often, and synchronization penalty takes
over.

On Apr 2, 1:48 am, Charles Oliver Nutter <charles.nut...@sun.com>
wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages < Older 
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google