another concurrency question: could there be contention for allocation (consing etc.)?

10 views
Skip to first unread message

Lee Spector

unread,
Mar 27, 2010, 12:13:20 AM3/27/10
to clo...@googlegroups.com

Is it possible that multiple threads all furiously generating list structure would have some sort of contention for the memory allocation state?

My losses of multicore utilization seem to be correlated with the generation of lots of random expressions in concurrent threads. I'd been worrying about contention for the random states, which I think I've now made thread local, but now I wonder if the contention might be in the allocation. Possible? If so, then is there a way around it?

Thanks,

-Lee

--
Lee Spector, Professor of Computer Science
School of Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspe...@hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438

Check out Genetic Programming and Evolvable Machines:
http://www.springer.com/10710 - http://gpemjournal.blogspot.com/

Chas Emerick

unread,
Mar 27, 2010, 6:21:48 AM3/27/10
to clo...@googlegroups.com
If you're not using a parallel garbage collector (which is the case by
default), then generating significant garbage will result in not-
insignificant GC pauses. Allocation itself isn't a synchronous
operation, but the default GC is.

Most java profilers have thread-related tools that allow you to see on
what, when, and for how long threads are blocking. It'd be worth it
to enable parallel GC just to see what happens, but beyond that, it
sounds like some data would help in determining specifically what code
is tripping you up.

If you can post/paste code, that might help, too.

- Chas

Daniel

unread,
Mar 27, 2010, 8:36:16 AM3/27/10
to clo...@googlegroups.com
Not sure if this is going to help, but I recently tried to optimize
performance of my long running IDE process, and crawled through a lot
of JVM flags and benchmarks. I give you the more or less raw list
below (stripped of UI related stuff), which you might find useful.
Machine specs are Macbook 2Ghz Core 2 Duo, 4Gb RAM

java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04-248-9M3125)
Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01-101, mixed mode)

Especially the Parallel GC option below need at least 1.6.0_10,
preferably _16+ It was a bit flaky before, and it's still an
experimental feature, so don't complain if you're computer turns into
marshmallows or something :)

Also be aware that the following setup is going to make your startup
slower, but your subsequent execution faster.

-server
-Xms256m
-Xmx1024m (Note: give your process enough memory, but not too much,
otherwise you're competing with other processes on you're machine, and
the OS will start to page out, and subsequently thrash.)
-Djava.net.preferIPv4Stack=true
-XX:CompileThreshold=1500 (makes the server VM compile faster. server
compiles after 10k calls with better results, so if you're running in
a tight loop and/or have a server duration of uptime (ie. multiple
days and longer), then it might make sense to omit this one, also
don't go lower than 1500 (the setting for -client, performance
deteriorates, because compilation is adapts to runtime statistics.
More statistics -> faster code)
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC (not sure if this flag is redundant with the one above)
-XX:+ExplicitGCInvokesConcurrent
-XX:+CMSClassUnloadingEnabled
-XX:MaxPermSize=250m
-XX:+UseAdaptiveSizePolicy
-XX:+AggressiveOpts
-XX:+UseFastAccessorMethods
-XX:+UseFastEmptyMethods
-XX:+UseFastJNIAccessors
-Xverify:none (very important flag, if you trust all executed code.
This will disable bytecode verification, speeding up loading of new
code. On further thought, this might be one of your bottlenecks, if
you're not only generating lists, but actually compiling them with
Clojure (-> new bytecode))
-XX:+UseCompressedOOPS


For the sake of completeness: UI related
-Dsun.java2d.opengl=true
-Dsun.awt.keepWorkingSetOnMinimize=true (I think this is only relevant
on Win32, not sure. Deals with OS not swapping out the working set on
window minimize)
-Dawt.useSystemAAFontSettings=lcd

I didn't explain every single flag, Google is your friend. There are
also quite a few interesting benchmarks around on the influence of the
flags depending on machine architecture (4 or more cores for
instance), and on OS (Win, Mac, *nix, 32/64bit). Be sure to constrain
your search to the last 2 years or so, a lot is old stuff and
irrelevant. 1.6.0_10 was the equivalent of a new JVM, because the JDK
process was (is) delayed.

Note: I just skimmed the messages and didn't see any discussion
before, so if this has already been discussed or if there are more
specific details on your process around, sorry for not reading them.

On my machine the above settings seemed to help keeping things sane. I
would be interested to know whether you had any success with the
settings.

If I missed anything that anyone found worthwhile, I'd be interested
to hear as well.

Hope that helps

Cheers,
Daniel


On Sat, Mar 27, 2010 at 5:21 PM, Chas Emerick <ceme...@snowtide.com> wrote:
> If you're not using a parallel garbage collector (which is the case by
> default), then generating significant garbage will result in

> not-insignificant GC pauses.  Allocation itself isn't a synchronous

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>
> To unsubscribe from this group, send email to
> clojure+unsubscribegooglegroups.com or reply to this email with the words
> "REMOVE ME" as the subject.
>

Lee Spector

unread,
Mar 27, 2010, 11:09:42 AM3/27/10
to clo...@googlegroups.com

Thanks Chas and Daniel.

It's funny -- as I was waking up this morning, before being really awake, I literally thought "or maybe it's the GC" :-)

I didn't know about the parallel GC option and will have to try that out. The 8-core machine I'm using is still running Java 1.5.0_19, so I guess I'll have to upgrade that first.

BTW also, someone else previously commented on a different thread that maybe some of my slow-downs were GC related, and at the time I didn't understand the possible interactions between the GC and multithread timing issues... which I'm still not sure I completely understand, but all of this has now been cast in a new light.

Chas: is there a particular profiler/thread-tool that you'd recommend for looking at where the contention is happening? If you're really curious I've temporarily put the code -- warning: it's very crufty and interim and half hacked to pieces for the sake of looking into these issues -- at http://hampshire.edu/lspector/temp/clj-debug. If you run regression.clj that will load clojush.clj and then begin cranking.

-Lee

Check out Genetic Programming and Evolvable Machines:

Peter Schuller

unread,
Mar 27, 2010, 12:26:24 PM3/27/10
to clo...@googlegroups.com
> BTW also, someone else previously commented on a different thread that maybe some of my slow-downs were GC related, and at the time I didn't understand the possible interactions between the GC and multithread timing issues... which I'm still not sure I completely understand, but all of this has now been cast in a new light.

Using the -XX:+PrintGC and -XX:+PrintGCDetails JVM options will tell
you more about when and whether GC is happening, and what type of GC
is happening.

Whether or not a GC pause is parallel is relevant to whether or not
the JVM will use all (or several depending on settings) available
cores during said pause. There is also the concept of "concurrent" GC,
which refers to the GC running concurrently (in one or more threads)
with your application.

Depending on your choice of GC you may have non-parallel/parallel GC
during stop-the-world pauses, or parallel/non-parallel GC running
concurrently with your application.

The '-Xincgc' option will turn on the concurrent mark-sweep GC which
makes operation on the old generation concurrent, and I believe the
default will also be for parallel GC in the young generation
(generations refer to classifications of data as young or new; the
distinction exists because GC:s typically perform optimizations based
on the fact that most applications generate a lot of short-lived
data).

If you are interested in how GC:s work I can recommend the following papers:

"Uniprocessor garbage collection techniques", which is now fairly old
but gives a pretty good (IMO) overview of GC algorithms of different
types:

ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps

Actually if anyone has good suggestions for similar papers I'm all
ears. This was the first paper I ever read about GC and launched my
interest in it; I still think it's good, but then I'm biased.

"Garbage-First Garbage Collection" describes the upcoming GC which is
already available but not yet the default, but intended to be the
default replacement for CMS in JDK 1.7:

http://research.sun.com/jtech/pubs/04-g1-paper-ismm.pdf

That one in particular will cover several aspects that are relevant to
concurrency as multi-processor scalability was a design goal.

You may also be interested in what I believe to be the original paper
about the CMS collector:

http://research.sun.com/techrep/2000/smli_tr-2000-88.pdf

Of course both CMS and G1 will have changed since the original
publications of the papers, but they should offer good insight.

--
/ Peter Schuller

Lee Spector

unread,
Mar 27, 2010, 6:48:12 PM3/27/10
to clo...@googlegroups.com

Thanks Peter. Just using -Xincgc seems to make a major difference for me, even under java 1.5.0_19. This is very nice in terms of my runtimes and it also shows that GC is a major factor here.

-Lee

Reply all
Reply to author
Forward
0 new messages