jcarder running "forever"

47 views
Skip to first unread message

Todd Lipcon

unread,
Oct 6, 2009, 2:30:26 PM10/6/09
to JCarder
Hi,

I downloaded jcarder about a week ago and gave it a try on Hadoop
(http://hadoop.apache.org) to try to track down some difficult
deadlocks in the JobTracker. The runtime analysis seems to have gone
fine, but I've been running jcarder on the resulting database for
about a week on a reasonably fast machine and it has yet to terminate.

Here's a jstack:

Full thread dump Java HotSpot(TM) Server VM (14.2-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x08581000 nid=0x60c5 runnable
[0x00000000]
java.lang.Thread.State: RUNNABLE

"Low Memory Detector" daemon prio=10 tid=0x0854e000 nid=0xdfb runnable
[0x00000000]
java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x0854c400 nid=0xdfa waiting on
condition [0x00000000]
java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x08549800 nid=0xdf9 waiting on
condition [0x00000000]
java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x08548000 nid=0xdf8 runnable
[0x00000000]
java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x08535800 nid=0xdf7 in Object.wait()
[0x580a6000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x62472ae8> (a java.lang.ref.ReferenceQueue
$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:
118)
- locked <0x62472ae8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:
134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:
159)

"Reference Handler" daemon prio=10 tid=0x08534000 nid=0xdf6 in
Object.wait() [0x580f7000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x62472b70> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:
116)
- locked <0x62472b70> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x084e7000 nid=0xdf4 runnable [0xb7259000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.getEntry(HashMap.java:344)
at java.util.HashMap.containsKey(HashMap.java:335)
at java.util.HashSet.contains(HashSet.java:184)
at java.util.AbstractCollection.containsAll
(AbstractCollection.java:278)
at java.util.AbstractSet.equals(AbstractSet.java:78)
at com.enea.jcarder.analyzer.Cycle.equals(Cycle.java:115)
at java.util.HashMap.put(HashMap.java:376)
at java.util.HashSet.add(HashSet.java:200)
at com.enea.jcarder.analyzer.CycleDetector.analyzeEdge
(CycleDetector.java:121)
at com.enea.jcarder.analyzer.CycleDetector.analyzeNode
(CycleDetector.java:101)
at com.enea.jcarder.analyzer.CycleDetector.analyzeEdge
(CycleDetector.java:135)
at com.enea.jcarder.analyzer.CycleDetector.analyzeNode
(CycleDetector.java:101)
at com.enea.jcarder.analyzer.CycleDetector.analyzeEdge
(CycleDetector.java:135)
at com.enea.jcarder.analyzer.CycleDetector.analyzeNode
(CycleDetector.java:101)
at com.enea.jcarder.analyzer.CycleDetector.analyzeEdge
(CycleDetector.java:135)
at com.enea.jcarder.analyzer.CycleDetector.analyzeNode
(CycleDetector.java:101)
at com.enea.jcarder.analyzer.CycleDetector.analyzeEdge
(CycleDetector.java:135)
at com.enea.jcarder.analyzer.CycleDetector.analyzeNode
(CycleDetector.java:101)
at com.enea.jcarder.analyzer.CycleDetector.analyzeEdge
(CycleDetector.java:135)
at com.enea.jcarder.analyzer.CycleDetector.analyzeNode
(CycleDetector.java:101)
at com.enea.jcarder.analyzer.CycleDetector.analyzeLockNodes
(CycleDetector.java:75)
at com.enea.jcarder.analyzer.Analyzer.start(Analyzer.java:96)
at com.enea.jcarder.analyzer.Analyzer.main(Analyzer.java:70)

"VM Thread" prio=10 tid=0x08530000 nid=0xdf5 runnable

"VM Periodic Task Thread" prio=10 tid=0x08550000 nid=0xdfc waiting on
condition

JNI global references: 815


Here's the program output:

$ java -server -Xmx1400m -jar jcarder/jcarder.jar
Opening for reading: /root/jcarder_contexts.db
Opening for reading: /root/jcarder_events.db

Loaded from database files:
Nodes: 1722
Edges: 20602 (excluding 27418 duplicated)


I'm assuming week-long runtimes are unusual. Is it likely that this
will eventually terminate, or is it stuck in an infinite loop? Any
diagnostics I can do for you guys to help track it down? Happy to
upload the database files somewhere if that helps.

-Todd

Henning Moll

unread,
Oct 7, 2009, 3:16:04 PM10/7/09
to jca...@googlegroups.com, Todd Lipcon
Hi Todd,

On Dienstag, 6. Oktober 2009, Todd Lipcon wrote:
> Loaded from database files:
> Nodes: 1722
> Edges: 20602 (excluding 27418 duplicated)

I made similar observations in analyzing a Web-Application running on
Tomcat. In my case, the problem was not only the very long runtime of
the analyzing step, but also its memory consumption. It eats more then
1.6GB of data, finally failing with OOME.

One part of the problem is a poor implementation of a hash function in
class com.enea.jcarder.analyzer.LockEdge

You can try to replace the current implementation with the following:

public int hashCode() {
final int prime = 31;
int result = 1;
result = prime + mSource.getLockId();
result = prime * result + mSourceContextId;
result = prime * result + mTarget.getLockId();
result = prime * result + mTargetContextId;
result = prime * result
+ (int) (mThreadId ^ (mThreadId >>> 32));
return result;
}

Regards,
Henning

Todd Lipcon

unread,
Oct 7, 2009, 4:16:40 PM10/7/09
to jca...@googlegroups.com
Hi Henning,

I actually did make that fix yesterday (well, a very similar one), and it still ran for quite some time. In the end I actually purposefully "broke" the algorithm so that it didn't return all cycles. This yielded a couple potential deadlocks. Once I fixed those, I unbroke the algorithm and it now finishes its analysis in a matter of seconds.

I've also implemented cycle removal in the case of "gated locks". For example:

Thread A:
synchronized (gate) {
  synchronized (a) {
    synchronized (b) {
    ...
    }
  }
}

Thread B:
synchronized (gate) {
  synchronized (b) {
    synchronized (a) {
    ...
    }
  }
}

jcarder previously noted this as a potential deadlock, but since the "gate" lock precedes the b<->a cycle, it's not. In order to do this, I had to change around the instrumentation a fair amount.

I'm not a mercurial user, but you can find my changes on github: http://github.com/toddlipcon/jcarder/tree/cloudera
There are a couple other hacks in there that I did that you may or may not want to pull. If you're interested in the changes I can take some time to clean them up into more coherent patches.

Thanks
-Todd

DrScott

unread,
Oct 8, 2009, 7:31:42 AM10/8/09
to JCarder
Hi Todd

On Oct 7, 10:16 pm, Todd Lipcon <tlip...@gmail.com> wrote:
> I actually did make that fix yesterday (well, a very similar one), and it
> still ran for quite some time. In the end I actually purposefully "broke"
> the algorithm so that it didn't return all cycles. This yielded a couple
> potential deadlocks. Once I fixed those, I unbroke the algorithm and it now
> finishes its analysis in a matter of seconds.

I implemented a filter for "monitors of interesst" which solved that
runtime/memory problem for me. I will post the changes within the next
days.

> I've also implemented cycle removal in the case of "gated locks".

This is great!

> In order to do this, I had to
> change around the instrumentation a fair amount.

You mean that you now have to "listen" for MonitorExit, too?

> I'm not a mercurial user, but you can find my changes on github:http://github.com/toddlipcon/jcarder/tree/cloudera
> There are a couple other hacks in there that I did that you may or may not
> want to pull. If you're interested in the changes I can take some time to
> clean them up into more coherent patches.

Yes, i think you added some features which are usefull for others,
too. It would be waste if they don't get into the official release.
Just my opinion ;-)

Ulrik Svensson

unread,
Oct 8, 2009, 4:10:18 PM10/8/09
to jca...@googlegroups.com, Todd Lipcon
Hello Todd!

I would be glad to include your good work with "gated locks" in the
official releases!

As you found out, the agent in the current JCarder implementation are
not logging events for MONITOR EXIT and therefore the analyzer can not
differentiate between

synchronized(gate) {
synchronized(lock2) {
synchronized(lock3) { }
}
synchronized(lock3) {
synchronized(lock2) { }
}
}

and:

synchronized(gate) {
synchronized(lock2) {
synchronized(lock3) { }
}
}
synchronized(lock3) {
synchronized(lock2) { }
}

This is not a problem for the current algorithm but is needed for
finding gated locks.

As I said, It would be perfect to include your work in the official
JCarder releases, but there are some issues that I would like to
discuss before:


A. The runtime performance of the Agent was decreased a little bit
when also logging MONITOR EXIT, but not as much as I suspected. One
benchmark[*] indicated that a program analyzed with JCarder was slowed
down about 5 times instead of about 4 times as with the latest
official release.


B. The maximum length of a java method is 65536 bytes. If we don't
instrument more than necessarily we minimize the risk of exceeding
that limit.


C. The runtime performance is probably decreased a little due to the
logging of additional MONITOR ENTER events even for monitors already
entered. The latest official release avoid this because I think it is
common with programs like:

public synchronized void foo() {
...
bar();
}

public synchronized void bar() {
...
}


D. The gated locks analysis (during the analysis phase, not during
agent runtime) is using an additional data structure that requires
more memory.


E. Maybe there are some unusual cases where it happens to be a gated
lock in the unit tests or module test setup, but not in the real
application? Then there might be a potential deadlock in the real
application that is not found. Some people might also argue that lock
cycles are a sign of bad design even if there are no potential
deadlocks?

I thinking about:


1. One performance bottleneck of the JCarder agent is the writing of
events and contexts to the database files. If we are going to log
MONITOR EXIT events I see no reason for storing:

* Anything more than lock id and thread id for MONITOR EXIT events.

* lastTakenLockId or lastTakenLockingContextId for MONITOR ENTER

Am I right? This might minimize the impact of issue A and also issue B
because we don't have to add any context string parameters when
instrumenting MONITOR EXIT.


2. Avoid logging MONITOR ENTER for locks that already are held. We
should be able to call Thread.currentThread().holdslock() in both the
callback for MONITOR ENTER and for MONITOR EXIT and only log the event
if that method returns false. For this to work we must instrument the
code to call the beforeMonitorEnter callback BEFORE the lock is taken
and the afterMonitorExit callback AFTER the lock has been released.
This should solve issue C.


3. A parameter for enabling or disabling the gated lock algorithm
during the analyze. I don't think it needs to be configurable for the
runtime agent. This should solve issue E.


4. One way to handle issue D could be to avoid always storing the
extra information needed for the gated analysis in each node/edge and
instead parse the event database file again if, and only if, cycles
are found, and in that second phase only store the information needed
for finding gates for the already found cycles, not for all
nodes/edges.


What are your thoughts about all this?


[*] The benchmark that I've been using when trying to optimize the
runtime performance of the JCarder agent is the 2006-10 release of
DaCapo ( http://dacapobench.org/ ) with it's xalan-based benchmark. If
you have ideas about more suitable benchmarks, please tell!

/Ulrik

Ulrik Svensson

unread,
Oct 8, 2009, 4:14:57 PM10/8/09
to jca...@googlegroups.com
Hello Henning!

On Thu, Oct 8, 2009 at 1:31 PM, DrScott <news...@gmx.de> wrote:

> I implemented a filter for "monitors of interesst" which solved that
> runtime/memory problem for me. I will post the changes within the next
> days.

A filter like that sounds interesting, I'm looking forward to your post. :-)

/Ulrik

Ulrik Svensson

unread,
Oct 9, 2009, 3:51:24 AM10/9/09
to JCarder, Todd Lipcon
Hello again!

I'm correcting myself:

On Oct 8, 10:10 pm, Ulrik Svensson <ulri...@gmail.com> wrote:

>    synchronized(gate) {
>        synchronized(lock2) {
>           synchronized(lock3) { }
>        }
>        synchronized(lock3) {
>           synchronized(lock2) { }
>        }
>    }

Results in the events:

gate -> lock2
lock2 -> lock3
gate -> lock3
lock3 -> lock2


> and:
>
>    synchronized(gate) {
>        synchronized(lock2) {
>           synchronized(lock3) { }
>        }
>    }
>    synchronized(lock3) {
>       synchronized(lock2) { }
>    }
>

Results in the events:

gate -> lock2
lock2 -> lock3
lock3 -> lock2

The events are different and therefore the analyzer should be able to
differ between the two cases above.

Now I start believe it is possible to find gated locks also without
logging MONITOR EXIT events. Or am I missing something? Can any of you
provide a scenario where it is not possible?

/Ulrik

Todd Lipcon

unread,
Oct 9, 2009, 1:37:31 PM10/9/09
to Ulrik Svensson, jca...@googlegroups.com
Sorry - was that in previous testing or did you try out the code from my github repository? I did have to add MONITOR_EXIT logging, but in the process also removed the code which has to iterate over held locks to check if they're still held. I can try a benchmark of my repository if the results above are from some testing you did earlier.
 
Regarding performance, what are the goals of jcarder? In my view, no one would want to run a production system on top of it, so we should concern ourselves more with correctness and thorough analysis than minimizing runtime impact. Granted, if there are two equivalent methods and one is faster, we should choose the faster one :)



B. The maximum length of a java method is 65536 bytes. If we don't
instrument more than necessarily we minimize the risk of exceeding
that limit.

Well, I think this is a bit unfair - while adding more instrumentation makes the bug slightly more possible, the fact is the bug already exists. Is there any instrumentation trick to break a method into two? It seems like this would be pretty common for people using asm.
 


C. The runtime performance is probably decreased a little due to the
logging of additional MONITOR ENTER events even for monitors already
entered. The latest official release avoid this because I think it is
common with programs like:

     public synchronized void foo() {
        ...
        bar();
     }

     public synchronized void bar() {
        ...
     }

True - it logs them, but now doesn't need to look into a table of already-acquired locks, since that work is deferred to analysis time. I guess benchmarks will tell.
 


D. The gated locks analysis (during the analysis phase, not during
agent runtime) is using an additional data structure that requires
more memory.

Yep. As you mentioned it might be a good idea to do an option for this.
 


E. Maybe there are some unusual cases where it happens to be a gated
lock in the unit tests or module test setup, but not in the real
application? Then there might be a potential deadlock in the real
application that is not found. Some people might also argue that lock
cycles are a sign of bad design even if there are no potential
deadlocks?

This is a good point. Perhaps it would be nice to do output like:

Removing cycle gated by lock L_12345 (<lock context probably obviously inside JUnit>)
Removing cycle gated by lock L_12345 (<lock context>)
Removing cycle gated by lock L_12345 (<lock context>)
Removed 3 gated locks.

And then allow the user to specify flags of the form --ignore-gate=L_12345

 



I thinking about:


1. One performance bottleneck of the JCarder agent is the writing of
events and contexts to the database files. If we are going to log
MONITOR EXIT events I see no reason for storing:

 * Anything more than lock id and thread id for MONITOR EXIT events.

True, the context is currently used but not necessary. I can kill that.
 

 * lastTakenLockId or lastTakenLockingContextId for MONITOR ENTER

Am I right? This might minimize the impact of issue A and also issue B
because we don't have to add any context string parameters when
instrumenting MONITOR EXIT.

I did actually kill the lastTakenLockId completely for both enter and exit, since that's now available at analysis time.
 


2. Avoid logging MONITOR ENTER for locks that already are held. We
should be able to call Thread.currentThread().holdslock() in both the
callback for MONITOR ENTER and for MONITOR EXIT and only log the event
if that method returns false. For this to work we must instrument the
code to call the beforeMonitorEnter callback BEFORE the lock is taken
and the afterMonitorExit callback AFTER the lock has been released.
This should solve issue C.

This sounds correct. What's the performance impact of that function compared to just writing the 7 bytes to a buffered output stream?
 


3. A parameter for enabling or disabling the gated lock algorithm
during the analyze. I don't think it needs to be configurable for the
runtime agent. This should solve issue E.

Yep.
 


4. One way to handle issue D could be to avoid always storing the
extra information needed for the gated analysis in each node/edge and
instead parse the event database file again if, and only if, cycles
are found, and in that second phase only store the information needed
for finding gates for the already found cycles, not for all
nodes/edges.


What are your thoughts about all this?

Seems reasonable. If memory usage is also a concern, there are probably some more efficient datastructures that can be used. There are a lot of HashSets for sets that are likely to be very small - in these cases using an ArrayList is probably much more memory efficient, and since it fits in a single cache line probably faster as well. If memory usage is a problem I bet we can cut it by a significant fraction by careful use of data structures.

-Todd

Ulrik Svensson

unread,
Oct 11, 2009, 7:02:13 PM10/11/09
to JCarder, Todd Lipcon
Hello Todd!

On Oct 9, 7:37 pm, Todd Lipcon <tlip...@gmail.com> wrote:

> On Thu, Oct 8, 2009 at 1:10 PM, Ulrik Svensson <ulri...@gmail.com> wrote:
> > A. The runtime performance of the Agent was decreased a little bit
> > when also logging MONITOR EXIT, but not as much as I suspected. One
> > benchmark[*] indicated that a program analyzed with JCarder was slowed
> > down about 5 times instead of about 4 times as with the latest
> > official release.
>
> Sorry - was that in previous testing or did you try out the code from my
> github repository? I did have to add MONITOR_EXIT logging, but in the
> process also removed the code which has to iterate over held locks to check
> if they're still held. I can try a benchmark of my repository if the results
> above are from some testing you did earlier.

Now I did a new benchmark with DaCapo-9.10-beta0:

xalan without jcarder 5 sec
xalan with jcarder 1.0.1 13 sec (260%)
xalan with jcarder github 15 sec (300%)

lusearch without jcarder 8 sec
lusearch with jcarder 1.0.1 10 sec (125%)
lusearch with jcarder github 14 sec (175%)

tomcat without jcarder 11 sec
tomcat with jcarder 1.0.1 14 sec (127%)
tomcat with jcarder github 14 sec (127%)

(The numbers are averages of at least 6 repeated executions, excluding
the first "cache warmup execution", but unfortunately with some other
programs running in the background on the computer (my Laptop, Intel
Core 2 Duo) at the same time)

The performance probably varies between different applications and
depends a lot on how the application synchronizes, if the same
monitors are reused many times, if many locks are held at the same
time, if already held monitors are reentrant, the number of concurrent
threads, number of cores in the CPU, and so on. And I was using the
xalan benchmark when trying to optimize the 1.0.1 release so it is not
fair to compare only the xalan based benchmark.

Maybe it is possible to optimize a solution that logs MONITOR_EXIT so
it becomes as fast, or even faster than the current 1.0.1
implementation. It probably depends on which benchmark are used. ;-)


> Regarding performance, what are the goals of jcarder? In my view, no one
> would want to run a production system on top of it, so we should concern
> ourselves more with correctness and thorough analysis than minimizing
> runtime impact. Granted, if there are two equivalent methods and one is
> faster, we should choose the faster one :)

I agree that correctness is more important than performance.

But sometimes I am using JCarder to analyze large java GUI
applications by starting them with the JCarder Agent and then manually
clicking around in the GUIs and it becomes frustrating to manually use
the GUIs if they are not responsive. And also since I have put effort
into optimizing the performance of the runtime agent I feel bad about
doing changes that decreases the performance if it is not necessary.

> > B. The maximum length of a java method is 65536 bytes. If we don't
> > instrument more than necessarily we minimize the risk of exceeding
> > that limit.
>
> Well, I think this is a bit unfair - while adding more instrumentation makes
> the bug slightly more possible, the fact is the bug already exists. Is there
> any instrumentation trick to break a method into two? It seems like this
> would be pretty common for people using asm.

I don't know of any support from ASM for breaking one large method
into several smaller. If we write code for such functionality ourself,
I'm afraid the probability for introducing a new bug in that code is
larger than the probability for stumbling on such a large class.

One of my first priorities of JCarder is to avoid complexity in the
instrumentation since it is very difficult to test the instrumentation
for all possible inputs / classes.

There are some tests for the instrumentation here:

test/com/enea/jcarder/agent/instrument/
TestDeadLockInstrumentation.java
test/com/enea/jcarder/agent/TestMonitorEventListener.java

but those tests are not complete and are a little bit outdated (but
are still working!) since they were written a long time ago when the
cycle analysis were done in runtime and not as a separate step.

> > C. The runtime performance is probably decreased a little due to the
> > logging of additional MONITOR ENTER events even for monitors already
> > entered. The latest official release avoid this because I think it is
> > common with programs like:
>
> >      public synchronized void foo() {
> >         ...
> >         bar();
> >      }
>
> >      public synchronized void bar() {
> >         ...
> >      }
>
> True - it logs them, but now doesn't need to look into a table of
> already-acquired locks, since that work is deferred to analysis time. I
> guess benchmarks will tell.

If I remember correctly the major bottle neck was file I/O. But yes,
benchmarks will tell.

> > E. Maybe there are some unusual cases where it happens to be a gated
> > lock in the unit tests or module test setup, but not in the real
> > application? Then there might be a potential deadlock in the real
> > application that is not found. Some people might also argue that lock
> > cycles are a sign of bad design even if there are no potential
> > deadlocks?
>
> This is a good point. Perhaps it would be nice to do output like:
>
> Removing cycle gated by lock L_12345 (<lock context probably obviously
> inside JUnit>)
> Removing cycle gated by lock L_12345 (<lock context>)
> Removing cycle gated by lock L_12345 (<lock context>)
> Removed 3 gated locks.
>
> And then allow the user to specify flags of the form --ignore-gate=L_12345

Yes, such flag would be very powerful when running a analyze manually.

Maybe we also need a solution that can be used when running JCarder in
automatic regression tests? In such tests the id of the lock (L_12345)
would probably change for each execution?

> > 2. Avoid logging MONITOR ENTER for locks that already are held. We
> > should be able to call Thread.currentThread().holdslock() in both the
> > callback for MONITOR ENTER and for MONITOR EXIT and only log the event
> > if that method returns false. For this to work we must instrument the
> > code to call the beforeMonitorEnter callback BEFORE the lock is taken
> > and the afterMonitorExit callback AFTER the lock has been released.
> > This should solve issue C.
>
> This sounds correct. What's the performance impact of that function compared
> to just writing the 7 bytes to a buffered output stream?

That function can be called by any thread anytime without holding any
internal JCarder locks that would block concurrently executing
threads. And only one thread at a time can write to the buffered
output stream. And the alternative is not just writing 7 bytes to a
buffered output stream, but also accessing LockingContextIdCache and
LockIdGenerator which also needs synchronization.

I guess it would be possible to write a benchmark where the
performance would be improved significantly by using
Thread.currentThread().holdslock() on a multi core CPU.

> > 4. One way to handle issue D could be to avoid always storing the
> > extra information needed for the gated analysis in each node/edge and
> > instead parse the event database file again if, and only if, cycles
> > are found, and in that second phase only store the information needed
> > for finding gates for the already found cycles, not for all
> > nodes/edges.
>
> > What are your thoughts about all this?
>
> Seems reasonable. If memory usage is also a concern, there are probably some
> more efficient datastructures that can be used. There are a lot of HashSets
> for sets that are likely to be very small - in these cases using an
> ArrayList is probably much more memory efficient, and since it fits in a
> single cache line probably faster as well. If memory usage is a problem I
> bet we can cut it by a significant fraction by careful use of data
> structures.

Are we going to optimize for the worst case or the average case? And
what is the worst case? A lot of nodes, a lot of edges, a lot of
cycles?

One can argue that the performance and memory usage is already good
enough for the average case but both You and Henning has reported
memory and performance problems for some specific cases? Maybe we need
to focus on optimization for those cases?

As I wrote, I have put some effort in optimizing the runtime agent,
but no effort at all in optimizing the analysis step. It is probably
room for much improvements in that part.

/Ulrik

Ulrik Svensson

unread,
Oct 11, 2009, 7:07:50 PM10/11/09
to JCarder
Hello Henning!

Are you thinking about implementating a filter for "monitors of
interest" in the instrumentation step, during runtime or in the
analysing step? They are not mutually exclusive but I think there are
pros and cons with all the three alternatives.

/Ulrik


On Oct 8, 10:14 pm, Ulrik Svensson <ulri...@gmail.com> wrote:
> Hello Henning!
>

Henning Moll

unread,
Oct 13, 2009, 9:47:02 PM10/13/09
to jca...@googlegroups.com
Hi Ulrik,

On Montag, 12. Oktober 2009, you wrote:
> Are you thinking about implementating a filter for "monitors of
> interest" in the instrumentation step, during runtime or in the
> analysing step? They are not mutually exclusive but I think there are
> pros and cons with all the three alternatives.

My implementation works during runtime, so there are no differences in
the instrumentation. This is also due to the fact of absolute absence
of any knowledge in this area... ;-)

For 'unwanted' monitors i just return as far as possible
from 'beforeMonitorEnter'.

I am currently on holiday for about a week. I will post my changes if i
find the time for some 'clean up'. So it may happen not before the end
of my vacation... ;-)

Regards,
Henning

Henning Moll

unread,
Oct 13, 2009, 9:51:49 PM10/13/09
to jca...@googlegroups.com
On Montag, 12. Oktober 2009, Ulrik Svensson wrote:
> As I wrote, I have put some effort in optimizing the runtime agent,
> but no effort at all in optimizing the analysis step. It is probably
> room for much improvements in that part.

The memory problem i stumbled upon is, that im my case there are many
(!) duplicate cycles, which are reduced not before all cycles have been
collected. A quick test which checks for 'AlikeCycles' also during
cycle detection seems to solve this problem (99% of cycles removed).
But i am not sure if there are situations were it is wrong to do this.
And then there is the impact on execution time, which i did not pay
attention to. Again a candidate for introducing an option?

So i am a little bit afraid of introducing yet another structure in
class Cycle, Todd ;-)

Regards,
Henning

Ulrik Svensson

unread,
Oct 17, 2009, 7:31:36 AM10/17/09
to JCarder
Hello Henning!

I hope you enjoy your vacation! :-)

In order to avoid instrumenting certain classes, make the method
com.enea.jcarder.agent.instrument.ClassTransformer.instrument return
null for those classes.

In order to avoid instrumenting specific methods, make the method
com.enea.jcarder.agent.instrument.ClassAdapter.visitMethod call
super.visitMethod directy, without doing anything more for those
methods.

The two instrumentation alternatives above should be straight forward
to use as long as the filtering only needs to be done on basis of the
name of the package, class or method. I don't think there are any
apparent performance problems using regular expressions for matching
the names during the instrumentation.

A benefit of filter during the instrumentation, instead of during
runtime, is that there would be no performance penalty in runtime,
except while loading the classes.

I think the challenge would be to find a convenient way for the user
to configure the filter.

/Ulrik

Henning Moll

unread,
Oct 18, 2009, 3:50:48 PM10/18/09
to jca...@googlegroups.com
On Samstag, 17. Oktober 2009, Ulrik Svensson wrote:
> I hope you enjoy your vacation! :-)
Well, yes but it's over in a few hours... :-(

> In order to avoid instrumenting certain classes, make the method
> com.enea.jcarder.agent.instrument.ClassTransformer.instrument return
> null for those classes.

This sounds interesting, but as far as i understand, this filters for
classes/methods where no monitor objects are of interrest at all. My
intention is to filter out some kind of monitor objects, regardless of
where they are used. E.g.:

class A {

public func() {
Object monitor = getMonitor();
synchronized (monitor) {
...
}

ConcreteMonitor cMonitor = ...
synchronized (cMonior) {
...
}
}
}

With your suggested change during the instrumentation it would be
possible to ignore any synchronization within class A resp. A.func()
completely.

What i want to achieve is to filter out only synchronizations on objects
of type "ConcreteMonitor". Do you agree that this is something that
could not be done during instrumentation? In the above code example,
the concrete objects returned by "getMonitor()" can only be checked
during runtime...

I am not sure whether this can be done with instrumentation. I don't
have the neccessary insight of the current implementation, but i think
jCarder's agent is triggered by instrumented code within the class
which uses a monitor and not from the monitor instance itself, right?
Can you help?

Anyway, your suggested filter is also handy.

Regards,
Henning

Ulrik Svensson

unread,
Oct 19, 2009, 3:34:33 PM10/19/09
to JCarder
On Oct 18, 9:50 pm, Henning Moll <newsSc...@gmx.de> wrote:

> What i want to achieve is to filter out only synchronizations on objects
> of type "ConcreteMonitor". Do you agree that this is something that
> could not be done during instrumentation? In the above code example,
> the concrete objects returned by "getMonitor()" can only be checked
> during runtime...

Yes, you are right, in the general case such filter can not be applied
until runtime.

> I am not sure whether this can be done with instrumentation. I don't
> have the neccessary insight of the current implementation, but i think
> jCarder's agent is triggered by instrumented code within the class
> which uses a monitor and not from the monitor instance itself, right?

Yes, you are right!

/Ulrik

Henning Moll

unread,
Oct 21, 2009, 6:22:12 PM10/21/09
to jca...@googlegroups.com, Ulrik Svensson
Hi Ulrik,

Finally i found some time to finish my work on jcarder. Please find
attached a patch for the latest version on hg. I'm not sure how the
mailinglist handles attachments, so i send it also directly to you.

The patch includes three changes:

a) filter for class level handling of certain monitors
b) include filter for monitors
c) a better hashcode function for LockEdge

Both filters are configured with a comma separated list
of "expressions". These expressions are simply matched
by "String.startsWith". So the monitor instance of class

com.bla.fasel.Foo

is matched by each of

com.,com.bla.,com.bla.fasel.,com.bla.fasel.Foo

This model has some limitations, but i think it's sufficient in most
cases. The results of the filters ares cached.

Both filter types are defined via a system property:

-Djcarder.classLevel
-Djcarder.include

Example:

-Djcarder.classLevel=com.bla.fasel.Foo,com.foo.

This will handle instances of classes 'com.bla.fasel.Foo'
and 'com.bla.fasel.Fool' as well as all every class below 'com.foo.**'.

a) class level:

Monitors for which their class name matches one of the filter
expressions are not handled individually but on class level. The real
synchronization is still done on the individuals, but not the cycle
detection.

b) include filter:

Only monitors for which their class name matches one of the filter
expressions are analyzed by jCarder.
If the option is missing, every monitor is analyzed.

I am very interessted in the results of your benchmarks.

Best regards,
Henning

patch.txt

Ulrik Svensson

unread,
Nov 2, 2009, 2:02:22 PM11/2/09
to JCarder
Thank you very much Henning for your patch and contribution to
JCarder!

I'm sorry for not answering until now. Your patch is now in the
official hg-repository at http://hg.jcarder.org/jcarder/. I was using
your gmx.de email address as author when creating mercurial change
sets for your patch. I hope that was ok.

I also run two different benchmarks without setting any -D properties.
In one of the benchmarks the performance with your patch was on par
with the latest stable release. In another benchmark it even seemed to
be a little bit faster than the last stable release (still without any
activated filters)! It was a surprise for me. After I added a
synchronization block to the filter code that I believe was missing,
it was on par in both the benchmarks.

Ideally, I would like to see the filter code separated from
EventListener.java, into a separate class but that separation would
probably be easier to achieve if we can purge the enteredMonitors data
structure and instead log MONITOR EXIT events, like Todd suggested.
But there are both pros and cons with such a change and I have not
made up my mind yet.

/Ulrik
>  patch.txt
> 7KViewDownload

Todd Lipcon

unread,
Nov 2, 2009, 2:10:13 PM11/2/09
to jca...@googlegroups.com
On Mon, Nov 2, 2009 at 11:02 AM, Ulrik Svensson <ulr...@gmail.com> wrote:

Thank you very much Henning for your patch and contribution to
JCarder!

I'm sorry for not answering until now. Your patch is now in the
official hg-repository at http://hg.jcarder.org/jcarder/. I was using
your gmx.de email address as author when creating mercurial change
sets for your patch. I hope that was ok.

I also run two different benchmarks without setting any -D properties.
In one of the benchmarks the performance with your patch was on par
with the latest stable release. In another benchmark it even seemed to
be a little bit faster than the last stable release (still without any
activated filters)! It was a surprise for me. After I added a
synchronization block to the filter code that I believe was missing,
it was on par in both the benchmarks.

Ideally, I would like to see the filter code separated from
EventListener.java, into a separate class but that separation would
probably be easier to achieve if we can purge the enteredMonitors data
structure and instead log MONITOR EXIT events, like Todd suggested.
But there are both pros and cons with such a change and I have not
made up my mind yet.

I spent a few hours last week trying to get the MONITOR_EXIT logging up to speed. No matter what tricks I tried to pull, I couldn't get it to be as fast as the current release. I'm still of the opinion that I'd rather have the extra features than the speed for my use cases, though :)

-Todd

Todd Lipcon

unread,
Nov 13, 2009, 6:17:12 PM11/13/09
to jca...@googlegroups.com
On Mon, Nov 2, 2009 at 11:10 AM, Todd Lipcon <tli...@gmail.com> wrote:
On Mon, Nov 2, 2009 at 11:02 AM, Ulrik Svensson <ulr...@gmail.com> wrote:

Thank you very much Henning for your patch and contribution to
JCarder!

I'm sorry for not answering until now. Your patch is now in the
official hg-repository at http://hg.jcarder.org/jcarder/. I was using
your gmx.de email address as author when creating mercurial change
sets for your patch. I hope that was ok.

I also run two different benchmarks without setting any -D properties.
In one of the benchmarks the performance with your patch was on par
with the latest stable release. In another benchmark it even seemed to
be a little bit faster than the last stable release (still without any
activated filters)! It was a surprise for me. After I added a
synchronization block to the filter code that I believe was missing,
it was on par in both the benchmarks.

Ideally, I would like to see the filter code separated from
EventListener.java, into a separate class but that separation would
probably be easier to achieve if we can purge the enteredMonitors data
structure and instead log MONITOR EXIT events, like Todd suggested.
But there are both pros and cons with such a change and I have not
made up my mind yet.

I spent a few hours last week trying to get the MONITOR_EXIT logging up to speed. No matter what tricks I tried to pull, I couldn't get it to be as fast as the current release. I'm still of the opinion that I'd rather have the extra features than the speed for my use cases, though :)

Sorry to bump this back into your collective inboxes, but I'm still interested to hear what other people think about getting the MONITOR_EXIT logging back into trunk.

For my use case, jcarder is seomwhat useless without it, since it catches so many false positives. Granted, the cyclic lock ordering gated by a higher level lock is bad style and a code smell, but I'm dealing with a 300Kloc project. I understand there may be an idea how to do the same functionality without logging monitor exits, but the algorithm wasn't very clear to me.

Thanks
-Todd
 

Henning Moll

unread,
Nov 15, 2009, 5:37:34 AM11/15/09
to jca...@googlegroups.com
Hi Todd,

On Samstag, 14. November 2009, Todd Lipcon wrote:
> Sorry to bump this back into your collective inboxes, but I'm still
> interested to hear what other people think about getting the
> MONITOR_EXIT logging back into trunk.

Well, i am interested in that, too.

> For my use case, jcarder is seomwhat useless without it, since it
> catches so many false positives. Granted, the cyclic lock ordering
> gated by a higher level lock is bad style and a code smell, but I'm
> dealing with a 300Kloc project. I understand there may be an idea
> how to do the same functionality without logging monitor exits, but
> the algorithm wasn't very clear to me.

Sometimes, code is getting too complex. Gating monitors are an
alternative which may reduce this complexity. There are always pros and
cons... ;-)

Are there changes to keep both possibilities (current algorithm, Todd's
algorithm) side by side?

Regards,
Henning

Todd Lipcon

unread,
Nov 15, 2009, 11:53:27 AM11/15/09
to jca...@googlegroups.com
On Sun, Nov 15, 2009 at 2:37 AM, Henning Moll <news...@gmx.de> wrote:

Hi Todd,

On Samstag, 14. November 2009, Todd Lipcon wrote:
> Sorry to bump this back into your collective inboxes, but I'm still
> interested to hear what other people think about getting the
>  MONITOR_EXIT logging back into trunk.

Well, i am interested in that, too.

> For my use case, jcarder is seomwhat useless without it, since it
>  catches so many false positives. Granted, the cyclic lock ordering
>  gated by a higher level lock is bad style and a code smell, but I'm
>  dealing with a 300Kloc project. I understand there may be an idea
>  how to do the same functionality without logging monitor exits, but
>  the algorithm wasn't very clear to me.

Sometimes, code is getting too complex. Gating monitors are an
alternative which may reduce this complexity. There are always pros and
cons... ;-)
Agreed

Are there changes to keep both possibilities (current algorithm, Todd's
algorithm) side by side?


Increased code complexity? :)
 
Regards,
Henning



Ulrik Svensson

unread,
Nov 17, 2009, 1:45:08 PM11/17/09
to JCarder
Hello Todd!

After some thought (sorry for the delay) I decided that I'm willing to
accept a patch with logging of MONITOR EXIT events even if the runtime
performance would be degraded a little bit.

I still believe it would be possible to find gating monitors even
without logging MONITOR EXIT, but it would require more complex logic
and your argument, about focusing on correctness and and thorough
analysis instead of minimizing runtime impact, is good!

The unit tests would have to be updated, especially the tests in
com.enea.jcarder.agent.instrument.TestDeadLockInstrumentation when not
only MONITOR ENTER, but also MONITOR EXIT are logged. I also guess
that some (but not all!) of the unit tests in
com.enea.jcarder.agent.TestMonitorEventListener would be obsolete if
ThreadLocalEnteredMonitors isn't used anymore.

/Ulrik

Todd Lipcon

unread,
Nov 17, 2009, 1:46:32 PM11/17/09
to jca...@googlegroups.com
Hi Ulrik,

Thanks! I'll look into the unit tests this week and make sure that the code is well tested and that they pass. I'll let you know when I have a tree you can pull.

Did git work out OK for seeing my work last time? Or would you prefer a simple patch file?

-Todd

--

You received this message because you are subscribed to the Google Groups "JCarder" group.
To post to this group, send email to jca...@googlegroups.com.
To unsubscribe from this group, send email to jcarder+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jcarder?hl=.



Ulrik Svensson

unread,
Nov 18, 2009, 3:32:37 PM11/18/09
to JCarder
Perfect Todd!

Mercurial change-sets would be most convenient for me, but I can
probably also pull changes from your git repository if you prefer
that.

/Ulrik


On Nov 17, 7:46 pm, Todd Lipcon <tlip...@gmail.com> wrote:
> Hi Ulrik,
>
> Thanks! I'll look into the unit tests this week and make sure that the code
> is well tested and that they pass. I'll let you know when I have a tree you
> can pull.
>
> Did git work out OK for seeing my work last time? Or would you prefer a
> simple patch file?
>
> -Todd
>
> On Tue, Nov 17, 2009 at 10:45 AM, Ulrik Svensson <ulri...@gmail.com> wrote:
> > Hello Todd!
>
> > After some thought (sorry for the delay) I decided that I'm willing to
> > accept a patch with logging of MONITOR EXIT events even if the runtime
> > performance would be degraded a little bit.
>
> > I still believe it would be possible to find gating monitors even
> > without logging MONITOR EXIT, but it would require more complex logic
> > and your argument, about focusing on correctness and and thorough
> > analysis instead of minimizing runtime impact, is good!
>
> > The unit tests would have to be updated, especially the tests in
> > com.enea.jcarder.agent.instrument.TestDeadLockInstrumentation when not
> > only MONITOR ENTER, but also MONITOR EXIT are logged. I also guess
> > that some (but not all!) of the unit tests in
> > com.enea.jcarder.agent.TestMonitorEventListener would be obsolete if
> > ThreadLocalEnteredMonitors isn't used anymore.
>
> > /Ulrik
>
> > --
>
> > You received this message because you are subscribed to the Google Groups
> > "JCarder" group.
> > To post to this group, send email to jca...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > jcarder+u...@googlegroups.com<jcarder%2Bunsu...@googlegroups.com>
> > .

Todd Lipcon

unread,
Dec 28, 2009, 7:43:23 PM12/28/09
to jca...@googlegroups.com
Hi Ulrik,

Sorry this took so long, but finally got everything into shape:

http://github.com/toddlipcon/jcarder/tree/cloudera

I did manage to merge in the newest tree from hg and resolve those
conflicts. Unit tests still pass, and I tried the dining philosophers
example with success, but haven't used it for a real workload since
then, so it's worth considering "experimental".

If you like, I can also re-run the benchmarks with this newest
version. There was certainly a speed hit since it records locks both
entering and exiting, but for my use case there are way too many false
positives without.

Thanks!

-Todd

> To unsubscribe from this group, send email to jcarder+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jcarder?hl=en.
>
>
>

Todd Lipcon

unread,
Jan 27, 2010, 5:33:04 PM1/27/10
to jca...@googlegroups.com
Hey guys,

Any word on this? Do you need any further work from me? I'd love to
see this contributed upstream.

Thanks
-Todd

Ulrik Svensson

unread,
Feb 2, 2010, 3:08:44 PM2/2/10
to JCarder
Support for analyzing gated locks has finally been brought into the
official JCarder repository at: http://hg.jcarder.org/jcarder

Thank you very much for your excellent work Todd and thanks for
contributing your changes including all the test cases!

It was surprisingly convenient to pull your change sets from your git
repository into hg with the Hg-Git Mercurial Plugin. I included all
your change sets except the last commit to allow jcarder options as
arguments, because I would like to keep command line arguments that
might be used by any java aplication being tested, as separate from
options to the jcarder agent as possible, in order to minimize the
risk of interference between them. Therefore I prefer using only java
properties with a jcarder prefix for jcarder options and not ordinary
java command line arguments.

Once again, thank you very much Todd and I'm sorry it took me so long
to get your contribution into the main repository.

/Ulrik


On Jan 27, 11:33 pm, Todd Lipcon <tlip...@gmail.com> wrote:
> Hey guys,
>
> Any word on this? Do you need any further work from me? I'd love to
> see this contributed upstream.
>
> Thanks
> -Todd
>

> On Mon, Dec 28, 2009 at 4:43 PM, Todd Lipcon <tlip...@gmail.com> wrote:
> > Hi Ulrik,
>
> > Sorry this took so long, but finally got everything into shape:
>
> >http://github.com/toddlipcon/jcarder/tree/cloudera
>
> > I did manage to merge in the newest tree from hg and resolve those
> > conflicts. Unit tests still pass, and I tried the dining philosophers
> > example with success, but haven't used it for a real workload since
> > then, so it's worth considering "experimental".
>
> > If you like, I can also re-run the benchmarks with this newest
> > version. There was certainly a speed hit since it records locks both
> > entering and exiting, but for my use case there are way too many false
> > positives without.
>
> > Thanks!
>
> > -Todd
>

Todd Lipcon

unread,
Feb 2, 2010, 3:18:36 PM2/2/10
to jca...@googlegroups.com
Hey Ulrik,

Great news, and thanks a lot for your work integrating! I also found
the hg-git mercurial plugin really useful from the git side.

Regarding the last changeset, note that the "command line arguments"
there aren't actually Java command line arguments. Rather, they're the
javaagent arguments. The change allows me to do the following:

JAVA_TOOL_OPTIONS="-javaagent:/path/to/jcarder.jar=outputdir=/foo/jc-@TIME"
ant run-tests

the JAVA_TOOL_OPTIONS environment will get passed through to any
forked jvms underneath ant. This is how I'm able to run jcarder on the
full suite of unit tests for Hadoop, which is my primary use case.

Does the patch make sense given this explanation?

Thanks
-Todd

Todd Lipcon

unread,
Feb 17, 2010, 3:06:45 AM2/17/10
to jca...@googlegroups.com
Hi JCarderers!

Any thoughts on this below? I've written up some instructions on the
Hadoop Wiki for using JCarder with Hadoop:
http://wiki.apache.org/hadoop/HowToUseJCarder and unfortunately I have
to point people to my github repository for now rather than the
JCarder repo itself, since this feature is missing.

In other good news, I've been able to identify a bunch of really
tricky bugs in Hadoop using JCarder. So thanks again for this very
useful tool.

-Todd

Ulrik Svensson

unread,
Feb 21, 2010, 10:06:21 AM2/21/10
to JCarder
Hi Tod!

Ok, I see. You are right, the javaagent arguments make sense. I pushed
your change set to the main repository.

I’m glad JCarder was useful for identifying those tricky bugs in
Hadoop!

/Ulrik


On 17 Feb, 09:06, Todd Lipcon <tlip...@gmail.com> wrote:
> Hi JCarderers!
>
> Any thoughts on this below? I've written up some instructions on the

> Hadoop Wiki for using JCarder with Hadoop:http://wiki.apache.org/hadoop/HowToUseJCarderand unfortunately I have


> to point people to my github repository for now rather than the
> JCarder repo itself, since this feature is missing.
>
> In other good news, I've been able to identify a bunch of really
> tricky bugs in Hadoop using JCarder. So thanks again for this very
> useful tool.
>
> -Todd
>
>
>
> On Tue, Feb 2, 2010 at 12:18 PM, Todd Lipcon <tlip...@gmail.com> wrote:
> > Hey Ulrik,
>
> > Great news, and thanks a lot for your work integrating! I also found
> > the hg-git mercurial plugin really useful from the git side.
>
> > Regarding the last changeset, note that the "command line arguments"
> > there aren't actually Java command line arguments. Rather, they're the
> > javaagent arguments. The change allows me to do the following:
>
> > JAVA_TOOL_OPTIONS="-javaagent:/path/to/jcarder.jar=outputdir=/foo/jc-@TIME"
> > ant run-tests
>
> > the JAVA_TOOL_OPTIONS environment will get passed through to any
> > forked jvms underneath ant. This is how I'm able to run jcarder on the
> > full suite of unit tests for Hadoop, which is my primary use case.
>
> > Does the patch make sense given this explanation?
>
> > Thanks
> > -Todd
>

> >> For more options, visit this group athttp://groups.google.com/group/jcarder?hl=en.- Dölj citerad text -
>
> - Visa citerad text -

Reply all
Reply to author
Forward
0 new messages