How are you?
The attached code hangs quite a bit with the 1.3 jdk when jvmpi is used. It works fine on the Solaris 1.3 jdk. With the jit enabled it hangs much less often. I have never got it to run to completion (should take around 30 seconds) with java -Dcompiler=NONE and have got it to complete once using java_g -Dcompiler=NONE (I must have tried at least 20 times).
Here is how I am running the class:
/usr/java/IBMJava2-13/bin/java -Djava.compiler=NONE -Xrunhprof:cpu=samples,depth=6
gcing2
One significant point is the runhprof option. Without this it works. As I am sure you are aware, runhprof invokes some jvmpi profiling that dumps information to a file (by default). I notice that this profiling does not work properly when the jit is enabled. The jit seems to effectively cause some of the profiling not to be done so another question is: is it a documented limitation that jvmpi does not work properly with the jit enabled?
I have been unsuccessful in producing a javacore file for the hang - javacore hangs and CTRL-C and CTRL-Z won't get me out - something to do with gcing maybe (obviously whoever put javacore into the 1.3 jdk didn't test it properly! ;-). I have also tried gdb but even with java_g it is giving me no information on the threads (I'm using gdb v5.0).
I have attached a javacore file from another app running on the same system so you can see my setup.
cheers,
Allan.
-- Allan Boyd
al...@volantis.com Volantis Systems www.volantis.com tel: +44 (0) 1344 631828
> Hi Neil,
>
> How are you?
>
I'm fine, how are you?
>
> The attached code hangs quite a bit with the 1.3 jdk when jvmpi is
> used. It works fine on the Solaris 1.3 jdk. With the jit enabled it
> hangs much less often. I have never got it to run to completion
> (should take around 30 seconds) with java -Dcompiler=NONE and have got
> it to complete once using java_g -Dcompiler=NONE (I must have tried at
> least 20 times).
>
> Here is how I am running the class:
> /usr/java/IBMJava2-13/bin/java -Djava.compiler=NONE
> -Xrunhprof:cpu=samples,depth=6 gcing2
>
> One significant point is the runhprof option. Without this it works.
> As I am sure you are aware, runhprof invokes some jvmpi profiling that
> dumps information to a file (by default). I notice that this profiling
> does not work properly when the jit is enabled. The jit seems to
> effectively cause some of the profiling not to be done so another
> question is: is it a documented limitation that jvmpi does not work
> properly with the jit enabled?
>
I have not seen any such document, so I guess it's supposed to work.
>
> I have been unsuccessful in producing a javacore file for the hang -
> javacore hangs and CTRL-C and CTRL-Z won't get me out - something to
> do with gcing maybe (obviously whoever put javacore into the 1.3 jdk
> didn't test it properly! ;-).
Probably the same programmer who, just before he left to join Volantis,
reset the SEGV signal handler to
give a javacore, ignoring the fact that the MMI uses SEGVs to detect
NULL objects, causing all our test
cases to fail. ;-)
> I have also tried gdb but even with java_g it is giving me no
> information on the threads (I'm using gdb v5.0).
All the threads end up suspended. It could be that jvmpi puts some
extra stress on the threading and
signalling processing which has been causing us problems. A lot of
development work has been going
on in this area, so I'll run your testcase on the JVM when this work is
completed.
Neil
Thanks for looking into that. I will look forward to the updated threading model.
I received an e-mail from one of the JProbe engineers (you may have heard of these guys since we had to do stuff in the JDKs so that their product would work). JProbe uses jvmpi. He said that the jit causes performance analysis problems because it does method inlining (hmm, maybe this can be disabled). FYI: JProbe tell me that they cannot support their product on Solaris because HotSpot breaks jvmpi even more than the IBM jit - at least you can switch the jit off!
As for SIGSEGV: I explained what the problem was to Tom before I left but I ran out of time. Anyway I knew you would be up to the task of fixing it - well done! ;-)
cheers,
Allan.
-- Allan Boyd Volantis Systems
> Neil,
>
> Thanks for looking into that. I will look forward to the updated
> threading model.
>
> I received an e-mail from one of the JProbe engineers (you may have
> heard of these guys since we had to do stuff in the JDKs so that their
> product would work). JProbe uses jvmpi. He said that the jit causes
> performance analysis problems because it does method inlining (hmm,
> maybe this can be disabled). FYI: JProbe tell me that they cannot
> support their product on Solaris because HotSpot breaks jvmpi even
> more than the IBM jit - at least you can switch the jit off!
>
> As for SIGSEGV: I explained what the problem was to Tom before I left
> but I ran out of time. Anyway I knew you would be up to the task of
> fixing it - well done! ;-)
export JITC_COMPILEOPT=NINLINING to disable method inling.
Neil
--
(note: I don't speak for IBM,
they don't speak for me;
it's better that way. )
Neil, it sounds like you boys are busy banging on some Thread related
problems in the Linux JDK1.3 code. Does that mean I should not submit
my test cases for two fully repeatable bugs with the 11/24 build
relating to threads? These bugs are:
1) An issue that calling interrupt() on a thread object does NOT
interrupt the thread. The test case works great in all other JVMs,
except for IBM JVM JDK1.3.
2) The issue I sent you a while back relating to repeated XMLRPC calls
using multiple threads.. The entire IBM JVM deadlocks, and all threads
(even threads doing nothing related to XMLRPC) freeze dead.
Should I not bother submitting these test cases until I can run them
myself with the new Threads code? When is that code going to be
released, and can/should I grab a pre-release version to see if the new
code resolves the two issues?
Unfortunately with these two bugs in the IBM JVM we can't really use
the ibm jvm for any production stuff, since it would crash every couple
hours. The IBM JVM seems to run about twice as fast as Sun's JDK1.3
with hotspot. IBM's done some really super work here.
Thanks,
-Joe
Joe,
All test cases are gratefully received. I have used your XMLRPC testcase
to verify that the threading has improved in the latest internal delivery.
It also highlighted a problem when trace is switched on which is being
worked on.
I'll let you have a pre-release if I can, but there are practical difficulties;
we aim to have a short turnaround once a product has entered testing.
and these threading changes have been integrated into the Service code
line at the last minute. There won't be much time to deliver a pre-release,
get your feedback and make any changes. The target refresh cycles of
6-8 weeks also make it difficult to delay a product without impacting
other deliver dates. I'll see what I can do though, because it is a good time
to raise problems while Development are still interested in the problem :-).
Thanks for your kind comments.
Please don't everybody ask for a pre-release - the real one will be along
soon!
Regards,
Neil Masson
IBM Java Technology Centre
Okay, I'm attaching a second test case. This one has to do with the
interrupt() signal not being received by threads. I've put a writeup at
the beginning of the java file. It works on other JVMs. It doesn't
matter if the JIT is on or off.
> I'll let you have a pre-release if I can, but there are practical difficulties;
> we aim to have a short turnaround once a product has entered testing.
> and these threading changes have been integrated into the Service code
> line at the last minute. There won't be much time to deliver a pre-release,
> get your feedback and make any changes. The target refresh cycles of
> 6-8 weeks also make it difficult to delay a product without impacting
> other deliver dates. I'll see what I can do though, because it is a good time
> to raise problems while Development are still interested in the problem :-).
Okay, send me an email if the pre-release works out, otherwise I'll hit
it next time around. I've got a suite of tests I bang a JVM against
before I allow it to make it to one of our production boxes. Most of
the stuff is internal, and (like the XMLRPC thing) a pain in the butt to
turn into a unit test.
-Joe
1. You should use while(!isInterrupted()) in your while loop. The 'try'
block catches only interrupts which are fired while the program is in the
'wait' call. I succeeded (but just once) to run a "non-busting" "java Worker
1" which actually busted because of this problem.
2. When you start your program with "java Worker 0", you assume that all
threads are running, before an interrupt signal occurs. This does not need
to be the case. With many threads started, it WILL not be the case. With
"java Worker 1" you wait till all threads are started and the program
functions properly.
Greetings,
Mark Scheffer.
"Joe Kislo" <ki...@athenium.com> wrote in message
news:3A75B122...@athenium.com...
----------------------------------------------------------------------------
----
>
> /**
> * Test case for IBM JDK1.3 Thread signaling failure under Linux.
> *
> * Threads don't seem to be signaled properly when interrupt() is called
on them. As a result, you can interrupt
> * a thread, but it will fail to acctually interrupt. The symptom of this
is when your application attempts to terminate, there will be
> * unterminated threads, and the JVM will not quit.
> *
> * I noticed that if all of the threads are in the wait state (and not on
the ready_queue), they appear to be signaled properly
> *
> * if you run:
> * java Worker 0
> *
> * You will see that the IBM JVM does not terminate at the end of the
test. You will also see that one or more of the threads
> * did not print "Terminating!" to the screen. These are the threads
which did not die, even though they were interrupted. You will
> * notice that the threads which do not Terminate, *DID* print "Starting!"
to the screen. Meaning they *ARE* inside the exception handler
> * for the interrupt signal.
> * -- You might need to run this test a couple times before it will
happen.
> *
> *
> * If you run:
> * java Worker 1
> *
> * You will see that the IBM JVM -does- function properly. This shows
that the IBM JVM Thread signaling code isn't a total loss :)
> * If you examine the code you will see why this mode does not crash the
IBM_JVM (because all the threads should have safely made it to the wait();
line)
> *
> * If you run this test on any other JVM, everything works dandy.
> *
> *
> * Email me if you need any help, ki...@athenium.com
> *
> * @author <a href="mailto:ki...@athenium.com "Joe Kislo</a>
> * @version
> */
>
> import java.util.*;
>
>
> public class Worker extends Thread {
>
> private String workerID;
> private boolean ready=false;
>
> public Worker (String workerID){
> this.workerID=workerID;
> }
>
> public boolean getReady() {
> return ready;
> }
>
> public String getWorkerID() {
> return workerID;
> }
>
> public void poke() {
> synchronized(this) {
> notifyAll();
> }
> }
>
> public void run() {
> try {
> System.out.println(workerID+": Starting!");
>
> while (true) {
> synchronized(this) {
> ready=true;
> wait();
> }
> System.out.println(workerID+": Ouch!");
> }
> } catch (InterruptedException ie) {
> System.out.println(workerID+": Terminating!");
> }
> }
> static public void main(String[] str) {
> if (str.length==0) {
> System.out.println("Usage: java Worker [0|1]");
> System.out.println("");
> System.out.println("State 0 busts IBM_JVM");
> System.out.println("State 1 proves IBM_JVM Threads signal properly when
they are not in the ready_queue");
> System.exit(-1);
> }
> boolean bust_jvm = str[0].equals("0");
> int numThreads=20;
> Vector theThreads = new Vector();
> System.out.println("Making threads!");
> for (int i=0;i<numThreads;i++) {
> Worker w = new Worker(Integer.toString(i));
> theThreads.addElement(w);
> w.start();
> }
> System.out.println("Poking threads!");
> for (Enumeration e = theThreads.elements();e.hasMoreElements();) {
> Worker w = (Worker) e.nextElement();
> if (!bust_jvm) {
> while (!w.getReady()){}
> }
> w.poke();
> }
> System.out.println("Interrupting threads!");
>
> for (Enumeration e = theThreads.elements();e.hasMoreElements();) {
> ((Worker)e.nextElement()).interrupt();
> }
> System.out.println("I should terminate now");
> }
> }// Worker
>
>
>
>
Hmm, I don' tthink there's a bug in the example. I will admit it's bad
code, but it still illustrates my point. When you run Worker 0, you
will get:
20 threads say "Starting!",
19 threads say "Terminating!"
It's that thread that didn't terminate which is the problem. So lets
figure out what it's doing. It said "Starting!", and has NOT said
"Ouch!", which means it is either BEFORE or INSIDE the wait(). Lets
take the case where it is INSIDE the wait(). If it is inside the wait,
it should throw the InterruptedException when interrupted, and say
"Terminating". It does not. I think we agree here.
What I think we don't agree on is if the thread is NOT in the wait()
yet. So lets say it printed "Starting!" then yielded. It gets
interrupted. Then it waits(). Should wait() then at that point, since
the thread is already interrupted, immediately throw the
InterruptedException, or wait until the thread is interrupted again. I
don't have the JLS infront of me, so I'll have to do this by simply
seeing what happens in a JVM. So if wait() only throws an
InterruptedException if the thread is interrupted during the wait(),
then:
this.currentThread().interrupt();
wait();
Should ALWAYS HALT indefinately. (Assuming nothing else interrupts
it).
However, attached is a quick piece of java code, which shows that if you
interrupt a thread, then wait(), wait throws the InterruptedException,
even though the thread was interrupted outside the wait().
So that ultimately closes the second possibility in the Worker example.
If the thread was interrupted BEFORE wait() was executed, it *still*
should have throw an InterruptedException and terminated. Yet it did
not. Perhaps there is a race condition in your wait(), maybe it's
checking the current interrupted state, then pulling itself off the
ready queue. When that action acctually needs to be atomic.
Lemme know if you have any other questions... My little worker example
illustrates I problem I have with almost all my thread pooled
applications. Except the wierd thing is, usually all the workers in my
pool *ARE* in the wait()... Yet they still fail to terminate. And,
ofcourse, all this works just fine and dandy on any other JVM.
And as for the threads not being started yet, since they have all
printed "Starting!", I know they are started. Yes there is a
possibility that they might not have started, and in a real application
there would need to be a test.. But since each thread prints to the
screen when it starts, we know they're all started before the interrupt
signal comes.
-Joe
----------------------------------------------------------------------------
----
> public class Tester {
>
> public Tester() {
> Thread.currentThread().interrupt();
> synchronized(this) {
> try {
> wait();
> } catch (InterruptedException ie) {
> System.out.println("Wait terminated due to interrupt");
> }
> }
> }
> static final public void main(String[] s) {
> new Tester();
> }
> }
>
Correct it is a definite BUG. I give it one more try and then put my foot in
my mouth.
I tested your Worker example with only one Worker and the program
occasionally hangs. With some extra debug messages, I think I found the
reason why: the interupted status flag does not seem to be volatile. The
debug messages show that the thread is waiting for a lock in the
"synchronize(this)" statement, it then gets an interrupt (isInterupted()
returns true for the interrupting thread), but then when the lock is
acquired, isInterrupted() returns FALSE.
Hope this helps,
Mark.
We've fixed your interrupt problem in Service Release 7 (available April)