Pitest hangs forever if slave VM initialization fails

463 views
Skip to first unread message

S J Kissane

unread,
Dec 26, 2012, 10:12:17 PM12/26/12
to pitu...@googlegroups.com
Hi

I have been running pitest 0.29 on a moderately sized code base (nightly using
Hudson), and recently the ANT Hudson job has started to hang. (Hours later it
is still running, and hasn't shown any signs of progress in that time.) Last
output before it hangs is:

[pitest] stderr  : 10:05:23 AM PIT >> FINE : mutating method XXX
[pitest]
[pitest] stderr  : 10:05:23 AM PIT >> FINE : 3 relevant test for XXX
[pitest]
[pitest] stderr  : 10:05:23 AM PIT >> FINE : replaced class with mutant in 13 ms
[pitest]
[pitest] stderr  : 10:05:23 AM PIT >> FINE : Running 1 units
[pitest]
[pitest] stderr  : 10:05:23 AM PIT >> FINE : Finished
[pitest]
[pitest] stderr  : 10:05:23 AM PIT >> FINE : Mutation Mutation -> className=XXXX, mutator=org.pitest.mutationtest.engine.gregor.mutators.ReturnValsMutator detected = KILLED by XXX(XXX
[pitest] stderr  : XXX)
[pitest] 10:05:23 AM PIT >> FINE : processed mutation in 25 ms.
[pitest]
[pitest] stdout  : Error occurred during initialization of VM
[pitest] agent library failed to init: instrument
[pitest]
[pitest] stderr  : Error opening zip file or JAR manifest missing : /tmp/1356562714264015241973089113678414702599088100345.jar
[pitest]
[pitest] 10:05:23 AM PIT >> FINE : Mutation -> className=XXX, mutator=org.pitest.mutationtest.engine.gregor.mutators.ConditionalsBoundaryMutator KILLED by XXX(XXX)
[pitest] 10:05:23 AM PIT >> FINE : Mutation -> className=XXX, mutator=org.pitest.mutationtest.engine.gregor.mutators.VoidMethodCallMutator KILLED by XXX(XXX)
[pitest] 10:05:23 AM PIT >> FINE : Mutation -> className=XXX, mutator=org.pitest.mutationtest.engine.gregor.mutators.ReturnValsMutator KILLED by XXX(XXX)
[pitest] 10:05:23 AM PIT >> FINE : Mutation -> className=XXX, mutator=org.pitest.mutationtest.engine.gregor.mutators.ReturnValsMutator KILLED by XXX(XXX)
[pitest] 10:05:23 AM PIT >> FINE : Exit code was - OK
[pitest] 10:05:23 AM PIT >> FINE : Slave exited ok
[pitest] 10:05:23 AM PIT >> FINE : using port 8092

(I replaced references to the software being tested with XXX.)

Now at this point, only processes left running per jps output are:
25652 MutationCoverageReport
31762 Jps
26284 LoopForever
26209 LoopForever
26328 LoopForever
25635 Launcher        [ANT script that launched pitest]

(The "LoopForever" processes are some dummy subprocesses the test suite creates
to test some subprocess management code. They get cleaned up by the time the
Hudson job finishes.)

The MutationCoverageReporter process thread dump is included below. (This is on
JDK 1.7.0_09 on Linux x64).

What I think is happening is as follows:
1) The mutation coverage reporter tries to launch the mutation slave but it
   dies for some reason
2) The reporter never detects the slave has died and thus hangs forever.

I guess these are separate problems; (2) is probably easier to diagnose/fix
than (1). In fact, reading MutationTestProcess.java, I see there is no code to
detect abnormal termination of the process; it waits for the communication
thread to finish, but does no monitoring of the slave process status. Maybe
there should be another thread that waits for the subprocess to finish, and
then maybe interrupts the comms thread if it is still running?

Thanks for any help
Simon

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2012-12-27 13:34:05
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.5-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x000000000a04d800 nid=0x7c31 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"PIT Stream Monitor" daemon prio=10 tid=0x00002aaab8a71800 nid=0x7a45 waiting on condition [0x0000000041bb0000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.pitest.util.StreamMonitor.readFromStream(StreamMonitor.java:56)
        at org.pitest.util.StreamMonitor.run(StreamMonitor.java:45)

"PIT Stream Monitor" daemon prio=10 tid=0x00002aaab8a73000 nid=0x7a43 waiting on condition [0x0000000041dda000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.pitest.util.StreamMonitor.readFromStream(StreamMonitor.java:56)
        at org.pitest.util.StreamMonitor.run(StreamMonitor.java:45)

"pit communication" daemon prio=10 tid=0x00002aaab8a70800 nid=0x7a3f runnable [0x00000000426a3000]
   java.lang.Thread.State: RUNNABLE
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        at java.net.ServerSocket.implAccept(ServerSocket.java:522)
        at java.net.ServerSocket.accept(ServerSocket.java:490)
        at org.pitest.util.SocketReadingCallable.call(SocketReadingCallable.java:28)
        at org.pitest.util.SocketReadingCallable.call(SocketReadingCallable.java:13)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.lang.Thread.run(Thread.java:722)

"Thread-2" daemon prio=10 tid=0x000000000d6ef000 nid=0x649c waiting on condition [0x0000000040ca2000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007d8286880> (a java.util.concurrent.FutureTask$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:248)
        at java.util.concurrent.FutureTask.get(FutureTask.java:111)
        at org.pitest.util.CommunicationThread.waitToFinish(CommunicationThread.java:60)
        at org.pitest.mutationtest.execute.MutationTestProcess.waitToDie(MutationTestProcess.java:49)
        at org.pitest.mutationtest.instrument.MutationTestUnit.waitForSlaveToDie(MutationTestUnit.java:157)
        at org.pitest.mutationtest.instrument.MutationTestUnit.runTestInSeperateProcessForMutationRange(MutationTestUnit.java:132)
        at org.pitest.mutationtest.instrument.MutationTestUnit.runTestsInSeperateProcess(MutationTestUnit.java:205)
        at org.pitest.mutationtest.instrument.MutationTestUnit.runTestsForMutations(MutationTestUnit.java:115)
        at org.pitest.mutationtest.instrument.MutationTestUnit.runTests(MutationTestUnit.java:94)
        at org.pitest.mutationtest.instrument.MutationTestUnit.execute(MutationTestUnit.java:83)
        at org.pitest.containers.UnContainer.submit(UnContainer.java:46)
        at org.pitest.Pitest$3.run(Pitest.java:148)
        at java.lang.Thread.run(Thread.java:722)

"Service Thread" daemon prio=10 tid=0x0000000009fc6800 nid=0x6442 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x0000000009fc4800 nid=0x6441 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x0000000009fb8800 nid=0x6440 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x0000000009fb6800 nid=0x643f runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x0000000009f64000 nid=0x643e in Object.wait() [0x000000004229f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x0000000784cafc98> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)

"Reference Handler" daemon prio=10 tid=0x0000000009f61800 nid=0x643d in Object.wait() [0x000000004219e000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x0000000784caf818> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x0000000009ed5000 nid=0x6435 in Object.wait() [0x0000000040709000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1266)
        - locked <0x0000000784d7c1a0> (a java.lang.Thread)
        at org.pitest.Pitest.processResultsFromQueue(Pitest.java:114)
        at org.pitest.Pitest.run(Pitest.java:77)
        at org.pitest.Pitest.run(Pitest.java:64)
        at org.pitest.mutationtest.MutationCoverage.runReport(MutationCoverage.java:164)
        at org.pitest.mutationtest.MutationCoverage.run(MutationCoverage.java:93)
        at org.pitest.mutationtest.MutationCoverageReport.runReport(MutationCoverageReport.java:95)
        at org.pitest.mutationtest.MutationCoverageReport.main(MutationCoverageReport.java:52)

"VM Thread" prio=10 tid=0x0000000009f5a000 nid=0x643c runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000009ee2800 nid=0x643a runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000009ee4800 nid=0x643b runnable

"VM Periodic Task Thread" prio=10 tid=0x0000000009fd1000 nid=0x6443 waiting on condition

JNI global references: 188


henry

unread,
Dec 27, 2012, 7:08:14 AM12/27/12
to pitu...@googlegroups.com
Hi Simon,

Thanks for the report.

I'll take a look at making the inter process co-ordination a little more robust, the code in this area is in desperate need of a cleanup anyway. As for the underlying issue . . for some reason the slave JVM is unable to read the temporary jar file that PIT generates. The log suggests that another JVM that PIT launched earlier was able to read it just fine.

Can you confirm if the file /tmp/1356562714264015241973089113678414702599088100345.jar is still present? If if is not can you confirm what distro you're running, and when it's set to clean files out of tmp. I think some distros clean files via a cron job, so it's possible it was removed mid run.

If it doesn't look like the temp file is being deleted, can you try running with the tests that launch those infinite loops disabled? I haven't got a coherent theory as to why these would cause a problem, but it would be good to be able to rule them out.

Thanks

Henry

S J Kissane

unread,
Feb 4, 2013, 7:13:58 AM2/4/13
to pitu...@googlegroups.com
Hi Henry

I worked it out in the end. The program being tested creates lots of temporary files. It tries to delete them, but on Windows sometimes it can't, because it or another process has them open. As a result, I kept on running out of disk space, and had countless thousands of files in my temporary directory (which took forever to delete.) So I added some code which runs at start-up, which scans the java.io.tmpdir for old files and deletes them all. It knows what files it should delete due to the particular naming conventions I used for temporary files. And that problem was solved. (The code is really only needed on Windows, but runs on all platforms anyway.)

But what happens next: I had tests in my test suite which tested this functionality. And I was blindly pitesting my entire test suite. So, pitest gets to these "delete old temp file" tests, and starts to mutate my old file deletion code. And it mutates it, from deleting only files in java.io.tmpdir which meet my naming conventions, to deleting all files in java.io.tmpdir - it mutated the "should this file be deleted?" test to always true. And then it goes ahead and deletes the temp files that pitest itself needs (the agent JAR.) Now, at the same time, pitest is in process of starting a slave. It goes to start the slave, and the slave JVM can't find the agent JAR, and exits. And then it hits the bug in pitest I mentioned, where it hangs if the slave JVM startup fails.

So, what I need to do to fix this as far as I am concerned: exclude the temp file cleanup tests from being tested with pitest, and/or tell pitest not to mutate the classes/methods involved in temp file cleanup. Moral of the story: be careful about mutation testing any code which does dangerous things like delete files. Who knows what the mutated code will end up deleting! I'm lucky it wasn't mutated into deleting something more important.

(Maybe it would make sense for pitest to run the code under a security manager, to prevent the mutated code from doing anything truly dangerous, like deleting files it should not be deleting? One could declare to the security manager "only allow modifications in these directories", and then if mutated code tried to go outside those bounds, it would be blocked....)

Simon

Henry Coles

unread,
Feb 4, 2013, 8:23:16 AM2/4/13
to pitu...@googlegroups.com
Hi Simon,

Thanks for the update.

The security manager suggestion is a good one - I think the apache lucene team is doing something similar.

Ignoring the possibilities of file system disaster, it's not a good idea to run tests that exercise code performing un mocked file IO (or any other io) through PIT. Any tests exercising file IO are by definition not fully repeatable, so any mutation test results might be inaccurate. i.e if the first time PIT runs a test the disk is not full, a file is not locked etc but is when PIT runs the test again with a mutation present then the mutation may be incorrectly flagged as killed.

Can you raise an issue for the security manager? I'll take a look when I get chance to work on PIT again.

The other side of this that I have been considering for the next release is some sort of "test lint" phase to check the quality of tests prior to mutation testing. This would primarily be looking for any order dependencies in the tests, but could also perhaps detect and flag up file system writes etc.

Thanks

Henry

--
You received this message because you are subscribed to the Google Groups "PIT Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pitusers+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages