"SEVERE : Error calculating coverage. Process will exit."

Esko Luontola

unread,

Apr 10, 2012, 3:39:37 AM4/10/12

to pitu...@googlegroups.com

Recently I've been getting the following failure on one of our CI servers (Linux). On my desktop (Windows) there are no problems, and also on our other CI servers (Linux) it works. When that error happens, the build stops and waits indefinitely.

I'm guessing that some test in that module is problematic, or then it's a combination of some environmental issues. Running mutationCoverage for other (smaller) modules works fine - the problematic module has the most tests and code. Also the problematic CI server is slower than our other CI servers.

Do you have ideas that what the problem could be? Could it be a timing issue? How to debug it and find the real reason, because the console output shows only an "Connection refused" exception and nothing else?

[INFO] --- pitest-maven:0.26:mutationCoverage (default) @ batch-scheduler ---

10:20:58 AM PIT >> INFO : Mutating from .../batch-scheduler/target/classes

Running report with ReportOptions [isValid=, classesInScope=[^fi\.nls\.batch\..*$], targetClasses=[^fi\.nls\.batch\..*$], reportDir=.../batch-scheduler/target/pit-reports, sourceDirs=[.../batch-scheduler/src/main/java, .../batch-scheduler/src/test/java], classPathElements=[...], mutators=[INVERT_NEGS_MUTATOR, RETURN_VALS_MUTATOR, MATH_MUTATOR, VOID_METHOD_CALL_MUTATOR, NEGATE_CONDITIONALS_MUTATOR, CONDITIONALS_BOUNDARY_MUTATOR, INCREMENTS_MUTATOR, NON_VOID_METHOD_CALL_MUTATOR, CONSTRUCTOR_CALL_MUTATOR, EXPERIMENTAL_INLINE_CONSTANT_MUTATOR, INLINE_CONSTANT_MUTATOR, EXPERIMENTAL_MEMBER_VARIABLE_MUTATOR], dependencyAnalysisMaxDistance=-1, mutateStaticInitializers=true, showHelp=, jvmArgs=[], numberOfThreads=2, timeoutFactor=1.25, timeoutConstant=3000, targetTests=[], loggingClasses=[]]

10:21:02 AM PIT >> INFO : Found 74 classes that might define tests

stderr : 10:21:03 AM PIT >> SEVERE : Error calculating coverage. Process will exit.

java.net.ConnectException: Connection refused

at java.net.PlainSocketImpl.socketConnect(Native Method)

at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)

at java.net

stderr : .PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)

at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)

at java.net.Socket.connect(Socket.java:525)

at java.net.Socket.conne

stderr : ct(Socket.java:475)

at java.net.Socket.<init>(Socket.java:372)

at java.net.Socket.<init>(Socket.java:186)

at org.pitest.coverage.execute.CoverageSlave.main(CoverageSlave.java:61)

java.net.ConnectException: Connection refused

at java.net.PlainSocketImpl

stderr : .socketConnect(Native Method)

at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)

at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)

at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)

at java.net.SocksSocketIm

stderr : pl.connect(SocksSocketImpl.java:366)

at java.net.Socket.connect(Socket.java:525)

at java.net.Socket.connect(Socket.java:475)

at java.net.Socket.<init>(Socket.java:372)

at java.net.Socket.<init>(Socket.java:186)

at org.pitest.coverage.execute.CoverageS

stderr : lave.main(CoverageSlave.java:61)

Esko Luontola

unread,

Apr 10, 2012, 4:13:05 AM4/10/12

to pitu...@googlegroups.com

The problem has now happened at least once also in another module, which contains no tests. The server is the same as above.

11:07:14 AM PIT >> INFO : Found 0 classes that might define tests
stderr  : 11:07:15 AM PIT >> SEVERE : Error calculating coverage. Process will exit.
java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
	at java.net

henry

unread,

Apr 10, 2012, 7:28:11 AM4/10/12

to pitu...@googlegroups.com

Hi Esko,

I take it the problem is intermittent on the machines it occurs on?

If so I think your guess of a timing issue is probably correct - looking
at the code it launches the child process before creating the socket it
will listen on.

The error is occurring before any of your tests are run - as soon as the
child process is launched, so the only possible candidates are
environmental problems unrelated to your code or an issue in PIT itself.
If it's environmental the only likely cause that comes to mind would be
a firewall or similar running on the machine.

I'm aware of an issue in this area of the code - after finding a socket
to listen on, PIT releases the socket before creating the listening
process. So it is possible that another process could grab it before the
child is launched. I don't think that this would give the sypmtoms you
are seeing however.

I'll take a quick look now . . .

Henry

henry

unread,

Apr 10, 2012, 7:41:47 AM4/10/12

to pitu...@googlegroups.com

Whether or not it's the cause of your issue, there is definitely a
potential timing problem here.

I'll publish a snapshot with a small change that may help in a couple of
minutes (note this snapshot will also remove the inscopeclasses
parameter).

I'll look at a robust fix later this week.

Cheers

Henry

henry

unread,

Apr 12, 2012, 4:21:33 PM4/12/12

to pitu...@googlegroups.com

Hi Esko,

I pushed out a snapshot with what I believe should be a robust fix for the identified race last night. Does this resolve things for you?

Esko Luontola

unread,

Apr 13, 2012, 8:37:26 AM4/13/12

to pitu...@googlegroups.com

On Thursday, 12 April 2012 23:21:33 UTC+3, henry wrote:

I pushed out a snapshot with what I believe should be a robust fix for the identified race last night. Does this resolve things for you?

The same problem has not yet appeared, but I'm getting quite many exceptions like these. Is this dangerous?

2:07:48 PM PIT >> INFO : Running mutation populateItem : 34 -> removed call to org/apache/wicket/AttributeModifier::<init> (4)

2:07:48 PM PIT >> INFO : 25 relevant test for populateItem

stderr : 2:07:49 PM PIT >> INFO : Running 1 units

java.io.EOFException

at java.io.DataInputStream.readByte(DataInputStream.java:250)

at org.pitest.util.SafeDataInputStream.readByte(SafeDataInputStream.java:65)

at org.pitest.util.SocketReadingCallable.receiveResults(SocketReadingCallable.java:66)

at org.pitest.util.SocketReadingCallable.call(SocketReadingCallable.java:36)

at org.pitest.util.SocketReadingCallable.call(SocketReadingCallable.java:1)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at java.lang.Thread.run(Thread.java:619)

java.util.concurrent.ExecutionException: org.pitest.PitError: null

Please copy and paste the information and the complete stacktrace below when reporting an issue

VM : Java HotSpot(TM) 64-Bit Server VM

Vendor : Sun Microsystems Inc.

Version : 16.0-b13

Uptime : 1120308

Input ->

1 : -XX:MaxPermSize=512m

2 : -Dclassworlds.conf=/home/hudson/tools/maven-3.0.2/bin/m2.conf

3 : -Dmaven.home=/home/hudson/tools/maven-3.0.2

BootClassPathSupported : true

at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)

at java.util.concurrent.FutureTask.get(FutureTask.java:83)

at org.pitest.util.CommunicationThread.waitToFinish(CommunicationThread.java:55)

at org.pitest.mutationtest.execute.MutationTestProcess.waitToDie(MutationTestProcess.java:49)

at org.pitest.mutationtest.instrument.MutationTestUnit.waitForSlaveToDie(MutationTestUnit.java:164)

at org.pitest.mutationtest.instrument.MutationTestUnit.runTestInSeperateProcessForMutationRange(MutationTestUnit.java:136)

at org.pitest.mutationtest.instrument.MutationTestUnit.runTestsInSeperateProcess(MutationTestUnit.java:219)

at org.pitest.mutationtest.instrument.MutationTestUnit.runTestsForMutations(MutationTestUnit.java:112)

at org.pitest.mutationtest.instrument.MutationTestUnit.runTests(MutationTestUnit.java:93)

at org.pitest.mutationtest.instrument.MutationTestUnit.execute(MutationTestUnit.java:82)

at org.pitest.internal.TestUnitExecutor.run(TestUnitExecutor.java:40)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

Caused by: org.pitest.PitError: null

Please copy and paste the information and the complete stacktrace below when reporting an issue

VM : Java HotSpot(TM) 64-Bit Server VM

Vendor : Sun Microsystems Inc.

Version : 16.0-b13

Uptime : 1120308

Input ->

1 : -XX:MaxPermSize=512m

2 : -Dclassworlds.conf=/home/hudson/tools/maven-3.0.2/bin/m2.conf

3 : -Dmaven.home=/home/hudson/tools/maven-3.0.2

BootClassPathSupported : true

at org.pitest.util.Unchecked.translateCheckedException(Unchecked.java:22)

at org.pitest.util.SafeDataInputStream.readByte(SafeDataInputStream.java:68)

at org.pitest.util.SocketReadingCallable.receiveResults(SocketReadingCallable.java:66)

at org.pitest.util.SocketReadingCallable.call(SocketReadingCallable.java:36)

at org.pitest.util.SocketReadingCallable.call(SocketReadingCallable.java:1)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

... 1 more

Caused by: java.io.EOFException

at java.io.DataInputStream.readByte(DataInputStream.java:250)

at org.pitest.util.SafeDataInputStream.readByte(SafeDataInputStream.java:65)

... 6 more

2:07:51 PM PIT >> WARNING : Slave encountered error or timeout

henry

unread,

Apr 13, 2012, 9:00:53 AM4/13/12

to pitu...@googlegroups.com

If you are only seeing these errors after a mutation has timed out, then this may be the issue I was about to investigate before I picked up the race. This was introduced in 0.26 when the way that the slave processes are terminated was changed - a thread in the main process is sometimes still trying to read from a socket that has now been terminated by the death of the child.

If it is the same issue then it is noisy, but harmless. The signature in the log is different from what I was about to look at, but the code in this area has now changed.

I'll take a look this weekend to confirm.

Thanks

Henry

henry

unread,

Apr 13, 2012, 4:56:15 PM4/13/12

to pitu...@googlegroups.com

Looking at this again I think it is something different.

It looks like the child process is dieing unexpectedly. I don't suppose there are any System.exit calls in the code?

Are you able to send through a complete log? Is this an open source code base?

Thanks

Henry

Esko Luontola

unread,

Apr 13, 2012, 5:51:21 PM4/13/12

to pitu...@googlegroups.com

On Friday, 13 April 2012 23:56:15 UTC+3, henry wrote:

Are you able to send through a complete log? Is this an open source code base?

I'll mail the log to you privately on Monday. It's closed source.

Esko Luontola

unread,

Apr 16, 2012, 9:05:53 AM4/16/12

to pitu...@googlegroups.com

I think I found why the build is crawling at very slow speeds - something leaves behind multiple PIT Stream Monitor threads which hog the CPU. I've attached a screenshot and thread dump. I ran that build using only one PIT thread.

Normally the PIT Stream Monitor threads exit when they are not anymore used, as can be seen from the bottom of the sceenshot, but sometimes the monitor threads don't exit. You can see multiple such threads in the top and near the bottom of the sceenshot. When I took this screenshot and thread dump, that process was hogging 100% of my 8 core CPU.

stale pit stream monitors.png

stale pit stream monitors.txt

Esko Luontola

unread,

Apr 16, 2012, 9:48:30 AM4/16/12

to pitu...@googlegroups.com

I kept an eye on the threads and the console output, and was able to find out that the Connection reset exception is related to the birth of those CPU hogging threads.

I've attached a screenshot showing three instances of those PIT Stream Monitor threads continuing running even after the pit communication thread stops. I've also attached one of the stack traces which is was printed when that happened. I'll mail the full log to you privately.

birth of hogger threads - part 1.txt

birth of hogger threads.png

Esko Luontola

unread,

Apr 16, 2012, 12:59:34 PM4/16/12

to pitu...@googlegroups.com

Continuing debugging... I've found out that the 100% CPU usage happens inside the native method java.io.FileInputStream.readBytes(). I've posted a question at Stack Overflow for this: http://stackoverflow.com/questions/10178162/infinite-100-cpu-usage-at-java-io-fileinputstream-readbytesnative-method

henry

unread,

Apr 17, 2012, 8:27:16 AM4/17/12

to pitu...@googlegroups.com

Ok, I have now managed to replicate the 100% CPU problem by forcing a child JVM crash using

        Object[] o = null;

        while (true) {
            o = new Object[] {o};
        }

henry

unread,

Apr 17, 2012, 8:38:22 AM4/17/12

to pitu...@googlegroups.com

Checking if the stream is available before reading appears to fix the 100% cpu issue. I'm not sure if this will be robust however - what happens if the child process crashes while actually reading data?

henry

unread,

Apr 17, 2012, 8:56:45 AM4/17/12

to pitu...@googlegroups.com

A new snapshot has now been pushed out with a change to how the streams
are read from the slave processes.

This appears to resolve the 100% cpu issue, but the fix may have some
performance implications. It does not address the underlying issue of
the child JVM crashing.

T Teswar

unread,

Jun 14, 2019, 1:24:35 PM6/14/19

to PIT Users

Hi Esko,

I'm facing the same problem. My build slaves have 6 Virtual processors but when running pitest the CPU utilization shows 100% and it is exited with the error "Minion exited abnormally due to MEMORY_ERROR". I have 32GB memory in machines. I keep threads as 4. Is it correct or Do I need to add any other parameters?

Reply all

Reply to author

Forward