Executable task hangs even after it's done

53 views
Skip to first unread message

Christian List

unread,
Nov 11, 2009, 10:03:25 AM11/11/09
to ccnet...@googlegroups.com
We are experiencing an intermittent hang problem when we use the ExecutableTask to run a .bat file, both in the Task section and in the PreBuild section of the ccnet.config.
The last line in our .bat file is EXIT 0, and we can see in the log that this line is executed.
But the task just hangs there and eventually times out.

In the debugger we can see that the thread appears to be waiting in the ProcessExecutor.cs at this line:

hasExited = WaitHandle.WaitAll(new WaitHandle[] { errorStreamClosed, outputStreamClosed, processExited }, processInfo.TimeOut, true);

We are guessing that maybe one of the events errorStreamClosed, outputStreamClosed, or processExited is not happening correctly, but how do we tell if they happened or not?
As stated above, we are fairly certain that the processExited event happened, since we can see EXIT 0 in the log, but what about the other 2 events?

This error seems to happen more frequently when we have several tasks running at the same time.
We almost never see it when we run 2 tasks at the same time, but about every other time when we run 10 tasks at the same time.

We have tried this both on 1.4.4 and 1.5.0 and the problem is the same on both versions.

This problem is preventing us from rolling out CC within our organisation, so any help on this issue would be greatly appreciated.
 
 

David Cameron

unread,
Nov 16, 2009, 7:13:59 PM11/16/09
to ccnet...@googlegroups.com
Hi Christian

I believe the code to be threadsafe, but it seems you might have uncovered a problem by having several tasks running. I'll add some TRACE level logging to the events themselves to see which ones do and do not fire. Which version of CC.Net would you prefer to debug with, 1.4.4 or 1.5.0.

I can also try to reproduce this locally. Are all of the tasks you are using in parallel <exec> tasks, or is it a mix of different types?

Christian List

unread,
Nov 16, 2009, 8:01:57 PM11/16/09
to ccnet...@googlegroups.com
Hi Dave,

We'd prefer to debug 1.5.0 if you have some ideas, we can try it out.

They are <exec> tasks in several projects running parallel (i.e. I'm not using the new ability to run several tasks parallel within a project).
Since my last post, we also tried to use the <powershell> task, and it has the same problem.

Christian List

2009/11/16 David Cameron <dav...@gmail.com>

David Cameron

unread,
Nov 16, 2009, 9:03:22 PM11/16/09
to ccnet...@googlegroups.com
Hi Christian

Good to hear this does not involve <parallel>. It likely does not have a problem, but it is a slightly easier debugging situation if it is not involved.

I've just committed some DEBUG level logging for the events that cc.net watches to see if the process is complete: the process exited event, the standard-output close event and the standard-error close event. A new build of 1.5 with the extra logging should be available in a moment from ccnetlive.

While setting up the test, a possible issue with your build occured to me. If your batch file starts other processes which do not exit, then the batch process will not appear to exit either. I think you can use "cmd /c" from your batch file to have the other processes start as "top-level" processes, rather than as "children" of your batch file. If there are child processes, the entire process tree has to exit before cc.net will recognize that the <exec> task is complete. Are the blocking batch files starting other long-running processes?

I also noticed while rummaging around the ProcessExecutor class that the allied ProcessMonitor class probably won't handle <parallel> correctly. I'll add a JIRA issue for that.

Christian List

unread,
Nov 17, 2009, 10:20:49 AM11/17/09
to ccnet...@googlegroups.com
Hi Dave,
 
We added a few more debug statements in ProcessExecutor.cs and prefixed our special ones with TDM.
Here's the relevant sections of a log file from yesterday when we reproduced the problem.
The two projects Modeling1A and Basic seemed to start at almost the same time:
 
2009-11-16 14:35:55,449 [2007.4 Plant EUS - Run QTP - Modeling1A:DEBUG] Starting process [C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe] in working directory [D:\Dispatcher\Tools] with arguments [D:\Dispatcher\Tools\RunTask.ps1]
2009-11-16 14:35:55,464 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] Starting process [C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe] in working directory [D:\Dispatcher\Tools] with arguments [D:\Dispatcher\Tools\RunTask.ps1]
2009-11-16 14:35:55,480 [2007.4 Plant EUS - Run QTP - Modeling1A:DEBUG] TDM:Calling process.WaitHandle.WaitAll
2009-11-16 14:35:55,495 [2007.4 Plant EUS - Run QTP - Modeling1A:DEBUG] TDM:    errorStreamClosed Handle:2576
2009-11-16 14:35:55,480 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] TDM:Calling process.WaitHandle.WaitAll
2009-11-16 14:35:55,511 [2007.4 Plant EUS - Run QTP - Modeling1A:DEBUG] TDM:    outputStreamClosed Handle:3076
2009-11-16 14:35:55,527 [2007.4 Plant EUS - Run QTP - Modeling1A:DEBUG] TDM:    processExited Handle:2656
2009-11-16 14:35:55,511 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] TDM:    errorStreamClosed Handle:2680
2009-11-16 14:35:55,542 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] TDM:    outputStreamClosed Handle:2792
2009-11-16 14:35:55,542 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] TDM:    processExited Handle:3660
 
Then about an hour later, the Basic project was done and exited:
 
2009-11-16 15:42:30,785 [60:DEBUG] [2007.4 Plant EUS - Run QTP - Basic C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe] Done running task @ 11/16/2009 15:42:30
2009-11-16 15:42:30,785 [26:DEBUG] TDM: ExitedHandler calling processExited.Set() Handle:3660
 
But notice that it didn't close the errorStream or outputStream.
 
After 4.5 hours the Modeling1A project was finally done and exited:
 
2009-11-16 19:00:12,183 [29:DEBUG] [2007.4 Plant EUS - Run QTP - Modeling1A C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe] Done running task @ 11/16/2009 19:00:12
2009-11-16 19:00:12,198 [50:DEBUG] TDM: CollectOutput calling System.Threading.ManualResetEvent.Set() Handle:2680
2009-11-16 19:00:12,198 [49:DEBUG] TDM: CollectOutput calling System.Threading.ManualResetEvent.Set() Handle:2576
2009-11-16 19:00:12,198 [60:DEBUG] TDM: CollectOutput calling System.Threading.ManualResetEvent.Set() Handle:2792
2009-11-16 19:00:12,198 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] process.WaitHandle.WaitAll returned True
2009-11-16 19:00:12,198 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] TDM:    errorStreamClosed Handle:2680
2009-11-16 19:00:12,198 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] TDM:    outputStreamClosed Handle:2792
2009-11-16 19:00:12,198 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] TDM:    processExited Handle:3660
2009-11-16 19:00:12,198 [29:DEBUG] TDM: CollectOutput calling System.Threading.ManualResetEvent.Set() Handle:3076
2009-11-16 19:00:12,198 [26:DEBUG] TDM: ExitedHandler calling processExited.Set() Handle:2656
2009-11-16 19:00:12,198 [2007.4 Plant EUS - Run QTP - Modeling1A:DEBUG] process.WaitHandle.WaitAll returned True
2009-11-16 19:00:12,198 [2007.4 Plant EUS - Run QTP - Modeling1A:DEBUG] TDM:    errorStreamClosed Handle:2576
2009-11-16 19:00:12,214 [2007.4 Plant EUS - Run QTP - Modeling1A:DEBUG] TDM:    outputStreamClosed Handle:3076
2009-11-16 19:00:12,214 [2007.4 Plant EUS - Run QTP - Modeling1A:DEBUG] TDM:    processExited Handle:2656
2009-11-16 19:00:12,245 [2007.4 Plant EUS - Run QTP - Basic:INFO] Integration complete: Success - 11/16/2009 7:00:12 PM
2009-11-16 19:00:12,245 [2007.4 Plant EUS - Run QTP - Modeling1A:INFO] Integration complete: Success - 11/16/2009 7:00:12 PM
 
Notice how the CollectOutput is getting called 4 times and is triggering errorStreamClosed and outputStreamClosed for both Modeling1A and Basic.
 
Eventually both projects are successfull, but the Basic project is being tied up for 3.5 hours longer than necessary.
Christian

David Cameron

unread,
Nov 17, 2009, 5:09:17 PM11/17/09
to ccnet...@googlegroups.com
Hi Christian

I see from the first two log lines that both builds are calling the same "D:\Dispatcher\Tools\RunTask.ps1" script. Do they perhaps end up running the same process? It seems possible that the powershell process that Basic starts is waiting for some signal from Modeling1A before it exits.

What is the Handle:#### included in your new log lines?

This also seems odd, errorStreamClosed is called twice with the same Handle, 4.5 hours apart:
2009-11-16 14:35:55,511 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] TDM:    errorStreamClosed Handle:2680
2009-11-16 19:00:12,198 [2007.4 Plant EUS - Run QTP - Basic:DEBUG] TDM:    errorStreamClosed Handle:2680

Would you mind attaching a patch with your extra logging lines? That would give me the clearest picture about what the lines mean.

David Cameron

unread,
Nov 17, 2009, 5:20:37 PM11/17/09
to ccnet...@googlegroups.com
Hello again

Just thought of something else while looking at this and eating breakfast. It is probably worth using ProcessExplorer's tree view while the build is in the "hanged" state to see what the process tree looks like. ProcessExplorer is one of the SysInternals tools from Microsoft: http://technet.microsoft.com/en-us/sysinternals/bb842062.aspx

The question would be, are both powershell processes still displayed there? And further, which children do they each have.

Christian List

unread,
Nov 18, 2009, 3:46:32 PM11/18/09
to ccnet...@googlegroups.com
Dave,
 
I attached a patch to this mail, with the changes we made to the logging.
The patch also contains a proposal for a fix in the StartProcess function. We've run it once with this fix, and we didn't see the problem, we'll run it a few more times to make sure we don't see it again.
 
This fix came about after reading the section about Thread Safety for the Process Class here http://msdn.microsoft.com/en-us/library/system.diagnostics.process.aspx
MSDN states: "Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe."
It appears that CC is using Process as an instance member, and it therefore is not thread safe?
The attached patch is an attempt to make CC more thread safe.
 
Christian
2009/11/17 David Cameron <dav...@gmail.com>
ProcessExecutor.patch

Christian List

unread,
Nov 19, 2009, 9:35:31 AM11/19/09
to ccnet...@googlegroups.com
Dave,
 
Bad news, we are still seeing the bug. Even after using the fix I proposed in my last email.
 
We'll try to use ProcessExplorer next time we see the bug and we are present at the same time. But we run this thing mostly over night, so it could be a while before we get the chance.
 
Any other suggestions are welcome!
 
Christian

2009/11/18 Christian List <christi...@gmail.com>

David Cameron

unread,
Nov 19, 2009, 6:29:54 PM11/19/09
to ccnet...@googlegroups.com
Hello again

On Fri, Nov 20, 2009 at 1:35 AM, Christian List <christi...@gmail.com> wrote:
Bad news, we are still seeing the bug. Even after using the fix I proposed in my last email.

In a way that is almost a relief, as I couldn't see how the change would fix the bug. And bug fixes I don't understand make me at least as nervous as bugs I don't understand. 
 
We'll try to use ProcessExplorer next time we see the bug and we are present at the same time. But we run this thing mostly over night, so it could be a while before we get the chance.

Debugging process problems like this is always a bit of a pain. For a while, I've been thinking it would be nice to have a "process tree" as part of the build output. It could show all the processes executed as part of the build and a few details about them and their child processes. The child processes part will be a problem though: the .net APIs don't seem to give complete information about parent/child relationships. Perhaps I'll start some experiments with this. I think it will be especially useful with the new <parallel> task.

You may be able to do similar debugging without running it through cc.net. Opening up two command windows and executing the same powershell command ([D:\Dispatcher\Tools\RunTask.ps1]) from both should give you a similar set up. Then run process explorer and see what is happening. You might need to engage the developers of RunTask.ps1. I'm not sure if that is you as well, or if you are on a separate "build team"?

Let me know if there is any other way I can help. I'll do some experiments on the weekend to understand in what circumstances the Exited()[1] event would fire so far before the ErrorDataReceived(null)[2] and OutputDataReceived(null)[3] calls.

David Cameron

unread,
Nov 22, 2009, 9:13:10 AM11/22/09
to ccnet...@googlegroups.com
I just did some experiments with batch files to try to reproduce this. Various combos of call, cmd /c, cmd /k and plain file invocations couldn't produce the separation between the Exited and *DataReceived(null) events that we can see in your logs. My next guess, and it is only a guess, is that some Powershell behaviour produces this. I assume there is no more output for the first build after the command appears to complete?

Any developments on your end?

Christian List

unread,
Nov 24, 2009, 11:24:24 AM11/24/09
to ccnet...@googlegroups.com
We can reproduce it both with an <exec> task and a <powershell> task, so I doubt that it's anything particular to Powershell. There is no more output for the first build after the command appears to complete.
 
We've tried looking at ProcessExplorer while the problem was happening, but everything looks okay as far as we can see.
 
We've also tried to reproduce it in a smaller test case that runs faster (and that we can send to you for reproducing), but we have been unable to do that.
This makes me think if somehow the long running process has something to do with it?

Christian
 
2009/11/22 David Cameron <dav...@gmail.com>

David Cameron

unread,
Nov 24, 2009, 4:05:44 PM11/24/09
to ccnet...@googlegroups.com
Thanks for the additional info Christian. If you use a batch file, do you still invoke the "D:\Dispatcher\Tools\RunTask.ps1" script?

We could add a limit to the amount of time that ProcessExecutor will wait for additional input, and then consider the process closed despite not seeing the output and error channels closed. What do you think of that as a solution?

ProcessExplorer did not show any relationship or link between the two different RunTask invocations? Or did it not even show the second RunTask (or child processes) at all?

I think it would be valuable to understand more about what the script does. Is it something you can follow up on locally or send to me off-list?

Christian List

unread,
Nov 26, 2009, 1:00:55 PM11/26/09
to ccnet...@googlegroups.com
Dave,

We have a RunTask.bat that we use with <exec> task which essentially does the same as RunTask.ps1.

Adding a time limit that ProcessExecutor waits for additional input sounds like something that would solve our problem at least.
If you have an idea how to do this, then we can test it for you.

When we tried ProcessExplorer, we ran 10 tasks at the same time. We could see the bug was happening in the CC log file, but ProcessExplorer showed exactly 10 processes and each one had exactly 1 child process. I believe this is exactly what it is supposed to show?

I'll see what I can do about sending you the script after thanksgiving holiday.

Christian

2009/11/24 David Cameron <dav...@gmail.com>
Reply all
Reply to author
Forward
0 new messages