Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tcl [exit] hanging?

318 views
Skip to first unread message

Jeff Godfrey

unread,
Apr 9, 2009, 10:31:54 AM4/9/09
to
Hi All,

I just spoke with a customer who's running a Tcl-based console
application (from a batch file under Windows). The app is an EXE
created using TclApp.

The termination code in the application looks like this:

proc appExit {} {
... #(some cleanup)
... #(some cleanup)

trace "\nExit status --> $retVal"
exit $retVal
}

The program is called (via a BAT file) hundreds of times per night.
According to the customer, every so often, the process does not exit
properly.

He's activated a debugging flag in the app, which causes the above
[trace] call to write messages both to a file and to the CMD terminal
itself. When the process runs normally, the CMD terminal closes just
after the program exits via the above code.

Now, the interesting part. When it hangs, he's greeted with a
still-open CMD terminal, containing the above trace message, like:

Exit status --> 0

Looking in Windows task manager, the application is still running and
must be forcibly killed.

So, what could possibly cause the exit message to be printed to the
console, but not allow the application to successfully exit via a simple
call to [exit]? I assume there must be some *external* forces at work
here, but I'm at a loss as to how debug this.

Thanks for any input.

Jeff Godfrey

Jeff Godfrey

unread,
Apr 9, 2009, 10:41:43 AM4/9/09
to
Jeff Godfrey wrote:

> So, what could possibly cause the exit message to be printed to the
console, but not allow the application to successfully exit via a simple
call to [exit]? I assume there must be some *external* forces at work
here, but I'm at a loss as to how debug this.
>
> Thanks for any input.
>
> Jeff Godfrey

A little more info...

- The app was built against Tcl 8.4.9
- It does not use any sockets
- It does use TclODBC
- It does run some external commands via code like:

if { [catch {open "|$cmd"} id] } {
log "MyExe Error - $id"
return 1
}

Does any of the above provide a clue? Maybe one of the external
commands is not exiting properly?

Jeff

Jeff Godfrey

unread,
Apr 9, 2009, 10:54:07 AM4/9/09
to

One more thing. According to the customer, rerunning the application
with the same data that caused the original hang generally will not
cause it again. So, it's very sporadic and can't be reproduced on demand.

Jeff

Jeff Godfrey

unread,
Apr 9, 2009, 10:55:41 AM4/9/09
to

One more thing. According to the customer, rerunning the application

George Peter Staplin

unread,
Apr 9, 2009, 3:27:37 PM4/9/09
to
Jeff Godfrey wrote:

It sounds like it could be a thread race during finalization. Make sure you
join/exit any threads that are running other than the main thread before
exit.

The Tcl threading layer needs some work, and more money/time invested into
it, because if you try to exit in many cases with more than 1 thread
active, Tcl will cleanup the resources used by other threads, and cause a
segfault. I wrote a patch to work around that bug, and it's in the
tracker, but I'm not sure about applying and committing it.

The Tcl thread support is also doing strange things with the thread startup
function, and thereby ignoring the void * return type of the startup
routine. The thread-specific data layer in Tcl is also not ideal, but it
seems to usually work with most common architectures. If it was possible,
I would rewrite Tcl to not use POSIX threads. We don't really have another
common layer though, and some people use threads. If every platform
provided a fork() it might even be possible. As it is several systems
actually make fork() return an error when used with more than 1 thread
active, and some system libraries use threads, so you're kind of dead in
the water if you want portability, and concurrency, and wish to avoid
threads.

-George

Jeff Godfrey

unread,
Apr 9, 2009, 4:13:51 PM4/9/09
to
George Peter Staplin wrote:
> Jeff Godfrey wrote:
>
>> Jeff Godfrey wrote:
>> > Jeff Godfrey wrote:
>> >
>> > > So, what could possibly cause the exit message to be printed to the
>> > console, but not allow the application to successfully exit via a
>> > simple
>> > call to [exit]? I assume there must be some *external* forces at work
>> > here, but I'm at a loss as to how debug this.
>
> It sounds like it could be a thread race during finalization. Make sure you
> join/exit any threads that are running other than the main thread before
> exit.

Hi George,

Thanks for the input. My application code is definitely not using any
threads. Looking at the packages it uses, it's fairly simple:

starkit
inifile
tclodbc
csv

This app hasn't been built in several years, and I don't even know where
the basekit it uses came from (could have been from Equi4). I don't
even know if it's a thread-enabled basekit or not. I still have the
basekit file, can I somehow tell interactively?

Any other ideas appreciated.

Jeff

George Peter Staplin

unread,
Apr 9, 2009, 4:36:19 PM4/9/09
to
Jeff Godfrey wrote:

I have one more idea. It may be a long shot. I noticed you mentioned the
pipe usage with [open |]. I have seen pipes hang on [close] in some apps,
including mine on exit. In some cases it can lead to a deadlock. It's a
feature of [close]. [close] tries to flush any pending outgoing data
before a close, and it wouldn't surprise me at all if the Tcl channel layer
is doing the same on finalization. Does it behave any differently if you:
fconfigure $chan -blocking 0 all of the pipes before exit?

If that does resolve the issue, then some process is probably leaving a pipe
full of readable data, or not reading fully, so your parent process is
actually trying to write() due to the [close], but it blocks forever,
because the child isn't reading. The typical amount you can write to a
pipe in a blocking mode before a write() will block, is about 4096 bytes
(see also PIPE_BUF in C). In non-blocking a puts/write() will actually
make the write return an error with a specific EAGAIN errno.

If that doesn't lead somewhere I would try to trace the syscalls, or use gdb
to step through the code via a break on Tcl_Exit. You could use the MinGW
package to do so in Windows.

-George

Jeff Godfrey

unread,
Apr 9, 2009, 4:47:46 PM4/9/09
to


George,

Thanks for the ideas. I'll see what I can do with them. The main
problem is that the hang only happens rarely, and can't be reproduced
reliably. To further complicate matters, the application is running on
a production system and requires about a week's lead time and a whole
stack of paper work in order to even *touch*. So, I can't just make
some willy-nilly changes and try them... ;^)

There is a "test" system available that I'm more free to experiment
with, though so far, the hang has not been replicated there.

Anyway, thanks again.

Jeff

rocket777

unread,
Apr 9, 2009, 5:13:40 PM4/9/09
to
> but I'm at a loss as to how debug this.

you might want to get sysinternals procexp to have available when the
process hangs. You can look at lots more things than with task
manager.

http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx


Is the exit status required? If you can kill the process, then maybe a
temporary workaround could be to use [twapi::end_process [pid]]. I've
added the twapi package to all my starkits on windows by just
downloading the extension and plopping it into the lib directory for
my starkits.

http://twapi.magicsplat.com/

There's ton's of stuff in twapi that you might find useful here.


Just curious, I thought [trace] took an option as the first arg, like
[trace variable] etc. I'm not sure I understand what the mentioned
trace command does - unless that was a typo.

rocket777

unread,
Apr 9, 2009, 5:29:40 PM4/9/09
to
Just noticed, that twapii::end_process also allows for an optional
exit code too!!

http://twapi.magicsplat.com/process.html#end_process

Jeff Godfrey

unread,
Apr 9, 2009, 6:01:24 PM4/9/09
to
rocket777 wrote:
>> but I'm at a loss as to how debug this.
>
> you might want to get sysinternals procexp to have available when the
> process hangs. You can look at lots more things than with task
> manager.
>
> http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

Good idea. I'll see if the customer can grab that.

> Is the exit status required? If you can kill the process, then maybe a
> temporary workaround could be to use [twapi::end_process [pid]]. I've
> added the twapi package to all my starkits on windows by just
> downloading the extension and plopping it into the lib directory for
> my starkits.

Yes, the exit status is required. Regarding TWAPI, I already use it
fairly extensively in other applications, so I have access to it and I'm
familiar with it. That said, it hadn't crossed my mind to see if it
contained anything useful in this regard. I could easily add it to this
application if need be. Though, as I mentioned earlier, the trick will
be getting a copy of the updated application on the production server.

> Just curious, I thought [trace] took an option as the first arg, like
> [trace variable] etc. I'm not sure I understand what the mentioned
> trace command does - unless that was a typo.

Nope, you're right. This trace is not the in-built Tcl [trace] command.
It's been overwritten with a local [trace] proc, that's basically
just a simple logging facility. I wrote the app, so there's really no
one to cry "foul" to. My only excuse is that I wrote it a number of
years ago and didn't know better. In this case, it's not really an
issue, but I probably should rename the proc.

Thanks for the great ideas.

Jeff

Jeff Godfrey

unread,
Apr 9, 2009, 6:02:39 PM4/9/09
to

Great. I'll look into it. While I'd certainly like to get to the
bottom of the real issue, maybe that'll at least buy me some time.

Thanks.

Jeff

Troy

unread,
Apr 10, 2009, 11:26:15 AM4/10/09
to

We've also seen this problem at work. We were using TclBlend, and so
we worked around it similarly, by exiting using the java system exit
command instead of tcl exit. Seeing this today prompted me to ask my
coworker who investigated this (we had been plagued by this for
years), and he filed a bug (http://sourceforge.net/tracker/?
func=detail&aid=2750491&group_id=10894&atid=110894) that contains more
information on what he found about it and how to possibly work around
it.

Jeff Godfrey

unread,
Apr 10, 2009, 12:19:33 PM4/10/09
to

That tracker definitely sounds quite similar to what I'm seeing. I'd be
interested to hear what others with more knowledge of Tcl internals have
to say. On my end, I've updated my app by replacing this:

exit $retVal

with this...

twapi::end_process [pid] -force -wait 1000 -exitcode $retVal

I've supplied the update to my customer, who it now going through the
pains of getting it into his production environment (the only place the
problem has been seen). I don't have any concrete evidence that this
change will actually circumvent the reported problem, though it seems
like it should.

Thanks again.

Jeff

George Peter Staplin

unread,
Apr 12, 2009, 6:24:44 AM4/12/09
to
Troy wrote:

I suspect the commit below may have fixed it in 8.6. There was some
questionable code in tclWin32Dll.c. I recall we discussed it at the
Tcler's Chat at some length. Joe English in particular was quite against
the code pattern, because of the problems it could introduce, and the MSDN
documentation warned against what Tcl was doing.

2008-08-01 Jeff Hobbs <je...@ActiveState.com>

* doc/Exit.3: Do not call Tcl_Finalize implicitly
* generic/tclEvent.c: on DLL_PROCESS_DETACH as it may lead
* win/tclWin32Dll.c (DllMain): to issues and the user should be
explicitly calling Tcl_Finalize before unloading regardless. Clarify
the docs to note the explicit need in embedded use.

So it's quite possible the code before lead to some races, or parts of the
Windows DLL code causing a deadlock.

This is what the MSDN documentation has to say about what Tcl was doing:
"Calling functions that require DLLs other than Kernel32.dll may result in
problems that are difficult to diagnose. For example, calling User, Shell,
and COM functions can cause access violation errors, because some functions
load other system components. Conversely, calling functions such as these
during termination can cause access violation errors because the
corresponding component may already have been unloaded or uninitialized.

Because DLL notifications are serialized, entry-point functions should not
attempt to communicate with other threads or processes. *Deadlocks* may
occur as a result."

See also: http://msdn.microsoft.com/en-us/library/ms682583.aspx

The emphasis on "Deadlocks" was added by me.

As far as I know Tcl uses multiple threads in Windows, and so does Tk for
Windows. The code that Tcl_Finalize uses manipulates and cleans up a lot
of global thread state, and that was unsafe to do from a DLL_PROCESS_DETACH
path. It was also using DLLs other than Kernel32.dll.

In fact the developer that added it seems to have known some of that, based
on the comment in see in 8.5.

" * Side effects:
* Establishes 32-to-16 bit thunk and initializes sockets library. This
* might call some sycronization functions, but MSDN documentation
* states: "Waiting on synchronization objects in DllMain can cause a
* deadlock."

In other words, the code was flawed, and the person that added it really
shouldn't have.

-George

jeff_g...@pobox.com

unread,
Apr 14, 2009, 10:48:57 AM4/14/09
to
On Apr 10, 11:19 am, Jeff Godfrey <jeff_godf...@pobox.com> wrote:

> I've supplied the update to my customer, who it now going through the
> pains of getting it into his production environment (the only place the
> problem has been seen).  I don't have any concrete evidence that this
> change will actually circumvent the reported problem, though it seems
> like it should.
>
> Thanks again.
>
> Jeff

Well, I just heard back from my customer. The updated code ran fine
in his test environment, but once installed in the production
environment, (occasionally) hangs as before.

Here's my exit code now...

# --- close the database if it's still open
if {$::global(dbOpen)} {closeDB db}

trace "\nExit status --> $retVal"

twapi::end_process [pid] -force -wait 1000 -exitcode $retVal

#exit $retVal

When the process hangs, he gets the "Exit status --> 0" message, but
the process doesn't die. So, we're really down to just one line of
code.

The only difference with this update is that I'm now using
[twapi::end_process] instead of the (now commented out) Tcl [exit].

While the above was intended as more of a workaround than a real
solution, I *really* expected it to avoid the hang issue.
Unfortunately not.

I'm now having the customer investigate the possibility of installing
SysInternals "Process Explorer" as suggested earlier in this thread.
I don't know if that'll be possible on their production system or not.

Does the fact that [twapi::end_process] apparently can't abort the
hung process tell anyone anything?

Thanks,

Jeff


jeff_g...@pobox.com

unread,
Apr 14, 2009, 11:50:40 AM4/14/09
to

Answering my own question, I'm betting on some sort of resource
exhaustion that causes a general system instability and leads to the
process eventually hanging (as mentioned earlier, this process is
called in batch hundreds (thousands?) of times in succession).

It sounds like it's going to be possible to get Process Explorer
installed on the production box, so I'm hoping that sheds some real
light on the issue next time it happens.

Jeff

George Peter Staplin

unread,
Apr 14, 2009, 1:27:18 PM4/14/09
to
jeff_g...@pobox.com wrote:

Jeff,

I have been discussing the issue in the Tcler's Chat with some other Tcl
developers. The tclWinPipe.c seems like it could hang on close if there is
any pending writable data.

The PipeClose2Proc waits an INFINITE amount of time for the writer thread to
complete, so if you have any child processes that aren't reading enough, it
could cause strange problems from what I understand. I am by no means a
Windows expert though.

There is also this problem that may occur in some cases:
Tcl_MutexLock(&pipeMutex);
/* BUG: this leaks memory */
TerminateThread(pipePtr->writeThread, 0);
Tcl_MutexUnlock(&pipeMutex);

During the chat it was suggested that someone rewrite a lot of the windows
code to not use tclWinProcs, and to make the assumption of wide interfaces
for WCHAR being available. The current code uses a lot of indirection for
older Windows platforms. Tcl has officially dropped support for Windows
98, so it shouldn't be a problem.

Any thread exit handlers in Windows are a potential problem, and I also
suspect that the DllMain code I posted about elsewhere in this thread could
have something to do with it too. The code in question has been removed
from 8.6, but is still present in older releases.

-George

jeff_g...@pobox.com

unread,
Apr 14, 2009, 1:53:02 PM4/14/09
to
On Apr 14, 12:27 pm, George Peter Staplin <georg...@xmission.com>
wrote:

> jeff_godf...@pobox.com wrote:
> > On Apr 14, 9:48 am, jeff_godf...@pobox.com wrote:
> >> On Apr 10, 11:19 am, Jeff Godfrey <jeff_godf...@pobox.com> wrote:
>
> >> Does the fact that [twapi::end_process] apparently can't abort the
> >> hung process tell anyone anything?
>
> > Answering my own question, I'm betting on some sort of resource
> > exhaustion that causes a general system instability and leads to the
> > process eventually hanging (as mentioned earlier, this process is
> > called in batch hundreds (thousands?) of times in succession).
>
> > It sounds like it's going to be possible to get Process Explorer
> > installed on the production box, so I'm hoping that sheds some real
> > light on the issue next time it happens.
>
> > Jeff
>
> Jeff,
>
> I have been discussing the issue in the Tcler's Chat with some other Tcl
> developers.  The tclWinPipe.c seems like it couldhangon close if there is
> any pending writable data.

[ ... other interesting info snipped ...]

> -George

George,

As always, thanks for your insight. If running Process Monitor on the
hung process reveals anything of interest, I'll report back.

Jeff

0 new messages