In fact, Tcl calls waitpid() in two contexts:
- in Tcl_ReapDetachedProcs(), which is a poor man's ersatz of SIGCHLD handler (without actually using signals: called on pipeline creation and closure, as a reasonably sampled spot in fork-intensive scripts). In this case it is waitpid(WNOHANG), and will not raise any error since it is in some sense a background task (and detached processes are by definition not to be tracked closely).
- in TclCleanupChildren(), which is the point where we want to know about the fate of the processes in a pipeline. Hence it is blocking, and cares about the returned status. Clearly this part depends on the fact that nobody has pulled the rug by calling waitpid() earlier.
To address your concern without an overhaul of core APIs, I can only imagine inserting some abstraction layer between Tcl_WaitPid and your handlers.
In an ideal world, Tcl would (1) use signal handlers more readily and (2) expose some API to overload its own, like SIGCHLD. The bad interactions (in a portable perspective) of threads and signals unfortunately made us drift pretty far from this.
So, it would seem reasonable to say that your best bet is to ignore the error raised by TclCleanupChildren. Sorry.
-Alex