Per,
I fully agree for the case where you have full control over the code architecture ... in such a case the only way is to stop digging and getting out of the hole. Or have a party leadership contest instead ... just [j|ch]oking.
In this particular situation I don't have control over the Gomega TDD session package and how it deploys Wait in a separate goroutine. On its own, I consider Gomega's design to be fine here. It is just when bringing file descriptor checks to a session/exec.Cmd where my issue comes up and where I can neither change Gomega's session Wait nor Go's exec.Cmd. The fd leak checks will be called potentially multiple times at seemingly arbitrary times for the session under test, while the session under test might throw its tantrum at any time if it pleases to do so.
After some more pondering, I opted for Python's "try first, beg forgiveness later" strategy by throwing out the problematic check and trying to read the PID's fd directory anyway. If the PID is still valid that will succeed, if its gone that moment, it'll fail. Catching that error and reporting a more test user-friendly error instead makes this more robust, as well as user friendly.
I like your idea of pidfd_open, but after pondering it for a long time I don't think that it does help in this peculiar use case. It even complicates fd checking in several ways, as I will need to sort it out from leak checking and at the same time attempting to not leak it itself. This might quickly turn against be for little gain. On top of this,
https://man7.org/linux/man-pages/man2/pidfd_open.2.html points out that when the child gets reaped then the pidfd becomes invalid. Nevertheless, I really appreciate you bringing up pidfd_open!
On a side note: I still cannot get over the fact that the fd returned by pidfd_open call can be used with setns as pointed out by someone else to me.
-- TheDiveO