What I think I narrowed it down to (I was actually wrong about my initial assessment in my earlier post) is that there's somewhat of a race condition between eio_handle_mainloop and eio_signal_shutdown. If eio_handle_mainloop exists before eio_signal_shutdown is called the socket is never shutdown() meaning there's data sent to the socket by srun that hasn't been dequeued by stepd and fed to sleep because, well, sleep doesn't read anything from stdin. When stepd exits, the next read() attempted by srun will return with ECONNRESET as a result of the unread, pending data in stepd before it quit and the kernel closed the socket.
I've come up with a seemingly functional fix that ensures the eio objects get set to shutdown and a subsequent shutdown() called should eio_handle_mainloop exit prior to eio_signal_shutdown being called. The proposed fix is here:
I've attempted to run the regression test suite but haven't had much luck. My SLURM VM is being a little squirrelly. I also plan to write a regression test for this problem but I haven't made it that far yet.
-Aaron