Hello,
I'm trying to use libuv on a heavily overloaded
Windows system and am running into an issue with the way my application
sends data. I had expected I could queue uv_write requests to the library and,
at least for TCP, the library would queue data to be sent and callback once the
send had completed offering the same in-order transmission guarantee as TCP with a nice
asynchronous interface. In doing this under load I was unfortunately running
into a scenario where my application was getting write failures. In trying to
fix this I discovered the error codes I was getting were non-fatal. I inspected
the implementation of uv_write for TCP connections on Windows and it appears to
call WSASend internally and then if WSASend returns an error other than WSA_IO_PENDING
it will post the error response to the event loop. This behavior was added about 5 years ago and seems consistent with the Unix implementation. My
concern is that when WSASend returns an error uv_write returns 0 and the event is posted asynchronously
even if the error does not indicate the connection has been lost. This would be
fine I weren't using a stream, and I could be reading the queuing of errors incorrectly,
but given the queued error might be temporary the next write might succeed even
though the previous request failed. I fear this could result in missing
segments or, in the case of application retries, octets being transmitted out
of order in cases where multiple writes are queued to the same stream. In addition,
it causes somewhat unfortunate behavior for me on Windows where I expected to
be able to queue as many messages as I had memory for and receive callbacks
only when the send succeeds or when the connection has failed. I managed a few
workarounds such as ensuring I only ever have one message in flight at a time
queuing the rest in my application or relying on the library's implementation
using WSASend as the last Winsock call for uv_write, calling WSASetLastError(0)
before invoking uv_write, and then calling WSAGetLastError() on return to see
if I should queue the message on my end. In looking for existing solutions I found
this message from 2013 indicating it was known the API could produce
retriable failures and a possible but unimplemented solution would be to queue the requests
internally, but this seems to have not been considered when the API was changed
to defer the failures asynchronously.
If anyone has any insight on how to better handle non-fatal uv_write errors on Windows I would appreciate it. Thanks in advance!
-Kaylin