--
You received this message because you are subscribed to the Google Groups "nxweb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nxweb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to nxweb+un...@googlegroups.com.
Hi,
here is my investigation:
Total request to error ratio: 2.5% in a sample size of 2M.
The error cases are situations that are considered fault situations by the server, and are subsequently logged, followed by an active connection cut-off via nxweb_http_server_connection_finalize.
I am worried that this situation is triggered in cases where it is not mandated.
The cases that concern me are when:
rc = 0 (98% of all errors)
state = HSP_RECEIVING_BODY (there are a few logs with HSP_HANDLING and HSP_SENDING_HEADERS, but they seem OK to me)
For i, the distribution is NXE_ERROR: 53%, NXE_RDHUP: 36%, NXD_HSP_READ_TIMEOUT: 10%, NXE_RDCLOSED: 0.3%.
Let’s consider NXD_HSP_READ_TIMEOUT and NXE_RDCLOSED to be valid reasons for cutting off the connection.
I am dubious about NXE_RDHUP. This would suggest a remote hangup, but I have my suspicions that this might be flawed as that number is rather high.
so: I look at i = NXE_ERROR and NXE_RDHUP
For errno, the distribution is EAGAIN: 84%, ECONNRESET: 16%. There is no correlation between i and errno, as errno is spread out evenly.
I have the impression that the errno value that is logged is of no use, as it is the errno at the moment of the log, not at the moment of the error detection. So that value just indicates the last error that has occurred somewhere else in the thread, potentially not related to the situation that finally will provoke the connection cutoff.
so: I ignore errno
br is irrelevant to this case, as it indicates the size of the client-provided headers.
NXE_ERROR:
there is 1 case in in sock_data_recv_read(): read() < 0 and errno != E_AGAIN (but note that I see many cases of i=NXE_ERROR, errno=EAGAIN logged, but then again, see the remark about errno)
I see 3 cases in nx_event.c, all related to EPOLLERR. I do not (yet) understand how these cases are triggered.
NXE_RDHUP:
I see 3 cases in nx_event.c, all related to EPOLLRDHUP. I do not (yet) understand how these cases are triggered.
I am trying to create a reproducible case with netem, but have not found a good case yet. Will continue to try, but here you have the first feedback.
Hans
To unsubscribe from this group and stop receiving emails from it, send an email to nxweb+unsubscribe@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to nxweb+unsubscribe@googlegroups.com.