On Mon, Aug 12, 2013 at 3:51 AM, ErnieOnTheRun
<
spanger...@gmail.com> wrote:
>> You should at least consider upgrading to the latest v0.10 release
>> (v0.10.13 as of this writing) because you're a number of bug fixes
>> behind.
>>
> --> Thank you for the advice. I will discuss this. I will use this mainly on
> Linux (and
> Windows). Do you have any decisive argument (key bug fix) to help me
> convince others
> to move from v.0.10.3 --> v.0.10.13 ? I looked over the notes in the
> different releases but it was hard to judge how critical those fixes are. I
> would appreciate your comment on this.
There were some fixes for older Linux* kernels in v0.10.11 that you
probably want. Said kernels report errors in an unusual way and libuv
didn't handle that correctly, resulting in a busy loop.
Another busy loop when hitting the file descriptor limit while
listening for incoming connections was fixed in v0.10.6.
* I say 'Linux' but I could only reproduce it on CentOS systems. The
delta between RHEL kernels and mainline is huge so it's possible it's
some kind of RHEL/CentOS-only regression.
>> >> I test on a number of platforms - but primarily x86_64 Linux 3.9+ and
>> >> amd64 FreeBSD 8 and 9 - and a range of gcc releases: gcc 4.2.1 to 4.8
>> >> (4.2.1 only because that's the gcc that ships with OS X and the BSDs.)
>> >>
>> > --> From this I understand that all your testing is done on 64-bit
>> > platforms. This might be related to the root cause of
>> > the unit test failures. Am I missing something ? I might need to do more
>> > testing on a 64-bit OS to compare ?
>>
>> Sorry, what I mean is that I _personally_ mostly test on 64 bits
>> platforms. Our Jenkins setup tests on a matrix of 32 and 64 bits
>> platforms.
>>
> --> Thanks for the additional info. That is good to know. So I assume the
> timing issues must be related to some particularities of the CentOS &
> Hardware I am working on..... (?)
That sounds plausible.
>> Going back to the failing tests, loop_stop and
>> tcp_close_while_connecting are probably timing issues.
>>
>> That last one we could probably address. The test sets a 50 ms timer,
>> then tries to connect to 1.2.3.4. What happens when you set the
>> timeout to zero? You can find it in
>> test/test-tcp-close-while-connecting.c.
>
> --> When I ran this with the timeout as 0, it always passed. What exactly is
> this timeout and why does it pass with the timeout being set to 0 ?
The test tries to connect to a non-routable (or at least unreachable)
address. It usually takes seconds if not minutes for a connection to
time out but if you're in an environment where an upstream router or
firewall drops the connection immediately, then the timeout is hit
sooner than the test expects. That's why the zero timeout fixes it
(for certain values of 'fix.')
If you open an issue, I'll look into fixing it properly.
>> I don't know why spawn_setuid_setgid is failing for you. Try running
>> it in gdb, the system error will be in handle->loop->last_err.
>
> --> The system error I get is: "{code = UV_EACCES, sys_errno_ = 13}", which
> looks like a
> access (permissions ?) issue. Any idea why I would see such an error if I
> run as root ?
The test changes user to 'nobody', then tries to spawn
`path/to/run-tests some_helper`. I suspect that the permissions of
run-tests prevent user nobody from executing it.