epoll bug with connect

1,378 views
Skip to first unread message

ja...@eggnet.com

unread,
Jul 24, 2014, 1:55:40 AM7/24/14
to golan...@googlegroups.com
Hello,

I noticed a bug related to detecting the connection refused error on connect().

The net effect is, that a write() will eventually error out with "connection refused" which is quite odd of course, since a connection refused error should be caught on connect, not write.

Here is a simple program that demonstrates the problem: http://play.golang.org/p/vyLWv-DEjf

The catch is, the target address has to be roughly at least 9ms away. At least that's how I was able to replicate the problem. 127.0.0.1 will not work, please change that before testing.

The test is working (rather, exposing the failure) if you see more than one of these:

2014/07/23 21:51:20 write error: write tcp4 69.28.182.189:5678: connection refused

The rest of this post is based on strace information, collected by executing this:

strace -ttt -f -o strace.out ./test

The vast majority of the time, the connection refused error will be detected on connect, more specifically by getsockopt after epoll_wait notifies a thread that a particular socket is ready for writes:

20161 1406175892.160793 getsockopt(61, SOL_SOCKET, SO_ERROR, [111], [4]) = 0

Error 111 is "connection refused."

But every so often, the getsockopt check happens too early, well before the connection could possibly have failed. Of course the connection does fail but isn't noticed until the write.

The issue seems to relate to the interaction of different threads:

20161 1406175892.159823 getsockopt(50, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.159976 getsockopt(49, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.160128 getsockopt(48, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.160316 getsockopt(62, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.160483 getsockopt(60, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.160642 getsockopt(63, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.160793 getsockopt(61, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20159 1406175892.180376 getsockopt(34, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
20159 1406175892.188062 getsockopt(5, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.189218 getsockopt(33, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.189389 getsockopt(30, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.189563 getsockopt(29, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.189721 getsockopt(31, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.189901 getsockopt(27, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.190043 getsockopt(28, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.190189 getsockopt(26, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.190358 getsockopt(22, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.190498 getsockopt(25, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.190639 getsockopt(24, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
20161 1406175892.190785 getsockopt(23, SOL_SOCKET, SO_ERROR, [111], [4]) = 0


Here are what I believe are the relevant syscalls:

20161 1406175892.180009 epoll_wait(4,  <unfinished ...>

20159 1406175892.180165 epoll_ctl(4, EPOLL_CTL_ADD, 34, {EPOLLIN|EPOLLOUT|EPOLLET|0x2000, {u32=3362912440, u64=140405843818680}}) = 0

20161 1406175892.180246 <... epoll_wait resumed> {{EPOLLOUT|EPOLLHUP, {u32=3362912440, u64=140405843818680}}}, 128, 4294967295) = 1

20159 1406175892.180290 connect(34, {sa_family=AF_INET, sin_port=htons(5678), sin_addr=inet_addr("x.x.x.x")}, 16 <unfinished ...>

20161 1406175892.180329 epoll_wait(4,  <unfinished ...>

20159 1406175892.180359 <... connect resumed> ) = -1 EINPROGRESS (Operation now in progress)

20159 1406175892.180376 getsockopt(34, SOL_SOCKET, SO_ERROR, [0], [4]) = 0

20159 1406175892.180745 write(34, "hi", 2) = -1 EAGAIN (Resource temporarily unavailable)

20161 1406175892.193698 epoll_wait(4, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362912440, u64=140405843818680}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362912088, u64=140405843818328}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362918640, u64=140405843824880}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362911736, u64=140405843817976}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362917056, u64=140405843823296}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362917936, u64=140405843824176}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362915432, u64=140405843821672}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362922160, u64=140405843828400}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362916880, u64=140405843823120}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362918112, u64=140405843824352}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362911560, u64=140405843817800}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362912616, u64=140405843818856}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362918288, u64=140405843824528}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362921632, u64=140405843827872}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362913672, u64=140405843819912}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362919168, u64=140405843825408}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=3362912792, u64=140405843819032}}}, 128, 0) = 17

20161 1406175892.196158 write(34, "hi", 2) = -1 ECONNREFUSED (Connection refused)

Thanks,
 -Jason

Dmitry Vyukov

unread,
Jul 24, 2014, 3:34:55 AM7/24/14
to ja...@eggnet.com, golang-nuts
Hi,


Looks like some issue with linux. Go syscall sequence looks
reasonable. While kernel signals EPOLLHUP during epoll_wait, but then
returns no error from getsockopt(SO_ERROR).
I guess that a C program would have the same behavior.

Does it negatively affect you application?

ja...@eggnet.com

unread,
Jul 24, 2014, 1:03:00 PM7/24/14
to golan...@googlegroups.com, ja...@eggnet.com
Dmitry,

Thanks for looking at this. No it doesn't negatively affect my application. It might if it happened in a significant percentage of cases, but it doesn't appear to.

Incidentally I tried this on 2.6.18-128.el5, 3.2.0-29-generic, as well as 3.15.4-x86_64-linode45 which is the latest kernel I could easily test. The behavior is absolutely identical on all of those kernels.

If it's a bug in linux, it's very old.

Thanks,
 -Jason

branimir....@gmail.com

unread,
Jul 24, 2014, 2:46:17 PM7/24/14
to golan...@googlegroups.com, ja...@eggnet.com


On Thursday, July 24, 2014 7:55:40 AM UTC+2, ja...@eggnet.com wrote:
Hello,

I noticed a bug related to detecting the connection refused error on connect().

The net effect is, that a write() will eventually error out with "connection refused" which is quite odd of course, since a connection refused error should be caught on connect, not write.

 
Here is a simple program that demonstrates the problem: http://play.golang.org/p/vyLWv-DEjf

 Here are what I believe are the relevant syscalls:

20161 1406175892.180009 epoll_wait(4,  <unfinished ...>

20159 1406175892.180165 epoll_ctl(4, EPOLL_CTL_ADD, 34, {EPOLLIN|EPOLLOUT|EPOLLET|0x2000, {u32=3362912440, u64=140405843818680}}) = 0

20161 1406175892.180246 <... epoll_wait resumed> {{EPOLLOUT|EPOLLHUP, {u32=3362912440, u64=140405843818680}}}, 128, 4294967295) = 1

20159 1406175892.180290 connect(34, {sa_family=AF_INET, sin_port=htons(5678), sin_addr=inet_addr("x.x.x.x")}, 16 <unfinished ...>

20161 1406175892.180329 epoll_wait(4,  <unfinished ...>

20159 1406175892.180359 <... connect resumed> ) = -1 EINPROGRESS (Operation now in progress)

20159 1406175892.180376 getsockopt(34, SOL_SOCKET, SO_ERROR, [0], [4]) = 0

20159 1406175892.180745 write(34, "hi", 2) = -1 EAGAIN (Resource temporarily unavailable)


Seems that golang does not wait for connection to finish and uses non blocking  connect.

20161 1406175892.196158 write(34, "hi", 2) = -1 ECONNREFUSED (Connection refused)

Yes this is error from connect which comes late. I guess that first golang have to wait for connection to finish.
I have tried different combinations of Dial and error seems consistent. You should
report bug.

 
Thanks,
 -Jason

ja...@eggnet.com

unread,
Jul 24, 2014, 3:15:31 PM7/24/14
to golan...@googlegroups.com, ja...@eggnet.com
Dmitry,

After speaking about this more thoroughly internally, I'm going to revise my position.

It does appear to be the case that at least some code bases use the same approach as go's runtime in terms of calling getsockopt in response to an epoll_wait event following connect. In particular, nginx.

However we have noticed that kernels, both linux and freebsd with epoll and kqueue, have a tendency to generate spurious events occasionally. Presumably it is difficult to prevent this.

In general that means the best approach is to just retry the previous system call, very similar to what go does when it thinks the connect has completed, it just tries the write() call again even though epoll_wait has signaled errors with the associated events.

Sure... getsockopt(SO_ERROR) should really have responded with EINPROGRESS instead of 0. But clearly that's an extremely old bug that, even if it was fixed today, would take a very long time to be generally deployed.

We have another pure C code base, which is event driven, deployed at significant scale. Calling connect() again is the approach we use and it works across both linux and freebsd.

Of course that code base does different things than the go code base that I discovered this problem in, and it is theoretically possible that calling connect() again is just as ineffective as calling getsockopt().

I'll see what I can do about proving that calling connect() again actually does fix this issue.

ja...@eggnet.com

unread,
Jul 24, 2014, 3:27:46 PM7/24/14
to golan...@googlegroups.com, ja...@eggnet.com, branimir....@gmail.com
I agree there is a problem here, which is why I posted this in the first place.

But it's not clear if this is a problem go can fix or not. getsockopt should have been good enough to detect the spurious event but it appears to not be the case. I'm going to try setting up a test that calls connect again instead.

20159 1406175892.180376 getsockopt(34, SOL_SOCKET, SO_ERROR, [0], [4]) = 0

Go does try to wait for a successful connect. go calls getsockopt.

That above line should have shown [115] which is EINPROGRESS, but it showed [0] which basically implies that the connect succeeded. Obviously that is not the case.

I'm going to attempt a test that calls connect instead of getsockopt, to see if that works reliably.

Mikio Hara

unread,
Jul 24, 2014, 3:31:43 PM7/24/14
to ja...@eggnet.com, golang-nuts, branimir....@gmail.com
On Fri, Jul 25, 2014 at 4:27 AM, <ja...@eggnet.com> wrote:

> I'm going to attempt a test that calls connect instead of getsockopt, to see
> if that works reliably.

golang.org/issue/8276 ?

branimir....@gmail.com

unread,
Jul 24, 2014, 3:39:49 PM7/24/14
to golan...@googlegroups.com, ja...@eggnet.com


On Thursday, July 24, 2014 9:15:31 PM UTC+2, ja...@eggnet.com wrote:
Dmitry,

After speaking about this more thoroughly internally, I'm going to revise my position.

It does appear to be the case that at least some code bases use the same approach as go's runtime in terms of calling getsockopt in response to an epoll_wait event following connect. In particular, nginx.

In C/C++ I am waiting for EPOLLIN in order to get response from non blocking connect. It never failed me.


However we have noticed that kernels, both linux and freebsd with epoll and kqueue, have a tendency to generate spurious events occasionally. Presumably it is difficult to prevent this.

In general that means the best approach is to just retry the previous system call, very similar to what go does when it thinks the connect has completed, it just tries the write() call again even though epoll_wait has signaled errors with the associated events.

Sure... getsockopt(SO_ERROR) should really have responded with EINPROGRESS instead of 0. But clearly that's an extremely old bug that, even if it was fixed today, would take a very long time to be generally deployed.

We can;t see what is result from getsockopt, because getsockopt returns 0 on successfull call and  in case of SOL_SOCKET,SO_ERROR it returns error in variable. I guess
that there is no error, anyway, since no operation on that socket is finished by that time. There should be epoll_wait on *input*
in order to get error from getsockopt. It never fails me in C on Linux.

 


branimir....@gmail.com

unread,
Jul 24, 2014, 5:04:12 PM7/24/14
to golan...@googlegroups.com, ja...@eggnet.com
Hm, I cannot reproduce problem in go if ip connection address is in doted format. However if I place domain name that should
be resolved, problem occurs. Spurious wakeup DNS related? I have started torrents seed in order to make process really slow
so I can eliminate if DNS case is just speed, seems that there is definitely some problem.


On Thursday, July 24, 2014 7:55:40 AM UTC+2, ja...@eggnet.com wrote:

ja...@eggnet.com

unread,
Jul 24, 2014, 5:17:57 PM7/24/14
to golan...@googlegroups.com, branimir....@gmail.com
Wow heh. go was calling connect() but dragonfly broke it, now go calls getsockopt.

ja...@eggnet.com

unread,
Jul 24, 2014, 6:49:14 PM7/24/14
to golan...@googlegroups.com, branimir....@gmail.com, ja...@eggnet.com
I have a simpler test that shows this happening more reliably:


There is just one goroutine. It only shows a problem when run under strace, presumably due to timing. I can't really see what's happening when I'm not running strace.

The spurious epoll_wait event appears to be from a previous close on the same file descriptor. In this trace example ECONNREFUSED on write happens from the first connect.

As a side note, this validation step that appears to be performed for every connect seems like a waste? I can't figure out what the purpose of it is. It seems like some kind of test connect() when the real connect() would provide the same information. My guess is, it is this validation step that increases the likelihood of go code running into this problem at all.

The interesting thing about this code is, it is not controlled by epoll. In other words, this probing code is using FD 3 but it never added or removed it from epoll. It appears to be these non-epoll uses of a file descriptor followed by epoll using them, that cause the spurious epoll_wait events... but really I have no idea.

When the code later uses FD 3 with epoll, it sees the close(3) event.

27040 1406238721.390444 socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
27040 1406238721.390467 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "far_end_ipv6", &sin6_addr), s
in6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable)
27040 1406238721.390505 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
27040 1406238721.390530 connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("far_end_ipv6")}, 16) = 0
27040 1406238721.390556 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(57305), inet_pton(AF_INET6, "::ffff:near_end_ipv4", &sin6_addr), sin6_flo
winfo=0, sin6_scope_id=0}, [28]) = 0

Here is the close that will trigger the epoll_wait event:

27040 1406238721.390604 close(3)        = 0

27040 1406238721.390715 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3

27040 1406238721.390775 epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLET|0x2000, {u32=673338472, u64=139647240000616}}) = 0

27040 1406238721.390810 connect(3, {sa_family=AF_INET, sin_port=htons(3456), sin_addr=inet_addr("far_end_ipv4")}, 16) = -1 EINPROGRESS (Operation now in progress)

27040 1406238721.390877 epoll_wait(4, {}, 128, 0) = 0

I'm assuming this event is triggered by the close 300 nanoseconds ago on the same file descriptor, plus the fact that we are now monitoring FD 3 again:

27037 1406238721.390931 <... epoll_wait resumed> {{EPOLLOUT|EPOLLHUP, {u32=673338472, u64=139647240000616}}}, 128, 4294967295) = 1

getsockopt doesn't consider an EINPROGRESS connect any kind of failure:

27037 1406238721.390979 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0

Therefore time to write:

27037 1406238721.391156 write(3, "hi", 2) = -1 EAGAIN (Resource temporarily unavailable)

27037 1406238721.391209 epoll_wait(4,  <unfinished ...>
27037 1406238721.400103 <... epoll_wait resumed> {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=673338472, u64=139647240000616}}}, 128, 4294967295) = 1

Write fails:

27037 1406238721.400210 write(3, "hi", 2) = -1 ECONNREFUSED (Connection refused)


ja...@eggnet.com

unread,
Jul 24, 2014, 8:33:23 PM7/24/14
to golan...@googlegroups.com, branimir....@gmail.com
I tried using literal IP addresses and from what I can tell the main difference when that occurs, is this is skipped on every connect. I think this is the primary cause of the spurious close events that we are seeing.

I haven't figured out yet how Dialing with a normal hostname leads to these syscalls.

If anyone knows, please update this thread.

6408  1406246292.447895 socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
6408  1406246292.447932 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "ipv6_remote_addr", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable)
6408  1406246292.447972 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
6408  1406246292.447998 connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("ipv4_remote_addr")}, 16) = 0
6408  1406246292.448026 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(60854), inet_pton(AF_INET6, "::ffff:ipv4_local_addr", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
6408  1406246292.448053 close(3)        = 0

ja...@eggnet.com

unread,
Jul 25, 2014, 5:18:37 AM7/25/14
to golan...@googlegroups.com, branimir....@gmail.com
Well, it turns out these extra syscalls aren't in go at all, they're caused by calling libc's getaddrinfo.

If I switch to using go's internal resolver, this problem vanishes.

I ran this from the go/src directory:

go install -a -tags netgo std

compiled the test code with the tag:

go build -tags netgo test.go

$ ldd test
        not a dynamic executable

And the problem vanishes for both test programs I posted.

It's not clear to me if this is the same issue but, this might be relevant:


To be clear I'm considering this a victory. Perhaps more importantly, with much more efficient and clear DNS resolution.

My personal vote is to make go's internal DNS resolution the default, for what it's worth.

ja...@eggnet.com

unread,
Jul 25, 2014, 8:32:06 PM7/25/14
to golan...@googlegroups.com
Wrong again... I'm pretty sure I've now identified the problem as a simple bug in go. It just takes more concurrency to trigger the issue than before.

go is calling epoll_ctl ADD before connect which I believe is incorrect. If another thread is in epoll_wait, that thread can (will?) get an event with EPOLLOUT|EPOLLHUP flags before connect is called, which is what has been happening.

I wrote a quick test in C, relevant syscalls highlighted below. The test was to make sure that epoll_wait was able to see the connect event even after a substantial delay between connect and epoll_ctl.

1406334260.600586 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 4 <0.000013>
1406334260.600629 setsockopt(4, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 <0.000006>
1406334260.600671 connect(4, {sa_family=AF_INET, sin_port=htons(3456), sin_addr=inet_addr("69.28.182.189")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000054>
1406334260.600808 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000005>
1406334260.600859 rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 <0.000005>
1406334260.600913 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000004>
1406334260.600946 nanosleep({5, 0}, 0x7fff83894450) = 0 <5.000141>
1406334265.601188 epoll_ctl(3, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLOUT|EPOLLET|0x2000, {u32=4, u64=18023070048452612}}) = 0 <0.000010>
1406334265.601237 epoll_wait(3, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=4, u64=18023070048452612}}}, 1, 10000) = 1 <0.000005>
1406334265.601310 getsockopt(4, SOL_SOCKET, SO_ERROR, [111], [4]) = 0 <0.000006>
1406334265.601350 close(4)              = 0 <0.000018>


Here's a clearer picture of what is happening in go:


This is with go's internal resolver.

Running the trace like this:

strace -ttt -T -ff -o a ./test hostname:3456

Find the write that returns ECONNREFUSED.

Determine which file descriptor it is, then determine which u64 ID is used with it in epoll_ctl. This should show all relevant syscalls that relate to the file descriptor or u64 ID.

The spurious epoll_wait event fires between epoll_ctl and connect.

egrep '[(, ]215($|[), ])|139969772241984' a.* | perl -pe 's/^a.(\d+):(\S+)/$2 $1/' | sort

1406320197.172769 26587 getsockopt(439, SOL_SOCKET, SO_ERROR, [111], [4]) = 0 <0.000005>
1406320197.172794 26587 epoll_ctl(4, EPOLL_CTL_DEL, 439, {0, {u32=4213032, u64=4213032}}) = 0 <0.000006>
1406320197.172817 26587 close(439)            = 0 <0.000010>
1406320197.507617 26587 socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 439 <0.000010>
1406320197.507645 26587 setsockopt(439, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0 <0.000006>
1406320197.507672 26587 epoll_ctl(4, EPOLL_CTL_ADD, 439, {EPOLLIN|EPOLLOUT|EPOLLET|0x2000, {u32=2368844064, u64=139971058053408}}) = 0 <0.000006>
1406320197.507823 26588 epoll_wait(4, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368724616, u64=139971057933960}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368839488, u64=139971058048832}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368774368, u64=139971057983712}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368821712, u64=139971058031056}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368839136, u64=139971058048480}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368757648, u64=139971057966992}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368839312, u64=139971058048656}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368757296, u64=139971057966640}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368757120, u64=139971057966464}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368757472, u64=139971057966816}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368756944, u64=139971057966288}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368722504, u64=139971057931848}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368722328, u64=139971057931672}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368791088, u64=139971058000432}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790912, u64=139971058000256}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368780528, u64=139971057989872}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790560, u64=139971057999904}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790384, u64=139971057999728}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790736, u64=139971058000080}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790032, u64=139971057999376}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368790208, u64=139971057999552}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789680, u64=139971057999024}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789328, u64=139971057998672}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789856, u64=139971057999200}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789152, u64=139971057998496}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368788976, u64=139971057998320}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368789504, u64=139971057998848}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368761696, u64=139971057971040}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368788800, u64=139971057998144}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368788624, u64=139971057997968}}, {EPOLLOUT|EPOLLHUP, {u32=2368844064, u64=139971058053408}}}, 128, 0) = 31 <0.000010>
1406320197.507942 26587 connect(439, {sa_family=AF_INET, sin_port=htons(3456), sin_addr=inet_addr("x")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000031>
1406320197.508003 26587 getsockopt(439, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 <0.000005>
1406320197.508081 26587 getsockname(439, {sa_family=AF_INET, sin_port=htons(36051), sin_addr=inet_addr("x")}, [16]) = 0 <0.000005>
1406320197.508109 26587 getpeername(439, 0x7f4d8cdb4560, [112]) = -1 ENOTCONN (Transport endpoint is not connected) <0.000005>
1406320197.508141 26587 setsockopt(439, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000006>
1406320197.508262 26587 write(439, "hi", 2)   = -1 EAGAIN (Resource temporarily unavailable) <0.000006>
1406320197.518050 26588 epoll_wait(4, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368843888, u64=139971058053232}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368843712, u64=139971058053056}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368843536, u64=139971058052880}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844768, u64=139971058054112}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368754128, u64=139971057963472}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368806224, u64=139971058015568}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368782992, u64=139971057992336}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368761168, u64=139971057970512}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368806048, u64=139971058015392}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368833504, u64=139971058042848}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368761344, u64=139971057970688}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368834208, u64=139971058043552}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368805344, u64=139971058014688}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368723736, u64=139971057933080}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368803760, u64=139971058013104}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368834032, u64=139971058043376}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368765568, u64=139971057974912}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368770848, u64=139971057980192}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368831568, u64=139971058040912}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368831392, u64=139971058040736}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368831216, u64=139971058040560}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368815904, u64=139971058025248}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368831040, u64=139971058040384}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368830864, u64=139971058040208}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368830688, u64=139971058040032}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368830512, u64=139971058039856}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844592, u64=139971058053936}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844416, u64=139971058053760}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844240, u64=139971058053584}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368723384, u64=139971057932728}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368820656, u64=139971058030000}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368767856, u64=139971057977200}}, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x2000, {u32=2368844064, u64=139971058053408}}, {EPOLLOUT|EPOLLHUP, {u32=2368771200, u64=139971057980544}}}, 128, 0) = 34 <0.000010>
1406320197.607291 26587 write(439, "hi", 2)   = -1 ECONNREFUSED (Connection refused) <0.000006>
1406320197.607607 26587 epoll_ctl(4, EPOLL_CTL_DEL, 439, {0, {u32=4213032, u64=4213032}}) = 0 <0.000006>
1406320197.607631 26587 close(439)            = 0 <0.000010>

The test C program:

#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <sys/epoll.h>
#include <fcntl.h>
#include <errno.h>

int main(int argc, char *argv[]) {
        int sock;
        int efd;
        int one = 1;
        int port;
        int res;
        int optval;
        int optlen = sizeof(optval);

        struct sockaddr_in server;
        struct epoll_event ev;

        /* first argument is the destination addr as an ipv4 literal */
        /* second argument is the destination port */

        if (argc < 3) {
                printf("usage: %s <target_ipv4_address> <port>\n", argv[0]);
                return 1;
        }

        port = atoi(argv[2]);

        efd = epoll_create1(O_CLOEXEC);
        if (efd < 0) {
                perror("epoll_create1");
                return 1;
        }

        sock = socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP);
        if (sock < 0) {
                perror("socket");
                return 1;
        }
        if (setsockopt(sock, SOL_SOCKET, SO_BROADCAST, &one, sizeof(one)) < 0) {
                perror("setsockopt");
                return 1;
        }

        server.sin_addr.s_addr = inet_addr(argv[1]);
        server.sin_family = AF_INET;
        server.sin_port = htons(port);

        res = connect(sock, (struct sockaddr *)&server, sizeof(server));

        if (res >= 0) {
                printf("connect succeeded, not interesting\n");
                return 1;
        }

        if (errno != EINPROGRESS) {
                perror("connect");
                return 1;
        }


        /* try to "miss" the epoll event */
        sleep(5);

        ev.events = EPOLLIN|EPOLLOUT|EPOLLET|EPOLLRDHUP;
        ev.data.fd = sock;

        if (epoll_ctl(efd, EPOLL_CTL_ADD, sock, &ev) < 0) {
                perror("epoll_ctl");
                return 1;
        }

        res = epoll_wait(efd, &ev, 1, 10000);
        if (res < 0) {
                perror("epoll_wait");
                return 1;
        }

        if (res == 0) {
                printf("no events\n");
                return 1;
        }

        if (res > 1) {
                printf("impossible number of events: %d\n", res);
        }

        if (getsockopt(ev.data.fd, SOL_SOCKET, SO_ERROR, &optval, &optlen) < 0) {
                perror("getsockopt");
        }
        close(ev.data.fd);

        printf("socket fd %d has errno %d\n", ev.data.fd, optval);
}

ja...@eggnet.com

unread,
Jul 26, 2014, 12:48:56 AM7/26/14
to golan...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages