I'm writing a streaming socket server that needs to handle 10K concurrent
sockets. I wrote a passable version using just
::select() and non-blocking IO. I figured that IOCP should be able to
handle the job better, however I'm having some problems.
When the server starts I create the server socket and an io completion port.
When accepting new clients I just add the new client to the io completion
port and wait for the next.
I am having a problem that I don't understand, when I hit around 500
sockets, the call to CreateIoCompletionPort returns
a failure and GetLastError() returns 6. I can not find anywhere the
meaning of error code 6.
Have I reached some sort of limit I am unaware of as regards to how many
sockets can be attached to an IOCP?
Secondly, what is the best way to detect/handle a disconnected socket? I
noticed an earlier post advising that calling closesocket is a bad idea when
using IOCP?
Right now I keep a context associated with every socket that contains the
total number of outstanding WSASend and WSARecv calls, when that count drops
to 0 I remove the client. If at any time during the way WSARecv or WSASend
returns an error I am calling 'closesocket' to get all outstanding calls to
return with errors.
It sounds like in the normal shutdown case as WSARecv will simply return 0
bytes read?
Any insight would be helpful,
Gordon
gscot...@hotmail.com
"Gordon Scott" <gscot...@hotmail.com> wrote in message
news:O2JazxZI...@TK2MSFTNGP11.phx.gbl...
> Hi All,
>
> I'm writing a streaming socket server that needs to handle 10K concurrent
> sockets. I wrote a passable version using just
> ::select() and non-blocking IO. I figured that IOCP should be able to
> handle the job better, however I'm having some problems.
>
> When the server starts I create the server socket and an io completion
port.
> When accepting new clients I just add the new client to the io completion
> port and wait for the next.
>
> I am having a problem that I don't understand, when I hit around 500
> sockets, the call to CreateIoCompletionPort returns
> a failure and GetLastError() returns 6. I can not find anywhere the
> meaning of error code 6.
> Have I reached some sort of limit I am unaware of as regards to how many
> sockets can be attached to an IOCP?
Error code 6 is ERROR_INVALID_HANDLE.
http://msdn.microsoft.com/library/en-us/debug/base/system_error_codes__0-499_.asp?frame=true
>
> Secondly, what is the best way to detect/handle a disconnected socket? I
> noticed an earlier post advising that calling closesocket is a bad idea
when
> using IOCP?
> Right now I keep a context associated with every socket that contains the
> total number of outstanding WSASend and WSARecv calls, when that count
drops
> to 0 I remove the client. If at any time during the way WSARecv or
WSASend
> returns an error I am calling 'closesocket' to get all outstanding calls
to
> return with errors.
>
> It sounds like in the normal shutdown case as WSARecv will simply return 0
> bytes read?
>
There are more than one ways to detect a disconnect socket. AsyncSelect to
wait for FD_CLOSE event is one of them and probably will work well for you
when you use IOCP.
http://msdn.microsoft.com/library/en-us/winsock/winsock/wsaasyncselect_2.asp?frame=true
IOCP should be able to handle thousands of connections np, I based mine on
this example:
http://www.codeproject.com/internet/iocp.asp
I create a preset/configurable number of sockets (10 by default) and set all
of them immediately to an AcceptEx state. That way if several sockets
connect rapidly, there are plenty sockets ready and waiting. I also added
WSAEventSelect + FD_ACCEPT to the maintanence loop so that if it ever needs
more sockets, I can create an additional batch while keeping resource
consumption minimal. Ofcourse I want to worry about denial of service
attacks so I decrease the idle timeouts until the socket count becomes
reasonable.
In the above example (and my implementation), the callback will receive zero
byte packet when the peer closes the connection. Closing the socket from
the callback doesn't seem to have any side affects. I have experimented
with an asynchronous closesocket (closing from a seperate thread) but the
outcomes are the same.
My problem on the other hand comes from closing the socket from another
socket's callback (hence the async close attempt). In my case, if a socket
closes itself, everything is okay. When a sibling socket closes a socket,
regardless of how I do it, well +o(.
Brendan
ps. Error 6 = "The handle is invalid", look under Tools, Error Lookup in
VS.
"Gordon Scott" <gscot...@hotmail.com> wrote in message
news:O2JazxZI...@TK2MSFTNGP11.phx.gbl...
It turns out -I- was getting an error code of 6 when was closing a socket.
I was calling CloseHandle() on the handle that CreateIoCompletionPort
returned when I associated a newly accepted socket with the existing
completion port. (Ie NOT the main completion port handle). However doing so
seemed to invalidate the main iocp handle and any further calls to
GetQueuedCompletionStatus failed with error code 6. How are you
terminating your sockets? Right now I am just calling shutdown() and
closesocket() when there is an error condition on a read or write.
So far I haven't gotten my server stable enough to worry about DOS.
My basic architecture seems rather similar. My main thread spawns 10 sockets
and calls AcceptEx, it then blocks on an Event to create more.
Every time a socket is accept by one of the worker threads, it signals the
event and the main thread issues more AcceptEx calls until it gets back up
to 10 sockets waiting for connection.
My worker threads just call GetQueuedCompletionStatus and handle reads or
writes, nothing fancy going on there. I'm really tyring to develop a
messaging type server here that spits out messages to (hopefully) thousands
of persistent connections. What I am seeing NOW is that I connect up say 1k
clients and being simultaneously transmitting roughly 300 byte data packets
to each client every 2 seconds. Seems to work beautifully for a few minutes
and then all of a sudden I get:
Error code 64 'The specified network name is no longer valid'
and ALL socket read or writes subsequently fail. It SEEMS like I am
throwing more at IOCP than it can handle but I find this hard to believe. I
am baffled at the moment on how to proceed tracking this down. I am running
all 1k clients on a separate box (in the same network segement) but perhaps
I'm flooding TCP on THAT end if they can't keep up? Would that cause this
issue?
"Brendan Rempel" <bre...@n0sp4msonicmobility.com> wrote in message
news:e7KaZazI...@TK2MSFTNGP10.phx.gbl...
struct linger li = { 0, 0 }; // Default: SO_DONTLINGER
setsockopt(s, SOL_SOCKET, SO_LINGER, (char *)&li, sizeof(li));
shutdown(s, SD_BOTH);
closesocket(s);
I have changed the linger options to just about everything and hasn't made
any difference either way. My problems aren't here anyway, mine are from
being a middle-tier with one socket sending data and state to another.
Adding a while(WSARecv()) before the closesocket also made no difference for
me. Other examples show using shutdown + WSAEventSelect(FD_CLOSE) to detect
when the peer has read all data and closed its side:
One major difference in our designs is that I used BindIoCompletionCallback
which handles the multithreading for me with the side affect that I must run
on Windows 2000 and up. It simplified my class design a lot, hopefully
that's not my bug. Same with that example on codeproject. That example was
a little over complicated since the business logic was built into the socket
classes, but it cleans up pretty well.
Brendan
"Gordon Scott" <gscot...@hotmail.com> wrote in message
news:enw4K%237IEH...@TK2MSFTNGP09.phx.gbl...
<truncate>
"Gordon Scott" <gscot...@hotmail.com> wrote in message
news:enw4K%237IEH...@TK2MSFTNGP09.phx.gbl...
> Brendan, I looked at a couple of examples on codeproject, not sure I
checked
> out this one, but I'll take a look at it and see if there is something
else
> I am doing differently.
>
> It turns out -I- was getting an error code of 6 when was closing a socket.
> I was calling CloseHandle() on the handle that CreateIoCompletionPort
> returned when I associated a newly accepted socket with the existing
> completion port. (Ie NOT the main completion port handle).
Acording to MSDN:
"the return value is the handle to the I/O completion port that is
associated with the specified file"
This implies that the handle returned is THE handle to the IOCP.
> However doing so
> seemed to invalidate the main iocp handle and any further calls to
> GetQueuedCompletionStatus failed with error code 6. How are you
> terminating your sockets? Right now I am just calling shutdown() and
> closesocket() when there is an error condition on a read or write.
The correct way is shutdown / closesocket. (or if abortave just closesocket)
> So far I haven't gotten my server stable enough to worry about DOS.
>
> My basic architecture seems rather similar. My main thread spawns 10
sockets
> and calls AcceptEx, it then blocks on an Event to create more.
> Every time a socket is accept by one of the worker threads, it signals the
> event and the main thread issues more AcceptEx calls until it gets back up
> to 10 sockets waiting for connection.
>
> My worker threads just call GetQueuedCompletionStatus and handle reads or
> writes, nothing fancy going on there. I'm really tyring to develop a
> messaging type server here that spits out messages to (hopefully)
thousands
> of persistent connections. What I am seeing NOW is that I connect up say
1k
> clients and being simultaneously transmitting roughly 300 byte data
packets
> to each client every 2 seconds. Seems to work beautifully for a few
minutes
> and then all of a sudden I get:
>
> Error code 64 'The specified network name is no longer valid'
What function do you call that gives you this error? This message typicaly
shows up in explorer when you unplug your network cable.
> and ALL socket read or writes subsequently fail. It SEEMS like I am
> throwing more at IOCP than it can handle but I find this hard to believe.
I
> am baffled at the moment on how to proceed tracking this down. I am
running
> all 1k clients on a separate box (in the same network segement) but
perhaps
> I'm flooding TCP on THAT end if they can't keep up? Would that cause this
> issue?
300 bytes x 1000 clients / 2 seconds is not even alot of data. (~150 kB /s).
On decent hardware, you should have no problems.
What does this mean? What is "a sibling socket closes a socket"? Do you mean
the completion callback routine of socket A tries to close socket B?
What's the outcome?
For closesocket, it should work so long as following condition are
satisfied:
1. When closesocket() is called on a socket, there must be no un-returned
calls pending on that socket. Note that an un-returned pending call is
different from a pending call that has returned control from winsock to you
but has not been completed yet. For example, if you have two threads working
on the same socket. In thread A, you call WSASend() and this call hasn't
returned control to you. And at exactly the same time, you call
closesocket() from thread B. There will be race conditions causing AV on
this socket handle.
2. Closesocket is the last operation you can perform on a socket.
3. You must call closesocket once and only once on a socket.
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples (if any) are subject to the terms specified
at http://www.microsoft.com/info/cpyright.htm"
"Brendan Rempel" <bre...@n0sp4msonicmobility.com> wrote in message
news:e7KaZazI...@TK2MSFTNGP10.phx.gbl...
> What function do you call that gives you this error? This message
typicaly
> shows up in explorer when you unplug your network cable.
GetQueuedCompletionStatus() returns 0 with the bytes tx'd/recv'd = 0.
I call GetLastError() at that point and get 64.
Something happens such that from that point on GetQueuedCompletionStatus
ALWAYS returns 0 for all subsequent calls.
> 300 bytes x 1000 clients / 2 seconds is not even alot of data. (~150 kB
/s).
> On decent hardware, you should have no problems.
Running on a quad proceesor Win2k box. Machine doesn't even use more that 1
or 2 % cpu during the test.
"Mike" <m@b.c> wrote in message
news:eC7Dt9%23IEH...@TK2MSFTNGP11.phx.gbl...
"Gordon Scott" <gscot...@hotmail.com> wrote in message
news:z_mdnYvUDKr...@adelphia.com...
> Mike,
>
> > What function do you call that gives you this error? This message
> typicaly
> > shows up in explorer when you unplug your network cable.
>
> GetQueuedCompletionStatus() returns 0 with the bytes tx'd/recv'd = 0.
> I call GetLastError() at that point and get 64.
>
> Something happens such that from that point on GetQueuedCompletionStatus
> ALWAYS returns 0 for all subsequent calls.
The error is unhelpfull - are you sure that this is the correct error code?
It sounds as though the handle is being closed. If there is even a subtle
error in your logic, on a system such as this, it will likly mainfest
itself as a series of strange issues.
> > 300 bytes x 1000 clients / 2 seconds is not even alot of data. (~150 kB
> /s).
> > On decent hardware, you should have no problems.
>
> Running on a quad proceesor Win2k box. Machine doesn't even use more that
1
> or 2 % cpu during the test.
What kind of network hardware? (NIC, switch)
The problem is when this happens. I do get notified that the server has
closed my outgoing connection and I can close my incoming connection.
However, the client only gets partially notifed of the socket close. A test
program like telnet doesn't care, but Internet Explorer will abort and show
an error. According to a packet sniffer, only about half of the TCP ACKs
are being transmitted either way which is tricks clients into thinking
there's a network error. All the payload data does get transmitted.
So far, what hasen't worked:
- with/without so_linger - closesocket
- while WSARecv/ReadFile until 0, inside/outside worker threads
- shutdown - WSAEventSelect + FD_CLOSE in worker thread
- using a worker thread to sleep() and closesocket
All above options have identical results. But the problem is only when one
socket closes another. If I immediately return data and close the
connection to the client (so no connecting to a server), everything is ok.
Would it be better if my incoming sockets were part of one CP and outgoing
in another? Or is it possible for the outgoing socket to wake up the
incoming socket so it can close itself?
Brendan
"Stanley Feng (MSFT)" <sf...@online.microsoft.com> wrote in message
news:uVPUKdAJ...@TK2MSFTNGP11.phx.gbl...
"Mike" <m@b.c> wrote in message
news:e3ggr6VJ...@TK2MSFTNGP09.phx.gbl...
struct linger li = { 0, 0 }; // Default: SO_DONTLINGER
int err;
err = setsockopt(m_Socket, SOL_SOCKET, SO_LINGER, (char *)&li, sizeof(li));
- err returns 0, success
err = shutdown(m_Socket, SD_BOTH);
- err returns 10057. Although this means that the socket is not connected,
it really is. The client side has not received notication that the
connection has been closed.
closesocket(m_Socket);
- the client immediately receives a network error and closes its connection
as this line executes.
Removing setsockopt call changes the return value of shutdown to -1
(undocumented return for shutdown). Adding while(Read) to any stage like
previously suggested also has no affect.
I found I can trick the incoming socket to start its completion routine by
calling WriteFileEx with zero bytes, however this hasn't made any change on
the outcome of the bug. If there's any suggestions, I would appreciate it
greatly.
Brendan
"Brendan Rempel" <bre...@n0sp4msonicmobility.com> wrote in message
news:u0r7K3hJ...@TK2MSFTNGP12.phx.gbl...
> Sibling as in part of the same list of sockets in the same list of IOCP
> threads. This is just a middle tier and when the server I connect to
closes
> a socket, I must close the connection to the client that connected to me.
>
> The problem is when this happens. I do get notified that the server has
> closed my outgoing connection and I can close my incoming connection.
> However, the client only gets partially notifed of the socket close. A
test
> program like telnet doesn't care, but Internet Explorer will abort and
show
> an error. According to a packet sniffer, only about half of the TCP ACKs
> are being transmitted either way which is tricks clients into thinking
> there's a network error. All the payload data does get transmitted.
<clip>
I connect 1k clients to my server and all connect without problems, I then
start publishing my 300 byte messages every two seconds.
After about 2 minutes the server reports a remote client connection close,
10054. What's odd is that my client code is not closing it's socket, and is
still waiting on a recv() call.
The client never gets a terminated connection notice. The server conntinues
to receive 10054 on most of it's connections, it's not until I actually kill
the server exe that the clients detect a disconnected socket. Still looks
like something is getting fouled down in the OS or TCP layers...
It still doesn't seem clear to me what the problem is. Are you concerned
about "the client only gets partially notified of the socket close"? Why the
fact that IE aborts and shows an error is an issue here? (the connection is
being closed by the server and it isn't wrong for IE to show an error mesg
for this).
What do you mean by "gets partially notified"? What lead you to this
conjecture/conclusion?
The fact that you are seeing half of the TCP Acks is due to TCP delayed ack
(one ack for every two TCP segments), not related with socket closure. This
would not "tricks clients into thinking there is a network error."
What is your goal here? - To completely close your connection with your
client as soon as the connection with the server is closed?
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples (if any) are subject to the terms specified
at http://www.microsoft.com/info/cpyright.htm"
"Brendan Rempel" <bre...@n0sp4msonicmobility.com> wrote in message
news:u0r7K3hJ...@TK2MSFTNGP12.phx.gbl...
Please email me (minus the n0 sp4m part) if you can help. Thanks for your
help...
Brendan
"Stanley Feng (MSFT)" <sf...@online.microsoft.com> wrote in message
news:%23IIHCim...@TK2MSFTNGP10.phx.gbl...
This is a web proxy. It follows all the rules of one. A script on a web
server may not know the exact length of the content because it's partially
interpretted. A web server could chunk the output thereby allowing
keep-alive connections but if not, the web browser can only know when the
data is completely transmitted when the TCP socket is closed.
In my proxy, I accept connections, connect to a web server, make a request.
The web server is happy to oblige and sends me content. I know the states
and requests and I know when the web server has sent me all the data because
it closes the connection.
Then the chaos begins. The outgoing socket is closed by the web server. I
have to do some checks to make sure I've transmitted everything. I need to
close the connection to the web browser or it will sit there indefinitely
with a spinning 'e'. When closesocket() is called, all browsers error, IE
with "DNS Error", Mozilla with "The document contains no data", and various
others that return such as "400". Telnet however *does* work (same headers
transmitted). Calls like shutdown() have no affect and all html is verified
to be transmitted.
The TCP ACK stuff is what the packet sniffer found, that's still being
investigated. I've tried every possible way to close that socket, every TCP
option, synchronously and in threads. I can reproduce this over and over
again but I can't see a bug. I've tried every suggestion, every
possibility, and the outcome is always identical.
Brendan
"Stanley Feng (MSFT)" <sf...@online.microsoft.com> wrote in message
news:%23IIHCim...@TK2MSFTNGP10.phx.gbl...
Finally acquired some new boxes to perform my tests on and my server works
just fine. My problems must have been hardware or network related.
Not sure if its NIC, Cable, switch, router, etc, but I'm checking into
that.
Thanks for all the help.
"Brendan Rempel" <bre...@n0sp4msonicmobility.com> wrote in message
news:OsHvHTwJ...@TK2MSFTNGP09.phx.gbl...
> K. Had some rest, I'll try to explain the problem.
>
> This is a web proxy. It follows all the rules of one. A script on a web
> server may not know the exact length of the content because it's partially
> interpretted. A web server could chunk the output thereby allowing
> keep-alive connections but if not, the web browser can only know when the
> data is completely transmitted when the TCP socket is closed.
<clip>