Thanks
Timmy
I had that problem once a long time ago. It was with WSAAsyncSelect and
a very hammered listening socket that would stop listening until closed
and restarted. Best I could surmise about why I think it was an
internal queue in winsock that was losing references until it became
depleted.
I'd monitor things like the non-paged pool to see it isn't growing too
excessively. I'd also look at the output of netstat to see sockets are
indeed closing properly.
Google for "tcpip registry" and look at the MSDN docs. You might be
exceeding max sockets which is 5000 for a default. Should be at maximum
32768 for a "real" server if you want it to take any serious load.
Let us know how it turns out.
The most likely? You've got a leak somewhere.
You didn't post a concise-but-complete code sample that reliably
demonstrates the problem, so where that leak might be, who knows? But 300
concurrent clients should be a walk in the park for an IOCP-based server,
and the "degrades over time" behavior is strongly suggestive of a leak of
some sort.
Pete
Do you mean a memory leak? It is unlikely for mem leak since there is
very limited memory leaks shown at exit of the program which
are the same whether load test was conducted or not.
> Do you mean a memory leak? It is unlikely for mem leak since there is
> very limited memory leaks shown at exit of the program which
> are the same whether load test was conducted or not.
There's no way to reliably have "memory leaks shown at exit of the
program", so I really don't know what you mean there.
In any case, all I'm telling you is what decades of experience have taught
me. Again, you didn't post a concise-but-complete code example, so the
best anyone can do it speculate.
In this situation, my speculation is that there's a leak of some sort.
Whether it's memory allocations in your own program, or in the network
driver, or somewhere else, I can't say. But that's the direction I think
you ought to be looking.
Pete
You forgot the symptom!
How do you determine when to accept new connections? Do you get
indications of new connections? If so, what happens when you try to
accept them?
You left out the most important thing -- what goes wrong? Why isn't
your program accepting the new connections?
DS
Timmy, are you replacing the used AcceptEx calls for all logic paths in
your completionProc? Read through mine if it helps:
I use Acceptex to accept new connections. When the problem occurs I
do get indication of new connections, the accept routine will send a
initial message to client, which never seems to complete.
By complete, do you mean arrive at the peer or WSASend() at the host
never returns WSA_IO_PENDING or something else? Or is the client the
host of this as a module library?
It could be possible that you are sending too early in your logic such
as before your call to setsockopt(...,SO_UPDATE_ACCEPT_CONTEXT,...) or
before it is associated to the completion port.
no setockopt is called in the first place and association with iocp is
done before WSASend().
> no setockopt is called in the first place and association with iocp is
> done before WSASend().
Post your code. Your answers are too short.
setockopt(SO_UPDATE_ACCEPT_CONTEXT) is supposed to be used. That's bug#1
Again, what do you mean by completion? The WSASend return
WSA_IO_PENDING, yet never came.. back to the completionProc? or never
arrived at the peer?
My IOCP routines follows MSDN's IOCP example. But unlike the example,
the session involves exchange of multiple packets
between client and server, which i suspect will saturate the IOCP
queue (considering tens of thousands simultaneous connections
doing huge amount of data transfer).
switch (io_type) {
case SockAccept:
if (fd == INVALID_SOCKET)
break;
ret = setsockopt(
fd,
SOL_SOCKET,
SO_UPDATE_ACCEPT_CONTEXT,
(char *)&connctx->socket_listen,
sizeof(connctx->socket_listen)
);
if( ret == SOCKET_ERROR ) {
logalert(log, 1, "setsockopt failed to update accept socket %d (#
%d).", fd, WSAGetLastError());
break;
}
ret = connctx->do_accept(connctx, dwIoSize, dwKey);
....
}
int do_accept(struct conn_ctx *connctx, int recvbytes, void* key)
{
struct conn_ctx *new_connctx = add_conn_ctx(&g_ctx_list, 0);
new_connctx->fd = connctx->fd;
ret = UpdateCompletionPort(new_connctx, 0);
ret = send_init_msg(new_co nnctx);
if ( create_accept_socket(connctx, 0, 0) ) {
logalert(log, 1, "Error accepting connections.");
return -1;
}
return ret;
}
When the problem occurs, send_init_msg() is successful every time a
client connects because the client can shows the message from server.
However IOCP delivers no more packet sent from client thereafter (as
reply to the init message for example).
/* post IOCP_INITIAL_RECV_COUNT recvs. */
for(i=0; i < IOCP_INITIAL_RECV_COUNT ;i++) {
newBufPtr = GetBufferObj(new_connctx, 0);
if ((WSAerr = PostOverlappedRecv(new_connctx, newBufPtr
0 /*useburst*/, 0 /*forcepost*/)) != NO_ERROR) {
/*
* The new connection is not valid. Do not alert
* Tcl about this new dud connection. Clean it
* up ourselves.
*/
new_connctx->flags |= IOCP_CLOSING;
PostOverlappedDisconnect(new_connctx, newBufPtr);
goto replace;
}
}
> ret = send_init_msg(new_co nnctx);
:replace
> if ( create_accept_socket(connctx, 0, 0) ) {
> logalert(log, 1, "Error accepting connections.");
> return -1;
> }
>
> return ret;
> }
>
> When the problem occurs, send_init_msg() is successful every time a
> client connects because the client can shows the message from server.
> However IOCP delivers no more packet sent from client thereafter (as
> reply to the init message for example).
I don't see a call to WSARecv to set it up for receiving. You need to
post at least a zero-byte buffer to get notification.
DWORD
PostOverlappedRecv (
SocketInfo *infoPtr,
BufferInfo *bufPtr,
int useBurst,
int ForcePostOnError)
{
WSABUF wbuf;
DWORD bytes = 0, flags, WSAerr;
int rc;
bufPtr->WSAerr = NO_ERROR;
if (infoPtr->flags & IOCP_EOF || infoPtr->flags & IOCP_CLOSING)
return WSAENOTCONN;
/* Recursion limit */
if (InterlockedIncrement(&infoPtr->outstandingRecvs)
> infoPtr->outstandingRecvCap) {
InterlockedDecrement(&infoPtr->outstandingRecvs);
/* Best choice I could think of for an error value. */
return WSAENOBUFS;
}
bufPtr->operation = OP_READ;
wbuf.buf = bufPtr->buf;
wbuf.len = bufPtr->buflen;
flags = 0;
/*
* Increment the outstanding overlapped count for this socket.
*/
InterlockedIncrement(&infoPtr->outstandingOps);
if (infoPtr->proto->type == SOCK_STREAM) {
rc = WSARecv(infoPtr->socket, &wbuf, 1, &bytes, &flags,
&bufPtr->ol, NULL);
} else {
rc = WSARecvFrom(infoPtr->socket, &wbuf, 1, &bytes,
&flags, bufPtr->addr, &infoPtr->proto->addrLen,
&bufPtr->ol, NULL);
}
/*
* There are three states that can happen here:
*
* 1) WSARecv returns zero when the operation has completed
* immediately and the completion is queued to the port (behind
* us now).
* 2) WSARecv returns SOCKET_ERROR with WSAGetLastError() returning
* WSA_IO_PENDING to indicate the operation was succesfully
* initiated and will complete at a later time (and possibly
* complete with an error or not).
* 3) WSARecv returns SOCKET_ERROR with WSAGetLastError() returning
* any other WSAGetLastError() code to indicate the operation was
* NOT succesfully initiated and completion will NOT occur.
*/
if (rc == SOCKET_ERROR) {
if ((WSAerr = WSAGetLastError()) != WSA_IO_PENDING) {
bufPtr->WSAerr = WSAerr;
if (ForcePostOnError) {
PostQueuedCompletionStatus(IocpSubSystem.port, 0,
(ULONG_PTR) infoPtr, &bufPtr->ol);
/* We can not process the error now, but is posted, so don't return an
error. */
return NO_ERROR;
} else {
InterlockedDecrement(&infoPtr->outstandingOps);
InterlockedDecrement(&infoPtr->outstandingRecvs);
/* return the error. */
return WSAerr;
}
}
} else if (bytes > 0 && useBurst) {
BufferInfo *newBufPtr;
/*
* The WSARecv(From) has completed now, *AND* is posted to the
* port. Keep giving more WSARecv(From) calls to drain the
* internal buffer (AFD.sys). Why should we wait for the time
* when the completion routine is run if we know the connected
* socket can take another right now? IOW, keep recursing until
* the WSA_IO_PENDING condition is achieved.
*
* The only drawback to this is the amount of outstanding calls
* will increase. There is no method for coming out of a burst
* condition to return the count to normal. This shouldn't be
* an issue with short lived sockets -- only ones with a long
* lifetime.
*/
newBufPtr = GetBufferObj(infoPtr, wbuf.len);
if (PostOverlappedRecv(infoPtr, newBufPtr, 1 /*useBurst */, 1
/*forcepost*/)) {
/*
* simple states for burstCap and !connected shall be
* ignored and the buffer recycled.
*/
FreeBufferObj(newBufPtr);
}
}
return NO_ERROR;
}
> My IOCP routines follows MSDN's IOCP example.
Could you provide a link to that example?