Overlapped WSASend and IOCP

Küržat

unread,

Jul 3, 2008, 10:48:17 AM7/3/08

to

Hi,

My question is about using WSASend with an overlapped socket and IOCP.

In the software I strive to develop, it is possible to send messages to a
particular client from different threads simultaneously. Order of calls can
be guarantied by using sequence numbers but another more serious problem
exists:

WSASend does not guarantee to grab all data in the user supplied buffer at
once. For example : If I supply a 10-byte buffer, it is not guaranteed that
the GetQueuedCompletionStatus will give me 10 as lpNumberOfBytes on
completion. If I get 6 from GetQueuedCompletionStatus then I should retry to
send last 4 bytes. For this reason I develop a design like this:

I first put the data into a temporary buffer. On completion, I check If all
of the data is sent. If so then I discard the temporary buffer, otherwise I
grab unsent data from the temporary buffer and retry to send it. If a thread
try to call WSASend while another call is still pending then that thread
just appends it's data to the temporary buffer and returns. When the pending
call completes it checks out the temporary buffer and sends remaining data.
This means I can post only one send per-client.

Is this design correct?

Thanks in advance.

iunknown

unread,

Jul 3, 2008, 10:02:52 PM7/3/08

to

I also deal with the partial write this way. More detail please refer
http://www.codeproject.com/KB/IP/IOCP_Server_Framework.aspx

Malachy Moses

unread,

Jul 4, 2008, 12:50:01 PM7/4/08

to

The design seems correct, but since the design limits you to one
single pending send, then it might be unduly restrictive if indeed you
believe that your application can benefit from multiple pending sends.

The situation you are concerned about is a partial send. Obviously,
there would be a problem in the following scenario:

post a first WSASend with 10 bytes 0123456789
post a second WSASend with 26 bytes a through z
completion of first WSASend but with only 6 bytes
completion of second WSASend with any amount

In the above scenario, as you have stated, the recipient would receive
"012345abcde...", i.e., he has received data out of intended order.
The situation cannot be undone, since the data has already been sent.
You have two choices:
1. Prohibit multiple pending sends, as you have proposed.
2. Implement a "back-up" command in your protocol, that advises the
recipient to discard data and start over. It might be as dramatic as
closure of the connection an a complete start-over.

If you believe you can benefit from multiple pending sends, then maybe
you should try option 2 instead of option 1.

In making your decision, you should consider the practical experience
of experts in this field. One of them (Len Holgate) has experimented
in an attempt to force a partial send, but he was unable to ever see
even a single instance of a partial send. His experimentation showed
that WSASend completed most of the time with all of the buffer
consumed. In very rare circumstances, where he exhausted the non-
paged pool or exceeded the locked pages limit, he reports that he
occassionally sees a WSASend complete with none of the buffer consumed
(i.e., lpNumberOfBytes == 0 ). See his post here, dated from 2005:
"TCP/IP Server Failures" at http://www.lenholgate.com/archives/000570.html

See also this thread from 2002, which talks about the same issues that
you have raised: "WSASend() - Multithreading Options" at
http://groups.google.com/group/microsoft.public.win32.programmer.networks/browse_frm/thread/e4942983a5d8610a/
. There's a post in there from Len Holgate which states that he has
never seen a partial send.

So, as a practical matter, since partial sends might never occur, it
might be worthwhile to allow multiple pending sends, check for the
rare instances of an error (i.e., an unexpected number in
lpNumberOfBytes), and if an error occurs, then implement a protocol
that tears down the connection and instructs the client to start over
at the beginning.

Len Holgate

unread,

Jul 6, 2008, 5:43:43 AM7/6/08

to

> I also deal with the partial write this way. More detail please referhttp://www.codeproject.com/KB/IP/IOCP_Server_Framework.aspx- Hide quoted text -
>
> - Show quoted text -

Personally, I think you're solving the wrong problem.

I've done a lot of work with IO completion port based servers and
async IO over the past few years and I've simply never seen this
problem occur for real except when machine resources are very scarce.
Treating 'partial send completion' as a problem that you need to solve
is, IMHO, not sensible. I've found that it's much better to simply
make sure that you don't get yourself into the situations that are
likely to cause it to occur as if you DO get yourself into those
situations the the fact that some of your sends are failing is
probably the least of your problems.

So, the first thing I'm curious about is what you're doing to cause
this to be an actual problem for you. I assume you can cause this
problem repeatably?

Possibly the easiest way to use up non-paged pool using async IO is to
continue issuing sends when the TCP receive window is full. This
causes your send data to be buffered in the TCP/IP stack and often
that uses non-paged pool... If left unchecked then you can use up all
the non-paged pool that you're allowed and then socket calls may start
to fail. Rather than adding code to cripple the performance of all
your sends it's better, IMHO, to monitor how many outstanding sends
you have and stop sending before you get to a point where you exhaust
resources...

I've just made this easy for users of my framework by building some
reusable code that clients can use to monitor the number of writes
that are pending on a connection and which can be configured to buffer
data until the pending writes complete... See here,
http://www.lenholgate.com/archives/000788.html for more details. Of
course there are other ways that non-paged pool can be used up, so
this isn't the only thing that you need to keep an eye on...

Len
http://www.lenholgate.com
Free IOCP server framework available here: http://www.lenholgate.com/archives/000637.html

Kürsat

unread,

Jul 7, 2008, 8:22:01 AM7/7/08

to

Hi Len,

Many thanks for sharing your experience with us.

I never run into the "partial send completion" stuation. I read an article
about it and decide to take precaution. Since our server will communicate
with existing clients via a predefined protocol we will unable to correct
any damage caused by partial send at client site. But after reading your new
article about your flow control design, I changed my mind. You are right,
controlling resource usage is much more meaningful than handling every side
effects caused by low resource stuation as long as possible.

Well, I have another question :

- What is the best way to disconnect a client from the server. In my server,
I simply call shutdown() and closesocket() on that client's socket and
cause pending I/Os of that client are immediately completed. When last
pending I/O is completed, I release resources related to that client. Is
this correct?

Thanks in advance.

"Len Holgate" <len.h...@jetbyte.com> wrote in message
news:d781dfa2-6b08-4bd9...@c58g2000hsc.googlegroups.com...

On Jul 4, 3:02 am, iunknown <liusi...@gmail.com> wrote:

Kürsat

unread,

Jul 7, 2008, 10:19:39 AM7/7/08

to

Hi Len,

FAQ section of the below article you are saying : "Having more than one
thread write to the socket at the same time will ALWAYS give unexpected
results." Why?

http://www.codeproject.com/KB/IP/jbsocketserver2.aspx

"Len Holgate" <len.h...@jetbyte.com> wrote in message
news:d781dfa2-6b08-4bd9...@c58g2000hsc.googlegroups.com...

On Jul 4, 3:02 am, iunknown <liusi...@gmail.com> wrote:

Len Holgate

unread,

Jul 7, 2008, 3:53:41 PM7/7/08

to

Before we start, please be aware that the answer to this question is
in no way related to the rest of the thread...

The answer is quite simple really. If you have two threads that write
data to the same socket and you have no explicit synchronisation
between them then you have no control over the sequence of events that
will lead to each thread writing to the socket. This means that the
data that each thread writes will be written to the socket in an order
that depends on how the threads are scheduled. Each actual write to
the socket will be atomic (though I haven't checked this with scatter/
gather style writes that use multiple WSABUFs in a single write) and a
series of writes from one thread will be sequential with regards to
other writes from that thread but if were to write 100 'A' bytes in
one write call from one thread and 100 'B' bytes from a write call on
another thread then the data that you read at the other end of the
connection could either be 100 'A's followed by 100 'B's OR, just as
likely, 100 'B's followed by 100 'A's.

Now this might not matter to you at all if each thread is sending
distinct messages in each call to write. But if you have a thread that
sends a protocol level message using multiple calls to write and and
you have another thread doing the same then it's only luck if the
client can read complete and correct messages from its end of the
connection.

Len
http://www.lenholgate.com

On Jul 7, 3:19 pm, "Kürsat" <x...@yy.com> wrote:
> Hi Len,
>
> FAQ section of the below article you are saying : "Having more than one
> thread write to the socket at the same time will ALWAYS give unexpected
> results." Why?
>
> http://www.codeproject.com/KB/IP/jbsocketserver2.aspx
>

> "Len Holgate" <len.holg...@jetbyte.com> wrote in message

> data until the pending writes complete... See here,http://www.lenholgate.com/archives/000788.htmlfor more details. Of

> course there are other ways that non-paged pool can be used up, so
> this isn't the only thing that you need to keep an eye on...
>

> Lenhttp://www.lenholgate.com
> Free IOCP server framework available here:http://www.lenholgate.com/archives/000637.html- Hide quoted text -

Len Holgate

unread,

Jul 7, 2008, 3:57:26 PM7/7/08

to

Best in what situation?

If you want a graceful shutdown then you should call shutdown on the
send side of your connection when you have no more data to write and
then wait for the other end of the connection to do the same. Once
both ends are shutdown you know that all your data has got to the
other side and you can clean up your resources.

If you want to terminate the connection with no regard to losing data
then you might want to turn off lingering and do an 'abortive' close,
the advantage of that is that you reset the connection and you don't
end up in a TIME_WAIT state.

Without knowing what you're trying to achieve with the shutdown it's
hard to say what's 'best'.

Len
http://www.lenholgate.com

On Jul 7, 1:22 pm, "Kürsat" <x...@yy.com> wrote:
> Hi Len,
>
> Many thanks for sharing your experience with us.
>
> I never run into the "partial send completion" stuation. I read an article
> about it and decide to take precaution. Since our server will communicate
> with existing clients via a predefined protocol we will unable to correct
> any damage caused by partial send at client site. But after reading your new
> article about your flow control design, I changed my mind. You are right,
> controlling resource usage is much more meaningful than handling every side
> effects caused by low resource stuation as long as possible.
>
> Well, I have another question :
>
> - What is the best way to disconnect a client from the server. In my server,
> I simply call shutdown() and closesocket() on that client's socket and
> cause pending I/Os of that client are immediately completed. When last
> pending I/O is completed, I release resources related to that client. Is
> this correct?
>
> Thanks in advance.
>

> "Len Holgate" <len.holg...@jetbyte.com> wrote in message

> data until the pending writes complete... See here,http://www.lenholgate.com/archives/000788.htmlfor more details. Of

> course there are other ways that non-paged pool can be used up, so
> this isn't the only thing that you need to keep an eye on...
>

> Lenhttp://www.lenholgate.com
> Free IOCP server framework available here:http://www.lenholgate.com/archives/000637.html- Hide quoted text -

K�r�at

unread,

Jul 7, 2008, 5:29:28 PM7/7/08

to

I actually try to ask about connection closure while some pending IO exists
on the socket. To clean-up per-IO resources we should prematurely initiate
completions. AFAIK, there are two ways to achieve this: Closing socket or
calling CancelIoEx. Since the CancelIoEx () is dependent on OS version, the
only way remaining is socket closure. What I don't completely grasp is the
closure flow on the IOCP infrastructure from beginning to end.

In my server there is an administrative interface by which an administrator
can send a command to the server to disconnect a client which is selected by
he or she. What should I do when a disconnect command received for both
graceful and abortive case? Should I call shutdwn(SD_SEND) and wait to call
closesocket() until GetQueuedCompletionStatus() returns with lpNumberOfBytes
set to zero? Or should I call closesocket() just after the shutdown
(SD_SEND) returns? Does the graceful closure harm performance?

"Len Holgate" <len.h...@jetbyte.com> wrote in message

news:e16a8d15-9197-4807...@z66g2000hsc.googlegroups.com...
Best in what situation?

If you want a graceful shutdown then you should call shutdown on the
send side of your connection when you have no more data to write and
then wait for the other end of the connection to do the same. Once
both ends are shutdown you know that all your data has got to the
other side and you can clean up your resources.

If you want to terminate the connection with no regard to losing data
then you might want to turn off lingering and do an 'abortive' close,
the advantage of that is that you reset the connection and you don't
end up in a TIME_WAIT state.

Without knowing what you're trying to achieve with the shutdown it's
hard to say what's 'best'.

Len
http://www.lenholgate.com

K�rsat

unread,

Jul 7, 2008, 5:30:46 PM7/7/08

to

Many thanks.

"Len Holgate" <len.h...@jetbyte.com> wrote in message

news:38fa47c8-d248-4e1e...@t54g2000hsg.googlegroups.com...

Before we start, please be aware that the answer to this question is
in no way related to the rest of the thread...

The answer is quite simple really. If you have two threads that write
data to the same socket and you have no explicit synchronisation
between them then you have no control over the sequence of events that
will lead to each thread writing to the socket. This means that the
data that each thread writes will be written to the socket in an order
that depends on how the threads are scheduled. Each actual write to
the socket will be atomic (though I haven't checked this with scatter/
gather style writes that use multiple WSABUFs in a single write) and a
series of writes from one thread will be sequential with regards to
other writes from that thread but if were to write 100 'A' bytes in
one write call from one thread and 100 'B' bytes from a write call on
another thread then the data that you read at the other end of the
connection could either be 100 'A's followed by 100 'B's OR, just as
likely, 100 'B's followed by 100 'A's.

Now this might not matter to you at all if each thread is sending
distinct messages in each call to write. But if you have a thread that
sends a protocol level message using multiple calls to write and and
you have another thread doing the same then it's only luck if the
client can read complete and correct messages from its end of the
connection.

Len
http://www.lenholgate.com

Len Holgate

unread,

Jul 8, 2008, 11:44:19 AM7/8/08

to

> I actually try to ask about connection closure while some pending IO exists
> on the socket. To clean-up per-IO resources we should prematurely initiate
> completions. AFAIK, there are two ways to achieve this: Closing socket or
> calling CancelIoEx. Since the CancelIoEx () is dependent on OS version, the
> only way remaining is socket closure. What I don't completely grasp is the
> closure flow on the IOCP infrastructure from beginning to end.

I wouldn't use CancelIOEx(). Partly because it's not always available
and partly because I'd then have to have some book-keeping data that
told me which IO I wanted to cancel.

You call shutdown when you're finished with one or both sides of the
connection. You call close socket when you're done with the connection
and want to clean up the resources that it uses. It shouldn't matter
that you have pending IO on a connection. You either want to allow
that IO to complete normally or you want to abortively close the
connection and cancel the IO. If you want to allow it to complete
normally then call shutdown on the sides of the connection that you're
done with (so, most likely, shutdown(both)). If you want to abort the
connection and potentially cancel some pending IO you would set the
linger options on the socket to not linger and then call close
socket.

Your IO will, eventually, complete at which point (if you're using
reference counting on the data that you've attached to your socket
your references on that will drop to zero and...) you know that
everything is done and you can clean things up. If the socket isn't
closed at this point, close it.

> graceful and abortive case? Should I call shutdwn(SD_SEND) and wait to call
> closesocket() until GetQueuedCompletionStatus() returns with lpNumberOfBytes
> set to zero? Or should I call closesocket() just after the shutdown
> (SD_SEND) returns? Does the graceful closure harm performance?

I wouldn't recommend ever doing things differently with
GetQueuedCompletionStatus() just because you're shutting down a
connection, there's no need for special cases. Oh, and if you're
talking about lpNumberOfBytes being 0 when a READ returns well, that
means that the client has shutdown its send side to you, do whatever
you feel is appropriate in that case but in no way is that occurrance
required to be linked to you doing a shutdown send on your end of the
connection... The client can do what it likes, when it likes...

How would a graceful close harm performance? It could leave you in
TIME_WAIT state if you initiate it but that may or may not be
desirable. You can, it seems, avoid TIME_WAIT by doing an abortive
close (i.e. resetting the connection). Alternatively it will continue
to use resources until the connection has been cleanly shutdown, but
you either want a clean shutdown or you don't...

Len
http://www.lenholgate.com

Kürsat

unread,

Jul 8, 2008, 4:03:37 PM7/8/08

to

Well, the puzzle is almost complete now. I will try to summarize what I
understand :

For graceful closure, I should first call shutdown(send). Now, there are
three cases :

1. There are some data queued: In this case a normal completion will
occur. After I process received data I will post another receive. This will
go on in this way until I receive all queued data and eventually the
WSARecv() will fail.

2. There is no data queued and the client closes it's connection in
response to my shutdown (send): Now a zero-byte completion will occur for
every receives posted for that client.

3. There is no data queued and the client omit to close it's connection
in response to my shutdown(send): After a period of time a zero-byte
completion will occur for every receives posted for that client.

In any case, I should decrease ref-count after each receive-completion and
increase it after I post a successful receive. If WSARecv() fails or a
zero-byte receive-completion occurs then I will not increase ref-count. The
ref-count will eventually fall to zero and I will call closesocket () at
that point.

Would you please correct my mistakes if you don't get bored yet :)

"Len Holgate" <len.h...@jetbyte.com> wrote in message

news:cea097d1-5c6a-452a...@s50g2000hsb.googlegroups.com...

Len Holgate

unread,

Jul 10, 2008, 3:56:34 AM7/10/08

to

> For graceful closure, I should first call shutdown(send). Now, there are
> three cases :

No, 1 is the same as 2 which is the same as 3. The WSARecv will work
and will return 0 bytes in a successful completion to indicate that
the connection has been shutdown. When all data has been received any
WSARecv's pending will return 0.

> In any case, I should decrease ref-count after each receive-completion and
> increase it after I post a successful receive. If WSARecv() fails or a
> zero-byte receive-completion occurs then I will not increase ref-count. The
> ref-count will eventually fall to zero and I will call closesocket () at
> that point.

There are no special cases... You need to increase the reference count
on your socket and your overlapped structure BEFORE you post the recv
(and decrement it if the WSARecv() fails in such a way that a
completion wont occur). Then in the completion handler you decrement
the reference count. If you do anything else then you might have a
race condition between the reference being increased and the recv
completing....

When the reference on the overlapped goes to 0 you can clean that up.
When the referance on the socket goes to 0 you can clean that up. Take
a look at my free code, it does all of this.

Len
http://www.lenholgate.com

Kürsat

unread,

Jul 11, 2008, 4:10:23 AM7/11/08

to

Thank you, I will study your free code.

"Len Holgate" <len.h...@jetbyte.com> wrote in message

news:4a489f7f-17c5-499f...@k30g2000hse.googlegroups.com...

Kürsat

unread,

Jul 13, 2008, 5:06:18 PM7/13/08

to

I have a question about your free code. Why didn't you issue reads and
writes directly instead of posting IO requests to server's IO queue?

"Len Holgate" <len.h...@jetbyte.com> wrote in message

news:4a489f7f-17c5-499f...@k30g2000hse.googlegroups.com...

Len Holgate

unread,

Jul 14, 2008, 3:41:36 AM7/14/08

to

Read the code project articles?

On platforms prior to vista all outstanding asyn io is cancelled when
the thread that issued it exits. There are various ways around this,
but if you read the original articles that came with the source they
explain why I selected that one.

On Jul 13, 10:06 pm, "Kürsat" <x...@yy.com> wrote:
> I have a question about your free code. Why didn't you issue reads and
> writes directly instead of posting IO requests to server's IO queue?
>

> "Len Holgate" <len.holg...@jetbyte.com> wrote in message

> >http://www.lenholgate.com- Hide quoted text -

Kürsat

unread,

Jul 14, 2008, 8:05:54 AM7/14/08

to

Well, understood, but I think your threadpool maintenance mechanism
introduces a race condition.
Consider execution path below:

Thread-1 calls HandleDispatch, resets dispathCompleteEvent, posts dispatch
packet,
Thread-2 wakes up, grabs dispatch packet, sets dispathCompleteEvent,
Thread-3 calls HandleDispatch, resets dispathCompleteEvent,
Thread-1 waits for event which is just reset by Thread-3, runs into
time-out, creates a new worker thread which is unnecessary...

Now we lost Thread-1's event status. This will cause unnecessary wait-create
thread-terminate thread(in context of the dormant thread clean-up) sequence.
I didn't stress this, it is possible that I overlook something.

"Len Holgate" <len.h...@jetbyte.com> wrote in message

news:f0c0ed36-569f-4465...@m3g2000hsc.googlegroups.com...

Len Holgate

unread,

Jul 14, 2008, 2:02:25 PM7/14/08

to

There's only ever one thread running on the dispatch port... So where
does thread 3 come from?

On Jul 14, 1:05 pm, "Kürsat" <x...@yy.com> wrote:
> Well, understood, but I think your threadpool maintenance mechanism
> introduces a race condition.
> Consider execution path below:
>
> Thread-1 calls HandleDispatch, resets dispathCompleteEvent, posts dispatch
> packet,
> Thread-2 wakes up, grabs dispatch packet, sets dispathCompleteEvent,
> Thread-3 calls HandleDispatch, resets dispathCompleteEvent,
> Thread-1 waits for event which is just reset by Thread-3, runs into
> time-out, creates a new worker thread which is unnecessary...
>
> Now we lost Thread-1's event status. This will cause unnecessary wait-create
> thread-terminate thread(in context of the dormant thread clean-up) sequence.
> I didn't stress this, it is possible that I overlook something.
>

> > >http://www.lenholgate.com-Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -

Len Holgate

unread,

Jul 14, 2008, 2:04:13 PM7/14/08

to

Oh, and this has probably now got very boring for anyone who's
interested in networking issues rather than just "my framework's"
design issues. Continue as comments on my blog or via private mail?

On Jul 14, 1:05 pm, "Kürsat" <x...@yy.com> wrote:

> Well, understood, but I think your threadpool maintenance mechanism
> introduces a race condition.
> Consider execution path below:
>
> Thread-1 calls HandleDispatch, resets dispathCompleteEvent, posts dispatch
> packet,
> Thread-2 wakes up, grabs dispatch packet, sets dispathCompleteEvent,
> Thread-3 calls HandleDispatch, resets dispathCompleteEvent,
> Thread-1 waits for event which is just reset by Thread-3, runs into
> time-out, creates a new worker thread which is unnecessary...
>
> Now we lost Thread-1's event status. This will cause unnecessary wait-create
> thread-terminate thread(in context of the dormant thread clean-up) sequence.
> I didn't stress this, it is possible that I overlook something.
>

> > >http://www.lenholgate.com-Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -