Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Socket switch delay

14 views
Skip to first unread message

Alun Jones [MSFT]

unread,
Aug 20, 2004, 7:33:34 PM8/20/04
to
"mil" <m...@discussions.microsoft.com> wrote in message
news:9DA32266-A6D8-4B98...@microsoft.com...
> I have a client and a server application communicating through TCP/IP.
When I
> connect to the server using 2 sockets (one for writing and one for
reading) I
> get a 200ms delay switching from one socket to the other. Sending data to
the
> server takes 200ms and receiving takes ~0ms

Whenever you hear "200 ms delay", you should automatically start thinking "I
wonder if this is an interaction between my software, the Nagle algorithm,
and delayed ACK?"

> When I am using a single socket to communicate with the server I get ~0ms
> delay for the communication both ways.

Sounds reasonable.

> I have tried everything I could think of to make this delay go away. I am
> using multiple threads in the client master thread to send, slave to
receive.
>
> In the server I am using Completion I/O with multiple ports (different one
> for recv and another for send). I even tried switching off the Nagle
> algorithm.

Yeah, Nagle is not generally a good thing to disable. At best, disabling
Nagle allows a poorly-performing application to continue performing poorly,
but to spread its poor performance throughout the local network.

> My conclusion so far is that Windows XP has a predefined delay of ~200ms
> when one process uses 2 sockets and switches from one to another. If that
is
> true, is there any parameter I can change to avoid/reduce this delay?

Your conclusion is false. Windows XP has a predefined delay of ~200ms, yes,
but it's for the time to delay an ACK by. Here's the way it all works:

Think of it as if the Nagle algorithm affects senders, and the delayed ACK
algorithm affects the receiver.

The Nagle algorithm aims to cut down on short TCP segments, by collecting
them all together when the network is busy handling your previous segments.
It does this by sending only if one of the following is true:
1. All previous data has been acknowledged.
2. There is more than a full segment's worth of data to send.

The delayed ACK algorithm says that ACKs should be sent only under one of
the following situations:
1. We are sending other data that we can "piggyback" onto.
2. We have received two segments of data.
3. 200ms has elapsed since the first piece of unacknowledged data was
received.

The classic example of Nagle and delayed ACK interaction is of a sender
issuing two small send()s, and then waiting for a recv() of data that is a
response to the second send(). As you can see from checking the above, the
first send() goes immediately onto the network, because there is no
unacknowledged data preceding it (item 1 of the Nagle algorithm's list).
The data makes its way to the receiver, who runs down his list, and
determines that he can't send an ACK.

The sender, then, queues up another send(), but this is sitting in a local
buffer, waiting for an ACK, because there is previous data that hasn't yet
been acknowledged, and there is not a full segment to send. The receiver is
similarly waiting, as we said, and after 200ms will finally send the ACK,
that wakes up the sender.

Note that we haven't got to the point of generating any data that the
initial sender could receive using its recv() call, so what we've discussed
so far is very applicable to your situation.

Disabling Nagle doesn't help, as you've discovered, because the Nagle
algorithm isn't the only place along the route that is allowed to coagulate
data.

Perhaps we need to ask a different question: Why are you set on using two
sockets where one will do?

Alun.
~~~~


mil

unread,
Aug 21, 2004, 5:43:01 AM8/21/04
to
Thank you for writing such a detailed reply. Before I answer your points, I
will answer your question first:

The server (as I mentioned) makes full use of CompletionIO (multiple threads
using ‘GetQueuedCompletionStatus’). It uses it for socket input/output and
file read/write.

When a client wants to read a file, it sends a “resolve/open” request to the
server and then multiple “read-segment” requests. These requests are queued
and processed asynchronously like the rest of the packets. So a sckt-input
completion packet generates a file-read/write completion packet, then that
generates a sckt-output completion packet.

Using this method I am not “touching” the files, the OS does all the work,
schedules extra threads etc (as you know) and I can say it really works! MS
has done such an amazing job with the CompletionIO that I was amazed by the
speed, smoothness and scalability of the whole thing.

When using slow connections and the server is heavily loaded, I want the
client to be able to execute multiple read/write requests. For example when
writing a file it can issue 3 write requests for 64KB then wait for the
replies (i.e. It sends one segment, the server starts writing the segment to
the file asynchronously, while the client is sending another segment).
If the server completes the file segment write, then it sends a reply, but
the client is not blocked waiting for each reply. At a given point (after it
has sent X segments) it blocks and waits for all the of them to be completed
by getting all the replies from the server.

What I am achieving essentially is a send/write concurrency, thanks to the
CompletionIO.

If I use one socket I cannot send and receive at the same time (or am I
doing something wrong in the sockets setup???), so I will have to send a
segment, wait for the reply, send another. So essentially the client is
waiting for the server to write the segment to the HD before it gets the
reply.

If I use 2 sockets then the client can be sending segments with the master
thread, while another thread is receiving replies. Then the master thread
blocks till all the replies are gathered.


> The delayed ACK algorithm says that ACKs should be sent only

So essentially with 1 socket, because it is used for sending and receiving,
I “cancel” the delayed ACK 200ms wait, by simply using the same socket to
send the reply.

Based on that observation, if I was using 2 sockets and I was sending &
receiving to/from both, I would not get “penalized” with this delay (?)


When I am sending data, I use the multiple-buffers-send capabilities of the
WSASend function and a 0 bytes send-socket-buffer. When receiving I have let
the socket use a big buffer of its own to hold the whole message (most of the
times).

But most of the requests the client sends are small packets, so unless the
receiver uses the same socket to also send replies back, it will keep waiting
for more data (for up to 200ms).

I assume there is no way to change that, or “inform” the recv that I know
how much data I am waiting for etc. so essentially if I want to create the
concurrency scenario I described at the beginning I will have to redesign the
whole thing, something like having the server postpone sending the replies or
something, so the client can send >1 requests etc…

mil

Alexander Nickolov

unread,
Aug 23, 2004, 1:07:43 PM8/23/04
to
What is the problem with concurrent send/receive on the same
socket?

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnic...@mvps.org
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================

"mil" <m...@discussions.microsoft.com> wrote in message

news:8502BB0F-25AC-4FB8...@microsoft.com...

> something, so the client can send >1 requests etc.
>
> mil
>
>
>


mil

unread,
Aug 23, 2004, 1:43:02 PM8/23/04
to
If both applications (client and server) try to send data through the same
socket at the same time, it just generates an error, looses connection and it
takes a few minutes for the OS to unlock the socket. If I send, then wait to
receive and then send again everything works fine.

I thought that bidirectional sockets were supposed to be used in the above
way and although it seems ok in loop back scenarios (127.0.0.1) it doesn’t
work with remote computers. So I assume that I am not suppose to use the
socket that way and I introduced the second socket to be able to send/recv
through one and recv only through the second for asynchronous replies from
the server.

Do you think I setup my sockets wrongly, or in general I am doing something
wrong and I get this behavior?

mil

"Alexander Nickolov" wrote:

> What is the problem with concurrent send/receive on the same
> socket?

> =====================================

Alun Jones [MSFT]

unread,
Aug 23, 2004, 4:41:26 PM8/23/04
to
"mil" <m...@discussions.microsoft.com> wrote in message
news:8502BB0F-25AC-4FB8...@microsoft.com...

> If I use one socket I cannot send and receive at the same time (or am I
> doing something wrong in the sockets setup???), so I will have to send a
> segment, wait for the reply, send another. So essentially the client is
> waiting for the server to write the segment to the HD before it gets the
> reply.

You should be able to use the same socket handle in two different threads,
one to send, the other to receive. If you want to use them in the same
thread without blocking, then you can set them up as non-blocking sockets
using the ioctlsocket(...FIONBIO...) call, or WSAAsyncSelect, or
WSAEventSelect, depending on how you wish to be notified when socket events
have occurred. You can also use WSASend / WSARecv to use overlapped I/O and
completion ports.

> If I use 2 sockets then the client can be sending segments with the master
> thread, while another thread is receiving replies. Then the master thread
> blocks till all the replies are gathered.

You are apparently comfortable marshalling threads, yet not comfortable with
non-blocking sockets. Non-blocking sockets are not that hard, and as you've
noticed, blocking socket use can get tedious. What do you do if the data
connection is unplugged while the master thread is waiting for a reply?

> So essentially with 1 socket, because it is used for sending and
receiving,
> I "cancel" the delayed ACK 200ms wait, by simply using the same socket to
> send the reply.

Yes. The ACK flag is a bit in every TCP packet, that is either on or off.
Since it takes 40 bytes to send an empty TCP packet, it's best to hold onto
the ACK until it can be packed in with data that also needs to go the other
way - from an overhead of 40 bytes, you go to zero overhead. It's a saving
worth making.

> Based on that observation, if I was using 2 sockets and I was sending &
> receiving to/from both, I would not get "penalized" with this delay (?)

Correct.

> When I am sending data, I use the multiple-buffers-send capabilities of
the
> WSASend function and a 0 bytes send-socket-buffer. When receiving I have
let
> the socket use a big buffer of its own to hold the whole message (most of
the
> times).

You may be over-thinking things a little. Why are you using a zero-byte
send buffer? It begins to look as though you have been trying to optimise
for performance without knowing where your performance bottlenecks truly
are. Rather like fine-tuning your octane ratio, without first checking if
all your tires are adequately inflated.

> But most of the requests the client sends are small packets, so unless the
> receiver uses the same socket to also send replies back, it will keep
waiting
> for more data (for up to 200ms).

That's correct - here's where good protocol design wins out over bad. TCP
should be used only where reliability is necessary, and in such
circumstances, most protocols resolve down to send/recv/send/recv, where a
command is sent and a response is received, and that pattern repeats until
the socket has to be closed.

> I assume there is no way to change that, or "inform" the recv that I know
> how much data I am waiting for etc. so essentially if I want to create the
> concurrency scenario I described at the beginning I will have to redesign
the
> whole thing, something like having the server postpone sending the replies
or

> something, so the client can send >1 requests etc.

Always ask to receive as much data as you can. Handle the possibility that
you might get less than you asked for, even to the point that your
application can handle one byte at a time. It won't get that, of course,
but you will have written a robust program. Look at how other protocols
have been designed to fit the structure of the network.

Alun.
~~~~


mil

unread,
Aug 23, 2004, 5:43:01 PM8/23/04
to

Alun, thanks again for the input.


> You should be able to use the same socket handle in two different threads,
>one to send, the other to receive. If you want to use them in the same


I was referring to the client sending data on this socket while the server
was also sending data in the same time. Which seems to be a big no no.


My client already uses non-blocking sockets, with the combination of the
select statement, so it will never “lock” while waiting to send or receive.

The server uses blocking sockets just because I am also using Overlapped IO
structures to send the packets. Thus I don’t need to care if it locks or not
since the Overlapped IO will never block me.


>You may be over-thinking things a little. Why are you using a zero-byte
>send buffer?


You are right, I always do so, when it comes to client/server stuff :)

The reason I am using a 0 bytes send buffer in my socket (i.e. I setup the
socket not to use any buffering, which actually works on XP but not on
Windows 2000 in the non-blocking sockets) is because all the communication is
done in packets already. Each packet is 32KB max and I can send one or more
of those in one go. The server and client, save all the data they want to
send in these packets, those get compressed (if possible/necessary) and
encrypted (through the MS Crypto API) and then I execute a WSASend with
multiple buffers. So I don’t need the protocol to buffer the data for a
second time, since I let it use my buffers for as long as it likes.

The receiver will always get a 256 byte header that describes what “is
coming” from the other end and will load the number of blocks with their
proper size, decrypt/decompress and pass them to the rest of the app.

After our conversations about the delay, I changed my client code so it uses
2 connections again but now:

The first connection is used to send and receive packets “synchronously”, by
sending the data, then blocking on an “Event”, which another thread receives
the reply and “Sets”. When the client wants to send a number of requests
without conflicting with server replies (i.e. wants delayed replies), it
simply tells to the server to use the second socket (channel) to send the
replies. Another thread handles the receive again from this second channel
and then the client (after it has finished with all the send-requests) blocks
on “multiple events”.

Since the second socket is used to send file data, or feedback messages
generated by the server, I don’t care about the 200ms delay, actually in this
case it really helps when there are many small packets generated by the
“server feedback messages”, for the reasons you explained to be already.

The whole thing now works so nicely that you almost cannot tell the
difference when the server is running on a dual PIII or a dual Opteron 248.
0% CPU usage when idle, instant responses on requests (and that includes a
round trip to the Access through OLEDB in the server side).

mil

Alun Jones [MSFT]

unread,
Aug 23, 2004, 8:07:32 PM8/23/04
to
"mil" <m...@discussions.microsoft.com> wrote in message
news:1EF5AB37-064F-441F...@microsoft.com...

> I was referring to the client sending data on this socket while the server
> was also sending data in the same time. Which seems to be a big no no.

What you're talking about is commonly referred to as "asynchronous
operation" - each turn does not need to wait its turn before sending. TCP
_definitely_ supports asynchronous operation, so it would be impossible for
sockets to provide TCP support without also supporting asynchronous
operation.

What _is_ a big no-no is when two threads vie for sending on the same
socket. A basic tenet of TCP is that the order of data going into the
stream is the same as that coming out, and two threads on the same machine
both trying to send would get in the way of that tenet.

> My client already uses non-blocking sockets, with the combination of the
> select statement, so it will never "lock" while waiting to send or
receive.
>
> The server uses blocking sockets just because I am also using Overlapped
IO
> structures to send the packets. Thus I don't need to care if it locks or
not
> since the Overlapped IO will never block me.

So, if you're not blockable, why are you being blocked in a receive? If, as
you say, the overlapped I/O means you aren't blockable, then set the socket
to be non-blocking, so that your socket doesn't block in receive.

> The reason I am using a 0 bytes send buffer in my socket (i.e. I setup the
> socket not to use any buffering, which actually works on XP but not on
> Windows 2000 in the non-blocking sockets) is because all the communication
is
> done in packets already. Each packet is 32KB max and I can send one or
more
> of those in one go. The server and client, save all the data they want to
> send in these packets, those get compressed (if possible/necessary) and
> encrypted (through the MS Crypto API) and then I execute a WSASend with
> multiple buffers. So I don't need the protocol to buffer the data for a
> second time, since I let it use my buffers for as long as it likes.

Trying to do "packets" on TCP is usually a bad idea. TCP is a stream, it's
going to ignore your packets. If it gets half a chance, it will chop your
packets up, and assemble them so that parts of two of your packets go out in
the same IP packet on the wire. Don't think of TCP as handling packets - it
doesn't.

As an example of why this might be a bad idea, note that there are cards
that offload TCP handling from the system to the card, handling TCP
buffering internally - setting the send buffer size to zero will mean that
your program suffers from worse performance on such a card. [A search on
"Winsock Direct" can give you more information on this than you ever wanted
to know].

> The receiver will always get a 256 byte header that describes what "is
> coming" from the other end and will load the number of blocks with their
> proper size, decrypt/decompress and pass them to the rest of the app.

Yes, this is a common cause of Nagle / delayed ACK interactions - "we send a
header, followed by data, and wait for a response" - you've got the
send/send/recv pattern encoded in text right there.

As long as you hold onto the header and the data, and send the two in one
call, you'll get better performance from that part of your code.

> After our conversations about the delay, I changed my client code so it
uses
> 2 connections again but now:
>
> The first connection is used to send and receive packets "synchronously",
by
> sending the data, then blocking on an "Event", which another thread
receives
> the reply and "Sets". When the client wants to send a number of requests
> without conflicting with server replies (i.e. wants delayed replies), it
> simply tells to the server to use the second socket (channel) to send the
> replies. Another thread handles the receive again from this second channel
> and then the client (after it has finished with all the send-requests)
blocks
> on "multiple events".

This sounds overly complicated. Simplify it. Your socket already contains
two asyncronous channels - one in each direction, inbound and outbound. To
interact well with Nagle, you can either be completely asynchronous, and
send and receive data willy-nilly, or do lock-step synchronous sends and
receives. What you must do, though, is to group all related data into a
single call to send() or WSASend().

> Since the second socket is used to send file data, or feedback messages
> generated by the server, I don't care about the 200ms delay, actually in
this
> case it really helps when there are many small packets generated by the
> "server feedback messages", for the reasons you explained to be already.

If you're sending file data, you will largely overcome any problems with
Nagle / delayed ACK by keeping the stream full of data - you'll always have
an MTU's worth to send, so Nagle won't slow you down, and you'll keep
triggering the ACKs by having more than two segments' worth of data to
acknowledge.

> The whole thing now works so nicely that you almost cannot tell the
> difference when the server is running on a dual PIII or a dual Opteron
248.
> 0% CPU usage when idle, instant responses on requests (and that includes a
> round trip to the Access through OLEDB in the server side).

I still feel like you've got something that's a little more baroque than
necessary. Still, I could be wrong, so if it ain't baroque, don't fix it.

Alun.
~~~~


mil

unread,
Aug 24, 2004, 6:29:01 AM8/24/04
to

"Alun Jones [MSFT]" wrote:
> What you're talking about is commonly referred to as "asynchronous
>operation" - each turn does not need to wait its turn before sending. TCP

> …


> What _is_ a big no-no is when two threads vie for sending on the same
> socket. A basic tenet of TCP is that the order of data going into the


What I was saying was that when the client is sending data the server cannot
be sending data as well over the same socket. Am I right on that?

That is the reason I added the second socket, because I want the client,
occasionally, to keep sending requests to the server while the server sends
back replies whenever there are data available (i.e. read/write completion io
has generated the data).

> Trying to do "packets" on TCP is usually a bad idea. TCP is a stream, it's
>going to ignore your packets. If it gets half a chance, it will chop your


I am doing the packet splitting for internal use, not for any other reason.
When I send the packets to the TCP/IP, I send all of them as multiple
buffers, with one call, no multiple. So since the WSASend can do the “gather”
operation and send all the data the way it likes, it is like I send a huge
flat buffer.


> that offload TCP handling from the system to the card, handling TCP
>buffering internally - setting the send buffer size to zero will mean that
>your program suffers from worse performance on such a card.


I am aware of these cards but I never thought actually about the point you
raised here. Hmm, you are right, I will have to have this buffer adjustable
in the settings of the application, or else as you said, I am going to be in
trouble when those cards are used.


> A search on "Winsock Direct" can give you more information

Very interesting stuff.


> send/send/recv pattern encoded in text right there

> As long as you hold onto the header and the data, and send the two in one
>call, you'll get better performance from that part of your code.


No, actually I have a send/recv/recv pattern. All the blocks of the send
operation, header and all the body blocks, all go out in the same time. Then
the receiver reads the header first and then the various blocks received. But
since I am also using a big buffer at the receiving end of the socket, I
don’t experience any delays.

If I was doing a send/send/recv then yes I will have some serious delays and
I have tried it years ago on another application, it was really slow.


> If you're sending file data, you will largely overcome any problems with
>Nagle / delayed ACK


Yep that is what the second socket is used for, so everything now works
without any delays.


> I still feel like you've got something that's a little more baroque than
> necessary.


I would agree with this statement, if I could send from the client, while
the server was also sending data in the same socket. But because I cannot do
so, I have to achieve that through 2 sockets.

e.g Client sends 100KB, sends 100KB, sends 100KB waits for 3 replies
Server writes (using Completion IO) 100KB sends reply (no waiting) 3 times.

(The “sending” and “writing” of the file takes place at the same time. i.e.
while the server is writing one block, it may be receiving the next one.)

…and again, the same operations loop till the whole file is send, or there
was an error. The buffer size depends on the type of connection (for LAN is
huge for ADSL is small).

Although I could use another method for delayed replies, the 2 sockets seem
to be easier to code than anything else that would involve postponed replies
etc.


mil

Alexander Nickolov

unread,
Aug 24, 2004, 12:47:18 PM8/24/04
to
> What I was saying was that when the client is sending data the server
> cannot
> be sending data as well over the same socket. Am I right on that?

No, you are wrong. Show some code, you probably have a
subtle bug...

BTW, it may be my own ignorance, but overlapped I/O is
_designed_ to work with non-blocking sockets, no? I'll
strongly second Alun here that you should rethink your
code and work with non-blocking sockets (and possibly
a single thread for read/write).

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnic...@mvps.org
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================

"mil" <m...@discussions.microsoft.com> wrote in message
news:548B1494-19D1-4E30...@microsoft.com...


>
>
> "Alun Jones [MSFT]" wrote:
>> What you're talking about is commonly referred to as "asynchronous
>>operation" - each turn does not need to wait its turn before sending. TCP

>> .

> .and again, the same operations loop till the whole file is send, or there

mil

unread,
Aug 24, 2004, 1:27:03 PM8/24/04
to
"Alexander Nickolov" wrote:

>BTW, it may be my own ignorance, but overlapped I/O is
>_designed_ to work with non-blocking sockets, no? I'll
>strongly second Alun here that you should rethink your
>code and work with non-blocking sockets (and possibly
>a single thread for read/write).


I am using non-blocking sockets. But for the Overlapped IO (in the server),
although I’ve added the code to make them non-blocking, it doesn’t really
matter, because it was working fine without it. Unless you mean that because
I was using blocking sockets in the server before (even with the Overlapped
IO) it was causing the problem of not being able to send from client and
server at the same time.


>No, you are wrong. Show some code, you probably have a
>subtle bug...

I hope I do have a bug and you find it :)…but I did run lots of tests and
always, when the client was sending data while the server was also sending
data there was a problem.
(Although when I was running the tests the server side was using blocking
sockets).

Keep in mind also that when the client is sending data and the server is
sending replies, the client is receiving the replies through another thread.


Here it is…


THE SERVER CREATES THE LISTEN PORT:


//...Create the listening socket
m_scktListen = WSASocket(AF_INET,
SOCK_STREAM,
IPPROTO_TCP,
NULL,
0,
WSA_FLAG_OVERLAPPED);
if(m_scktListen!=INVALID_SOCKET)
{
si_addrlocal.sin_family = AF_INET;
si_addrlocal.sin_port = htons(m_ushPort);
si_addrlocal.sin_addr.s_addr = m_lIPAddressListen;

nRet = bind(m_scktListen, (struct sockaddr *)&si_addrlocal,
sizeof(si_addrlocal));
if(nRet!=SOCKET_ERROR)
{
if(m_pRtmContextRef->dwMaxConnListenBacklog>SOMAXCONN)
{
m_pRtmContextRef->dwMaxConnListenBacklog=SOMAXCONN;
}
else if(m_pRtmContextRef->dwMaxConnListenBacklog<1)
{
m_pRtmContextRef->dwMaxConnListenBacklog=1;
}

nRet=listen(m_scktListen, (int)(m_pRtmContextRef->dwMaxConnListenBacklog));
if(nRet!=SOCKET_ERROR)
{
int nScktBufferSize;

if(!m_bDisableZeroSendBuffering)
{
nScktBufferSize=0;
//...Disable send buffering on the socket. Setting SO_SNDBUF to 0
causes winsock to stop
//...bufferring sends and perform sends directly from our buffers,
thereby reducing CPU usage.
nRet=setsockopt(m_scktListen, SOL_SOCKET, SO_SNDBUF, (char
*)&nScktBufferSize, sizeof(nScktBufferSize));
if(nRet==SOCKET_ERROR)
{
hr=OM8ERR_WSA_SO_SNDBUF;
m_errorLogHlpr.ReportErrorSTRWSA(hr,szActionCtx);
}
}

if(SUCCEEDED(hr))
{
//...Set the receive buffer size
nScktBufferSize=OM8_MAXIMUM_PACKET_SIZE+OM8_PACKETHEADER_SIZE;
nRet=setsockopt(m_scktListen, SOL_SOCKET, SO_RCVBUF, (char
*)&nScktBufferSize, sizeof(nScktBufferSize));
if(nRet!=SOCKET_ERROR)
{
LINGER lingerStruct;

lingerStruct.l_onoff = 1;
lingerStruct.l_linger = 0;

nRet = setsockopt(m_scktListen, SOL_SOCKET, SO_LINGER,(char
*)&lingerStruct, sizeof(lingerStruct) );
if(nRet!=SOCKET_ERROR)
{

THE SERVER SENDS A NUMBER OF PACKETS:

dwSendBytes=dwFlags=0;
ZeroMemory(&(pIOCtxPacketOUT->Overlapped),sizeof(WSAOVERLAPPED));

//...Need to call this function before we execute the OverlappedIO function,
//...because it may free the IO thread before we have registered the seesion
//...with the pendingIO sessions list and then we will be out of sync
IncrementIO();
IncrementSENDIOStats();

//...Always use the secondary connection for sending
//...feedback messages
iRet=WSASend(m_scktConnectionSecondary,
pIOCtxPacketOUT->wsabuf,
pIOCtxPacketOUT->dwNumberOfBuffersUsed,
&dwSendBytes,
dwFlags,
&(pIOCtxPacketOUT->Overlapped),
NULL);

if(iRet==SOCKET_ERROR)
{
int iLastWSError=WSAGetLastError();

if(iLastWSError==ERROR_IO_PENDING)
{
//...Do not allow the caller to re-use this packet
pIOCtxPacketOUT=NULL;
}
else
{
hr=OM8ERR_SEND;
ATLASSERT(SUCCEEDED(hr));
DecrementIO();
DecrementSENDIOStats();
}
}
else
{
//...Do not allow the caller to re-use this packet
pIOCtxPacketOUT=NULL;
}

THE SERVER RECEIVES DATA

if(pIOCtxPacket->operationState==OM8IO_OPST_SOCKET_RECV_PACKET_HDR)
{
pIOCtxPacket->wsabuf[0].buf=(((char*)&(pIOCtxPacket->packetHeader))+pIOCtxPacket->dwBytesSoFar);
pIOCtxPacket->wsabuf[0].len=pIOCtxPacket->dwBytesTotal-pIOCtxPacket->dwBytesSoFar;
pIOCtxPacket->dwNumberOfBuffersUsed=1;

//...Need to call this function before we execute the OverlappedIO function,
//...because it may free the IO thread before we have registered the seesion
//...with the pendingIO sessions list and then we will be out of sync
IncrementIO();
IncrementRECVIOStats();

ZeroMemory(&(pIOCtxPacket->Overlapped),sizeof(WSAOVERLAPPED));
iRet=WSARecv(m_scktConnectionPrimary,
pIOCtxPacket->wsabuf,
pIOCtxPacket->dwNumberOfBuffersUsed,
&dwRecvBytes,
&dwFlags,
&(pIOCtxPacket->Overlapped),
NULL);

if(iRet==SOCKET_ERROR)
{
int iLastWSError=WSAGetLastError();
if(iLastWSError!=ERROR_IO_PENDING)
{
hr=OM8ERR_RECEIVE;
ATLASSERT(SUCCEEDED(hr));
DecrementIO();
DecrementRECVIOStats();
}
}
}//...It is a bit more complex because we may receive more than one blocks
at once
else if(pIOCtxPacket->operationState==OM8IO_OPST_SOCKET_RECV_PACKET_BODY)
{

THE CLIENT SETS UP THE CONNECTION

//...Create the socket
m_socketPrimary=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
if(m_socketPrimary==INVALID_SOCKET)
{
hr=OM8ERR_CREATESOCKET;
m_errorLogHlpr.ReportErrorSTRWSA(hr,lpActionCtx);
}
else
{
if(dwConnectionFlags & OM8CLIENTSESSION_CONNECT_FLAGS_SINGLECONNECTION)
{
m_socketSecondary=m_socketPrimary;
m_bSingleConnection=TRUE;
}
else
{
m_socketSecondary=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
if(m_socketSecondary==INVALID_SOCKET)
{
hr=OM8ERR_CREATESOCKET;
m_errorLogHlpr.ReportErrorSTRWSA(hr,lpActionCtx);
}
}
}

//...Set the sockets to non-blocking mode
if(SUCCEEDED(hr))
{
if(ioctlsocket(m_socketPrimary,FIONBIO,&ul)==SOCKET_ERROR)
{
hr=OM8ERR_FIONBIO;
m_errorLogHlpr.ReportErrorSTRWSA(hr,lpActionCtx);
}
else if(!m_bSingleConnection)
{
ul=1;
if(ioctlsocket(m_socketSecondary,FIONBIO,&ul)==SOCKET_ERROR)
{
hr=OM8ERR_FIONBIO;
m_errorLogHlpr.ReportErrorSTRWSA(hr,lpActionCtx);
}
}
}

//...Finally connect
if(SUCCEEDED(hr))
{
int iScktBufferSize;
OM8_OSVERSION osVersion;
OM8_OSTYPE osType;
OM8_OSSUITE osSuite;
DWORD dwMaxBufferSize=OM8_MAXIMUM_PACKET_SIZE+OM8_PACKETHEADER_SIZE;

//...Need to know which OS are we running on
OM8GetOSVersion(osVersion,osType,osSuite);

//...We need a different setting for different OS versions
if(osVersion<OM8OSVERSION_WINDOWS_XP)
{
//...It seems that Windows 2000 doesn't like 0 send buffer in the client
part
iScktBufferSize=dwMaxBufferSize;
}
else
{
//...Disable send buffering on the socket. Setting SO_SNDBUF to 0 causes
winsock to stop
//...bufferring sends and perform sends directly from our buffers, thereby
reducing CPU usage.
iScktBufferSize=0;
}

iRet=setsockopt(m_socketPrimary, SOL_SOCKET, SO_SNDBUF, (char
*)&iScktBufferSize, sizeof(iScktBufferSize));
if((iRet!=SOCKET_ERROR) && !m_bSingleConnection)
{
iRet=setsockopt(m_socketSecondary, SOL_SOCKET, SO_SNDBUF, (char
*)&iScktBufferSize, sizeof(iScktBufferSize));
if(iRet==SOCKET_ERROR)
{
hr=OM8ERR_WSA_SO_SNDBUF;
m_errorLogHlpr.ReportErrorSTRWSA(hr,lpActionCtx);
}
}
else
{
hr=OM8ERR_WSA_SO_SNDBUF;
m_errorLogHlpr.ReportErrorSTRWSA(hr,lpActionCtx);
}

if(iRet!=SOCKET_ERROR)
{
//...Set the receive buffer size to 4 times the packet+header size, that
way the server
//...can send data faster without having to rely on our receiver thread to
process data
//...fast enough
iScktBufferSize=(OM8_MAXIMUM_PACKET_SIZE+OM8_PACKETHEADER_SIZE)<<2;

iRet=setsockopt(m_socketPrimary, SOL_SOCKET, SO_RCVBUF, (char
*)&iScktBufferSize, sizeof(iScktBufferSize));
if((iRet!=SOCKET_ERROR) && !m_bSingleConnection)
{
iRet=setsockopt(m_socketSecondary, SOL_SOCKET, SO_RCVBUF, (char
*)&iScktBufferSize, sizeof(iScktBufferSize));
if(iRet==SOCKET_ERROR)
{
hr=OM8ERR_WSA_SO_RCVBUF;
m_errorLogHlpr.ReportErrorSTRWSA(hr,lpActionCtx);
}
}
else
{
hr=OM8ERR_WSA_SO_RCVBUF;
m_errorLogHlpr.ReportErrorSTRWSA(hr,lpActionCtx);
}
}

THE CLIENT CONNECTS


HRESULT hr=OM8ERR_CONNECT;

if(connect(sck,(struct sockaddr*)pServer,sizeof(struct
sockaddr_in))!=SOCKET_ERROR)
{
hr=S_OK;
}
else
{
switch(WSAGetLastError())
{
case WSAEINPROGRESS:
//...Pass through
case WSAEWOULDBLOCK:
{
struct timeval tmVal;
fd_set fdWrite;

//...Wait for 'lSecsWait' seconds to establish a connection
tmVal.tv_sec = lSecsWait;
tmVal.tv_usec = 0;

FD_ZERO(&fdWrite);
FD_SET(sck,&fdWrite);

if(select(0,NULL,&fdWrite,NULL,&tmVal)==1)
{
hr=S_OK;
}
}
break;

default:
break;
}
}

return hr;

THE CLIENT SENDS DATA

It actually calls this routine with a number of data blocks, one for the
header and many for the body. Then it blocks (or not) on an event, that is
signaled by the receiver thread.

while(IsConnectionAlive())
{
tmVal.tv_sec = PCKIN_TIMETWAIT;
tmVal.tv_usec = 0;

FD_ZERO(&fdWrite);
FD_SET(socket,&fdWrite);

iRet=select(0,NULL,&fdWrite,NULL,&tmVal);
if(iRet==SOCKET_ERROR)
{
if(WSAGetLastError()!=WSAEINPROGRESS)
{
m_bSendDataStatus=FALSE;
return OM8ERR_SEND;
}
}
else if(iRet==0)
{
dwTotalTimeOut+=PCKIN_TIMETWAIT;
if(dwTotalTimeOut>PCKSEND_TIMEOUT)
{
m_bSendDataStatus=FALSE;
return OM8ERR_SENDTIMEOUT;
}
}
else
{
dwNumberOfBytesSent=0;

#ifdef _DEBUG
ATLTRACE(_T("CLOCK: 'sendMultipleBuffers - select' total time
%u\n"),clock()-tStart);
#endif

iRet=WSASend( socket,
lpBuffers,
dwBufferCount,
&dwNumberOfBytesSent,
flags,
NULL,
NULL);

if(iRet==SOCKET_ERROR)
{
int iLastError=WSAGetLastError();

switch(iLastError)
{
case WSAEINPROGRESS:
//...Pass through
case WSAEWOULDBLOCK:
continue;

default:
if(IsRunning())
{
ShutDown(false); //...close the connection since there is a problem
m_bSendDataStatus=FALSE;
return OM8ERR_SEND;
}
else
{
m_bSendDataStatus=FALSE;
return OM8ERR_CONNECTIONCLOSED;
}
break;
}

m_bSendDataStatus=FALSE;
return SOCKET_ERROR;
}
else
{
#ifdef _DEBUG
ATLTRACE(_T("CLOCK: 'sendMultipleBuffers - WSASend' total time
%u\n"),clock()-tStart);
#endif
m_bSendDataStatus=FALSE;
return S_OK;
}
}
}

THE CLIENT RECEIVES DATA

This function is called to receive the header block and any subsequent
blocks repeatedly from the extra client thread (actually 2 threads when 2
sockets are used).

while(IsConnectionAlive())
{
tmVal.tv_sec = PCKIN_TIMETWAIT;
tmVal.tv_usec = 0;

FD_ZERO(&fdRead);
FD_SET(socket,&fdRead);

//...Wait for th esocket to get some data
iRet=select(0,&fdRead,NULL,NULL,&tmVal);
if(iRet==SOCKET_ERROR)
{
if(WSAGetLastError()!=WSAEINPROGRESS)
{
m_bRecvDataStatus=FALSE;
return OM8ERR_RECEIVE;
}
}
else if(iRet==0)
{
//...just keep waiting
continue;
}
else if(iBytesRecvSoFar<len)
{
//...Read the remaining bytes in the buffer
iRet=recv(socket,buf+iBytesRecvSoFar,len-iBytesRecvSoFar,flags);
if(iRet==SOCKET_ERROR)
{
int iLastError=WSAGetLastError();

switch(iLastError)
{
case WSAEINPROGRESS:
//...Pass through
case WSAEWOULDBLOCK:
continue;

default:
break;
}

m_bRecvDataStatus=FALSE;
return OM8ERR_RECEIVE;
}
else
{
if(!iRet)
{
m_bRecvDataStatus=FALSE;
return OM8ERR_RECEIVE;
}

iBytesRecvSoFar+=iRet;
}

mil


Alexander Nickolov

unread,
Aug 24, 2004, 2:32:05 PM8/24/04
to
I went over all your code so far. My first impression is it is
very inefficient, both at the client and at the server (and why
would you set the send buffer size to zero on a non-overlapped
socket escapes me...). One glaring error is your client does
not check it sent all data in one go (synchronous send can
still accept less data than you offered it), though I have no
idea if this is your problem. So when you use a single socket,
what exactly happens with this code and on which side?

As far as design, do not attempt to force packet structure
on the socket I/O level. Use it as a stream and compose/
decompose your packets in your own buffers. Overlapped
send and receive (and I noticed you don't do overlapped
receive at the server!) means you queue buffers, you don't
wait for the socket with select. For sending, your buffers
will be used directly by the TCP driver bypassing the socket
layer buffer, so several queued overlapped send operations
at any time are essential to performance. If you want efficient
overlapped receive, you need to queue many receive buffers.
On the client, provide a large receive buffer. Use at least
8KB send buffer (this is the default), or performance may
degrade (though if your requests are small and/or relatively
rare this does not impact performance much).

Finally, I've never mixed overlapped with non-overlapped I/O
on the same socket. Could that be your problem?

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnic...@mvps.org
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================

"mil" <m...@discussions.microsoft.com> wrote in message

news:B411F92B-6991-494F...@microsoft.com...


> "Alexander Nickolov" wrote:
>
>>BTW, it may be my own ignorance, but overlapped I/O is
>>_designed_ to work with non-blocking sockets, no? I'll
>>strongly second Alun here that you should rethink your
>>code and work with non-blocking sockets (and possibly
>>a single thread for read/write).
>
>
> I am using non-blocking sockets. But for the Overlapped IO (in the
> server),
> although I've added the code to make them non-blocking, it doesn't really
> matter, because it was working fine without it. Unless you mean that
> because
> I was using blocking sockets in the server before (even with the
> Overlapped
> IO) it was causing the problem of not being able to send from client and
> server at the same time.
>
>
>>No, you are wrong. Show some code, you probably have a
>>subtle bug...
>

> I hope I do have a bug and you find it :).but I did run lots of tests and


> always, when the client was sending data while the server was also sending
> data there was a problem.
> (Although when I was running the tests the server side was using blocking
> sockets).
>
> Keep in mind also that when the client is sending data and the server is
> sending replies, the client is receiving the replies through another
> thread.
>
>

> Here it is.

mil

unread,
Aug 24, 2004, 3:21:14 PM8/24/04
to

"Alexander Nickolov" wrote:

>(and I noticed you don't do overlapped receive at the server!)


The receive operation in the server is overlapped, first I receive a header
which describes what is coming (with one recv competion io) and then I setup
buffers and wait for all the data I should be receiving, through the
completion IO. If not all the data comes back, I calculate the right offsets
and wait again for the completion io.

I didn’t send you that part of the code because it is too long.


>would you set the send buffer size to zero on a non-overlapped
>socket escapes me...).

So I get better performance(?) Since I am sending all the data at once using
WSA buffers why do I need to copy them again? Of course I changed that in the
server (so the user can disable the 0 buffering, maybe I should make it
optional in the client too).


>One glaring error is your client does not check it sent all data in one go


Hmm, you are right. I thought when I send data it always sends them all or
it returns an error. I also thought that only the receive can send partial
buffers where the send always sends the whole thing or nothing.

I think I got this whole thing wrong.

>though I have no idea if this is your problem. So when you use a >single socket, what exactly happens with this code and on which >side?

If the previous assumption of mine, is wrong, then that must be the problem.
Because when the client sends data while the server also sends data, the
connection fails and the server socket stays locked for a minute or two.


>Finally, I've never mixed overlapped with non-overlapped I/O
>on the same socket. Could that be your problem?

I didn’t mix the two models, only the client is not using Overlapped IO. The
server is using it for all communication and file writing.


The problem must be the fact that I don’t “expect” the send command to send
partial buffers. And I am not taking into account that in both the client and
the server code. Where for the receive, I “obey” the rules.

So only when I am using a single socket and both client and server are
sending data, the send-operation does not send all the buffers in one go and
“asks me” to try again (and my code doesn’t handle this condition).

I am not saying that this cannot happen with the current code as well, but
it seems that it mostly happens when both sides try to talk over the same
socket.

I will change the code accordingly to take this into account at both sides.

Thanks for pointing out all these “omissions” (to put it kindly) in my code ;)

Now back to the drawing board for me :( I am going to add the checking for
the send operations in client and server then try again with a single socket.

mil

Alexander Nickolov

unread,
Aug 24, 2004, 4:17:31 PM8/24/04
to
Your sever side send is never partial - it's overlapped. All your
buffers are queued and will be later sent. You must preserve
them until notified (BTW, I didn't see any code of yours to
deal with the notification - you pass NULL event handle and
NULL APC...). Or at least that's the behavior with a non-blocking
socket, not sure about a blocking socket. (And I'm not even
sure it matters if the socket is blocking or not.)

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnic...@mvps.org
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================

"mil" <m...@discussions.microsoft.com> wrote in message

news:B70055F9-BDAD-4D2E...@microsoft.com...

Alexander Nickolov

unread,
Aug 24, 2004, 4:38:22 PM8/24/04
to
Sorry, I got mixed up. Your server-side socket was non-blocking.

Also, I didn't read carefully your server receive code. It is indeed
overlapped, but I'm not sure how do you detect when the data
has arrived. Again you passed NULL event handle and NULL APC...

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnic...@mvps.org
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================

"Alexander Nickolov" <agnic...@mvps.org> wrote in message
news:eZSptdhi...@TK2MSFTNGP11.phx.gbl...

mil

unread,
Aug 24, 2004, 5:15:02 PM8/24/04
to

"Alexander Nickolov" wrote:

> Your sever side send is never partial - it's overlapped.

Ok…just to double check that. My WSASend in the server will NOT return
partial results, unless it comes back with an error. Unlike the WSARecv which
can come back with partial buffers from a completion io.

Does this apply to the File Write IO when using completion io? i.e. I should
NOT worry about partial writes either.


In the client side now, it seems to be working in the same “style” even
though I am not using Overlapped IO. Is it because I am using the
WSASend/non-blocking sockets, or I should add code to catch the case of a
partial send anyway?


> buffers are queued and will be later sent. You must preserve
> them until notified

The buffers are preserved until they are needed no more, or the Completion
port has been “closed” and the threads have exited.


> BTW, I didn't see any code of yours to deal with the notification –

> you pass NULL event handle and NULL APC...).


Passing events and using callback functions is too easy ;) I am using
threads instead with the completion port.

while(1)
{
//...Reset these (important!)
dwIoSize=0;
pSession=NULL;
pIOContext=NULL;

//...Continually loop to service io completion packets
bSuccess = GetQueuedCompletionStatus(pThis->m_hIOCompletionPortIO,
&dwIoSize,
(PDWORD_PTR)&pSession,
(LPOVERLAPPED*)&pIOContext,
INFINITE);

if(pSession)
{
//...Sanity check
_ASSERTE(pSession->GetRefCount()>0);

//...Add a reference before we do any processing
//...so we can be sure the session won't be deleted
pSession->AddRef();

//...One less IO
pSession->DecrementIO();

if(bSuccess)
{

The above code runs in one or more threads and uses the
m_hIOCompletionPortIO for receives, sends, file read/writes.

Now, before you said my recvs are inefficient. That is not really true,
because I read a header and then all the packets together, so the TCP/IP
layer will just read them as a stream. Given that I also use a big recv
buffer, it shouldn’t cause any delays and it really does not do so. I always
get ~0ms round trips now.

While we were having this conversation, I changed my code to use one socket
again and tested it to see if it will “hit” the partial send. It never does
in the client and server (testing with 3 computers here).

But the more interesting thing is that I also “allowed” the client code to
send while the server is also sending over the same socket. It seems that it
doesn’t have the problems it had before. Now, I don’t know what has changed
in the code and I am not getting this behavior again. The only thing I
remember changing in the server was the “non-blocking sockets” together with
the overlapped IO, but I disabled that and it still works…so I am
investigating.


mil

Alexander Nickolov

unread,
Aug 24, 2004, 6:12:39 PM8/24/04
to
Ah, hadn't realized the bit about the completion port...
Makes sense now.

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: agnic...@mvps.org
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================

"mil" <m...@discussions.microsoft.com> wrote in message

news:A73A2096-4B03-4AC3...@microsoft.com...


>
>
> "Alexander Nickolov" wrote:
>
>> Your sever side send is never partial - it's overlapped.
>

> Ok.just to double check that. My WSASend in the server will NOT return


> partial results, unless it comes back with an error. Unlike the WSARecv
> which
> can come back with partial buffers from a completion io.
>
> Does this apply to the File Write IO when using completion io? i.e. I
> should
> NOT worry about partial writes either.
>
>
> In the client side now, it seems to be working in the same "style" even
> though I am not using Overlapped IO. Is it because I am using the
> WSASend/non-blocking sockets, or I should add code to catch the case of a
> partial send anyway?
>
>
>> buffers are queued and will be later sent. You must preserve
>> them until notified
>
> The buffers are preserved until they are needed no more, or the Completion
> port has been "closed" and the threads have exited.
>
>

>> BTW, I didn't see any code of yours to deal with the notification -

> the overlapped IO, but I disabled that and it still works.so I am
> investigating.
>
>
> mil
>


Swami

unread,
Aug 1, 2005, 1:15:42 PM8/1/05
to
Is there a difference between receiving data and receiving a file at a
socket? I implemented a server that is expecting to receive some files from
a client (that I did not implement), but when I view the files, they appear
as a bunch a unreadable characters. I'm wondering if it is because I am
trying to read a data stream (using the TcpListener class) when actually the
client is sending me files. Is there a difference?

Swami.

"mil" wrote:

> I have a client and a server application communicating through TCP/IP. When I
> connect to the server using 2 sockets (one for writing and one for reading) I
> get a 200ms delay switching from one socket to the other. Sending data to the
> server takes 200ms and receiving takes ~0ms
>

> When I am using a single socket to communicate with the server I get ~0ms
> delay for the communication both ways.
>

> I have tried everything I could think of to make this delay go away. I am
> using multiple threads in the client master thread to send, slave to receive.
>
> In the server I am using Completion I/O with multiple ports (different one
> for recv and another for send). I even tried switching off the Nagle
> algorithm.
>

> My conclusion so far is that Windows XP has a predefined delay of ~200ms
> when one process uses 2 sockets and switches from one to another. If that is
> true, is there any parameter I can change to avoid/reduce this delay?
>

> mil
>

Arkady Frenkel

unread,
Aug 1, 2005, 3:38:00 PM8/1/05
to
You can see what socket send installing some sniffer ( netmon , ethereal
... ) and sending file with
TransmitFile()
Arkady

"Swami" <Sw...@discussions.microsoft.com> wrote in message
news:6E240129-C07A-4AD4...@microsoft.com...

0 new messages