SO_SNDBUF/SO

Max Dubinsky

unread,

Jan 4, 2002, 4:31:19 AM1/4/02

to

Hello All,
I have an application using overlapped IO.
To increase performance i played a bit with SO_SNDBUF and SO_RCVBUF options
and found out that:
- When they are set to zero(as recommended in Q214397 for "applications that
do bulk data transfer")
WSARecv returns not more than 3k. Without setting these options it was 8k
per call.
Of course this is not the way to transfer bulk amount of data.
I used following code:
int zero = 0;
dwSysErr = setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (char *) &zero, sizeof
zero );
dwSysErr = setsockopt(sock, SOL_SOCKET, SO_RCVBUF, (char *) &zero, sizeof
zero );
May be i missed something? What could be wrong?
Any help is appreciated

Ed Astle

unread,

Jan 4, 2002, 6:47:21 AM1/4/02

to

"Max Dubinsky" <Ma...@itos.eu.org> wrote in message
news:O92KdKQlBHA.2084@tkmsftngp04...

Normally data received by the tcp stack is held and aggregated until you
call recv(), which *copies* the data from the internal tcp buffers to your
user supplied buffer.

By setting the snd/rcv buffers to zero and supplying your owns buffers with
the io structure you prevent a memcpy for all the data received. When the
tcp stack receives data it gets put directly into your buffers - not its own
internal buffers. This saves cpu time.

I would recommend you make your buffers an exact multiple of the network
packet size. I know tcp is stream based and you shouldn't need to know
about packet boundaries but I found it makes a huge difference. Here's a
comment direct from my code:

/*Although TCP is stream orientated, I set the socket receive size to be
exact multiples of an ethernet frame (1460 bytes) so that the TCP stack
doesn't have to keep data behind if, for example, we were to read only
1000 bytes - it would leave 460 bytes in the stack which would require
another receive to get it. Receiving 1460 bytes (or multiple thereof)
ensures minimum fragmentation of the stream.*/
#define MAXIMUM_SOCKET_RECEIVE_SIZE_DEFAULT (1460 * 4)

Make the value above (1460*4) configurable and try different values when
receiving huge streams of data and you'll see the difference. The 1460
isn't actually the frame size, its the amount of tcp data that can be held
in a network frame. I think you could get this programmatically by querying
for the max udp data size.

As an example, lets say you read 500k (500*1024) of data. And your buffer
size is 1460. The most efficient (ie least number of receives) would be
512000/1460 (for ethernet) = 350.685 ie 350 full buffers of 1460 and 1 at
the end of 1000 bytes.

If your buffer size was 1460*4 then the least number of receives could be 87
full buffers and 1 at the end of 3920 bytes. Less receives is good because
it means less api overhead.

If your app is running reasonably efficiently then you shouldn't get a
backlog of data to read. If you keep reading 3k then it sounds like you app
can read data faster than it arrives. Thats good. It also keeps memory
overhead down - say you had 1GB of data to read - you wouldn't want that all
in 1 buffer would you ?

Try and go one step further with your overlapped io - use 2 overlapped io
structures. Kick off the first one. When the buffer is filled with data
immediately kick off the second one. Data can be placed asynchronously into
this 2nd buffer while you are processing the first buffer. When you come
round to getting data from the 2nd buffer more often than not data is
already there waiting for you. Just keep alternating these 2 buffers.

One more thing - if you are resending the received data out on the network
(as proxy/router code would) don't try to coallesce data before sending it -
sending multiples of the data size (ethernet 1460) also keeps the number of
sent packets on the wire to a minumum (exactly the same logic as for
receiving). This means a client app gets full packets for the entire
download (ie 1460 bytes in each one, rather than say 1000 bytes in each one)
and requires less receives to finish the download.

Hope this helps,
Ed.

Max Dubinsky

unread,

Jan 4, 2002, 7:49:35 AM1/4/02

to

Thank you for the answer.
I think that setting SO_SNDBUF to zero really made sending slower - nothing
else was changed.
Your notice about buffer size is also reasonable - i will try that.
Also i think should be good to use multiple outstanding sends(the same
method as you described for receving).

"Ed Astle" <ed_dot...@tfeurope.com> wrote in message
news:3c359673$1...@primark.com...

Herb Stokes

unread,

Jan 4, 2002, 11:51:34 AM1/4/02

to

> "Ed Astle" wrote:
> I think you could get this programmatically by querying for the max udp
data size.

I too have read Q214397 and considered setting buffers to 0 using
SO_SNDBUF/SO_RCVBUF. However, I am not performing bulk transfers - just
small, frequent datagrams (UDP only). Would I see an increase in
performance by doing this?

I use async i/o on completion ports. The size of the buffers I currenly use
for recv() are no larger than SO_MAX_MSG_SIZE, which I query for after
binding.

Ed, when you said "I think you could get this programmatically by querying
for the max udp data size", were you speaking of SO_MAX_MSG_SIZE?

Herb

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----

Ed Astle

unread,

Jan 4, 2002, 1:07:20 PM1/4/02

to

"Herb Stokes" <n...@home.com> wrote in message
news:3c35d...@corp.newsgroups.com...

> > "Ed Astle" wrote:
> > I think you could get this programmatically by querying for the max udp
> data size.
>
> I too have read Q214397 and considered setting buffers to 0 using
> SO_SNDBUF/SO_RCVBUF. However, I am not performing bulk transfers - just
> small, frequent datagrams (UDP only). Would I see an increase in
> performance by doing this?
>
> I use async i/o on completion ports. The size of the buffers I currenly
use
> for recv() are no larger than SO_MAX_MSG_SIZE, which I query for after
> binding.
>
> Ed, when you said "I think you could get this programmatically by querying
> for the max udp data size", were you speaking of SO_MAX_MSG_SIZE?
>
> Herb
>

That's the fella!

I haven't done any udp stuff for years (and that was ipx through sockets)
but, iirc, having a buffer size larger than SO_MAX_MSG_SIZE made no sense as
even if many udp packets are backed up ready for reading each recv will only
pull out 1 udp packet at a time - winsock will not aggregate them into 1 big
packet.

Ed.

Michal Zygmuntowicz

unread,

Jan 4, 2002, 6:57:01 PM1/4/02

to

"Ed Astle" <ed_dot...@tfeurope.com> wrote in message

news:3c35ef85$1...@primark.com...

> but, iirc, having a buffer size larger than SO_MAX_MSG_SIZE made no sense
as
> even if many udp packets are backed up ready for reading each recv will
only
> pull out 1 udp packet at a time - winsock will not aggregate them into 1
big
> packet.

but if you have no buffer, incoming packets are silently discarded I
suppose,
and the retransmission rate (if retransmissions are needed) or the packet
loss
increases. so the buffer is rather neccessary, but it is your choice whether
it
is winsock buffer or your application's buffer.

---
Michal Zygmuntowicz

arkadyf

unread,

Jan 6, 2002, 8:00:20 AM1/6/02

to

You are correct , but why you say about 1460 ( it really size for data ) ?
MTU is 1500 ,
because you forgot about 20bytes for TCP header and 20 ( for IP ) so
1460+20+20=1500
Arkady

Ed Astle <ed_dot...@tfeurope.com> wrote in message

news:3c359673$1...@primark.com...

arkadyf

unread,

Jan 6, 2002, 8:15:44 AM1/6/02

to

1)8K is default winsock buffer
2)It's strongly recommended not to use that parameter set to 0 ,
One exception: streaming data using overlapped I/O should set the send
buffer to zero as TCP/IP issue stated.
You can look why not recommended to do it , in MSDN ( MSDN Magazine (
previous MS Journal )
October "Windows Sockets 2.0: Write Scalable Winsock Apps Using Completion
Ports" )
HTH
Arkady

Max Dubinsky <Ma...@itos.eu.org> wrote in message
news:O92KdKQlBHA.2084@tkmsftngp04...

Vadim Eydelman[MS]

unread,

Jan 6, 2002, 10:47:49 PM1/6/02

to

Setting socket receive (SO_RCVBUF) or send (SO_SNDBUF) buffer to exact
multiple of the network packet payload size is unnecessary.

Winsock does not pre-allocate memory for either receive or send buffer in
advance, only when data arrives from the network or posted by the
application. If the amount of buffered data is in excess of SO_RCVBUF or
SO_SNDBUF setting, Winsock would go above the set value in order to avoid
partial receive or send. Only the next receive or send (e.g. after the
buffer setting is reached or exceeded) is rejected buffering.

Note, however, that making size of the send request itself a multiple of the
network packet payload size does make sense.

--
This posting is provided "AS IS" with no warranties, and confers no rights.

"Ed Astle" <ed_dot...@tfeurope.com> wrote in message
news:3c359673$1...@primark.com...

Ed Astle

unread,

Jan 7, 2002, 7:17:58 AM1/7/02

to

"Vadim Eydelman[MS]" <vad...@online.microsoft.com> wrote in message
news:OrewU4ylBHA.604@tkmsftngp02...

> Setting socket receive (SO_RCVBUF) or send (SO_SNDBUF) buffer to exact
> multiple of the network packet payload size is unnecessary.
>

A slight misunderstanding, Vadim,

I was setting SO_RCVBUF and SO_SNDBUF to *zero*. I was setting my user
supplied buffers (for the overlapped io structure) to multiples of the
network payload size.

That way every incoming packet can be put directly into my buffers without
any fragmentation.

Regards,
Ed.

arkadyf

unread,

Jan 8, 2002, 1:28:46 AM1/8/02

to

IMHO , Vadim told about variant SO_RCVBUF not set to 0 ,
because , I don't know why Max after reading Q214397 decided to do it.
All MS documents stated not to do it ( except 1 case : media stream with
overlapped operations).

But if we talk about SO_RCVBUF not set to zero and Vadim told that it's not
necessary to
arrange buffers as multiply of MTU , it's correct from MS TCP stack point of
view , but
from user point you are absolutely correct . Why , because such allocations
reduces
interprocess/ring switching when time is critical. If my buffer is not big
enough I
need to do many recv() to take the data , but each recv() will switch from
my process ( Ring 3 ) to system ( protocol driver on Ring 0 ) and back.
So you are absolutely correct in both cases ( SO_RCVBUF = 0 and # 0 )
Let's wait what Vadim say
Arkady

Ed Astle <ed_dot...@tfeurope.com> wrote in message

news:3c399227$1...@primark.com...