linux networking buffer

Surinder

unread,

Mar 30, 2013, 11:59:30 AM3/30/13

to

Hi,

A) RX count as seen and set by ethtool.
B) sysctl -w net.core.netdev_max_backlog=3000
C) sysctl -w net.core.rmem_max=256000 along with setsockopt(SO_RCVBUF)

on transmit side
D) TX count as seen and set by ethtool
E) txqueuelen as seen and set by ifconfig
F) sysctl -w net.core.wmem_max and setsockopt(SO_SNDBUF)

I guess
A,D are set in NIC eprom and meant for NIC memory
or NIC driver (host non-pagable memory) ?
B,E are kept in kernel outside any device driver
C,F are kept in kernel networking stack (not part of NIC driver)

A,D changes are for that very NIC
B,E are about system wide queue size
C,F allows per socket buffer size tuning.

I wrote a ping program in C/Linux to send icmp echo to 1000 hosts, I got
around 170 drops in single iteration i.e. without retry.
These are my buffer values
RX(200), backlog(1000), net.core.rmem_max = 131071, SO_RCVBUF :262142

packet sniffer got all 1000 reply packets.
packet size is about 100byte on wire and 84byte with IP header. so
262142 should easily hold 2600 packets without drop.
changing RX didn't make difference.
on increasing rmem_max, all 1000 are received though.
There was no other traffic to the collector (as verified by packet
capture tool).

TIA
- Surinder

Rick Jones

unread,

Apr 5, 2013, 5:39:45 PM4/5/13

to

Surinder <Surinde...@example.net> wrote:
> These are my buffer values
> RX(200), backlog(1000), net.core.rmem_max = 131071, SO_RCVBUF :262142

> packet sniffer got all 1000 reply packets.
> packet size is about 100byte on wire and 84byte with IP header. so
> 262142 should easily hold 2600 packets without drop.

Except that Linux socket buffer limits are applied to not just the
packet headers/data but also the size of the buffer(s) used to hold
the data. So, if the NIC driver happens to post 2048 byte buffers to
the card, that 262141 bytes would be able to hold no more than 127
packets. So, my guess would be that your receiver wasn't keeping-up
with all the replies coming-in and lost the race a few times.

> changing RX didn't make difference.

Suggesting that the "queue" which overflows is not the NIC's RX
queue(s).

> on increasing rmem_max, all 1000 are received though.
> There was no other traffic to the collector (as verified by packet
> capture tool).

That would be consistent then with the increased rmem_max size being
sufficient to cover those cases where your receiver wasn't able to
keep-up for a short length of time.

If your application is simply sending all 1000 requests and *then*
looking to receive/process replies, you should alter the logic of your
application to be able to also look for replies as you are sending
requests.

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

Tauno Voipio

unread,

Apr 6, 2013, 12:37:26 PM4/6/13

to

I'm not sure if it is wise to continue handholding the OP. The task
at hand smells a bit too much of first stage of a spam-o-bot.

Luckily it seems that he's not too much up to the task.

--

Tauno Voipio

J G Miller

unread,

Apr 6, 2013, 3:36:41 PM4/6/13

to

On Saturday, April 6th, 2013, at 19:37:26h +0300, Tauno Voipio observed:

> Luckily it seems that he's not too much up to the task.

It does seems a little strange that Thunderbird is up to version 17.0.4
but the original posting in this thread was made with a six year old
version from 2006

User-Agent: Thunderbird 1.5.0.9 (X11/20061206)

So one does wonder which version of the kernel is in use?
2.6.20 or earlier?

Rick Jones

unread,

Apr 8, 2013, 4:23:33 PM4/8/13

to

Oh, back when my primary workstation was an HP-UX 11.11-running HP9000
J5600 I used the same version of browser/email client for years at a
time. I wasn't being forced-upgraded to a new version every couple
months.

rick jones
--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?

Surinder

unread,

Apr 9, 2013, 12:50:10 AM4/9/13

to Rick Jones

Rick Jones wrote:
> Surinder <Surinde...@example.net> wrote:
>> These are my buffer values
>> RX(200), backlog(1000), net.core.rmem_max = 131071, SO_RCVBUF :262142
>
>> packet sniffer got all 1000 reply packets.
>> packet size is about 100byte on wire and 84byte with IP header. so
>> 262142 should easily hold 2600 packets without drop.
>
> Except that Linux socket buffer limits are applied to not just the
> packet headers/data but also the size of the buffer(s) used to hold
> the data. So, if the NIC driver happens to post 2048 byte buffers to
> the card, that 262141 bytes would be able to hold no more than 127
> packets. So, my guess would be that your receiver wasn't keeping-up
> with all the replies coming-in and lost the race a few times.

Rick,
Thanks for reply.
SO_RCVBUF of 262141 may be coming into picture after kernel got the
packet and putting it into socket specific queues. I understand that
you are suggesting that total packet capacity is calculated based on
some constant size for the packet. But that would lead to wastage of
memory when the packet size are small. as tcp or udp is 65K.
buffer of 262141 could hold 4 packets of 65K each.

>
>> changing RX didn't make difference.
>
> Suggesting that the "queue" which overflows is not the NIC's RX
> queue(s).

Agree.

>
>> on increasing rmem_max, all 1000 are received though.
>> There was no other traffic to the collector (as verified by packet
>> capture tool).
>
> That would be consistent then with the increased rmem_max size being
> sufficient to cover those cases where your receiver wasn't able to
> keep-up for a short length of time.

I got my program working properly with bigger buffer size, just that I
want to optimise on buffer size. huge size of rmem_max could mean more
non-pagable memory reserved for kernel itself. not that I am running low
on RAM but just want to be conservative.

>
> If your application is simply sending all 1000 requests and *then*
> looking to receive/process replies, you should alter the logic of your
> application to be able to also look for replies as you are sending
> requests.
>
> rick jones

I could get the required reliability and performance with having
all-sends/all-recvs in simpler and readable code.
If it hit the wall, then I would be having sender-thread/recvr-thread to
go in parallel, which would ease load on the queues even though that
would add complexity of synchronizing.

And also I wanted to understand que sizing mechanism between linux
socket sender/rcvr.

This is my Linux info though I intended to ask for Linux in general.
Linux khyber 2.6.13-15.18-default #1 Tue Oct 2 17:36:20 UTC 2007 i686
i686 i386 GNU/Linux

Thanks Again.
- Surinder

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Surinder

unread,

Apr 9, 2013, 1:03:16 AM4/9/13

to

Miller,

This is my Linux info though I intended to ask for Linux in general.

2.6.13-15.18-default #1 Tue Oct 2 17:36:20 UTC 2007 i686 i386 GNU/Linux

surely a lot would have changed in linux kernel since.

How would Thunderbird make or version relate to my question :)

- Surinder

Surinder

unread,

Apr 9, 2013, 1:24:12 AM4/9/13

to

Tauno,

A tool has no intent else than doing its job.
Its the person who has intents.
And it is different thing to make guess on intents of others.
And to guess their capacity to follow their intents.
And that on forum that is meant for understand making/working of tool.
And to act as wet blanket.

I hate spam more than I hate mosquitoes.
spam has (despite of all anti spam tools) make lot of non reversible
losses to Internet. Worst affected being NNTP newsgroups.
sometimes a valid email ends up in spam box.

The concerned application is ping check for a set of devices just for
reachability.

- Surinder

J G Miller

unread,

Apr 9, 2013, 9:10:56 AM4/9/13

to

On Tuesday, April 9th, 2013, at 10:33:16h +0530, Surinder wrote:

> This is my Linux info though I intended to ask for Linux in general.
>
> 2.6.13-15.18-default #1 Tue Oct 2 17:36:20 UTC 2007 i686 i386 GNU/Linux
>
> surely a lot would have changed in linux kernel since.

Your question was about networking and there were very important
changes in Linux kernel networking features in going from 2.4 to 2.6,
so it is important to know which version of the kernel you are using.

In going from 2.6.13 which is about 6 years old to today's kernel
version of 3.{4,5} depending on distribution or latest 3.7 there
have been some important changes as well.

Rick Jones

unread,

Apr 9, 2013, 1:19:16 PM4/9/13

to

Socket buffers in general, and things like net.core.[rw]mem_max
specifically are not preallocations. They are limits. So, an
SO_RCVBUF of 256KB is not consuming 256 KB of memory unless there are
actually data/packets waiting therein.

For the inbound data path, the allocations are actually made by the
NIC driver when it allocates buffers to post to the NIC for inbound
DMA. The strategies employed for buffer sizes there will vary from
NIC to NIC, and will depend on the NIC's programming model.

rick jones
--
Process shall set you free from the need for rational thought.

Surinder

unread,

Apr 11, 2013, 12:45:23 PM4/11/13

to

Rick Jones wrote:
> Socket buffers in general, and things like net.core.[rw]mem_max
> specifically are not preallocations. They are limits. So, an
> SO_RCVBUF of 256KB is not consuming 256 KB of memory unless there are
> actually data/packets waiting therein.
>
> For the inbound data path, the allocations are actually made by the
> NIC driver when it allocates buffers to post to the NIC for inbound
> DMA. The strategies employed for buffer sizes there will vary from
> NIC to NIC, and will depend on the NIC's programming model.
>
> rick jones

Good to hear that.

- Surinder

Surinder

unread,

Apr 13, 2013, 5:37:21 AM4/13/13

to

Rick Jones wrote:
> Socket buffers in general, and things like net.core.[rw]mem_max
> specifically are not preallocations. They are limits. So, an
> SO_RCVBUF of 256KB is not consuming 256 KB of memory unless there are
> actually data/packets waiting therein.
>
> For the inbound data path, the allocations are actually made by the
> NIC driver when it allocates buffers to post to the NIC for inbound
> DMA. The strategies employed for buffer sizes there will vary from
> NIC to NIC, and will depend on the NIC's programming model.
>
> rick jones

Following code answers few of my questions.
(inline with what Rick told)
http://lxr.linux.no/#linux+v3.8.7/net/core/sock.c#L689
708set_rcvbuf:
709 sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
710 /*
711 * We double it on the way in to account for
712 * "struct sk_buff" etc. overhead. Applications
713 * assume that the SO_RCVBUF setting they make will
714 * allow that much actual data to be received on that
715 * socket.
716 *
717 * Applications are unaware that "struct sk_buff" and
718 * other overheads allocate from the receive buffer
719 * during socket buffer allocation.
720 *
721 * And after considering the possible alternatives,
722 * returning the value we actually used in getsockopt
723 * is the most desirable behavior.
724 */
725 sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
726 break;

As my packets are below 100 bytes data, the sk_buff overhead would be
proportionally quite high. Visually looking at sk_buff, for a 32-bit
system, the size of sk_buff appears to be around 200 bytes.
For 100 byte on wire packet, that means only one-third of rcvbuf is
holding packet data. 2/3 is sk_buff overhead.

So for SO_RCVBUF :262142,
262142/3 = 87380 bytes are available for packet data.
87380/100 = 873 is number of packets that can be held in it.

1000 - 873 = 127 drops out of 1000 are expected.
What I got was 170, which is not far from expectation.

Following Chapters/Books were really helpfull.
[1] Understanding Linux Network Internals By Christian Benvenuti
Part III: Transmission and Reception

[2] Essential Linux Device Drivers by Sreekrishnan Venkateswaran
Chapter 15. Network Interface Cards

Thanks all folks here.

- Surinder