[lwip-users] pbuf pool size / mss size in low memory environment and routing to slow link

1,868 views
Skip to first unread message

fepgmbh

unread,
Nov 19, 2009, 4:15:35 PM11/19/09
to lwip-...@nongnu.org

Good evening all,

i'm on the way to get the best memory adjustments for my LWIP, reading
around this evening but with still some questions left.

I use the LWIP with FreeRTOS on an Atmel AT91SAM7X512, so my RAM with 128k
is relatively limited in my environment. In addition to this, i have to
route packets from a fast connection (Ethernet) to a slow connection (radio
link with PPP at 3.6 kbit/s).

In my first setup i had a windows size of 2500 bytes with a MSS of 1200
bytes, 6 pool-pbufs with a size of 1500 bytes each. With a few 1500 bytes
pbuf, one whole buffer is blocked by a small 100byte packet.

First of all, 6 pbufs seems not to be very much, and as i understood they
can be chained together, i changed this to 36 pbufs with 256 bytes each -
noticing that now the TCP connections won't work with the 1200 byte MSS.

The second thing was, that the PPP link over the radio can loose packets, so
a damages packet will be around 1200 bytes, and when it is damaged it must
be resent.

As the pbuf size may not be lower than the MSS size (???), i changed the MSS
also to 256 bytes, especially for making the retransmits over the radio link
not too big.

But there are still some questions left for me ...

- Why can't the pool pbuf size less then the MSS, if the documentation says
the pbuf can be chained together?

- What will be the ideal pbuf size according to MSS size. Shall it be the
same, or must the pbuf be the MSS+40 bytes (to hold the segment and the
header)?

- Will i be still able to send / receive a (i.e.) 1000 bytes UDP datagram
with my 256bytes pbufs?

When my webserver i.e. sends a 4kbyte-page over the radio link, where will
all the TCP segments be stored until they have been acknowledged? As these
are dynamically generated data (sent with "COPY" option), this should be
happen in the pool pbufs, right?

And at least, will be a memory size (heap size) for LWIP of 8192 bytes be OK
in that case? I really have not to waste memory and get the best compromise
for memory usage and performance.

Sorry for all these questions, but after some hours of reading i still got
no really satisfying answer to them ... maybe one of you experts can give me
the one or other hint :-)

Thank you for your time, and a nice evening (for those who are also in the
european timezone ;)

Marco
--
View this message in context: http://old.nabble.com/pbuf-pool-size---mss-size-in-low-memory-environment-and-routing-to-slow-link-tp26421483p26421483.html
Sent from the lwip-users mailing list archive at Nabble.com.



_______________________________________________
lwip-users mailing list
lwip-...@nongnu.org
http://lists.nongnu.org/mailman/listinfo/lwip-users

Mike Kleshov

unread,
Nov 20, 2009, 1:52:58 AM11/20/09
to Mailing list for lwIP users
> - Why can't the pool pbuf size less then the MSS, if the documentation says
> the pbuf can be chained together?

Here is a part of my lwipopts.h:

#define PBUF_POOL_SIZE 48
#define PBUF_POOL_BUFSIZE 96
...
#define TCP_MSS 512

And it works.

> - Will i be still able to send / receive a (i.e.) 1000 bytes UDP datagram
> with my 256bytes pbufs?

There is UDP in my setup. It works too. I definitely have packets
spanning multiple pbufs.

> - What will be the ideal pbuf size according to MSS size. Shall it be the
> same, or must the pbuf be the MSS+40 bytes (to hold the segment and the
> header)?

It's a trade-off between memory efficiency and performance. With small
pbufs, you don't waste much memory on small packets, but you loose
some speed due to packet splitting. But the performance loss would
depend heavily on the application and optimizations used. There is
also the memory overhead of the pbuf header.
You probably want your pbufs to be at least large enough to hold an
ARP packet. There should be plenty of those in a LAN.

> When my webserver i.e. sends a 4kbyte-page over the radio link, where will
> all the TCP segments be stored until they have been acknowledged? As these
> are dynamically generated data (sent with "COPY" option), this should be
> happen in the pool pbufs, right?

In lwip, memory is allocated either from a pbuf pool or from heap. If
I remember correctly, in a standards setup memory for incoming packets
is allocated by the network interface driver from a pbuf pool. All
other allocations are done from heap. It is possible to configure lwip
to use pbuf pools everywhere and not use heap at all.

> And at least, will be a memory size (heap size) for LWIP of 8192 bytes be OK
> in that case? I really have not to waste memory and get the best compromise
> for memory usage and performance.

8 Kbytes would probably be OK for fast links. But with a slow link,
you'll want more unacked data to be 'in-flight' for reasonable
performance. Besides, you shouldn't think that heap size is the most
important tuning parameter. Yes, performance will suffer if there is
not enough heap. But heap can be wasted in the wrong places, so it is
important to tune other parameters to match your workload.

Kieran Mansley

unread,
Nov 20, 2009, 4:17:38 AM11/20/09
to Mailing list for lwIP users
On Thu, 2009-11-19 at 13:15 -0800, fepgmbh wrote:
> First of all, 6 pbufs seems not to be very much, and as i understood
> they
> can be chained together, i changed this to 36 pbufs with 256 bytes
> each -
> noticing that now the TCP connections won't work with the 1200 byte
> MSS.

I wonder if your driver can handle that case correctly? It has to know
to split the received packet across pbufs. A naive driver might ignore
pbuf chains (on send as well as receive).

Kieran

Chris Strahm

unread,
Nov 23, 2009, 2:01:31 AM11/23/09
to Mailing list for lwIP users
> > First of all, 6 pbufs seems not to be very much, and as i understood
they
> > can be chained together, i changed this to 36 pbufs with 256 bytes each
> > noticing that now the TCP connections won't work with the 1200 byte
> > MSS.

> I wonder if your driver can handle that case correctly? It has to know
> to split the received packet across pbufs. A naive driver might ignore
> pbuf chains (on send as well as receive).
> Kieran

Correct, I wrote my own driver for the ARM7 LPC23XX/24XX. I wrote the
driver so that it could handle PBUFs of different size than the EMAC DMA
buffers: equal, bigger, or smaller. Unless your driver is specifically
written to handle all of this data reorganization between the PBUFs and the
DMA bufs, it's best to assume they need to be the exact same size.

I was curious to know how much difference in speed there would be with
various buffer sizes, and equal or not equal. I tested with a 4MB stream of
TCP data. The data rate is currently around 2.2MB/sec. I tested with BUF
sizes down to 128 bytes, and up to 1536. But in each case the total RAM was
the same, about 12K.

There wasn't much difference in speed, maybe +/- 10%. I expected a larger
penalty when the buffer sizes were not equal, and/or when they were smaller,
but it really didn't affect it significantly. Very little speed difference.
Not much difference either with a large number of 24/256 size buffers, or a
small number of 4/1536 buffers. I could see no significant speed penalty by
chaining buffers either. The chaining appears very efficient.

With all that in mind, the most flexible combination is to use a larger
number of smaller buffers. 256 bytes seems about optimal and provides more
resources for higher frequency small packet traffic. Several of the other
TCP stacks I have used employed dual small/big buffers as a solution to this
problem. However, I have to give lwIP credit here, the PBUF chaining
approach gives all this small/big buffer size flexibility with virtually no
speed penalty.

My experience also suggests that lwIP [RAW] is about the fastest, smallest,
and most capable TCP stack solution for embedded systems with minimal RAM
resources. I am using this with FreeRTOS and my profiling shows only about
75% CPU utilization during the saturated TX/RX transfer. I think even more
speed is still possible with further optimizations and tuning, which is what
I am working on now. memcpy and chksum routines are very important, as well
as many other somewhat obscure areas. I am curious to see if I can
increase the speed further.

Chris.

Marco Jakobs

unread,
Nov 27, 2009, 9:05:21 AM11/27/09
to Mailing list for lwIP users
Hi Kieran,

the 36 pbufs with 256 bytes each and 1200 byte MSS did not work ;-)

Is there a recommendation of the relation between pbuf size and MSS? I did not find anything about this is the documentation ...

Marco



Kieran Mansley schrieb:

Marco Jakobs

unread,
Dec 1, 2009, 7:28:54 AM12/1/09
to Chris Strahm, Mailing list for lwIP users
Hi Chris,

Chris Strahm schrieb:
Correct, I wrote my own driver for the ARM7  LPC23XX/24XX.  I wrote the
driver so that it could handle PBUFs of different size than the EMAC DMA
buffers: equal, bigger, or smaller.  Unless your driver is specifically
written to handle all of this data reorganization between the PBUFs and the
DMA bufs, it's best to assume they need to be the exact same size.
  
In other words: for the standard ATMEL EMAC driver in the FREERTOS port for SAM7, which defines the buffer size as 128 bytes (with a big "do not change"), i should have the PBUF_POOL_BUFSIZE  in LWIP defined to 128?


/* Number of receive buffers */
#define NB_RX_BUFFERS           ( 16 )

/* Size of each receive buffer - DO NOT CHANGE. */
#define ETH_RX_BUFFER_SIZE      ( 128 )

/* Number of Transmit buffers */
#define NB_TX_BUFFERS           ( 16 )

/* Size of each Transmit buffer. */
#define ETH_TX_BUFFER_SIZE      ( 128  )


Kind regards
Marco

Chris Strahm

unread,
Dec 1, 2009, 11:53:51 AM12/1/09
to Mailing list for lwIP users
>> /* Number of receive buffers */
>> #define NB_RX_BUFFERS           ( 16 )
>> /* Size of each receive buffer - DO NOT CHANGE. */
>> #define ETH_RX_BUFFER_SIZE      ( 128 )
>> In other words: for the standard ATMEL EMAC driver in the FREERTOS port for SAM7,
>> which defines the buffer size as 128 bytes (with a big "do not change"),
>> i should have the PBUF_POOL_BUFSIZE  in LWIP defined to 128?
 
Yes I would think so.  But if you have the memory it might be a good idea to increase the number of buffers from 16 to much higher, say 32, 48, or 64.  16x128 is only 2048 bytes total.  That's not a lot of buffer space, either for the EMAC or lwIP.  But I do not know how the memory is laid out or setup for that part.

Chris.


Marco Jakobs

unread,
Dec 2, 2009, 7:16:39 AM12/2/09
to Chris Strahm, Mailing list for lwIP users
With the size of 128, i'm actually using 64 pbuf's. Although my RAM is nothing to waste, i think this size is necessary.

I'm just wondering how the PPP of LWIP is handling the pbuf's, maybe someone can drop a line about this ...

Marco




Chris Strahm schrieb:

Simon Goldschmidt

unread,
Dec 2, 2009, 10:22:44 AM12/2/09
to Mailing list for lwIP users
> I'm just wondering how the PPP of LWIP is handling the pbuf's, maybe
> someone can drop a line about this ...

I don't know that by heart but I think I remember the PPP code (as it is ported from sources that don't know pbufs) needs the input packets in contigouous memory and copies from PBUF_POOL to PBUF_RAM if the input packet is a linked list of pbufs (i.e. p->len != p->tot_len).

You might therefore be faster setting the size of pool pbufs high enough...

Simon
--
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

David Empson

unread,
Dec 2, 2009, 4:28:51 PM12/2/09
to Mailing list for lwIP users

"Simon Goldschmidt" <gold...@gmx.de> wrote:
>"Marco Jakobs" m...@piciorgros.com wrote:
>> I'm just wondering how the PPP of LWIP is handling the pbuf's, maybe
>> someone can drop a line about this ...
>
> I don't know that by heart but I think I remember the PPP code (as it is
> ported from sources that don't know pbufs) needs the input packets in
> contigouous memory and copies from PBUF_POOL to PBUF_RAM if the input
> packet is a linked list of pbufs (i.e. p->len != p->tot_len).

I can't see any evidence of that.

The code appears to be correctly dealing with pbufs that are smaller than
the MTU.

On the transmit side, the IP packet supplied by LWIP is processed by
pppifOutput(). It steps through each pbuf in the supplied chain (relying on
pb->next = NULL to detect the end of the packet). The IP packet is copied
into a new chain of pbufs (of type PBUF_RAW), adding the PPP framing and
control character escaping. The expanded PPP frame could be up to twice as
big as the MTU in the worst case. The transmit code then steps through the
pbufs holding the PPP frame and writes each of them in turn.

The receive side is reading one byte at a time and state machine allocates
new pbufs as required to hold the data, allocating a new pbuf each time it
has accumulated PBUF_POOL_BUFSIZE bytes. pbuf_cat() is used to attach each
pbuf to the existing chain.

gold...@gmx.de

unread,
Dec 3, 2009, 12:29:41 AM12/3/09
to Mailing list for lwIP users
David Empson schrieb:
> "Simon Goldschmidt" <gold...@gmx.de> wrote:
>
>> "Marco Jakobs" m...@piciorgros.com wrote:
>>
>>> I'm just wondering how the PPP of LWIP is handling the pbuf's, maybe
>>> someone can drop a line about this ...
>>>
>> I don't know that by heart but I think I remember the PPP code (as it is
>> ported from sources that don't know pbufs) needs the input packets in
>> contigouous memory and copies from PBUF_POOL to PBUF_RAM if the input
>> packet is a linked list of pbufs (i.e. p->len != p->tot_len).
>>
>
> I can't see any evidence of that.
>
The function pppSingleBuf() does that. However, I'm not really sure how
often that function is used. At least with pppoe (which I'M currently
trying to get to work :), it seems to be called on every RX pbuf, though.

Simon

David Empson

unread,
Dec 3, 2009, 1:48:04 AM12/3/09
to Mailing list for lwIP users

----- Original Message -----
From: <gold...@gmx.de>
To: "Mailing list for lwIP users" <lwip-...@nongnu.org>
Sent: Thursday, December 03, 2009 6:29 PM
Subject: Re: [lwip-users] pbuf pool size / mss size in low
memoryenvironmentandrouting to slow link


> David Empson schrieb:
>> "Simon Goldschmidt" <gold...@gmx.de> wrote:
>>
>>> "Marco Jakobs" m...@piciorgros.com wrote:
>>>
>>>> I'm just wondering how the PPP of LWIP is handling the pbuf's, maybe
>>>> someone can drop a line about this ...
>>>>
>>> I don't know that by heart but I think I remember the PPP code (as it is
>>> ported from sources that don't know pbufs) needs the input packets in
>>> contigouous memory and copies from PBUF_POOL to PBUF_RAM if the input
>>> packet is a linked list of pbufs (i.e. p->len != p->tot_len).
>>>
>>
>> I can't see any evidence of that.
>>
> The function pppSingleBuf() does that. However, I'm not really sure how
> often that function is used. At least with pppoe (which I'M currently
> trying to get to work :), it seems to be called on every RX pbuf, though.

Ah, my mistake. I was looking at our port, which has been restructured and
doesn't do that. (Complete rewrite of the receive side.)

Looking at the standard PPP code in LWIP 1.3.2-rc1: pppSingleBuf() is also
in the PPP over serial code. It is used for received frames which get passed
on to internal protocol handlers in PPP (such as LCP and IPCP) that expect a
simple pointer and length, so they require a contiguous block of data.

It is not used for TCP or IP frames. They are passed on to standard LWIP
code as a pbuf chain.

I'm not familiar enough with PPPoE to analyse its usage there.

pppSingleBuf() is not used for transmit at all (in either PPPoS or PPPoE).

In summary, this still doesn't affect my earlier comments - the PPP code
will work fine with pbuf size less than MTU.

The only code that may have problems with pbuf size less than MTU are
applications or Ethernet drivers which aren't prepared to deal with chains
of pbufs.

Simon Goldschmidt

unread,
Dec 3, 2009, 5:34:53 AM12/3/09
to Mailing list for lwIP users

> In summary, this still doesn't affect my earlier comments - the PPP code
> will work fine with pbuf size less than MTU.
>
> The only code that may have problems with pbuf size less than MTU are
> applications or Ethernet drivers which aren't prepared to deal with chains
> of pbufs.

I didn't want to question that. I merely wanted to point out that it might be more efficient with ppp to have pbufs that can hold a full packet as otherwise some packets are duplicated (meaning double RAM usage). Of course that doesn't account for wasted space and the pbufs might only have to be as big as the maximum internally used messages...

Anyway, you're right taht there is no problem for PPP with pbuf chains.

Simon
--
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser
Reply all
Reply to author
Forward
0 new messages