[lwip-users] Problem with lwip 1.3.2

268 views
Skip to first unread message

Magnus S

unread,
Sep 12, 2011, 5:25:20 AM9/12/11
to lwip-...@nongnu.org

Hello,
I have some experienced problems when using lwip 1.3.2.

We have been using our embedded system successfully for a while, but we have now performed load tests/stress tests,
and we have discovered that in certain circumstances the system misbehaves, or hangs completely.

Our system connects to a server periodically, and exchanges some data.
In the tests, a 1 minute period has been used, so the system has been connecting to the server once a minute.

We have investigated how well the device handles unrelated ethernet traffic, so we have connected the device to an
old hub instead of a switch, which means that the PHY/MAC receives all the data present on the ethernet.

In the test two Linux machines were used, connected to the same hub as our device.
One of the Linux PC:s acted as a web client, and the other as a web server.
a 6 MB file was repetedly downloaded to the client, with a 1 second delay between each download.
This means that the ethernet will be completely saturated for a few seconds, and then almost idle for one second.

With this setup, the device will connect once a minute and this normally succeeds for a few minutes or so.
But after a while (normally less than 10 minutes), the system starts to misbehave, and hangs completely.

If a switch is used instead of a hub, the problem is not seem, the system works as expected.

We have investigated the problem cause, and have found that the application hangs on the following line in the lwip source code:
tcpip.c, function tcpip_apimsg():
    sys_arch_sem_wait(apimsg->msg.conn->op_completed, 0);

This call will never return, and the task will hang forever.

It seems as something should trigger the semaphore "op_completed", but this never happens.

What could be the cause of this problem?

We believe that the problem happens when the ethernet is saturated, and lwip fails to send an ethernet packet.

The error handling routines should handle this gracefully, but it seems as this case is not handled properly.

Do you have any suggestions for how this should be solved? Has anyone else experienced something similar?

 

Regards

/Magnus

 

FreeRTOS Info

unread,
Sep 12, 2011, 5:43:17 AM9/12/11
to Mailing list for lwIP users
Into the land of guessing and speculation here, but here goes:

1) Your network driver is failing under load. Are you checking the
error bits in the MAC status register, clearing them, and recovering
from errors? If the network interrupts stop, then there are no packets
being received, and lwIP will do nothing because there is nothing for it
to do.

2) You have your MAC in promiscuous mode. Otherwise, it would make no
difference if you were behind a hub or a switch. Turning promiscuous
mode off will mask, although not cure or fix, the issue.


Regards,
Richard.

+ http://www.FreeRTOS.org
Designed for Microcontrollers.
More than 7000 downloads per month.

> _______________________________________________
> lwip-users mailing list
> lwip-...@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
lwip-...@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

Magnus S

unread,
Sep 12, 2011, 6:25:07 AM9/12/11
to Mailing list for lwIP users
Hi Richard,
we are using the Atmel drivers as distributed with the EVK1100 development board for the UC3A0512 MCU.
There could be problem in the network driver, but then everyone using the Atmel driver could experience similar problems.
However, I will investigate the error handling in the driver.
 
The MAC is not in promiscuous mode, the CAF bit (Copy All Frames) in the MAC layer for the UC3A0512 is not set.
We have tried a ping-flood the deivce, sending as many ping requests to the device as the ethernet can handle.
For this test we get similar behaviour, the device hangs.
 
Regards
/Magnus
 
 

2011/9/12 FreeRTOS Info <nos...@freertos.org>

Simon Goldschmidt

unread,
Sep 12, 2011, 7:10:37 AM9/12/11
to Mailing list for lwIP users
Magnus S <magnu...@gmail.com> wrote:
> we are using the Atmel drivers as distributed with the EVK1100 development
> board for the UC3A0512 MCU.

Which version are you using? The one I just download seems to only include lwIP 1.2.0 (which is about 5 years old now and has serious bugs). Also, from having a quick look at the sources, the driver included there should also not be too stable (as it calls into ARP from a wrong thread).

> There could be problem in the network driver, but then everyone using the
> Atmel driver could experience similar problems.

A quick google search yielded such problems (plus a possible fix), so it might well be a problem in the atmel port (or whatever port you are using - since you are using 1.3.2, not 1.2.0).

Simon
--
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
Jetzt informieren: http://www.gmx.net/de/go/freephone

Magnus S

unread,
Sep 12, 2011, 9:50:09 AM9/12/11
to Mailing list for lwIP users
Hi,
Atmel has several example applications, and they include different versions of lwip.
 
I use Atmel AVR32 Studio version 2.6.0, and this IDE includes a number of example applications from the
"File->New->AVR example project" menu.
For example, there is a sample called "Control panel demo" that includes the lwip 1.3.0.
 
Also, there are example applications called "LWIP 1.3.2 example" and "LWIP 1.3.2 with DHCP" that include lwip 1.3.2.
 
In our case, we started with an lwip 1.3.0 sample. When 1.3.2 was released, we upgraded manually, instead of using the Atmel 1.3.2 example.
 
Simon, do you have more information about the Atmel driver problem?
Can you send a link to the possible fix? I did not find the right information when i searched.
 
Regards
/Magnus

 
2011/9/12 Simon Goldschmidt <gold...@gmx.de>

Simon Goldschmidt

unread,
Sep 13, 2011, 2:22:02 AM9/13/11
to Mailing list for lwIP users
Magnus S <magnu...@gmail.com> wrote:
> I use Atmel AVR32 Studio version 2.6.0

OK, got it.

> In our case, we started with an lwip 1.3.0 sample. When 1.3.2 was
> released,
> we upgraded manually, instead of using the Atmel 1.3.2 example.

Well, there seems to be a problem in the 1.3.0 driver (C:\Program Files\Atmel\AVR Tools\AVR32 Studio\plugins\com.atmel.avr32.sf.uc3_1.7.0.201009140900\framework\1.7.0-AT32UC3\SERVICES\LWIP\lwip-port-1.3.0\AT32UC3A\netif\ethernetif.c) that has been fixed in the 1.3.2 driver:

ethernetif_input() should call "netif->input(p)" (which correctly obeys lwIP threading requirements) instead of calling "ethernet_intput(p)" (which violates threading requirements when called in multithreading configurations).

Also, you should pass "tcpip_input" as last parameter to "netif_add()" when using multithreading (NO_SYS defined to 0) and "ethernet_input" if NO_SYS is defined to 1.

Hope that helps.

> Can you send a link to the possible fix? I did not find the right
> information when i searched.

No, there was no link on the page I found.

Simon
--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

Magnus S

unread,
Sep 26, 2011, 7:52:18 AM9/26/11
to Mailing list for lwIP users
Hello,
we have now tried with the lwip 1.3.2 sample, but with the same result as before, the problem remain.
 
We have used the development board EVK1100 together with the example application "LWIP 1.3.2 with DHCP example".
The example application has been modified to act as a client instead as a server.
 
The problem is that when a certain kind of network load is present in the system, we are able to make the system hang,
so the only way of getting the system to work is the perform a reset.
 
Below follows a detailed instruction regarding how to reproduce the problem, including the source code.
 
In order to reproduce the problem, the EVK1100 board and two PC:s have been connected to each other with an
ethernet network hub. Note that a switch should not be used, in that case the problem does not appear.
In our case we used an "Etherprime EP-1008m" ethernet hub. Another example is "D-Link DE-824TP".
 
The hub should also be connected to a DHCP server, we used an ethernet router for this ("D-link DIR-100").
A document that descibes the setup can be found here:
https://sites.google.com/site/ethernetissue/
(See PDF-file in the first link on the page)
 
One of the PC:s should be setup as a web server. In our case, we used Ubuntu Linux and the Apache web server.
The other PC should act as a web client, and should repeatedly download a large file.
In our case a 5 Mb file was used. We configured the PC with Ubuntu Linux, and used the "wget" tool to download the file.
The following bash command was given:
 
    while [ "true" ]; do wget http://192.168.24.101/mylargefile.JPG; sleep 1; done
 
This command downloads a large file from the web server, which will take a few seconds.
Then is sleeps for one second, and after that the actions are repeated.
 
The EVK1100 board should run the test software. The source code for the application can be found at the web page:
https://sites.google.com/site/ethernetissue/
(See the second link on the page, use "File -> Download original" to download the original zip-file)

This test application is a slightly modified version of the LWIP 1.3.2 example shipped with AVR studio 2.6.0.
 
The test applicaton will first request an IP address from the DHCP server.
When it has obained an IP address, the application will repeatedly connect to a web server,
and will download the file in the root folder. This is done repeatedly with a 10 second delay 
between the TCP requests.
 
In our case we use the built-in web server in the router during the test, when pages were requested by the EVK1100 applicaton.
However, it should be possible to use the Ubuntu web server as well for this.
 
In order to configure the IP address for the web server, change following line in main.c:
    char servername[] = "192.168.24.101";
 
The actions are printed on the display by the EVK1100-application,
so it is possible to follow what happens. The leds are also indicating the progress.
 
When the EVK1100 test application is running without the Ubuntu Linux web client turned on,
the application will run forever, just as is should, without any problems.
 
However, when the Ubuntu Linux web client is started with the repeated wget calls,
the EVK1100 application will hang after a while. Note that the problem is intermittent,
and it is tricky to reproduce the problem. It might take a while to reproduce the problem.
However, we are usually able to reproduce the problem within about 10 minutes or so.
 
When the hanging appears the application will stop, and no more progress will be indicated on the display.
 
Note also that the device does not recover when the network load is turned off.
When the application has been hung, removing the power is the only way to recover.
It is a bit hard to reproduce the problem, but should be possible to do within about 10 minutes.
 
I hope someone could give some advice regarding this, how to proceed from this point.
 
/Magnus


 
2011/9/13 Simon Goldschmidt <gold...@gmx.de>

Kieran Mansley

unread,
Sep 26, 2011, 10:05:48 AM9/26/11
to Mailing list for lwIP users
On Mon, 2011-09-26 at 13:52 +0200, Magnus S wrote:
> In order to reproduce the problem, the EVK1100 board and two PC:s have
> been
> connected to each other with an
> ethernet network hub. Note that a switch should not be used, in that
> case
> the problem does not appear.

I suggest having a quick search through the mailing list archives for
problems involving a network hub as this sort of thing has cropped up
before. I'm not sure if it will apply to you (the earlier problems might
have been with a different port for example) or what the solution was,
but there's a good chance it could help.

Kieran

gold...@gmx.de

unread,
Sep 26, 2011, 2:32:12 PM9/26/11
to Mailing list for lwIP users
From having a quick look at the lwIP port (and application) you provided, I can't see anything wrong. However, I don't have an EVK1100 board to test it. Also, our product is full-duplex only, so no hubs allowed, which means I cannot test it with our product.

What you describe looks like some kind of resource blocker, so you might probably best off with *not* disabling asserts and trying to get assert/error strings to be written to your LCD. You could also try to creat a task that periodically dumps the lwip_stats somewhere...

Anyway, this doesn't look like a problem in lwIP core, but rather in your port or something :-(

Simon
Reply all
Reply to author
Forward
0 new messages