Hello,
I have some experienced problems when using lwip 1.3.2.
We have been using our embedded system successfully for a while, but we have now performed load tests/stress tests,
and we have discovered that in certain circumstances the system misbehaves, or hangs completely.
Our system connects to a server periodically, and exchanges some data.
In the tests, a 1 minute period has been used, so the system has been connecting to the server once a minute.
We have investigated how well the device handles unrelated ethernet traffic, so we have connected the device to an
old hub instead of a switch, which means that the PHY/MAC receives all the data present on the ethernet.
In the test two Linux machines were used, connected to the same hub as our device.
One of the Linux PC:s acted as a web client, and the other as a web server.
a 6 MB file was repetedly downloaded to the client, with a 1 second delay between each download.
This means that the ethernet will be completely saturated for a few seconds, and then almost idle for one second.
With this setup, the device will connect once a minute and this normally succeeds for a few minutes or so.
But after a while (normally less than 10 minutes), the system starts to misbehave, and hangs completely.
If a switch is used instead of a hub, the problem is not seem, the system works as expected.
We have investigated the problem cause, and have found that the application hangs on the following line in the lwip source code:
tcpip.c, function tcpip_apimsg():
sys_arch_sem_wait(apimsg->msg.conn->op_completed, 0);
This call will never return, and the task will hang forever.
It seems as something should trigger the semaphore "op_completed", but this never happens.
What could be the cause of this problem?
We believe that the problem happens when the ethernet is saturated, and lwip fails to send an ethernet packet.
The error handling routines should handle this gracefully, but it seems as this case is not handled properly.
Do you have any suggestions for how this should be solved? Has anyone else experienced something similar?
Regards
/Magnus
1) Your network driver is failing under load. Are you checking the
error bits in the MAC status register, clearing them, and recovering
from errors? If the network interrupts stop, then there are no packets
being received, and lwIP will do nothing because there is nothing for it
to do.
2) You have your MAC in promiscuous mode. Otherwise, it would make no
difference if you were behind a hub or a switch. Turning promiscuous
mode off will mask, although not cure or fix, the issue.
Regards,
Richard.
+ http://www.FreeRTOS.org
Designed for Microcontrollers.
More than 7000 downloads per month.
> _______________________________________________
> lwip-users mailing list
> lwip-...@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/lwip-users
_______________________________________________
lwip-users mailing list
lwip-...@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users
Which version are you using? The one I just download seems to only include lwIP 1.2.0 (which is about 5 years old now and has serious bugs). Also, from having a quick look at the sources, the driver included there should also not be too stable (as it calls into ARP from a wrong thread).
> There could be problem in the network driver, but then everyone using the
> Atmel driver could experience similar problems.
A quick google search yielded such problems (plus a possible fix), so it might well be a problem in the atmel port (or whatever port you are using - since you are using 1.3.2, not 1.2.0).
Simon
--
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
Jetzt informieren: http://www.gmx.net/de/go/freephone
OK, got it.
> In our case, we started with an lwip 1.3.0 sample. When 1.3.2 was
> released,
> we upgraded manually, instead of using the Atmel 1.3.2 example.
Well, there seems to be a problem in the 1.3.0 driver (C:\Program Files\Atmel\AVR Tools\AVR32 Studio\plugins\com.atmel.avr32.sf.uc3_1.7.0.201009140900\framework\1.7.0-AT32UC3\SERVICES\LWIP\lwip-port-1.3.0\AT32UC3A\netif\ethernetif.c) that has been fixed in the 1.3.2 driver:
ethernetif_input() should call "netif->input(p)" (which correctly obeys lwIP threading requirements) instead of calling "ethernet_intput(p)" (which violates threading requirements when called in multithreading configurations).
Also, you should pass "tcpip_input" as last parameter to "netif_add()" when using multithreading (NO_SYS defined to 0) and "ethernet_input" if NO_SYS is defined to 1.
Hope that helps.
> Can you send a link to the possible fix? I did not find the right
> information when i searched.
No, there was no link on the page I found.
Simon
--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
I suggest having a quick search through the mailing list archives for
problems involving a network hub as this sort of thing has cropped up
before. I'm not sure if it will apply to you (the earlier problems might
have been with a different port for example) or what the solution was,
but there's a good chance it could help.
Kieran