hi all,
I would be really grateful if any one of you can throw a bit of time at this issue, because I think it could manifest in several boards and not just the nucleo h743.
I could be loosing time on something obvious that you will identify faster than me, because I've not looked at NuttX for a long time.
net init is like this:
nx_start -> up_initialize -> arm_netinitialize -> stm32_ethinitialize -> netdev_register/dev/console is initialized in: arm_serialinit,
which is called in up_initialize right before arm_netinitialize
And this is executed JUST BEFORE drivers_initialize
I am also seeing that all debug calls all end up in syslog()
Which
itself is by default a driver that will lead to
file_open("/dev/console"), am I right?
So any debug message emitted in arm_netinitialize/stm32_ethinitialize is executed while the driver subsystem is not initialized yet.
What does syslog do in that case? Is there an early syslog mechanism?
And this is a recent change, still visible in the current github display for nx_start.c:
https://github.com/apache/incubator-nuttx/commit/a594a5d7a8dcd756884ca7d3449f37ba77f96af3
I cant find a pull request for this change. Nor any discussion. And the commit message makes no sense, there are no clear explanations.
This is an important change that can have lots of consequences with drivers and chip init, specifically in the case where I am stuck.
I believe that nx_open in group_setupidlefiles expects to get fd 0 (because it's the first opened file?) but it does not, because syslog or something else has opened a file before and this breaks an assumption.
I might be seeing different issues. I'll probably put an early breakpoint in fopen to see which values are returned.
I have a workaround by removing the DEBUGASSERT, but I fear an importnat issue hides somewhere else, and I want to find the root cause of this instead of hiding a bug under the carpet.
Sebastien
Hi, Most probably we need to trace when network init is done. Usually in the early stages the "# define showprogress(c) arm_lowputc(c)" and "arm_lowputs" is used to trace the init process, so maybe you can instrument the code to find out what is going on. Best regards, Petro чт, 19 трав. 2022 р. о 22:15 Nathan Hartman <hartman...@gmail.com> пише:On Thu, May 19, 2022 at 12:40 PM Sebastien Lorquet <seba...@lorquet.fr> wrote:Hi, I have a nucleo-h743zi, in which I have enabled network. It seems to work so far. When I enable Network Info Debug, I get a crash. The crash does not happen with Network info disabled (Enabling network warnings and errors is ok) I am writing on the mailing list to get feedback and discussion. Here is the crash, pretty obvious cause: A▒DE stm32_ethinitialize: intf: 0 stm32_ifdown: Taking the network down netdev_register: Registered MAC: 2e:7e:27:a2:69:88 as dev: eth0 up_assert: Assertion failed at file:init/nx_start.c line: 703 task: Idle Task First of all I dont know where this MAC is obtained but whatever, thats not my problem. I will put the real mac later. Note that nx_start is a pretty general file for nuttx, not related to board or chip. (gdb) break nx_start.c:703 Breakpoint 1 at 0x8002b9a: file init/nx_start.c, line 703. Note: automatically using hardware breakpoints for read-only addresses. (gdb) mon reset init SWD DPIDR 0x6ba02477 target halted due to debug-request, current mode: Thread xPSR: 0x01000000 pc: 0x080002e0 msp: 0x240059fc (gdb) c Continuing. Breakpoint 1, nx_start () at init/nx_start.c:703 703 DEBUGVERIFY(group_setupidlefiles(&g_idletcb[i])); (gdb) s group_setupidlefiles (tcb=0x24001114 <g_idletcb>) at group/group_setupidlefiles.c:61 61 FAR struct task_group_s *group = tcb->cmn.group; (gdb) n 66 DEBUGASSERT(group != NULL); (gdb) n 74 fd = nx_open("/dev/console", O_RDWR); (gdb) n 75 if (fd == 0) (gdb) n 88 if (fd > 0) (gdb) n 98 return -ENFILE; =>assert This function group_setupidlefiles fails to open /dev/console and triggers an assert. Now please dont tell me that this code will work when I remove debug, I dont care. This is not the root cause. Does this code fails to open the console because some network debug has been emitted first? What to do, then? It is pretty common to have boards emit debug messages before the OS is started. I cant believe I'm the only one that will see this. It seems to happen before I have CONFIG_DEV_CONSOLE, which is the default, and is ok, since i'm just using a serial port with this board. If I disable CONFIG_DEV_CONSOLE, then nsh does not start, so I believe this setting is required. USB console is not enabled nor planned to be used. Can you please help me understand and fix this? It does not look to be recent code.Unfortunately I don't know the cause, but I have actually run into a rather similar problem just a few days ago. In my case it's on Tiva platform (armv7m-based) and there is an assertion failure related in some way to the console device. In my case, the board is a custom board that does not have a serial console (though it does have a UART which controls an another chip). For everything else, this board has only network connectivity. For the time being I have worked around the crash by disabling a bunch of Kconfigs. I didn't know it was the network debugging but I do remember that being one of the Kconfigs I turned off. I will try to look at the specific areas you mentioned and I'll post back if I figure out what's going on with that. It might take me a few days though. Inundated with other work at the moment... Nathan
hi all,
I would be really grateful if any one of you can throw a bit of time at this issue, because I think it could manifest in several boards and not just the nucleo h743.
I could be loosing time on something obvious that you will identify faster than me, because I've not looked at NuttX for a long time.
net init is like this:
nx_start -> up_initialize -> arm_netinitialize -> stm32_ethinitialize -> netdev_register/dev/console is initialized in: arm_serialinit, which is called in up_initialize right before arm_netinitialize
And this is executed JUST BEFORE drivers_initialize
I am also seeing that all debug calls all end up in syslog()
Which itself is by default a driver that will lead to file_open("/dev/console"), am I right?
So any debug message emitted in arm_netinitialize/stm32_ethinitialize is executed while the driver subsystem is not initialized yet.
What does syslog do in that case? Is there an early syslog mechanism?
And this is a recent change, still visible in the current github display for nx_start.c:
https://github.com/apache/incubator-nuttx/commit/a594a5d7a8dcd756884ca7d3449f37ba77f96af3
I cant find a pull request for this change. Nor any discussion. And the commit message makes no sense, there are no clear explanations.
This is an important change that can have lots of consequences with drivers and chip init, specifically in the case where I am stuck.
I believe that nx_open in group_setupidlefiles expects to get fd 0 (because it's the first opened file?) but it does not, because syslog or something else has opened a file before and this breaks an assumption.
I might be seeing different issues. I'll probably put an early breakpoint in fopen to see which values are returned.
--
This group has moved to the Apache Foundation:
http://nuttx.incubator.apache.org/community/
---
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nuttx/53381675-ce05-cf21-818b-5b799c6c2eb1%40lorquet.fr.
--
This group has moved to the Apache Foundation:
http://nuttx.incubator.apache.org/community/
---
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nuttx/53381675-ce05-cf21-818b-5b799c6c2eb1%40lorquet.fr.
Hi,
Complex message.
See below for specific answers.
What is WEIRD is that
-when I put a breakpoint on syslog and continue each time it is reached, the boad boots. So it sounds like a race condition.
-when the function group_setupidlefiles executes, the function inode_search does not find /dev/console. The tree only seems to contain /dev/null
Could that be a cache issue on the stm32h7?
On Fri, May 20, 2022 at 6:09 AM Sebastien Lorquet <seba...@lorquet.fr> wrote:
hi all,
I would be really grateful if any one of you can throw a bit of time at this issue, because I think it could manifest in several boards and not just the nucleo h743.
I could be loosing time on something obvious that you will identify faster than me, because I've not looked at NuttX for a long time.
net init is like this:
nx_start -> up_initialize -> arm_netinitialize -> stm32_ethinitialize -> netdev_register/dev/console is initialized in: arm_serialinit, which is called in up_initialize right before arm_netinitialize
could you ensure "/dev/console" correctly registers after arm_serialinit return?
It does:
register_driver (path=0x802fb18 "/dev/console", fops=0x803019c
<g_serialops>, mode=438, priv=0x24000068
<g_usart3priv>)
at driver/fs_registerdriver.c:72
72 ret = inode_semtake();
(gdb) n
73 if (ret < 0)
(gdb)
78 ret = inode_reserve(path, mode, &node);
(gdb)
79 if (ret >= 0)
(gdb)
85 INODE_SET_DRIVER(node);
(gdb)
87 node->u.i_ops = fops;
(gdb)
88 node->i_private = priv;
(gdb)
89 ret = OK;
(gdb)
92 inode_semgive();
(gdb)
93 return ret;
(gdb)
94 }
(gdb)
And this is executed JUST BEFORE drivers_initialize
I am also seeing that all debug calls all end up in syslog()
Which itself is by default a driver that will lead to file_open("/dev/console"), am I right?
So any debug message emitted in arm_netinitialize/stm32_ethinitialize is executed while the driver subsystem is not initialized yet.
What does syslog do in that case? Is there an early syslog mechanism?
syslog is independent from the file system. You can call syslog from any point(except some sched related code to avoid the infinite recursive).
I have checked that syslog indeed ends up calling low level serial functions without using the VFS.
And this is a recent change, still visible in the current github display for nx_start.c:
https://github.com/apache/incubator-nuttx/commit/a594a5d7a8dcd756884ca7d3449f37ba77f96af3
I cant find a pull request for this change. Nor any discussion. And the commit message makes no sense, there are no clear explanations.
This is an important change that can have lots of consequences with drivers and chip init, specifically in the case where I am stuck.
Here is the related PR:
OK...
I believe that nx_open in group_setupidlefiles expects to get fd 0 (because it's the first opened file?) but it does not, because syslog or something else has opened a file before and this breaks an assumption.
Kernel shouldn't use open/read/write/close, but call file_open/file_read/file_write/file_close instead.
No idea why it's made like that. It's old code.
To view this discussion on the web visit https://groups.google.com/d/msgid/nuttx/CAH2Cfb-v_EHB-JGYz3Ns-JH6LDaqzvfZPRLPE1yXyFDK22jOzA%40mail.gmail.com.
Not related to cache, disabling it changes nothing.
If I setup a breakpoint on syslog() just once, then delete the break point after it is hit, then continue, the board starts
When it boots like that, /dev is fully populated
But when group_setupidlefiles,
only /dev/null is present, so /dev/console is not found, and
then nx_open fails and asserts.
I'm loosing my mind!
Sebastien
To view this discussion on the web visit https://groups.google.com/d/msgid/nuttx/5fb373fd-2683-96a0-98c4-81325b377eff%40lorquet.fr.
new info found just now: the board boots with network info debug when I comment the call to netdev_register.
When I enable netdev_register here:
https://github.com/apache/incubator-nuttx/blob/master/arch/arm/src/stm32h7/stm32_ethernet.c#L4474
, the board hangs with a stack dump pointing to here:
https://github.com/apache/incubator-nuttx/blob/master/sched/init/nx_start.c#L703
But if I set a breakpoint on netdev_register, the board starts correctly after a "continue"
Has anyone ever seen something like that?
Could we have stack limitations with the initial boot stack?
Aaaaand the answer is yes: setting CONFIG_IDLETHREAD_STACKSIZE to
2048 fixes my problem. Same with 1536. But 1024 just hangs.
I did not bisect the exact limit.
I consider the root cause found.
1024 might be a bit too small for the idle thread. Might happen
to someone else...
Sébastien
To view this discussion on the web visit https://groups.google.com/d/msgid/nuttx/ed005028-ec01-d328-f2bc-ed25411b9379%40lorquet.fr.