OpenThread NCP / wpantund issues with a RS-422 based NCP

224 views
Skip to first unread message

Stuart Longland

unread,
Sep 15, 2017, 4:31:32 AM9/15/17
to openthread-users
Hi all,

Earlier I asked about using RS-485 as a serial link between the NCP and
wpantund… and it was pointed out that Spinel really does assume a full
duplex link.

For the NCP, we've just taken one of our nodes which are intended to
talk to Modbus energy meters, hence why it features two RS-485 ports.
It is built on the TI CC2538, the transceivers connect to UART0 via the
standard pins (PA0/PA1).

The same serial port code is used on the nodes themselves, and there it
has no issues with communications. Modbus frames are a similar length
to IEEE802.15.4 frames. For the purposes of the NCP, I have made the
buffer larger than what is used on the nodes: 2048 bytes, enough to
handle a full 1280-byte IPv6 datagram with room to spare.

At the RF end, the device has a CC2592 range extender IC like the TI
CC2538-CC2592EM boards. This is configured to be transparent to
OpenThread, the RF core in the CC2538 drives the PA_EN and LNA_EN
signals on the CC2592.

Apart from that, there are no real differences that should matter from
OpenThread's point of view. Some LEDs off I²C, that's about it.

Normally, only one transceiver would be used at a time and the
transmit/receive switching is handled in software. It's possible to
drive one permanently as a transmitter and the other a receiver. Thus
we can do RS-422 communications, giving us a full-duplex link.

Of course I'd prefer to be using RS-232, a USB to TTL-serial chip or USB
CDC-ACM; but time and cost constraints forbid us from changing the
circuit design, and I know of no working USB stack for the CC2538 to do
CDC-ACM (TI's usblib does not compile under gcc… and there's lots of
work porting LUFA).

So it is what it is, we're stuck with RS-422 for the time being. The
NCPs "RS-422" serial port connects to a RS-422 interface (Advantech
PCM-24D2R4) on an industrial PC which runs Debian Jessie (AMD64) and the
OpenThread Border Router.

When it works, it works well. There are no hardware flow control lines
however, and if there's a large surge of traffic, the NCP seems to get
overwhelmed and drops offline.

When it does this, `wpanctl reset` does not recover the unit, I have to
physically yank the power on the NCP and plug it back in again.
Obviously in production, this will not fly. Two different situations
that seem to cause `wpantund` to fall into this mode:

> Sep 15 17:14:56 wsg-74fe481fe117 wpantund[3433]: Experiencing extended insomnia. Resetting internal state.
> Sep 15 17:14:56 wsg-74fe481fe117 wpantund[3433]: wpantund[3433]: Experiencing extended insomnia. Resetting internal state.

> Sep 15 17:19:15 wsg-74fe481fe117 wpantund[3769]: SpinelNCPInstance-Protothreads.cpp:412: Requirement Failed (IS_EVENT_FROM_NCP(event) && GetInstance(this)->mInboundHeader == mLastHeader)
> Sep 15 17:19:16 wsg-74fe481fe117 wpantund[3769]: wpantund[3769]: SpinelNCPInstance-Protothreads.cpp:412: Requirement Failed (IS_EVENT_FROM_NCP(event) && GetInstance(this)->mInboundHeader
> == mLastHeader)

Due to hardware constraints, I am unable to wire up a hard reset signal,
and really I do not feel this should be necessary. Spinel uses a frame
separator byte to synchronise the two ends, it shouldn't matter if a
UART buffer overruns, the code should spot the frame separators, drop
the corrupted frame and move on to the next one, so I'm a little
confused why there's so much emphasis on having hardware reset control
in the Spinel docs.

The CC2538 has a watchdog timer which can hard-reset the device in the
event of a detected failure. Right now I have the NCP counting bytes on
the serial interface, and if it doesn't see data after a minute, it
stops resetting the watchdog.

This doesn't seem to help though, because once `wpantund` has decided
the NCP is dead, it refuses to talk to the NCP. No activity is seen on
the RS-422 link from either end, even when `wpantund` is supposedly
resetting the NCP.

The only thing I can think of is to have `wpantund` and the NCP perhaps
generate a BREAK signal on the serial link: `wpantund` sends BREAK to
reset the NCP, then NCP sends BREAK when it resets.

A second problem I'm having is that `wpantund` seems to want to join the
mesh as an end-device. I need it to join as a router. I see in
`wpanctl get` there's:

> Network:NodeType = "end-device"

but if I try to change that, it tells me:
> wpanctl:wpan0> set Network:NodeType router
> set failed with error 16. PropertyNotFound
> Error 16 (0x10)

Resetting again, doesn't help. It's stuck in end-device mode. How does
one force a NCP to become a router?

Regards,
--
_ ___ Stuart Longland - Systems Engineer
\ /|_) | T: +61 7 3535 9619
\/ | \ | 38b Douglas Street F: +61 7 3535 9699
SYSTEMS Milton QLD 4064 http://www.vrt.com.au

Stuart Longland

unread,
Sep 15, 2017, 4:51:51 AM9/15/17
to openthre...@googlegroups.com
On 15/09/17 18:31, Stuart Longland wrote:
> A second problem I'm having is that `wpantund` seems to want to join the
> mesh as an end-device. I need it to join as a router. I see in
> `wpanctl get` there's:
>
>> Network:NodeType = "end-device"
> but if I try to change that, it tells me:
>> wpanctl:wpan0> set Network:NodeType router
>> set failed with error 16. PropertyNotFound
>> Error 16 (0x10)
> Resetting again, doesn't help. It's stuck in end-device mode. How does
> one force a NCP to become a router?

For reference, these are the ./configure settings I use:

$ …/openthread/configure --host=arm-none-eabi
--prefix=/home/stuartl/vrt/projects/widesky/hub/hal/build/ncp-lib
--with-platform-info=WIDESKYHUBNCP --disable-docs --enable-ncp-app=ftd
--with-ncp-bus=uart --disable-raw-link-api --enable-tmf-proxy
--enable-border-router --enable-joiner

This worked last time I built NCP firmware.

Stuart Longland

unread,
Sep 17, 2017, 8:03:00 PM9/17/17
to openthre...@googlegroups.com
On 15/09/17 18:51, Stuart Longland wrote:
> On 15/09/17 18:31, Stuart Longland wrote:
>> A second problem I'm having is that `wpantund` seems to want to join the
>> mesh as an end-device. I need it to join as a router. I see in
>> `wpanctl get` there's:
>>
>>> Network:NodeType = "end-device"
>> but if I try to change that, it tells me:
>>> wpanctl:wpan0> set Network:NodeType router
>>> set failed with error 16. PropertyNotFound
>>> Error 16 (0x10)
>> Resetting again, doesn't help. It's stuck in end-device mode. How does
>> one force a NCP to become a router?
>
> For reference, these are the ./configure settings I use:
>
> $ …/openthread/configure --host=arm-none-eabi
> --prefix=/home/stuartl/vrt/projects/widesky/hub/hal/build/ncp-lib
> --with-platform-info=WIDESKYHUBNCP --disable-docs --enable-ncp-app=ftd
> --with-ncp-bus=uart --disable-raw-link-api --enable-tmf-proxy
> --enable-border-router --enable-joiner
>
> This worked last time I built NCP firmware.
>

Okay, so this morning, I had a look at the OpenThread commit logs and
saw this…

> commit 36f4f43adc55ef3ed411001915fd10d078c15721
> Author: Abtin Keshavarzian <abt...@google.com>
> Date: Mon Aug 28 21:16:38 2017 -0700
>
> [ncp-uart] fix the bug in uart encoder (#2145)
>
> This commit fixes an issue with the `NcpUart::EncodeAndSendToUart()`
> code where the last message can be removed from `mTxFrameBuffer`
> and if we ran out of buffer and cannot finalize the HDLC encoded
> frame, the final bytes would not be sent until next message is
> queued in `mTxBuffer`.

I wondered if this might be related to my issue, and so figured I'd try
building a firmware image based on this.

Re-building the NCP firmware based on the current OpenThread master
code, I now find I cannot form a mesh with the NCP:
> Sep 18 09:52:18 wsg-74fe481fe117 otbr-agent[937]: NCP property Daemon:ReadyForHostSleep changed.
> Sep 18 09:52:18 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Daemon:ReadyForHostSleep changed.
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: [-NCP-]: NCP was reset (STATUS_RESET_POWER_ON, 112)
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: State change: "offline" -> "uninitialized"
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: ../util/netif-mgmt.c:530: Requirement Failed ((ret) == 0)
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: ../util/netif-mgmt.c:530: Requirement Failed ((ret) == 0)
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: wpantund[486]: [-NCP-]: NCP was reset (STATUS_RESET_POWER_ON, 112)
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: wpantund[486]: State change: "offline" -> "uninitialized"
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: wpantund[486]: ../util/netif-mgmt.c:530: Requirement Failed ((ret) == 0)
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: wpantund[486]: ../util/netif-mgmt.c:530: Requirement Failed ((ret) == 0)
> Sep 18 09:52:18 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property IPv6:MeshLocalAddress changed.
> Sep 18 09:52:18 wsg-74fe481fe117 otbr-agent[937]: NCP property IPv6:MeshLocalAddress changed.
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: State change: "uninitialized" -> "offline"
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: wpantund[486]: State change: "uninitialized" -> "offline"
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: NCP is running "OPENTHREAD/0.01.00; CC2538; Sep 15 2017 15:17:11"
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: Driver is running "0.08.00d (/038e8b0; Sep 15 2017 00:12:23)"
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: wpantund[486]: NCP is running "OPENTHREAD/0.01.00; CC2538; Sep 15 2017 15:17:11"
> Sep 18 09:52:18 wsg-74fe481fe117 wpantund[486]: wpantund[486]: Driver is running "0.08.00d (/038e8b0; Sep 15 2017 00:12:23)"
> Sep 18 09:52:18 wsg-74fe481fe117 otbr-agent[937]: NCP property NCP:Channel changed.
> Sep 18 09:52:18 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property NCP:Channel changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Network:PANID changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Network:PANID changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property NCP:MACAddress changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property NCP:MACAddress changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Network:Name changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Network:Name changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Network:XPANID changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Network:XPANID changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property IPv6:LinkLocalAddress changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property IPv6:LinkLocalAddress changed.
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: wpantund[486]: Network is not joinable
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: Network is not joinable
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: Resetting interface(s). . .
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: ../util/netif-mgmt.c:530: Requirement Failed ((ret) == 0)
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: ../util/netif-mgmt.c:530: Requirement Failed ((ret) == 0)
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: Finished initializing NCP
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: wpantund[486]: Resetting interface(s). . .
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: wpantund[486]: ../util/netif-mgmt.c:530: Requirement Failed ((ret) == 0)
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: wpantund[486]: ../util/netif-mgmt.c:530: Requirement Failed ((ret) == 0)
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: wpantund[486]: Finished initializing NCP
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property NCP:State changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Daemon:ReadyForHostSleep changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property NCP:State changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Daemon:ReadyForHostSleep changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Daemon:ReadyForHostSleep changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Daemon:ReadyForHostSleep changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Network:KeyIndex changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Network:KeyIndex changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Network:PANID changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Network:PANID changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Network:XPANID changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Network:XPANID changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property IPv6:MeshLocalAddress changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property IPv6:MeshLocalAddress changed.
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: State change: "offline" -> "associating"
> Sep 18 09:52:19 wsg-74fe481fe117 wpantund[486]: wpantund[486]: State change: "offline" -> "associating"
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property NCP:State changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property NCP:State changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property NCP:Channel changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property NCP:Channel changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Network:Name changed.
> Sep 18 09:52:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Network:Name changed.
> Sep 18 09:53:19 wsg-74fe481fe117 otWeb[489]: reply is NULL; error: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
> Sep 18 09:53:19 wsg-74fe481fe117 otWeb[489]: reply error: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
> Sep 18 09:53:19 wsg-74fe481fe117 otWeb[489]: wpan service error: 9
> Sep 18 09:53:19 wsg-74fe481fe117 otbr-web[489]: otWeb[489]: reply is NULL; error: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
> Sep 18 09:53:19 wsg-74fe481fe117 otbr-web[489]: otWeb[489]: reply error: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
> Sep 18 09:53:19 wsg-74fe481fe117 otbr-web[489]: otWeb[489]: wpan service error: 9
> Sep 18 09:53:19 wsg-74fe481fe117 wpantund[486]: SpinelNCPTaskForm.cpp:404: Requirement Failed (ncp_state_is_associated(mInstance->get_ncp_state()))
> Sep 18 09:53:19 wsg-74fe481fe117 wpantund[486]: Form failed: 1
> Sep 18 09:53:19 wsg-74fe481fe117 wpantund[486]: State change: "associating" -> "offline"
> Sep 18 09:53:19 wsg-74fe481fe117 otbr-agent[937]: NCP property NCP:State changed.
> Sep 18 09:53:19 wsg-74fe481fe117 otbr-agent[937]: NCP property Daemon:ReadyForHostSleep changed.
> Sep 18 09:53:19 wsg-74fe481fe117 wpantund[486]: wpantund[486]: SpinelNCPTaskForm.cpp:404: Requirement Failed (ncp_state_is_associated(mInstance->get_ncp_state()))
> Sep 18 09:53:19 wsg-74fe481fe117 wpantund[486]: wpantund[486]: Form failed: 1
> Sep 18 09:53:19 wsg-74fe481fe117 wpantund[486]: wpantund[486]: State change: "associating" -> "offline"
> Sep 18 09:53:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property NCP:State changed.
> Sep 18 09:53:19 wsg-74fe481fe117 otbr-agent[937]: otbr-agent[937]: NCP property Daemon:ReadyForHostSleep changed.

Stuart Longland

unread,
Sep 17, 2017, 11:42:55 PM9/17/17
to openthre...@googlegroups.com
On 18/09/17 10:02, Stuart Longland wrote:
> On 15/09/17 18:51, Stuart Longland wrote:
>> On 15/09/17 18:31, Stuart Longland wrote:
> Okay, so this morning, I had a look at the OpenThread commit logs and
> saw this…
>
>> commit 36f4f43adc55ef3ed411001915fd10d078c15721
>> Author: Abtin Keshavarzian <abt...@google.com>
>> Date: Mon Aug 28 21:16:38 2017 -0700
>>
>> [ncp-uart] fix the bug in uart encoder (#2145)
>>
>> This commit fixes an issue with the `NcpUart::EncodeAndSendToUart()`
>> code where the last message can be removed from `mTxFrameBuffer`
>> and if we ran out of buffer and cannot finalize the HDLC encoded
>> frame, the final bytes would not be sent until next message is
>> queued in `mTxBuffer`.
>
> I wondered if this might be related to my issue, and so figured I'd try
> building a firmware image based on this.
>
This might be useful for those like myself who are implementing their
own CC2538 UART driver and interfacing that to OpenThread… but two
things I found helped (time will tell if it actually fixes the problems):

1. Enable the RT interrupt. This fires if the data is received by the
UART and left in the FIFO but no further data appears after 32 bit
periods. It was not apparent that this should be necessary which is why
I left it off, but it would seem it helps in the NCP case.

2. *DON'T* manually clear RX and RT interrupts. They'll get cleared
once the FIFO is empty. Manually clearing it at the end of the
interrupt runs the risk that a byte slips in between you clearing the
FIFO and setting ICR.

Touch wood (well, probably chipboard in this case, who knows what they
make desks from these days) the NCP seems to be holding up.

Regards,
Reply all
Reply to author
Forward
0 new messages