Re: Ntb Transport Driver Problem After Power-up

19 views
Skip to first unread message

Jon Mason

unread,
Nov 27, 2017, 11:39:16 AM11/27/17
to ThanhTuThai, linux-ntb
(Ccing NTB mailing list)

On Sun, Nov 19, 2017 at 9:36 PM, ThanhTuThai <cruis...@gmail.com> wrote:
> Dear Jon,
>
> We are using Ntb_transport driver from Linux with Microsemi's ntb hardware.
> We get a problem when one peer suddenly power off without removing the
> drivers, after it powers up again, the good peer cannot reconnect with it
> again, the good peer need to reload the drivers in order to reconnect to it.
> I guess the good peer need to re-init some thing in order to catch up with
> another one, but I don't what it is.
>
> I knew that, when one peer starts, the driver will send out a message
> through doorbell, when another peer catch that message, I can announce the
> ntb_transport link-down ( ntb_link_event(&sndev->ntb); ).
>
> But in this case, when one peer power down and up the good peer don't
> receive any message from it in the interrupt (switchtec_ntb_message_isr),
> although I have check that, it already sent out the message.
>
> Do you have any idea about it ?

It sounds like the link down/up isn't working properly. Is the
Microsemi NTB not able to detect a link down?

Thanks,
Jon

>
> Thank you very much !

Logan Gunthorpe

unread,
Nov 27, 2017, 11:47:25 AM11/27/17
to Jon Mason, ThanhTuThai, linux-ntb


On 27/11/17 09:39 AM, Jon Mason wrote:
> It sounds like the link down/up isn't working properly. Is the
> Microsemi NTB not able to detect a link down?

It can detect a link down, but the link doesn't actually go down in a
lot of cases. If the host is just soft rebooted after a crash, I don't
think the link will go down.

Logan

Jon Mason

unread,
Nov 27, 2017, 12:00:10 PM11/27/17
to Logan Gunthorpe, ThanhTuThai, linux-ntb
This is really not optimal :(

We can have a SW watchdog timer to poll it (aka heartbeat) to detect
this, but that's going to eat cycles and could allow for a windows
where the link is down and the sender is writing into oblivion.
Thoughts?

>
> Logan
>
> --
> You received this message because you are subscribed to the Google Groups
> "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to linux-ntb+...@googlegroups.com.
> To post to this group, send email to linu...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/linux-ntb/37fc41d9-5191-6f41-fd65-e175f0c661fa%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

Logan Gunthorpe

unread,
Nov 27, 2017, 12:09:30 PM11/27/17
to Jon Mason, ThanhTuThai, linux-ntb


On 27/11/17 10:00 AM, Jon Mason wrote:
> We can have a SW watchdog timer to poll it (aka heartbeat) to detect
> this, but that's going to eat cycles and could allow for a windows
> where the link is down and the sender is writing into oblivion.
> Thoughts?

I think the easiest way is if we get a link up event, and we already
think the link is up, then we just put the link down before sending a
second link up event. I can probably look at doing something like that
shortly. However, unfortunately, my setup isn't suited to test this as
I'm actually looping back both partitions to a single host :(. I'll
submit a patch that others can test though.

Logan

Jon Mason

unread,
Nov 27, 2017, 12:38:37 PM11/27/17
to Logan Gunthorpe, ThanhTuThai, linux-ntb
Sounds great. ThanhTuThai, can you test this?

>
> Logan
>
> --
> You received this message because you are subscribed to the Google Groups
> "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to linux-ntb+...@googlegroups.com.
> To post to this group, send email to linu...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/linux-ntb/e4c00ca2-3d74-5d88-1824-c517250d164f%40deltatee.com.

ThanhTuThai

unread,
Nov 27, 2017, 6:33:45 PM11/27/17
to Jon Mason, Logan Gunthorpe, linu...@googlegroups.com
No, it doesn't. 

I've also thought about this strategy, and implemented it in my setup. 
For the soft-reboot, it works well, I can receive the link up event when it complete rebooting.
But for power-on reset ( in this case, the switchtec is also power-reset ), I don't receiver any message of linking-up. So the good peer cannot reset link status as mention above. But if I reload the drivers on the good peer, they work perfectly.

Thanks !

On Nov 28, 2017 1:38 AM, "Jon Mason" <jdm...@kudzu.us> wrote:
On Mon, Nov 27, 2017 at 12:09 PM, Logan Gunthorpe <log...@deltatee.com> wrote:
>
>
> On 27/11/17 10:00 AM, Jon Mason wrote:
>>
>> We can have a SW watchdog timer to poll it (aka heartbeat) to detect
>> this, but that's going to eat cycles and could allow for a windows
>> where the link is down and the sender is writing into oblivion.
>> Thoughts?
>
>
> I think the easiest way is if we get a link up event, and we already think
> the link is up, then we just put the link down before sending a second link
> up event. I can probably look at doing something like that shortly. However,
> unfortunately, my setup isn't suited to test this as I'm actually looping
> back both partitions to a single host :(. I'll submit a patch that others
> can test though.

Sounds great.  ThanhTuThai, can you test this?

>
> Logan
>
> --
> You received this message because you are subscribed to the Google Groups
> "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Logan Gunthorpe

unread,
Nov 27, 2017, 6:35:24 PM11/27/17
to ThanhTuThai, Jon Mason, linu...@googlegroups.com


On 27/11/17 04:33 PM, ThanhTuThai wrote:
> No, it doesn't.
>
> I've also thought about this strategy, and implemented it in my setup.
> For the soft-reboot, it works well, I can receive the link up event when
> it complete rebooting.
> But for power-on reset ( in this case, the switchtec is also power-reset
> ), I don't receiver any message of linking-up. So the good peer cannot
> reset link status as mention above. But if I reload the drivers on the
> good peer, they work perfectly.

That's because it's filtering out the link up because it already thinks
the link is up. Please wait until I send a patch and test that. It'll be
in the next day or two.

Thanks,

Logan

ThanhTuThai

unread,
Nov 27, 2017, 6:44:20 PM11/27/17
to Logan Gunthorpe, jdm...@kudzu.us, linu...@googlegroups.com
Ok, sounds good. 

Thank Logan !

Karl Kao

unread,
Nov 27, 2017, 7:14:50 PM11/27/17
to linux-ntb
Would Microsemi turn off the NTB pcie link once the hardware chip is being through a reset?
We can have the chip by default disable NTB link until hardware driver is loaded to make sure either reset or power cycle that the NTB link will be down.

Logan Gunthorpe

unread,
Nov 27, 2017, 8:42:52 PM11/27/17
to Karl Kao, linux-ntb


On 27/11/17 05:14 PM, Karl Kao wrote:
> Would Microsemi turn off the NTB pcie link once the hardware chip is
> being through a reset?
> We can have the chip by default disable NTB link until hardware driver
> is loaded to make sure either reset or power cycle that the NTB link
> will be down.

The hardware chip won't be reset in this situation. That would kill the
other host. The "NTB link" is tracked in the driver as hardware only has
support for link events on each of the ports.

Logan

Karl Kao

unread,
Nov 27, 2017, 9:16:55 PM11/27/17
to linux-ntb
I meant to say the "peer" which was suddenly powered off. Once the "peer" is being reset or power cycled, the peer's hardware chip is supposed to be reset, correct?

Logan Gunthorpe

unread,
Nov 27, 2017, 10:02:36 PM11/27/17
to Karl Kao, linux-ntb


On 2017-11-27 7:16 PM, Karl Kao wrote:
> I meant to say the "peer" which was suddenly powered off. Once the
> "peer" is being reset or power cycled, the peer's hardware chip is
> supposed to be reset, correct

There's only one hardware chip in an NTB configuration. If you reset it,
you reset all the peers (which you do not want to do).

Logan

Karl Kao

unread,
Nov 27, 2017, 10:24:03 PM11/27/17
to linux-ntb
I don't get the point why one system went through a power cycle, and the NTB chip in that system would not be reset.

Karl Kao

unread,
Nov 29, 2017, 6:44:49 PM11/29/17
to linux-ntb
Just realized the current driver supports two partitions in single MicroSemi chip, instead of crosslink. Sorry for the mess up.
In this sense, I am confused that the initial question is about sudden power loss to one system, and the other is alive. What's the configuration of the systems with regard to MicroSemi NTB, isn't it cross link?

Thanks,
Karl
Reply all
Reply to author
Forward
0 new messages