Kernel crash in ni_find_route_source with a start_src=TRIG_EXT with NI 6251

42 views
Skip to first unread message

Éric Piel

unread,
Dec 13, 2019, 11:15:22 AM12/13/19
to Comedi: Linux Control and Measurement Device Interface
Hello,
While updating the kernel on our systems, I've found a bug that seems to have been introduced (most likely) by the new NI routing functionality that came in kernel 4.20.
Kernel 4.19 is not affected, and every kernel 4.20+ (up to 5.4.2, at least) is affected.

I'm attaching an example program that triggers the issue. Basically, on affected kernels, it will cause a kernel crash when calling prepare_command() with a AO command which has start_src=TRIG_EXT. All the start_arg values that I've tried caused a crash, although in our specific case we use the (old) NI_TRIG_AI_START1.
To run:
python wao-trig.py


(You'll need a NI board, and I don't know if this happens with all the boards, as we only have NI 6251 hanging around). I'm attaching the output of comedi_board_info, in case that's relevant.

Please let me know if there is any way that I can help further debugging/fixing this issue.

The kernel trace looks like this:
[   44.685167] BUG: unable to handle page fault for address: 0000000000008d18
[   44.685172] #PF: supervisor read access in kernel mode
[   44.685174] #PF: error_code(0x0000) - not-present page
[   44.685176] PGD 0 P4D 0
[   44.685180] Oops: 0000 [#1] SMP PTI
[   44.685184] CPU: 1 PID: 2102 Comm: python2.7 Tainted: G         C        5.4.2-050402-generic #201912042231
[   44.685186] Hardware name: Supermicro X11SSQ/X11SSQ, BIOS 2.3 09/17/2018
[   44.685196] RIP: 0010:ni_find_route_source+0x38/0x50 [ni_routing]
[   44.685199] Code: e5 81 fe d1 00 00 00 77 2e 69 f6 d2 00 00 00 48 8b 4a 08 83 cf 80 31 c0 eb 0a 83 c0 01 3d d2 00 00 00 74 13 8d 14 06 48 63 d2 <40> 38 3c 11 75 ea 05 00 80 00 00 5d c3 b8 ea ff ff ff 5d c3 0f 1f
[   44.685201] RSP: 0018:ffffac508146bd40 EFLAGS: 00010246
[   44.685204] RAX: 0000000000000000 RBX: ffffac508146bda8 RCX: 0000000000000000
[   44.685205] RDX: 0000000000008d18 RSI: 0000000000008d18 RDI: 00000000ffffff93
[   44.685207] RBP: ffffac508146bd40 R08: 0000000000000000 R09: 0000000000000003
[   44.685209] R10: 0000000000010000 R11: ffffffffc0760b80 R12: ffff9af1268e1100
[   44.685210] R13: 0000000000000000 R14: ffff9af12b970000 R15: ffffffffc07604c0
[   44.685213] FS:  00007f7e99fdb700(0000) GS:ffff9af132480000(0000) knlGS:0000000000000000
[   44.685215] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   44.685217] CR2: 0000000000008d18 CR3: 00000003e516c004 CR4: 00000000003606e0
[   44.685218] Call Trace:
[   44.685226]  ni_ao_cmdtest+0x1df/0x320 [ni_pcimio]
[   44.685232]  comedi_unlocked_ioctl+0x5bc/0xe10 [comedi]
[   44.685239]  do_vfs_ioctl+0x407/0x670
[   44.685243]  ? do_user_addr_fault+0x216/0x450
[   44.685246]  ksys_ioctl+0x67/0x90
[   44.685250]  __x64_sys_ioctl+0x1a/0x20
[   44.685254]  do_syscall_64+0x57/0x190
[   44.685259]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   44.685261] RIP: 0033:0x7f7f523675d7
[   44.685264] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[   44.685266] RSP: 002b:00007f7e99fd9798 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   44.685268] RAX: ffffffffffffffda RBX: 00007f7f4002da60 RCX: 00007f7f523675d7
[   44.685270] RDX: 00007f7ef0185170 RSI: ffffffff8050640a RDI: 000000000000003a
[   44.685271] RBP: 00007f7ef0185170 R08: 00007f7f347f6d20 R09: 0000000000000000
[   44.685273] R10: 000000000000003a R11: 0000000000000246 R12: 00000000ffffffff
[   44.685275] R13: 00007f7f50742de0 R14: 00007f7f44090170 R15: 00007f7f440e6368
[   44.685277] Modules linked in: intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp kvm_intel kvm irqbypass snd_hda_intel snd_intel_nhlt snd_hda_codec crct10dif_pclmul snd_hda_core crc32_pclmul ghash_clmulni_intel snd_hwdep snd_pcm snd_seq_midi aesni_intel snd_seq_midi_event crypto_simd snd_rawmidi cryptd glue_helper snd_seq snd_seq_device intel_cstate i915 intel_rapl_perf snd_timer snd soundcore ni_pcimio(C) drm_kms_helper ni_tiocmd(C) mite(C) drm input_leds fb_sys_fops comedi_pci(C) cdc_acm mei_me syscopyarea ni_routing(C) mei sysfillrect ni_tio(C) intel_pch_thermal sysimgblt comedi_8255(C) comedi(C) acpi_pad mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid igb e1000e i2c_algo_bit dca ahci libahci pinctrl_sunrisepoint video pinctrl_intel
[   44.685316] CR2: 0000000000008d18
[   44.685319] ---[ end trace 0f7df4360d57c0dc ]---
[   44.728051] RIP: 0010:ni_find_route_source+0x38/0x50 [ni_routing]
[   44.728053] Code: e5 81 fe d1 00 00 00 77 2e 69 f6 d2 00 00 00 48 8b 4a 08 83 cf 80 31 c0 eb 0a 83 c0 01 3d d2 00 00 00 74 13 8d 14 06 48 63 d2 <40> 38 3c 11 75 ea 05 00 80 00 00 5d c3 b8 ea ff ff ff 5d c3 0f 1f
[   44.728054] RSP: 0018:ffffac508146bd40 EFLAGS: 00010246
[   44.728056] RAX: 0000000000000000 RBX: ffffac508146bda8 RCX: 0000000000000000
[   44.728057] RDX: 0000000000008d18 RSI: 0000000000008d18 RDI: 00000000ffffff93
[   44.728057] RBP: ffffac508146bd40 R08: 0000000000000000 R09: 0000000000000003
[   44.728058] R10: 0000000000010000 R11: ffffffffc0760b80 R12: ffff9af1268e1100
[   44.728059] R13: 0000000000000000 R14: ffff9af12b970000 R15: ffffffffc07604c0
[   44.728060] FS:  00007f7e99fdb700(0000) GS:ffff9af132480000(0000) knlGS:0000000000000000
[   44.728061] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   44.728062] CR2: 0000000000008d18 CR3: 00000003e516c004 CR4: 00000000003606e0



Best,
Éric
wao-trig.py
comedi_board_info.txt

Ian Abbott

unread,
Dec 16, 2019, 9:48:29 AM12/16/19
to comed...@googlegroups.com, Spencer E. Olson

That's not good. From your "comedi_board_info.txt", I see the board name is "pcie-6251". The "ni_routing" module doesn't contain any routing information for that board, but it does for "pci-6251", "pxi-6251" and "pxie-6251". I guess this is related to the problem.

Do you see something in the kernel log similar to this?:

... ni_E_init: pcie-6251 device has no signal routing table.
... ni_E_init: High level NI signal names will not be available for this pcie-6251 board.

That would confirm the routing information is missing.

I think the crash in ni_find_route_source() is because tables->route_values is NULL, i.e. a null pointer dereference.

I guess ni_find_route_source() (and maybe some other functions) ought to return -EINVAL if the route_values pointer is NULL. That will still leave the TRIG_EXT sources non-working, but it should at least avoid the kernel crash!

I'd like some advice from Spencer on how to fix this (if he's still around). Obviously, ni_find_route_source() needs to check the route_values pointer. Currently, both the route_values pointer and the valid_routes pointer get set to NULL by ni_assign_device_routes() and ni_find_device_routes() if no board-specific routes are found. However, the route_values pointer only depends on the device family ("ni_mseries"), not on the board name. Perhaps ni_find_device_routes() should allow the route_values pointer to be filled in even if the valid_routes pointer cannot be filled in.

Best regards,

Ian

-- 
-=( Ian Abbott <abb...@mev.co.uk> || Web: www.mev.co.uk )=-
-=( MEV Ltd. is a company registered in England & Wales. )=-
-=( Registered number: 02862268.  Registered address:    )=-
-=( 15 West Park Road, Bramhall, STOCKPORT, SK7 3JZ, UK. )=-

Éric Piel

unread,
Dec 16, 2019, 10:09:29 AM12/16/19
to comed...@googlegroups.com, Spencer E. Olson
Hi Ian,

On 16/12/2019 15:48, Ian Abbott wrote:
> That's not good. From your "comedi_board_info.txt", I see the board name
> is "pcie-6251". The "ni_routing" module doesn't contain any routing
> information for that board, but it does for "pci-6251", "pxi-6251" and
> "pxie-6251". I guess this is related to the problem.
That gives a quite a bit of hope, because I think the pcie version is
identical to the other ones, so it'll hopefully be easy to fix :-)

>
> Do you see something in the kernel log similar to this?:
>
> ... ni_E_init: pcie-6251 device has no signal routing table.
> ... ni_E_init: High level NI signal names will not be available for this pcie-6251 board.
>
> That would confirm the routing information is missing.
Indeed:
[ 4.081277] comedi comedi0: ni_E_init: pcie-6251 device has no signal
routing table.
[ 4.081278] comedi comedi0: ni_E_init: High level NI signal names
will not be available for this pcie-6251 board.


Cheers,
Éric Piel

--
Software architect

DELMIC B.V.
Kanaalweg 4
2628EB Delft
The Netherlands

Office: +31 15 744 0158
Mobile: +31 6 2437 9135

Spencer Olson

unread,
Dec 16, 2019, 10:08:34 PM12/16/19
to Éric Piel, Comedi: Linux Control and Measurement Device Interface
I'm still around.  On some travel right now.  I think I can give a better response either tomorrow night or at least by this weekend...

Éric Piel

unread,
Jan 17, 2020, 11:18:41 AM1/17/20
to Spencer Olson, Comedi: Linux Control and Measurement Device Interface, Ian Abbott
Hi,
Any update on this issue?

I've looked at the ni_routing. It seems it should be quite
straightforward to create a pcie-6251.csv (and also a pcie-6259.csv?) as
a copy of their pci- counterpart. I'm just a little confused as to why
the pxi-6251 is a little different from the pci- and pxie- versions (ie,
it's missing CtrGate(1) and PauseTrigger). Is that real difference in
the hardware? How did you find out? Is this routing documented somewhere
by NI in some public document? Or did you run some NI tool to see it?

If that's fine, I can send a patch to add the routing for the two pcie
boards. Ian, let me know if you're interested.

In parallel, it'd be still good to fix the crash when no routing info is
available.

Cheers,
Éric
Best regards,

Ian Abbott

unread,
Jan 17, 2020, 12:55:52 PM1/17/20
to Éric Piel, Spencer Olson, Comedi: Linux Control and Measurement Device Interface
On 17/01/2020 16:18, Éric Piel wrote:
> Hi,
> Any update on this issue?

I sent a couple of patches a few days ago. They are in the
"staging-linus" branch of Greg-KH's staging repo:

https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git/log/?h=staging-linus

"staging: comedi: ni_routes: fix null dereference in
ni_find_route_source()":

https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git/commit/?h=staging-linus&id=01e20b664f808a4f3048ca3f930911fd257209bd

"staging: comedi: ni_routes: allow partial routing information":

https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git/commit/?h=staging-linus&id=9fea3a40f6b07de977a2783270c8c3bc82544d45

Depending on timing, these might end up in 5.5.0, but if not, they will
end up in 5.6.0. They are also marked for inclusion in "stable" kernels
4.20 onwards, so should end up in 5.4.x and 5.5.x (if not 5.5.0)
eventually (and maybe distro maintained 5.3.x and earlier kernels).

One of the patches fixes the null pointer dereference, returning an
error instead of crashing.

The other patch allows the "route_values" pointer to be set correctly
even if the "valid_routes" pointer cannot be set correctly, or vice
versa. The "route_values" pointer is the one that had the null
dereference bug and it only depends on matching the device family
("ni_mseries", "ni_eseries" or "ni_660x"). The "valid_routes" pointer
is the one that needs the board-specific routing information to be
available. Both pointers need to be set for the routing functionality
to be enabled.

The upshot is that as long as the device family information is available
(which it is), you should be able to set up comedi commands to use
external trigger soources, but unless the board-specific information is
available, you won't be able to configure the routing of the external
triggers.

> I've looked at the ni_routing. It seems it should be quite
> straightforward to create a pcie-6251.csv (and also a pcie-6259.csv?) as
> a copy of their pci- counterpart. I'm just a little confused as to why
> the pxi-6251 is a little different from the pci- and pxie- versions (ie,
> it's missing CtrGate(1) and PauseTrigger). Is that real difference in
> the hardware? How did you find out? Is this routing documented somewhere
> by NI in some public document? Or did you run some NI tool to see it?

I am also confused by that. The pci-6251, pxi-6251 and pxie-6251 routes
are all subtly different, as are the pci-6733 and pxi-6733 routes. I
was hopeful that the pcie (or pxie) models with missing data could be
aliased to the matching pci (or pxi) models by adding an optional
"alternate routing name" to the board data, but it seems that would
result in incorrect routing data being used.

> If that's fine, I can send a patch to add the routing for the two pcie
> boards. Ian, let me know if you're interested.

If you can work out how to do it (or liase with Spencer for clues), that
would be great. There are some tools within the Linux sources'
"drivers/staging/comedi/drivers/ni_routing/tools" directory to convert
the information in the .c files into .csv spreadsheets and back again
using Python. See the README in the "ni_routing" directory.

> In parallel, it'd be still good to fix the crash when no routing info is
> available.

That part is in progress at least.

>
> Cheers,
> Éric

Kind regards,
Ian

Éric Piel

unread,
Jan 20, 2020, 11:12:42 AM1/20/20
to Ian Abbott, Spencer Olson, Comedi: Linux Control and Measurement Device Interface
Hi Ian,

On 17/01/2020 18:55, Ian Abbott wrote:
> I sent a couple of patches a few days ago.  They are in the
> "staging-linus" branch of Greg-KH's staging repo:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git/log/?h=staging-linus

That's great! Actually, I see they've just been picked up by Linus in
5.5-rc7, so that should help with a quick backport :-)

> If you can work out how to do it (or liase with Spencer for clues), that
> would be great.  There are some tools within the Linux sources'
> "drivers/staging/comedi/drivers/ni_routing/tools" directory to convert
> the information in the .c files into .csv spreadsheets and back again
> using Python.  See the README in the "ni_routing" directory.
The README is very clear, so that was pretty straightforward.
Please find attached a patch that should add routing support for the
PCIe 6251 and 6259.
linux-staging-comedi-ni_routes-add-routes-for-NI-PCIe-6521.patch

Spencer Olson

unread,
Jan 20, 2020, 11:56:10 AM1/20/20
to Éric Piel, Ian Abbott, Comedi: Linux Control and Measurement Device Interface
Sorry for my lack of correspondence recently. If I understood the
questions/confusions correctly, you both seemed to want to know about
the source of the information for the routing tables.
1) Source of info for route_values for particular device families is
primarily from any and all documentation that I could find on the web
for register-level programming or other various hard-to-find documents
and also the MHDDK from NI. I have collected quite a few of these as
I'm sure that you both have already done. I wish we could collect
these all into a repository for everyone to refer to, since they are
rather hard to track down.

2) Source of info for device routes:
The *only* source of this information that I've found is reliable is
the view that is presented from NI MAX. There does not seem to be any
way of tabulating this information programmatically in NIDAQmx and I
cannot find it specifically tabulated/documented in any other location
(spec sheets or whatever). This is the reason why, a while ago, I
asked for volunteers in the group to provide me screenshots of as many
devices as possible from NI max. I've kept a copy of all these
screenshots within a directory of related experimental timing software
I've been developing for some time
(https://github.com/afrl-quantum/arbwave/tree/master/python/arbwave/backend/drivers/nidaqmx/available-routes).
There were several devices that folks sent screenshots for that
required stitching images together into one table (that is why some of
the images look a bit edited--they were).
Using the NI MAX screenshots as the source of information is one of
the primary reasons for the formatting that I chose and the tools that
I wrote to convert c<->python: this way I could more easily visually
compare the extracted table from the screenshots.

It is amusing to me that, with this set of data that we are now
including in the ni_routing module, we are providing a capability that
is unique to comedi--NIDAmx does not support programmatic access to
routing information (you have to try it before you know whether it
works). When my software uses NIDAQmx, I basically have to use
similar representations of the NI MAX sceenshots to provide this info
to my NIDAQmx wrapper driver.

Finally: I would *very much* appreciate any other contributions to
the set of screenshots from NI MAX (both for comedi ni_routing and
also for my own software). I'll add any that I can collect to my
collection for later reference.

Hopefully this is helpful.

Ian Abbott

unread,
Jan 21, 2020, 5:43:17 AM1/21/20
to Éric Piel, Spencer Olson, Comedi: Linux Control and Measurement Device Interface
On 20/01/2020 16:12, Éric Piel wrote:
> Hi Ian,
>
> On 17/01/2020 18:55, Ian Abbott wrote:
>> I sent a couple of patches a few days ago.  They are in the
>> "staging-linus" branch of Greg-KH's staging repo:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git/log/?h=staging-linus
>
>
> That's great! Actually, I see they've just been picked up by Linus in
> 5.5-rc7, so that should help with a quick backport :-)

They are in the 5.4 queue too, so they should appear in 5.4.14.

>> If you can work out how to do it (or liase with Spencer for clues),
>> that would be great.  There are some tools within the Linux sources'
>> "drivers/staging/comedi/drivers/ni_routing/tools" directory to convert
>> the information in the .c files into .csv spreadsheets and back again
>> using Python.  See the README in the "ni_routing" directory.
> The README is very clear, so that was pretty straightforward.
> Please find attached a patch that should add routing support for the
> PCIe 6251 and 6259.

So that patch copies the routes as-is from PCI 6251 and PCI 6259. Do we
know if that is the correct thing to do, given the differences between
pxi-6251 and pxie-6251?

I'm happy to forward on the patch. I'd just like to know if it's
correct first.

Éric Piel

unread,
Jan 21, 2020, 8:26:33 AM1/21/20
to Ian Abbott, Spencer Olson, Comedi: Linux Control and Measurement Device Interface
Hi,
On 21/01/2020 11:43, Ian Abbott wrote:
>> The README is very clear, so that was pretty straightforward.
>> Please find attached a patch that should add routing support for the
>> PCIe 6251 and 6259.
>
> So that patch copies the routes as-is from PCI 6251 and PCI 6259.  Do we
> know if that is the correct thing to do, given the differences between
> pxi-6251 and pxie-6251?
Looking a little bit more at these files, to me it seems that pxi-6251
is the odd one. pci-6251 and pxie-6251 are very similar, except for one
PXI signal missing on the PCI version. So I've assumed that the PCIe is
similar.

> I'm happy to forward on the patch.  I'd just like to know if it's
> correct first.
Fair enough ;-)

From yesterday's email by Spencer, it seems that the most reliable way
is to look at the matrix in NI MAX (Measurement & Automation eXplorer).
However, as far as I understand it's only available on Windows... and
our systems only have Ubuntu. I can try to find a system where I could
temporarily install Windows and the NI software, to check the PCIe
6251... but don't hold your breath ;-)

Spencer Olson

unread,
Jan 21, 2020, 9:59:18 AM1/21/20
to Éric Piel, Ian Abbott, Comedi: Linux Control and Measurement Device Interface
I keep at least one install around for that purpose.  I acquired several older cards from our surplus organization and the contributions from others on the Comedi group were very helpful.

Éric Piel

unread,
Feb 6, 2020, 4:53:42 AM2/6/20
to Ian Abbott, Spencer Olson, Comedi: Linux Control and Measurement Device Interface
On 21/01/2020 14:26, Éric Piel wrote:
:
>> I'm happy to forward on the patch.  I'd just like to know if it's
>> correct first.
> Fair enough ;-)
>
> From yesterday's email by Spencer, it seems that the most reliable way
> is to look at the matrix in NI MAX (Measurement & Automation eXplorer).
> However, as far as I understand it's only available on Windows... and
> our systems only have Ubuntu. I can try to find a system where I could
> temporarily install Windows and the NI software, to check the PCIe
> 6251... but don't hold your breath ;-)
Hello,
I've eventually found some time to install Windows & NI MAX on a system.
You can find a screenshot of the reported routes in attachment.
(Spencer, you can add it to your collection if you want ;-) ). The
conclusion is that it's exactly as expected. So the patch I've sent
previously should be correct (for the PCIe 6251). Hopefully, for the
PCIe 6259, there is also no surprise, but as we don't have any of such
card, I really won't be able to check.
Screenshot ni max pcie6251.png

Spencer Olson

unread,
Feb 6, 2020, 4:55:11 AM2/6/20
to Éric Piel, Ian Abbott, Comedi: Linux Control and Measurement Device Interface
Great. Thanks

Ian Abbott

unread,
Feb 7, 2020, 9:12:06 AM2/7/20
to Éric Piel, Spencer Olson, Comedi: Linux Control and Measurement Device Interface
On 06/02/2020 09:53, Éric Piel wrote:
> On 21/01/2020 14:26, Éric Piel wrote:
> :
>>> I'm happy to forward on the patch.  I'd just like to know if it's
>>> correct first.
>> Fair enough ;-)
>>
>>  From yesterday's email by Spencer, it seems that the most reliable
>> way is to look at the matrix in NI MAX (Measurement & Automation
>> eXplorer). However, as far as I understand it's only available on
>> Windows... and our systems only have Ubuntu. I can try to find a
>> system where I could temporarily install Windows and the NI software,
>> to check the PCIe 6251... but don't hold your breath ;-)
> Hello,
> I've eventually found some time to install Windows & NI MAX on a system.
> You can find a screenshot of the reported routes in attachment.
> (Spencer, you can add it to your collection if you want ;-) ). The
> conclusion is that it's exactly as expected. So the patch I've sent
> previously should be correct (for the PCIe 6251). Hopefully, for the
> PCIe 6259, there is also no surprise, but as we don't have any of such
> card, I really won't be able to check.

Thanks for checking that, Éric. Let's assume the PCIe-6259 routing info
is also identical to PCI-6259 for now.

Rather than adding the duplicate routing information, we can save memory
by allowing an alternate board name to be specified for routing purposes
in case the routing information for the actual board name cannot be
found. I'm about to send off some patches to the
de...@driverdev.osuosl.org mailing list (for the "staging" subsystem) to
implement that. I'll Cc: Éric and Spencer on the patch emails.

Best regards,
Ian
Reply all
Reply to author
Forward
0 new messages