Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#893393: linux-image-amd64: Kernel panic on active outgoing traffic through Huawei E173 modem in NDIS (CDC) mode

40 views
Skip to first unread message

Горбешко Богдан

unread,
Mar 18, 2018, 10:50:02 AM3/18/18
to
Package: linux-image-amd64
Version: 4.14+89
Severity: critical
Justification: breaks the whole system

Dear Maintainer,

This bothers me from November 2017, when wvdial broke and I moved to
NetworkManager. While wvdial uses only serial interface (ttyUSB),
NetworkManager sometimes recognizes the modem as ttyUSB and sometimes as
cdc-
wdm. So maybe the bug is much older as I was not actively using
huawei_cdc_nbm
module before.

Since that, I started to experience strange system crashes. The only common
thing for them is that HDD activity stops and the cooler keeps working; the
system doesn't respond to anything including REISUB. The screen image was
simply freezing for first weeks, then it started cluttering when crash
happens.

I was not sure if this is a software problem or a hardware one. I
couldn't even
strictly determine what conditions lead to this. The only mostly common
thing
was that it happens on active outgoing traffic (file uploading, torrents
seeding and so). But not sure if every time. Sometimes the issue huddled
and I
could calmly upload large files for several days or even several weeks, but
then crashes started happening again.

People on a forum suggested me to install crash/kdump. Sometimes kdump
triggers
on kernel panic, sometimes it doesn't and I still get an unresponsive system
with a cluttered screen. When it triggers, systemd tries to start the
bunch of
services in a small amount of RAM, so it proceeds very slowly and
finally hangs
or fails to the maintenance mode because of expired timeouts. Today I
found out
that in maintenance mode I still can run the kdump service and successfully
collect the kernel dump and dmesg.

[60103.825970] BUG: unable to handle kernel paging request at
ffff9641f2004000
[60103.825998] IP: __memset+0x24/0x30
[60103.826001] PGD a6a06067 P4D a6a06067 PUD 4f65a063 PMD 72003063 PTE 0
[60103.826013] Oops: 0002 [#1] SMP NOPTI
[60103.826018] Modules linked in: iptable_filter option huawei_cdc_ncm
cdc_wdm
cdc_ncm usbnet usb_wwan usbserial mii lz4 lz4_compress zram zsmallo
c cpufreq_userspace cpufreq_powersave cpufreq_conservative rtsx_usb_ms
memstick
uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobu
f2_core videodev media arc4 brcmsmac cordic brcmutil b43 mac80211
binfmt_misc
cfg80211 fuse xfs ssb libcrc32c rng_core pcmcia pcmcia_core snd_hda_
codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_codec
snd_hda_core
kvm_amd snd_hwdep kvm snd_pcm_oss snd_mixer_oss joydev irqbypass pcs
pkr snd_pcm bcma serio_raw ideapad_laptop sparse_keymap rfkill k10temp sg
snd_timer wmi snd shpchp sp5100_tco battery ac soundcore evdev acpi_cpuf
req vboxdrv(O) squashfs loop parport_pc ppdev lp parport sunrpc
binder_linux(O)
[60103.826105]  ashmem_linux(O) ip_tables x_tables autofs4 ext4 crc16
mbcache
jbd2 crc32c_generic fscrypto ecb crypto_simd cryptd glue_helper
aes_x86_64 uas
usb_storage sr_mod sd_mod cdrom rtsx_usb_sdmmc mmc_core rtsx_usb mfd_core
amdkfd radeon psmouse ohci_pci ahci libahci i2c_algo_bit ttm atl1c libata
drm_kms_helper ohci_hcd ehci_pci ehci_hcd i2c_piix4 scsi_mod drm usbcore
usb_common video button thermal
[60103.826158] CPU: 0 PID: 5990 Comm: Chrome_DevTools Tainted: G           O
4.14.0-3-amd64 #1 Debian 4.14.17-1
[60103.826162] Hardware name: LENOVO 20081                          
/Inagua,
BIOS 41CN28WW(V2.04) 05/03/2012
[60103.826166] task: ffff964193484fc0 task.stack: ffffb2890137c000
[60103.826171] RIP: 0010:__memset+0x24/0x30
[60103.826174] RSP: 0000:ffff964316c03b68 EFLAGS: 00010216
[60103.826178] RAX: 0000000000000000 RBX: 00000000fffffffd RCX:
000000001ffa5000
[60103.826181] RDX: 0000000000000005 RSI: 0000000000000000 RDI:
ffff9641f2003ffc
[60103.826184] RBP: ffff964192f6c800 R08: 00000000304d434e R09:
ffff9641f1d2c004
[60103.826187] R10: 0000000000000002 R11: 00000000000005ae R12:
ffff9642e6957a80
[60103.826190] R13: ffff964282ff2ee8 R14: 000000000000000d R15:
ffff9642e4843900
[60103.826194] FS:  00007f395aaf6700(0000) GS:ffff964316c00000(0000)
knlGS:0000000000000000
[60103.826197] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60103.826200] CR2: ffff9641f2004000 CR3: 0000000013b0c000 CR4:
00000000000006f0
[60103.826204] Call Trace:
[60103.826212]  <IRQ>
[60103.826225]  cdc_ncm_fill_tx_frame+0x5e3/0x740 [cdc_ncm]
[60103.826236]  cdc_ncm_tx_fixup+0x57/0x70 [cdc_ncm]
[60103.826246]  usbnet_start_xmit+0x5d/0x710 [usbnet]
[60103.826254]  ? netif_skb_features+0x119/0x250
[60103.826259]  dev_hard_start_xmit+0xa1/0x200
[60103.826267]  sch_direct_xmit+0xf2/0x1b0
[60103.826273]  __dev_queue_xmit+0x5e3/0x7c0
[60103.826280]  ? ip_finish_output2+0x263/0x3c0
[60103.826284]  ip_finish_output2+0x263/0x3c0
[60103.826289]  ? ip_output+0x6c/0xe0
[60103.826293]  ip_output+0x6c/0xe0
[60103.826298]  ? ip_forward_options+0x1a0/0x1a0
[60103.826303]  tcp_transmit_skb+0x516/0x9b0
[60103.826309]  tcp_write_xmit+0x1aa/0xee0
[60103.826313]  ? sch_direct_xmit+0x71/0x1b0
[60103.826318]  tcp_tasklet_func+0x177/0x180
[60103.826325]  tasklet_action+0x5f/0x110
[60103.826332]  __do_softirq+0xde/0x2b3
[60103.826337]  irq_exit+0xae/0xb0
[60103.826342]  do_IRQ+0x81/0xd0
[60103.826347]  common_interrupt+0x98/0x98
[60103.826351]  </IRQ>
[60103.826355] RIP: 0033:0x7f397bdf2282
[60103.826358] RSP: 002b:00007f395aaf57d8 EFLAGS: 00000206 ORIG_RAX:
ffffffffffffff6e
[60103.826362] RAX: 0000000000000000 RBX: 00002f07bc6d0900 RCX:
00007f39752d7fe7
[60103.826365] RDX: 0000000000000022 RSI: 0000000000000147 RDI:
00002f07baea02c0
[60103.826368] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000000000000
[60103.826371] R10: 00000000ffffffff R11: 0000000000000000 R12:
00002f07baea02c0
[60103.826373] R13: 00002f07bba227a0 R14: 00002f07bc6d090c R15:
0000000000000000
[60103.826377] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 49 89 f9 48 89
d1 83
e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6
<f3> 48
ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1
[60103.826442] RIP: __memset+0x24/0x30 RSP: ffff964316c03b68
[60103.826444] CR2: ffff9641f2004000



-- System Information:
Debian Release: buster/sid
  APT prefers oldoldstable-updates
  APT policy: (500, 'oldoldstable-updates'), (500, 'oldoldstable'),
(500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.14.0-3-amd64 (SMP w/2 CPU cores)
Locale: LANG=ru_UA.UTF-8, LC_CTYPE=ru_UA.UTF-8 (charmap=UTF-8),
LANGUAGE=ru_UA:ru (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages linux-image-amd64 depends on:
ii  linux-image-4.14.0-3-amd64  4.14.17-1

linux-image-amd64 recommends no packages.

linux-image-amd64 suggests no packages.

-- no debconf information

Bjørn Mork

unread,
Mar 18, 2018, 8:40:04 PM3/18/18
to
Горбешко Богдан <bodqh...@gmail.com> writes:

> vboxdrv(O)
> binder_linux(O)
> ashmem_linux(O)

Can you reproduce the problem without these modules loaded?

AFAICS there is no way the only memset in cdc_ncm can be called with
crashing input parameters. Unless something is scribbling over the
driver's data.


Bjørn

Горбешко Богдан

unread,
Mar 19, 2018, 10:10:03 AM3/19/18
to
On 3/19/18 2:18 AM, Bjørn Mork wrote:
> Горбешко Богдан <bodqh...@gmail.com> writes:
>
>> vboxdrv(O)
>> binder_linux(O)
>> ashmem_linux(O)
> Can you reproduce the problem without these modules loaded?
ashmem/binder were installed only 3 weeks ago. And Virtualbox VMs were
run last time in July 2017, nothing other is expected to use its kernel
module; however I'll try to blacklist it for now.
>
> AFAICS there is no way the only memset in cdc_ncm can be called with
> crashing input parameters. Unless something is scribbling over the
> driver's data.
Maybe inspecting the crashdump would shed some light on the possible
module conflict? If so, I'll try to upload it.
>
> Bjørn
>
>

Phil

unread,
Jun 27, 2018, 3:50:02 PM6/27/18
to
Hi everybody,

I'm really greatful about stumbling upon this issue, because it
describes the exact same issue I've been experiencing for a while now.

Basically whenever I upload file/s via. rsync/Firefox/Chromium, within
several seconds my entire Linux system crashes. I've experienced this
issue on Debian 10, but it also shows up on ArchLinux. In my case the
modem in charge is an M.2. module Huawei ME906s (USB ID 12d1:15c1).

I've also tried debugging via. kdump and I've got different kernel
errors across multiple crashes and I've tried logging my debugging issue
resolving problems on this gist [0].

It doesn't matter if I'm uploading files from a ramfs (/tmp/) or my SATA
SSD.

I'm also using modemmanager and network-manager.

I switched ISP and thought the issue was resolved, but I've just tried
uploading a file again and it still crashes my Linux 4.17.2-1-ARCH
kernel (so I guess this is a Linux and not Debian only related issue).

[0]: https://gist.github.com/norpol/d5b043d6082ace9fc232527d4835f045 or
attachment
README.md

Bjørn Mork

unread,
Jun 29, 2018, 4:40:02 AM6/29/18
to
This issue should be fixed by commit

49c2c3f246e2 ("cdc_ncm: avoid padding beyond end of skb")

which has been backported to v4.17.3, v4.16.18 and v4.14.52. Please
check again with one of those kernel versions (or newer).

I see now that the fix doesn't apply cleanly to v4.9 stable due to
unrelated context changes. I'll go fix that and resubmit a backport for
v4.9, so we get the fix into "stretch" too. Thanks for reminding me.



Bjørn

Phil

unread,
Jul 6, 2018, 3:10:03 PM7/6/18
to
On Fri, 29 Jun 2018 10:17:20 +0200 =?utf-8?Q?Bj=C3=B8rn_Mork?=
<bm...@telenor.net> wrote:
> This issue should be fixed by commit
> 49c2c3f246e2 ("cdc_ncm: avoid padding beyond end of skb")
> https://patchwork.kernel.org/patch/10453923/
> Please check again with one of those kernel versions (or newer).

Hi, thank you for your quick response.

I had to wait a bit for the 4.17-3 being released in ArchLinux repos.

I've tested uploading again and couldn't reproduce the crash anymore.
I experienced a single crash - but I'm not sure if it was related to
uploading. I'll reach out to you if I experience any further crash
related to modem.

> [...] we get the fix into "stretch" too. Thanks for reminding me.

Thank you for your work, this issue has been super exhausting and I'm
really thankful that it appears finally to be fixed.

Best wishes,
Phil.

Горбешко Богдан

unread,
Aug 2, 2018, 5:30:03 PM8/2/18
to
I upgraded the kernel to 4.17.8 and experienced the issue again. Not
sure if the bug is the same technically, but the sympthomes are: I tried
to upload a 30 MB file, and in the midst got a noisy screen. I will try
to catch it with kdump to get the backtrace again later.

Горбешко Богдан

unread,
Aug 4, 2018, 8:10:03 AM8/4/18
to
Unfortunately, I messed with it for several hours but couldn't reproduce
the bug intentionally. Does anyone have any hints on how to do this more
reliably? I tried to upload several files simultaneously, to fill the
memory with tmpfs partitions for emulating high memory pressure
condition, but nothing helped to trigger the crash.

deb...@anm.c0mm.it

unread,
Aug 6, 2018, 6:10:02 PM8/6/18
to
On 08/04/2018 02:04 PM, Горбешко Богдан wrote:
> Unfortunately, I messed with it for several hours but couldn't reproduce
> the bug intentionally.

I've just launched a long living rsync job and streamed a video via. mpv
and managed to crash my system this way.
Before the crash I managed to upload for a couple of minutes while
watching a video via. Firefox.

(mpv downloads the entire video at full until it's done in comparison to
Firefox which only caches sequences...)

So from my perspective crashes appear to be still showing up, but way
less frequently.

I'm on 4.17.11-arch1 x86_64 right now.

Phil

unread,
Aug 30, 2018, 6:40:03 PM8/30/18
to
Hi, I hope you're all doing well.

Shall we/I maybe reopen a new issue?

I'm still affected by this and I'd could use some advice how to debug
the issue a little bit better, especially since the kexec kernel
crashdumps appear not to be helpful. Can I maybe compile the module with
special debug flags and load it via. dkms or something?

I don't see any actual changes in [cdc_ncm.c][cdc_ncm], besides the one
change in `cdc_ncm_unbind`.

Also I'm confused why this is happening now again, I managed to do an
rsync upload with ~10GB over night back then - and my system didn't
crash - but right now even if I'm just trying to upload a picture to
twitter via. Firefox my laptop freezes.

[cdc_ncm]:
https://github.com/torvalds/linux/commits/master/drivers/net/usb/cdc_ncm.c

Bjørn Mork

unread,
Aug 31, 2018, 5:30:03 AM8/31/18
to
Phil <deb...@anm.c0mm.it> writes:

> Hi, I hope you're all doing well.
>
> Shall we/I maybe reopen a new issue?

I believe so. I am almost sure we fixed the original memset BUG in
cdc_ncm_fill_tx_frame. Or at least one of them...

So you are probably seeing another issue if you still have problems with
that fix in place. Although the issues may or may not be related. But
still, aother bug report would make it easier to track given that we
alread have one fix for 893393.

> I'm still affected by this and I'd could use some advice how to debug
> the issue a little bit better, especially since the kexec kernel
> crashdumps appear not to be helpful. Can I maybe compile the module with
> special debug flags and load it via. dkms or something?

Crash reports of some sort are best. But any info is useful. Like what
device is this really and what mode is in currently in? What driver
does it use? Most Huawei firmwares will support many different modes
using different USB drivers. But laptop internal modems are most likely
not tested with anything but the Windows MBIM class driver, since that
is the certification requirement and only target platform.

You can enable the little debugging that's already in the drivers by
doing something like

echo 'module cdc_ncm +fp' >/sys/kernel/debug/dynamic_debug/control
echo 'module cdc_mbim +fp' >/sys/kernel/debug/dynamic_debug/control
echo 'module huawei_cdc_ncm +fp' >/sys/kernel/debug/dynamic_debug/control

See https://www.kernel.org/doc/html/v4.11/admin-guide/dynamic-debug-howto.html

Not sure it will be useful to debug a freeze though.

> I don't see any actual changes in [cdc_ncm.c][cdc_ncm], besides the one
> change in `cdc_ncm_unbind`.

Not sure I understood this... Are you referring to the fix for bug
893393? That's part of the v4.9.111 stable release:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/usb/cdc_ncm.c?h=linux-4.9.y&id=35fd10aeb2248cc7f8d3d48ccc2eff1cf19918f4

> Also I'm confused why this is happening now again, I managed to do an
> rsync upload with ~10GB over night back then - and my system didn't
> crash - but right now even if I'm just trying to upload a picture to
> twitter via. Firefox my laptop freezes.

Freezing without any Oops or similar?



Bjørn
0 new messages