Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: BUG in rt2x00lib_txdone() with 2.6.37-rc8

2 views
Skip to first unread message

Michele Ballabio

unread,
Jan 12, 2011, 5:50:02 PM1/12/11
to
On 01.01.2011, Stephen Boyd wrote:
>On 01/01/11 02:28, Heinz Diehl wrote:
>> On 31.12.2010, Stephen Boyd wrote:
>>
>>> [ 9085.714105] BUG: unable to handle kernel NULL pointer dereference at
>>> 00000000000000a4
>>> [ 9085.714816] IP: [<ffffffffa0025458>] rt2x00lib_txdone+0x36/0x249
>>> [rt2x00lib]
>>> [ 9085.715017] PGD 215fd067 PUD 292f4067 PMD 0
>>> [ 9085.715017] Oops: 0000 [#1] SMP
>>> [ 9085.715017] last sysfs file:
>>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
>>> [ 9085.715017] CPU 1
>>> [ 9085.715017] Modules linked in: usb_storage thermal snd_seq_oss
>>> snd_seq_midi snd_seq_dummy snd_pcm_oss snd_mixer_oss snd_hrtimer
>>> snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event
>>> snd_seq_midi_emul snd_seq scsi_wait_scan powernow_k8 mperf i2c_i801 fuse
>>> fan snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_pcm
>>> snd_seq_device snd_timer snd_page_alloc snd_util_mem rt73usb crc_itu_t
>>> rt2x00usb snd_hwdep snd processor r8169 via82cxxx rt2x00lib soundcore
>>> mii button k8temp
>>> [ 9085.715017]
>>> [ 9085.715017] Pid: 11513, comm: kworker/1:0 Not tainted 2.6.37-rc7+ #27
>>> MS-7094/MS-7094
>> [....]
>>
>> I'm not quite shure on this, but it reminds me on this bug here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=24892
>>
>
>I believe I had the latest net tree merged into 2.6.37-rc7 in which case
>the patch mentioned in that bugzilla would already be applied.

I can confirm this bug is present in v2.6.37, and not in v2.6.36.
It seems to trigger quite randomly, I think in less than 2-3 hours after
the boot (sometimes in half an hour), and it leaves no trace in my log
files.

As Stephen said, most of the times the screen shows later oopses triggered
by this one, so it is not easy to identify it either.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Ingo Brunberg

unread,
Jan 13, 2011, 7:40:01 AM1/13/11
to
I also suffer from this bug with 2.6.37. The first time the following
trace made it into my logs. Hopefully it might help.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
IP: [<ffffffffa00983e4>] rt2x00lib_txdone+0x31/0x259 [rt2x00lib]
PGD a7011067 PUD ab9b2067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:13.2/usb2/2-3/2-3.4/2-3.4:1.0/firmware/2-3.4:1.0/loading
CPU 3
Modules linked in: aes_generic af_packet w83627ehf hwmon_vid ipv6 fbcon font bitblit softcursor dm_mod arc4 ecb crypto_blkcipher cryptomgr aead crypto_algapi rt73usb rt2x00usb rt2x00lib mac80211 cfg80211 usbhid hid radeon snd_hda_codec_realtek ttm r8169 drm_kms_helper sr_mod drm cdrom firewire_ohci snd_hda_intel i2c_piix4 bitrev 8250_pnp processor snd_hda_codec ohci_hcd thermal_sys ehci_hcd usbcore crc32 8250 i2c_algo_bit firewire_core i2c_core sg pata_atiixp crc_itu_t rtc button k10temp evdev hwmon snd_pcm snd_timer cfbcopyarea cfbimgblt snd floppy cfbfillrect serial_core mii nls_base soundcore snd_page_alloc

Pid: 3069, comm: kworker/3:0 Not tainted 2.6.37 #1 M3A785GXH/128M/To Be Filled By O.E.M.
RIP: 0010:[<ffffffffa00983e4>] [<ffffffffa00983e4>] rt2x00lib_txdone+0x31/0x259 [rt2x00lib]
RSP: 0018:ffff880094ad3d30 EFLAGS: 00010286
RAX: 0000000000000030 RBX: ffff88011df79980 RCX: 0000000000000014
RDX: 0000000000000101 RSI: ffff880094ad3d90 RDI: 0000000000000000
RBP: ffff88011ec37af8 R08: 0000000000000002 R09: ffffffff00000002
R10: 0000000000000286 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000028 R14: ffff880094ad3d90 R15: ffff88011df79c10
FS: 00007fc5bad23710(0000) GS:ffff8800cfd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000090 CR3: 00000000ab985000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/3:0 (pid: 3069, threadinfo ffff880094ad2000, task ffff88011ff08b20)
Stack:
ffff88011fc7e420 0000000000011000 0000000000000030 0000000000004000
ffff88011ec37af8 ffff88011dcb3af0 ffff88011df79980 ffff88011dcb3b40
ffff88011dcb3b40 0000000000000003 ffff88011df79c10 ffffffffa009862e
Call Trace:
[<ffffffffa009862e>] ? rt2x00lib_txdone_noinfo+0x22/0x27 [rt2x00lib]
[<ffffffffa0016316>] ? rt2x00usb_work_txdone+0x3e/0x6d [rt2x00usb]
[<ffffffffa0016a0d>] ? rt2x00usb_watchdog+0x69/0xe0 [rt2x00usb]
[<ffffffffa009aed9>] ? rt2x00link_watchdog+0x0/0x4a [rt2x00lib]
[<ffffffffa009af00>] ? rt2x00link_watchdog+0x27/0x4a [rt2x00lib]
[<ffffffff8104256e>] ? process_one_work+0x20e/0x34e
[<ffffffff81042a45>] ? worker_thread+0x1c9/0x340
[<ffffffff8102612e>] ? __wake_up_common+0x41/0x78
[<ffffffff8104287c>] ? worker_thread+0x0/0x340
[<ffffffff8104287c>] ? worker_thread+0x0/0x340
[<ffffffff810455a9>] ? kthread+0x7a/0x82
[<ffffffff81002cd4>] ? kernel_thread_helper+0x4/0x10
[<ffffffff8104552f>] ? kthread+0x0/0x82
[<ffffffff81002cd0>] ? kernel_thread_helper+0x0/0x10
Code: f6 41 55 41 54 55 48 89 fd 53 48 83 ec 28 4c 8b 67 10 48 8b 47 08 48 8b 18 49 8d 44 24 30 4c 89 e7 4d 8d 6c 24 28 48 89 44 24 10 <41> 8b 94 24 90 00 00 00 66 89 54 24 1e e8 1b 16 14 00 48 89 ef
RIP [<ffffffffa00983e4>] rt2x00lib_txdone+0x31/0x259 [rt2x00lib]
RSP <ffff880094ad3d30>
CR2: 0000000000000090
---[ end trace 2c6843a38ee68ff0 ]---

Helmut Schaa

unread,
Jan 13, 2011, 8:30:01 AM1/13/11
to
Hi,

Am Donnerstag, 13. Januar 2011 schrieb Ingo Brunberg:
> I also suffer from this bug with 2.6.37. The first time the following
> trace made it into my logs. Hopefully it might help.

Thanks for the trace!

Just a shot in the dark but since the stack trace shows the newly added
watchdog this might be the result of a race between a regular txdone work
(mac80211 workqueue) vs the watchdog work (global workqueue).

I guess the following situation could happen:
A regular tx done work calls rt2x00lib_txdone which first sets entry->skb to
NULL, calls the driver specific clear_entry and afterwards increases
Q_INDEX_DONE. If the watchdog work calls rt2x00lib_txdone on a different CPU
inbetween the skb might be NULL and cause the above oops.

Ivo, does that sound reasonable?

Helmut

Ivo Van Doorn

unread,
Jan 15, 2011, 8:40:01 AM1/15/11
to
Hi,

> Just a shot in the dark but since the stack trace shows the newly added
> watchdog this might be the result of a race between a regular txdone work
> (mac80211 workqueue) vs the watchdog work (global workqueue).
>
> I guess the following situation could happen:
> A regular tx done work calls rt2x00lib_txdone which first sets entry->skb to
> NULL, calls the driver specific clear_entry and afterwards increases
> Q_INDEX_DONE. If the watchdog work calls rt2x00lib_txdone on a different CPU
> inbetween the skb might be NULL and cause the above oops.

This could be, would be interesting to know if compat-wireless also shows
this problem. Because the queue refactoring code which should have solved
these race conditions was added after 2.6.37.

Ivo

Helmut Schaa

unread,
Jan 15, 2011, 9:10:02 PM1/15/11
to
Am Samstag, 15. Januar 2011 schrieb Ivo Van Doorn:
> Hi,
>
> > Just a shot in the dark but since the stack trace shows the newly added
> > watchdog this might be the result of a race between a regular txdone work
> > (mac80211 workqueue) vs the watchdog work (global workqueue).
> >
> > I guess the following situation could happen:
> > A regular tx done work calls rt2x00lib_txdone which first sets entry->skb to
> > NULL, calls the driver specific clear_entry and afterwards increases
> > Q_INDEX_DONE. If the watchdog work calls rt2x00lib_txdone on a different CPU
> > inbetween the skb might be NULL and cause the above oops.
>
> This could be, would be interesting to know if compat-wireless also shows
> this problem. Because the queue refactoring code which should have solved
> these race conditions was added after 2.6.37.

I also guess that this issue would be fixed in compat-wireless due to the queue
refactoring. But I guess that is way too big for a stable kernel :(

Helmut

Ingo Brunberg

unread,
Jan 15, 2011, 10:00:02 PM1/15/11
to
Ivo Van Doorn <ivd...@gmail.com> writes:

> This could be, would be interesting to know if compat-wireless also shows
> this problem. Because the queue refactoring code which should have solved
> these race conditions was added after 2.6.37.

I really would like to give it a try, but compat-wireless-2011-01-15
crashes right on module loading. Is there a version known to work?

Ivo Van Doorn

unread,
Jan 17, 2011, 3:40:01 PM1/17/11
to
Hi,

On Sun, Jan 16, 2011 at 3:58 AM, Ingo Brunberg <ingo_b...@web.de> wrote:
> Ivo Van Doorn <ivd...@gmail.com> writes:
>
>> This could be, would be interesting to know if compat-wireless also shows
>> this problem. Because the queue refactoring code which should have solved
>> these race conditions was added after 2.6.37.
>
> I really would like to give it a try, but compat-wireless-2011-01-15
> crashes right on module loading. Is there a version known to work?

You could try the rt2x00-special package:
http://kernel.org/pub/linux/kernel/people/ivd/compat-rt2x00.tar.bz2
this is compat-wireless + rt2x00 patches from rt2x00.git.

For me it is working without any crashes..

Ivo

0 new messages