Re: BUG: scheduling while atomic

Michael Guntsche

unread,

Jun 27, 2009, 7:20:10 AM6/27/09

to

Hello list, Alan,

I got the same error with pptpgw as Sergey after switching form
2.6.30 to recent git (rc1).

[ 102.335486] BUG: scheduling while atomic: pptpgw/392/0x00000500
[ 102.341528] Modules linked in: emlog nf_nat_irc nf_conntrack_irc
nf_nat_rtsp nf_conntrack_rtsp
[ 102.350242] NIP: c00258d0 LR: c00258c8 CTR: c001cae8
[ 102.355240] REGS: c7a75d20 TRAP: 0501 Tainted: G W
(2.6.31-rc1)
[ 102.362148] MSR: 00009032 <EE,ME,IR,DR> CR: 24222482 XER: 20000000
[ 102.368609] TASK = c7966880[392] 'pptpgw' THREAD: c7a74000
[ 102.373949] GPR00: c01c101c c7a75dd0 c7966880 00000001 00190834
00000000 0014ff97 00000019
[ 102.382396] GPR08: 0834548b 00000004 00000004 c7a74000 00190834
[ 102.388671] NIP [c00258d0] __tasklet_schedule+0x80/0x90
[ 102.393935] LR [c00258c8] __tasklet_schedule+0x78/0x90
[ 102.399101] Call Trace:
[ 102.401576] [c7a75dd0] [c00258c8] __tasklet_schedule+0x78/0x90
(unreliable)
[ 102.408620] [c7a75de0] [c01c101c] ppp_asynctty_wakeup+0x90/0xbc
[ 102.414594] [c7a75e00] [c017037c] tty_wakeup+0x70/0xa0
[ 102.419773] [c7a75e10] [c0178cd4] pty_unthrottle+0x24/0x50
[ 102.425305] [c7a75e20] [c0175340] tty_unthrottle+0x5c/0x78
[ 102.430831] [c7a75e30] [c0172e7c] check_unthrottle+0x1c/0x2c
[ 102.436530] [c7a75e40] [c0174a14] n_tty_read+0x3f8/0x65c
[ 102.441881] [c7a75ec0] [c0171bf8] tty_read+0xa8/0xe4
[ 102.446891] [c7a75ef0] [c0084004] vfs_read+0xb4/0x1c0
[ 102.451990] [c7a75f10] [c008438c] sys_read+0x4c/0x90
[ 102.457008] [c7a75f40] [c00118a8] ret_from_syscall+0x0/0x38
[ 102.462620] --- Exception: c01 at 0xff2d510
[ 102.462629] LR = 0x100032e0
[ 102.470004] Instruction dump:
[ 102.473004] 60000020 90090040 4182001c 7fe00124 80010014 83e1000c
38210010 7c0803a6
[ 102.480840] 4e800020 4bfff6d5 7fe00124 80010014 <83e1000c> 38210010
7c0803a6 4e800020

this happens if I put some load on my ADSL line. I already have
ppp_async.c build with commit

commit a6540f731d506d9e82444cf0020e716613d4c46c
Author: Alan Cox <al...@linux.intel.com>
Date: Mon Jun 22 18:42:29 2009 +0100

ppp: Fix throttling bugs

So I do not understand why this is still happening here. My call trace
looks different than the one from Sergey so I do not know if I am
hitting the same problem.
2.6.30 is rock solid.

Kind regards,
Michael Guntsche
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Alan Cox

unread,

Jun 27, 2009, 9:10:11 AM6/27/09

to

On Sat, 27 Jun 2009 13:10:23 +0200
Michael Guntsche <mi...@it-loops.com> wrote:

> Hello list, Alan,
>
> I got the same error with pptpgw as Sergey after switching form
> 2.6.30 to recent git (rc1).

Yes - I've no idea what is going on with the pptpgw traces at the moment.

The path is

pptpgw reads from the tty side (it uses a pty/tty pair and runs
ppp over it)

This calls the ldisc n_tty_read
We copy some data and this makes room in the buffers
ww call check_unthrottle (with no spinlocks held)
This decides that thre is now room to ask the other end
to send more
which calls tty_unthrottle (which takes only a mutex)
which calls pty_unthrottle
which wakes the other side of the pty/tty pair for writing
this calls the ppp ldisc wakeup handler for the other side
which blows up somewhere around the tasklet code for no reason I
can make sense of yet

and all of this should be (and seems to be) getting called with sleeping
allowed and no spinlocks taken in error.

You could stuff calls to might_sleep() in pty_unthrottle and
check_unthrottle and see which ones are triggered if any.

Michael Guntsche

unread,

Jun 27, 2009, 12:10:09 PM6/27/09

to

On Jun 27, 2009, at 15:06, Alan Cox wrote:

> You could stuff calls to might_sleep() in pty_unthrottle and
> check_unthrottle and see which ones are triggered if any.

Hello Alan,

I put might_sleep in like this...

diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 94a5d50..f1882ae 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -149,6 +149,7 @@ static void put_tty_queue(unsigned char c, struct
tty_struct *tty)
static void check_unthrottle(struct tty_struct *tty)
{
if (tty->count)
+ might_sleep();
tty_unthrottle(tty);
}

diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index daebe1b..d4bc166 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -79,7 +79,8 @@ static void pty_unthrottle(struct tty_struct *tty)

if (!o_tty)
return;
-
+
+ might_sleep();
tty_wakeup(o_tty);
set_bit(TTY_THROTTLED, &tty->flags);
}

and then put some load in my ADSL line. Apparently I get the same dump

[ 87.152961] BUG: scheduling while atomic: pptpgw/392/0x00000400
[ 87.158939] Modules linked in: emlog nf_nat_irc nf_conntrack_irc
nf_nat_rtsp nf_conntrack_rtsp
[ 87.167644] NIP: c00258d0 LR: c00258c8 CTR: c001cae8
[ 87.172645] REGS: c7a61d20 TRAP: 0901 Tainted: G W
(2.6.31-rc1)
[ 87.179551] MSR: 00009032 <EE,ME,IR,DR> CR: 24222482 XER: 20000000
[ 87.185992] TASK = c794a400[392] 'pptpgw' THREAD: c7a60000
[ 87.191330] GPR00: c01c1010 c7a61dd0 c794a400 00000001 00190834
00000000 0014ff97 00000019
[ 87.199780] GPR08: 0834548b 00000004 00000004 c7a60000 00190834
[ 87.206055] NIP [c00258d0] __tasklet_schedule+0x80/0x90
[ 87.211313] LR [c00258c8] __tasklet_schedule+0x78/0x90
[ 87.216477] Call Trace:
[ 87.218949] [c7a61dd0] [c00258c8] __tasklet_schedule+0x78/0x90
(unreliable)
[ 87.225969] [c7a61de0] [c01c1010] ppp_asynctty_wakeup+0x90/0xbc
[ 87.231941] [c7a61e00] [c017037c] tty_wakeup+0x70/0xa0
[ 87.237123] [c7a61e10] [c0178cc8] pty_unthrottle+0x24/0x50
[ 87.242652] [c7a61e20] [c0175334] tty_unthrottle+0x5c/0x78
[ 87.248179] [c7a61e30] [c0172e70] check_unthrottle+0x10/0x20
[ 87.253882] [c7a61e40] [c0174a08] n_tty_read+0x3f8/0x65c
[ 87.259233] [c7a61ec0] [c0171bf8] tty_read+0xa8/0xe4
[ 87.264243] [c7a61ef0] [c0084004] vfs_read+0xb4/0x1c0
[ 87.269335] [c7a61f10] [c008438c] sys_read+0x4c/0x90
[ 87.274352] [c7a61f40] [c00118a8] ret_from_syscall+0x0/0x38
[ 87.279965] --- Exception: c01 at 0xff2d510
[ 87.279974] LR = 0x100032e0
[ 87.287316] Instruction dump:
[ 87.290309] 60000020 90090040 4182001c 7fe00124 80010014 83e1000c
38210010 7c0803a6
[ 87.298145] 4e800020 4bfff6d5 7fe00124 80010014 <83e1000c> 38210010
7c0803a6 4e800020

after that (120 seconds later) I see..

[ 360.294730] INFO: task pptpgw:392 blocked for more than 120 seconds.
[ 360.301147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 360.309018] pptpgw D 0ff2d510 0 392 390 0x00000000
[ 360.315257] Call Trace:
[ 360.317721] [c7a61870] [c794a400] 0xc794a400 (unreliable)
[ 360.323194] [c7a61930] [c0008b20] __switch_to+0x4c/0x6c
[ 360.328465] [c7a61940] [c02c6244] schedule+0x190/0x380
[ 360.333646] [c7a61980] [c02c6d7c] __mutex_lock_slowpath+0x80/0xd8
[ 360.339796] [c7a619b0] [c0175370] tty_throttle+0x20/0x78
[ 360.345151] [c7a619c0] [c01737f0] n_tty_receive_buf+0x344/0xfcc
[ 360.351115] [c7a61aa0] [c01789b8] pty_write+0x8c/0xdc
[ 360.356210] [c7a61ac0] [c01c1410] ppp_async_push+0x3d4/0x544
[ 360.361910] [c7a61b20] [c01c15d4] ppp_async_send+0x54/0x6c
[ 360.367438] [c7a61b30] [c01bd948] ppp_push+0x3f4/0x530
[ 360.372614] [c7a61b90] [c01be12c] ppp_xmit_process+0x444/0x548
[ 360.378488] [c7a61bf0] [c01bf62c] ppp_start_xmit+0x198/0x1d0
[ 360.384203] [c7a61c10] [c01f5554] dev_hard_start_xmit+0x1c0/0x350
[ 360.390341] [c7a61c40] [c0207010] __qdisc_run+0x1c8/0x250
[ 360.395784] [c7a61c70] [c01f78b0] net_tx_action+0xfc/0x11c
[ 360.401314] [c7a61c90] [c00256e0] __do_softirq+0xa8/0x120
[ 360.406767] [c7a61cd0] [c00064e4] do_softirq+0x58/0x5c
[ 360.411943] [c7a61ce0] [c0025530] irq_exit+0x94/0x98
[ 360.416952] [c7a61cf0] [c000e768] timer_interrupt+0x12c/0x188
[ 360.422750] [c7a61d10] [c0011f40] ret_from_except+0x0/0x14
[ 360.428282] --- Exception: 901 at __tasklet_schedule+0x80/0x90
[ 360.428293] LR = __tasklet_schedule+0x78/0x90
[ 360.438870] [c7a61de0] [c01c1010] ppp_asynctty_wakeup+0x90/0xbc
[ 360.444837] [c7a61e00] [c017037c] tty_wakeup+0x70/0xa0
[ 360.450015] [c7a61e10] [c0178cc8] pty_unthrottle+0x24/0x50
[ 360.455544] [c7a61e20] [c0175334] tty_unthrottle+0x5c/0x78
[ 360.461071] [c7a61e30] [c0172e70] check_unthrottle+0x10/0x20
[ 360.466783] [c7a61e40] [c0174a08] n_tty_read+0x3f8/0x65c
[ 360.472136] [c7a61ec0] [c0171bf8] tty_read+0xa8/0xe4
[ 360.477171] [c7a61ef0] [c0084004] vfs_read+0xb4/0x1c0
[ 360.482264] [c7a61f10] [c008438c] sys_read+0x4c/0x90
[ 360.487278] [c7a61f40] [c00118a8] ret_from_syscall+0x0/0x38
[ 360.492893] --- Exception: c01 at 0xff2d510
[ 360.492902] LR = 0x100032e0

I am not sure if the might_sleep entries are at the correct position
but it apparently did not make a difference.

Kind regards,
Michael

Rafael J. Wysocki

unread,

Jun 28, 2009, 8:00:13 PM6/28/09

to

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13522
Subject : BUG: scheduling while atomic
Submitter : Sergey Senozhatsky <sergey.se...@mail.by>
Date : 2009-06-12 19:29 (17 days old)

Alan Cox

unread,

Jun 29, 2009, 10:50:17 AM6/29/09

to

On Mon, 29 Jun 2009 01:51:09 +0200 (CEST)
"Rafael J. Wysocki" <r...@sisk.pl> wrote:

> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).

Yes although getting people seeing to test with might_sleep() checks in
the tty layer demonstrate the tty layer isn't calling into ppp with irqs
disabled when it shouldn't so I'm at a loss as to whether this is a
network bug, a scheduler bug or something else.

The schedule error in __tasklet_schedule() is itself rather peicular.

Sergey Senozhatsky

unread,

Jun 29, 2009, 12:10:09 PM6/29/09

to

On (06/29/09 15:41), Alan Cox wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
>
> Yes although getting people seeing to test with might_sleep() checks in
> the tty layer demonstrate the tty layer isn't calling into ppp with irqs
> disabled when it shouldn't so I'm at a loss as to whether this is a
> network bug, a scheduler bug or something else.

Looks like I can't reproduce with rc1-git5 (after several hours)...
I'll try -git3,-git4.

>
> The schedule error in __tasklet_schedule() is itself rather peicular.
>

Reported : 2009-06-28 22:07 by Rafael J. Wysocki
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13652
Subject : scheduling while atomic: pptpgw

Rafael, I'm not sure we need 'scheduling while atomic: pptpgw' bug.
Initially BUG 13522 was named "BUG: scheduling while atomic: pptpgw"
and renamed to "BUG: scheduling while atomic" since I was not sure wheter this is ppp or scheduler bug...

---
kernel: [ 1424.347514] BUG: scheduling while atomic: ksoftirqd/0/4/0x10000500
kernel: [27404.945372] BUG: scheduling while atomic: pptpgw/4637/0x00000400
kernel: [ 1715.273674] BUG: scheduling while atomic: ksoftirqd/0/4/0x10000500
kernel: [ 1631.073371] BUG: scheduling while atomic: pptpgw/3169/0x10000500
...
---

Sergey

Sergey Senozhatsky

unread,

Jun 29, 2009, 12:30:19 PM6/29/09

to

On (06/29/09 15:41), Alan Cox wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
>
> Yes although getting people seeing to test with might_sleep() checks in
> the tty layer demonstrate the tty layer isn't calling into ppp with irqs
> disabled when it shouldn't so I'm at a loss as to whether this is a
> network bug, a scheduler bug or something else.
>
> The schedule error in __tasklet_schedule() is itself rather peicular.
>

Hello Alan, Rafael,

BUG is still here...

kernel: [ 7331.719518] BUG: scheduling while atomic: pptpgw/4161/0x10000500
kernel: [ 7331.719526] Modules linked in: ppp_deflate zlib_deflate ppp_async crc_ccitt ppp_generic slhc ipv6 fuse loop snd_hda_codec_si3054 snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd pcspkr soundcore psmouse serio_raw snd_page_alloc i2c_i801 rng_core evdev
asus_laptop usbhid hid sg sd_mod sr_mod cdrom ata_generic pata_acpi uhci_hcd ata_piix sdhci_pci sdhci ricoh_mmc mmc_core led_class ide_pci_generic ehci_hcd r8169 mii usbcore
kernel: [ 7331.719627]
kernel: [ 7331.719636] Pid: 4161, comm: pptpgw Not tainted (2.6.31-rc1-dbgnv-git4 #32) F3JC
kernel: [ 7331.719644] EIP: 0060:[<c13ffcd6>] EFLAGS: 00200246 CPU: 0
kernel: [ 7331.719657] EIP is at _spin_unlock_irqrestore+0x16/0x30
kernel: [ 7331.719664] EAX: c1690300 EBX: 00200246 ECX: 00000000 EDX: 00200246
kernel: [ 7331.719671] ESI: dcf68398 EDI: 0000003a EBP: dcf61e94 ESP: dcf61e90
kernel: [ 7331.719677] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
kernel: [ 7331.719684] CR0: 8005003b CR2: b6f77000 CR3: 1b4c3000 CR4: 000006d0
kernel: [ 7331.719690] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
kernel: [ 7331.719696] DR6: ffff0ff0 DR7: 00000400
kernel: [ 7331.719701] Call Trace:
kernel: [ 7331.719714] [<c12d002b>] tty_ldisc_deref+0x5b/0x90
kernel: [ 7331.719724] [<c12d16f5>] pty_write+0x85/0x90
kernel: [ 7331.719734] [<c12cbde3>] n_tty_write+0x303/0x410
kernel: [ 7331.719745] [<c103d8f0>] ? default_wake_function+0x0/0x40
kernel: [ 7331.719754] [<c12c8fc3>] tty_write+0x173/0x250
kernel: [ 7331.719763] [<c10ebcaf>] ? rw_verify_area+0x6f/0xf0
kernel: [ 7331.719772] [<c12cbae0>] ? n_tty_write+0x0/0x410
kernel: [ 7331.719781] [<c10ec89e>] vfs_write+0xbe/0x1c0
kernel: [ 7331.719790] [<c12c8e50>] ? tty_write+0x0/0x250
kernel: [ 7331.719798] [<c10ecaa8>] sys_write+0x58/0xa0
kernel: [ 7331.719808] [<c100324b>] sysenter_do_call+0x12/0x22

Sergey

Alan Cox

unread,

Jun 29, 2009, 12:40:11 PM6/29/09

to

> BUG is still here...

> kernel: [ 7331.719657] EIP is at _spin_unlock_irqrestore+0x16/0x30

Which doesn't reschedule - sorry at this point I can't help you any
further. The traces make no sense.

Arjan van de Ven

unread,

Jun 30, 2009, 1:20:05 AM6/30/09

to

On Mon, 29 Jun 2009 19:28:53 +0300
Sergey Senozhatsky <sergey.se...@mail.by> wrote:

> On (06/29/09 15:41), Alan Cox wrote:
> > > This message has been generated automatically as a part of a
> > > report of recent regressions.
> > >
> > > The following bug entry is on the current list of known
> > > regressions from 2.6.30. Please verify if it still should be
> > > listed and let me know (either way).
> >
> > Yes although getting people seeing to test with might_sleep()
> > checks in the tty layer demonstrate the tty layer isn't calling
> > into ppp with irqs disabled when it shouldn't so I'm at a loss as
> > to whether this is a network bug, a scheduler bug or something else.
> >
> > The schedule error in __tasklet_schedule() is itself rather
> > peicular.
> >
>
> Hello Alan, Rafael,
>
> BUG is still here...

can you enable CONFIG_DEBUG_INFO, and then run this through
scripts/markup_oops.pl ? With a bit of luck it'll point to the exact
code that's going wrong...

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

Sergey Senozhatsky

unread,

Jun 30, 2009, 3:20:14 AM6/30/09

to

On (06/29/09 22:15), Arjan van de Ven wrote:
> > ...

> > Hello Alan, Rafael,
> >
> > BUG is still here...
>
>
> can you enable CONFIG_DEBUG_INFO, and then run this through
> scripts/markup_oops.pl ? With a bit of luck it'll point to the exact
> code that's going wrong...
>

Hello Arjan. Sure. I'll try.

btw, I've updated bug description:

kernel: [ 984.626653] BUG: scheduling while atomic: pptpgw/3347/0x10000500
kernel: [ 984.626661] Modules linked in: ppp_deflate zlib_deflate ppp_async crc_ccitt ppp_generic slhc ipv6 fuse loop snd_hda_codec_si3054 snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore pcspkr psmouse rng_core serio_raw i2c_i801 snd_page_alloc
asus_laptop evdev usbhid hid sg sr_mod cdrom sd_mod ata_generic pata_acpi ricoh_mmc sdhci_pci uhci_hcd sdhci mmc_core led_class ehci_hcd r8169 mii ata_piix ide_pci_generic usbcore
kernel: [ 984.626763]
kernel: [ 984.626771] Pid: 3347, comm: pptpgw Not tainted (2.6.31-rc1-dbgnv-git5 #2) F3JC
kernel: [ 984.626780] EIP: 0060:[<c103a1fc>] EFLAGS: 00200246 CPU: 0
kernel: [ 984.626794] EIP is at finish_task_switch+0x4c/0xc0
kernel: [ 984.626800] EAX: 00000000 EBX: c1eaa380 ECX: 00000001 EDX: c1407ee0
kernel: [ 984.626807] ESI: f5c72200 EDI: f72fc070 EBP: e4f87f20 ESP: e4f87f0c
kernel: [ 984.626813] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
kernel: [ 984.626819] CR0: 8005003b CR2: b6f399ae CR3: 24f80000 CR4: 000006d0
kernel: [ 984.626825] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
kernel: [ 984.626831] DR6: ffff0ff0 DR7: 00000400
kernel: [ 984.626836] Call Trace:
kernel: [ 984.626848] [<c13fcfef>] schedule+0x41f/0xb20
kernel: [ 984.626859] [<c10ecdb8>] ? sys_read+0x58/0xa0
kernel: [ 984.626869] [<c10033a2>] work_resched+0x5/0x1a
kernel: [ 1051.216759] BUG: scheduling while atomic: pptpgw/3347/0x10000500
kernel: [ 1051.216768] Modules linked in: ppp_deflate zlib_deflate ppp_async crc_ccitt ppp_generic slhc ipv6 fuse loop snd_hda_codec_si3054 snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore pcspkr psmouse rng_core serio_raw i2c_i801 snd_page_alloc
asus_laptop evdev usbhid hid sg sr_mod cdrom sd_mod ata_generic pata_acpi ricoh_mmc sdhci_pci uhci_hcd sdhci mmc_core led_class ehci_hcd r8169 mii ata_piix ide_pci_generic usbcore
kernel: [ 1051.216869]
kernel: [ 1051.216878] Pid: 3347, comm: pptpgw Not tainted (2.6.31-rc1-dbgnv-git5 #2) F3JC
kernel: [ 1051.216886] EIP: 0060:[<c1049883>] EFLAGS: 00200246 CPU: 0
kernel: [ 1051.216901] EIP is at __tasklet_schedule+0x53/0x80
kernel: [ 1051.216907] EAX: 00000000 EBX: 00200246 ECX: 00000000 EDX: 00000000
kernel: [ 1051.216914] ESI: 00000000 EDI: e556dd10 EBP: e4f87e70 ESP: e4f87e68
kernel: [ 1051.216921] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
kernel: [ 1051.216927] CR0: 8005003b CR2: b7085000 CR3: 24f80000 CR4: 000006d0
kernel: [ 1051.216934] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
kernel: [ 1051.216939] DR6: ffff0ff0 DR7: 00000400
kernel: [ 1051.216945] Call Trace:
kernel: [ 1051.216960] [<f80b376c>] ppp_asynctty_receive+0x3dc/0x510 [ppp_async]
kernel: [ 1051.216975] [<c12d161d>] pty_write+0x7d/0x90
kernel: [ 1051.216985] [<c12cbd13>] n_tty_write+0x303/0x410
kernel: [ 1051.216996] [<c103d8f0>] ? default_wake_function+0x0/0x40
kernel: [ 1051.217005] [<c12c8ef3>] tty_write+0x173/0x250
kernel: [ 1051.217015] [<c10ebcaf>] ? rw_verify_area+0x6f/0xf0
kernel: [ 1051.217025] [<c12cba10>] ? n_tty_write+0x0/0x410
kernel: [ 1051.217034] [<c10ec89e>] vfs_write+0xbe/0x1c0
kernel: [ 1051.217043] [<c12c8d80>] ? tty_write+0x0/0x250
kernel: [ 1051.217052] [<c10ecaa8>] sys_write+0x58/0xa0
kernel: [ 1051.217062] [<c100324b>] sysenter_do_call+0x12/0x22

Sergey

Sergey Senozhatsky

unread,

Jun 30, 2009, 6:00:13 PM6/30/09

to

Hello,

rc1-git6. Hope this will help.

dmesg | ./scripts/markup_oops.pl

*__get_cpu_var(tasklet_vec).tail = t;
__get_cpu_var(tasklet_vec).tail = &(t->next);
raise_softirq_irqoff(TASKLET_SOFTIRQ);
local_irq_restore(flags);
c104c4fb: f7 c6 00 02 00 00 test $0x200,%esi
c104c501: 74 1d je c104c520 <__tasklet_schedule+0x80>
c104c503: e8 78 b5 02 00 call c1077a80 <trace_hardirqs_on>
return flags;
}

static inline void native_restore_fl(unsigned long flags)
{
asm volatile("push %0 ; popf"
c104c508: 56 push %esi | %esi => 200246
c104c509: 9d popf
}
*c104c50a: 8b 45 f4 mov -0xc(%ebp),%eax | %eax = 0 <--- faulting instruction
c104c50d: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
c104c514: 75 13 jne c104c529 <__tasklet_schedule+0x89>
c104c516: 83 c4 04 add $0x4,%esp
c104c519: 5b pop %ebx
c104c51a: 5e pop %esi
c104c51b: c9 leave
c104c51c: c3 ret
c104c51d: 8d 76 00 lea 0x0(%esi),%esi
c104c520: 56 push %esi
c104c521: 9d popf
local_irq_save(flags);
t->next = NULL;
*__get_cpu_var(tasklet_vec).tail = t;
__get_cpu_var(tasklet_vec).tail = &(t->next);
raise_softirq_irqoff(TASKLET_SOFTIRQ);
local_irq_restore(flags);
c104c522: e8 29 7d 02 00 call c1074250 <trace_hardirqs_off>
c104c527: eb e1 jmp c104c50a <__tasklet_schedule+0x6a>
}
c104c529: e8 c2 85 ff ff call c1044af0 <__stack_chk_fail>
c104c52e: 66 90 xchg %ax,%ax

c104c530 <raise_softirq>:
if (!in_interrupt())

syslog:

kernel: [47080.294012] BUG: scheduling while atomic: pptpgw/6053/0x10000400
kernel: [47080.294022] INFO: lockdep is turned off.
kernel: [47080.294028] Modules linked in: ppp_deflate zlib_deflate ppp_async crc_ccitt ppp_generic slhc ipv6 fuse loop snd_hda_codec_si3054 snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device pcspkr psmouse snd i2c_i801 rng_core serio_raw soundcore snd_page_alloc
asus_laptop evdev sg usbhid sr_mod hid cdrom sd_mod ata_generic pata_acpi ata_piix ricoh_mmc sdhci_pci sdhci ide_pci_generic uhci_hcd mmc_core led_class ehci_hcd usbcore r8169 mii
kernel: [47080.294165]
kernel: [47080.294175] Pid: 6053, comm: pptpgw Not tainted (2.6.31-rc1-dbgnv-git6 #1) F3JC
kernel: [47080.294184] EIP: 0060:[<c104c50a>] EFLAGS: 00200246 CPU: 0
kernel: [47080.294198] EIP is at __tasklet_schedule+0x6a/0x90
kernel: [47080.294205] EAX: 00000000 EBX: c31cf17c ECX: 00000000 EDX: 00000000
kernel: [47080.294213] ESI: 00200246 EDI: 00000fff EBP: c3271e60 ESP: c3271e54
kernel: [47080.294220] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
kernel: [47080.294228] CR0: 8005003b CR2: b7462000 CR3: 29de4000 CR4: 000006d0
kernel: [47080.294235] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
kernel: [47080.294242] DR6: ffff0ff0 DR7: 00000400
kernel: [47080.294248] Call Trace:
kernel: [47080.294263] [<f80b238b>] ppp_asynctty_wakeup+0x6b/0x80 [ppp_async]
kernel: [47080.294276] [<c12dea72>] tty_wakeup+0x72/0x90
kernel: [47080.294287] [<c12e7603>] pty_unthrottle+0x23/0x50
kernel: [47080.294296] [<c12e4072>] tty_unthrottle+0x42/0x70
kernel: [47080.294306] [<c12e254f>] n_tty_read+0x48f/0x730
kernel: [47080.294320] [<c1039f20>] ? default_wake_function+0x0/0x40
kernel: [47080.294332] [<c12dd1b0>] tty_read+0xa0/0xe0
kernel: [47080.294341] [<c12e20c0>] ? n_tty_read+0x0/0x730
kernel: [47080.294352] [<c10fb30b>] vfs_read+0xbb/0x1c0
kernel: [47080.294362] [<c12dd110>] ? tty_read+0x0/0xe0
kernel: [47080.294372] [<c10fbad8>] sys_read+0x58/0xa0
kernel: [47080.294383] [<c100339b>] sysenter_do_call+0x12/0x32
kernel: [47080.294401] BUG: scheduling while atomic: pptpgw/6053/0x00000400
kernel: [47080.294408] INFO: lockdep is turned off.
kernel: [47080.294413] Modules linked in: ppp_deflate zlib_deflate ppp_async crc_ccitt ppp_generic slhc ipv6 fuse loop snd_hda_codec_si3054 snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device pcspkr psmouse snd i2c_i801 rng_core serio_raw soundcore snd_page_alloc
asus_laptop evdev sg usbhid sr_mod hid cdrom sd_mod ata_generic pata_acpi ata_piix ricoh_mmc sdhci_pci sdhci ide_pci_generic uhci_hcd mmc_core led_class ehci_hcd usbcore r8169 mii
kernel: [47080.294548]
kernel: [47080.294555] Pid: 6053, comm: pptpgw Not tainted (2.6.31-rc1-dbgnv-git6 #1) F3JC
kernel: [47080.294564] EIP: 0060:[<c104c50a>] EFLAGS: 00200246 CPU: 0
kernel: [47080.294572] EIP is at __tasklet_schedule+0x6a/0x90
kernel: [47080.294579] EAX: 00000000 EBX: c31cf17c ECX: 00000000 EDX: 00000000
kernel: [47080.294586] ESI: 00200246 EDI: 00000fff EBP: c3271e60 ESP: c3271e54
kernel: [47080.294593] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
kernel: [47080.294600] CR0: 8005003b CR2: b7462000 CR3: 29de4000 CR4: 000006d0
kernel: [47080.294607] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
kernel: [47080.294614] DR6: ffff0ff0 DR7: 00000400
kernel: [47080.294619] Call Trace:
kernel: [47080.294631] [<f80b238b>] ppp_asynctty_wakeup+0x6b/0x80 [ppp_async]
kernel: [47080.294641] [<c12dea72>] tty_wakeup+0x72/0x90
kernel: [47080.294651] [<c12e7603>] pty_unthrottle+0x23/0x50
kernel: [47080.294661] [<c12e4072>] tty_unthrottle+0x42/0x70
kernel: [47080.294670] [<c12e254f>] n_tty_read+0x48f/0x730
kernel: [47080.294683] [<c1039f20>] ? default_wake_function+0x0/0x40
kernel: [47080.294695] [<c12dd1b0>] tty_read+0xa0/0xe0
kernel: [47080.294704] [<c12e20c0>] ? n_tty_read+0x0/0x730
kernel: [47080.294714] [<c10fb30b>] vfs_read+0xbb/0x1c0
kernel: [47080.294724] [<c12dd110>] ? tty_read+0x0/0xe0
kernel: [47080.294733] [<c10fbad8>] sys_read+0x58/0xa0
kernel: [47080.294744] [<c100339b>] sysenter_do_call+0x12/0x32
pptp[6064]: anon log[logecho:pptp_ctrl.c:677]: Echo Reply received.
kernel: [47113.649464] INFO: RCU detected CPU 1 stall (t=14044096/3000 jiffies)
kernel: [47113.649464] Pid: 6052, comm: pppd Not tainted 2.6.31-rc1-dbgnv-git6 #1
kernel: [47113.649464] Call Trace:
kernel: [47113.649464] [<c1417a58>] ? printk+0x23/0x36
kernel: [47113.649464] [<c10a8220>] __rcu_pending+0x140/0x210
kernel: [47113.649464] [<c10a831a>] rcu_pending+0x2a/0x70
kernel: [47113.649464] [<c1051bdf>] update_process_times+0x3f/0x80
kernel: [47113.649464] [<c10714df>] tick_sched_timer+0x6f/0xf0
kernel: [47113.649464] [<c10640f6>] __run_hrtimer+0x56/0xe0
kernel: [47113.649464] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47113.649464] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47113.649464] [<c1064af5>] hrtimer_interrupt+0x145/0x270
kernel: [47113.649464] [<c1070ea6>] ? tick_program_event+0x36/0x60
kernel: [47113.649464] [<c101c4bc>] smp_apic_timer_interrupt+0x5c/0xb0
kernel: [47113.649464] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
kernel: [47113.649464] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
kernel: [47113.649464] [<c1257c46>] ? delay_tsc+0x26/0x80
kernel: [47113.649464] [<c1257ba8>] __delay+0x18/0x30
kernel: [47113.649464] [<c125d956>] _raw_spin_lock+0xf6/0x170
kernel: [47113.649464] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47113.649464] [<c141b3d2>] _spin_lock_bh+0x62/0x90
kernel: [47113.649464] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47113.649464] [<f8097e94>] ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47113.649464] [<f8098057>] ppp_write+0xd7/0xe0 [ppp_generic]
kernel: [47113.649464] [<c10fafae>] vfs_write+0xbe/0x1c0
kernel: [47113.649464] [<f8097f80>] ? ppp_write+0x0/0xe0 [ppp_generic]
kernel: [47113.649464] [<c10fba38>] sys_write+0x58/0xa0
kernel: [47113.649464] [<c100339b>] sysenter_do_call+0x12/0x32
/USR/SBIN/CRON[6450]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
kernel: [47143.649461] INFO: RCU detected CPU 1 stall (t=14053096/12000 jiffies)
kernel: [47143.649461] Pid: 6052, comm: pppd Not tainted 2.6.31-rc1-dbgnv-git6 #1
kernel: [47143.649461] Call Trace:
kernel: [47143.649461] [<c1417a58>] ? printk+0x23/0x36
kernel: [47143.649461] [<c10a8220>] __rcu_pending+0x140/0x210
kernel: [47143.649461] [<c10a831a>] rcu_pending+0x2a/0x70
kernel: [47143.649461] [<c1051bdf>] update_process_times+0x3f/0x80
kernel: [47143.649461] [<c10714df>] tick_sched_timer+0x6f/0xf0
kernel: [47143.649461] [<c10640f6>] __run_hrtimer+0x56/0xe0
kernel: [47143.649461] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47143.649461] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47143.649461] [<c1064af5>] hrtimer_interrupt+0x145/0x270
kernel: [47143.649461] [<c1070ea6>] ? tick_program_event+0x36/0x60
kernel: [47143.649461] [<c101c4bc>] smp_apic_timer_interrupt+0x5c/0xb0
kernel: [47143.649461] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
kernel: [47143.649461] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
kernel: [47143.649461] [<c1257c65>] ? delay_tsc+0x45/0x80
kernel: [47143.649461] [<c1257ba8>] __delay+0x18/0x30
kernel: [47143.649461] [<c125d956>] _raw_spin_lock+0xf6/0x170
kernel: [47143.649461] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47143.649461] [<c141b3d2>] _spin_lock_bh+0x62/0x90
kernel: [47143.649461] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47143.649461] [<f8097e94>] ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47143.649461] [<f8098057>] ppp_write+0xd7/0xe0 [ppp_generic]
kernel: [47143.649461] [<c10fafae>] vfs_write+0xbe/0x1c0
kernel: [47143.649461] [<f8097f80>] ? ppp_write+0x0/0xe0 [ppp_generic]
kernel: [47143.649461] [<c10fba38>] sys_write+0x58/0xa0
kernel: [47143.649461] [<c100339b>] sysenter_do_call+0x12/0x32
pptp[6064]: anon log[logecho:pptp_ctrl.c:677]: Echo Reply received.
pptp[6064]: anon log[logecho:pptp_ctrl.c:679]: no more Echo Reply/Request packets will be reported.
kernel: [47173.649458] INFO: RCU detected CPU 1 stall (t=14062096/21000 jiffies)
kernel: [47173.649458] Pid: 6052, comm: pppd Not tainted 2.6.31-rc1-dbgnv-git6 #1
kernel: [47173.649458] Call Trace:
kernel: [47173.649458] [<c1417a58>] ? printk+0x23/0x36
kernel: [47173.649458] [<c10a8220>] __rcu_pending+0x140/0x210
kernel: [47173.649458] [<c10a831a>] rcu_pending+0x2a/0x70
kernel: [47173.649458] [<c1051bdf>] update_process_times+0x3f/0x80
kernel: [47173.649458] [<c10714df>] tick_sched_timer+0x6f/0xf0
kernel: [47173.649458] [<c10640f6>] __run_hrtimer+0x56/0xe0
kernel: [47173.649458] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47173.649458] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47173.649458] [<c1064af5>] hrtimer_interrupt+0x145/0x270
kernel: [47173.649458] [<c1070ea6>] ? tick_program_event+0x36/0x60
kernel: [47173.649458] [<c101c4bc>] smp_apic_timer_interrupt+0x5c/0xb0
kernel: [47173.649458] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
kernel: [47173.649458] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
kernel: [47173.649458] [<c1257c65>] ? delay_tsc+0x45/0x80
kernel: [47173.649458] [<c1257ba8>] __delay+0x18/0x30
kernel: [47173.649458] [<c125d956>] _raw_spin_lock+0xf6/0x170
kernel: [47173.649458] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47173.649458] [<c141b3d2>] _spin_lock_bh+0x62/0x90
kernel: [47173.649458] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47173.649458] [<f8097e94>] ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47173.649458] [<f8098057>] ppp_write+0xd7/0xe0 [ppp_generic]
kernel: [47173.649458] [<c10fafae>] vfs_write+0xbe/0x1c0
kernel: [47173.649458] [<f8097f80>] ? ppp_write+0x0/0xe0 [ppp_generic]
kernel: [47173.649458] [<c10fba38>] sys_write+0x58/0xa0
kernel: [47173.649458] [<c100339b>] sysenter_do_call+0x12/0x32
kernel: [47203.649454] INFO: RCU detected CPU 1 stall (t=14071096/30000 jiffies)
kernel: [47203.649456] Pid: 6052, comm: pppd Not tainted 2.6.31-rc1-dbgnv-git6 #1
kernel: [47203.649456] Call Trace:
kernel: [47203.649456] [<c1417a58>] ? printk+0x23/0x36
kernel: [47203.649456] [<c10a8220>] __rcu_pending+0x140/0x210
kernel: [47203.649456] [<c10a831a>] rcu_pending+0x2a/0x70
kernel: [47203.649456] [<c1051bdf>] update_process_times+0x3f/0x80
kernel: [47203.649456] [<c10714df>] tick_sched_timer+0x6f/0xf0
kernel: [47203.649456] [<c10640f6>] __run_hrtimer+0x56/0xe0
kernel: [47203.649456] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47203.649456] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47203.649456] [<c1064af5>] hrtimer_interrupt+0x145/0x270
kernel: [47203.649456] [<c101c4bc>] smp_apic_timer_interrupt+0x5c/0xb0
kernel: [47203.649456] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
kernel: [47203.649456] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
kernel: [47203.649456] [<c1257c65>] ? delay_tsc+0x45/0x80
kernel: [47203.649456] [<c1257ba8>] __delay+0x18/0x30
kernel: [47203.649456] [<c125d956>] _raw_spin_lock+0xf6/0x170
kernel: [47203.649456] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47203.649456] [<c141b3d2>] _spin_lock_bh+0x62/0x90
kernel: [47203.649456] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47203.649456] [<f8097e94>] ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47203.649456] [<f8098057>] ppp_write+0xd7/0xe0 [ppp_generic]
kernel: [47203.649456] [<c10fafae>] vfs_write+0xbe/0x1c0
kernel: [47203.649456] [<f8097f80>] ? ppp_write+0x0/0xe0 [ppp_generic]
kernel: [47203.649456] [<c10fba38>] sys_write+0x58/0xa0
kernel: [47203.649456] [<c100339b>] sysenter_do_call+0x12/0x32
kernel: [47233.649451] INFO: RCU detected CPU 1 stall (t=14080096/39000 jiffies)
kernel: [47233.649452] Pid: 6052, comm: pppd Not tainted 2.6.31-rc1-dbgnv-git6 #1
kernel: [47233.649452] Call Trace:
kernel: [47233.649452] [<c1417a58>] ? printk+0x23/0x36
kernel: [47233.649452] [<c10a8220>] __rcu_pending+0x140/0x210
kernel: [47233.649452] [<c10a831a>] rcu_pending+0x2a/0x70
kernel: [47233.649452] [<c1051bdf>] update_process_times+0x3f/0x80
kernel: [47233.649452] [<c10714df>] tick_sched_timer+0x6f/0xf0
kernel: [47233.649452] [<c10640f6>] __run_hrtimer+0x56/0xe0
kernel: [47233.649452] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47233.649452] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47233.649452] [<c1064af5>] hrtimer_interrupt+0x145/0x270
kernel: [47233.649452] [<c1070ea6>] ? tick_program_event+0x36/0x60
kernel: [47233.649452] [<c101c4bc>] smp_apic_timer_interrupt+0x5c/0xb0
kernel: [47233.649452] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
kernel: [47233.649452] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
kernel: [47233.649452] [<c1257c63>] ? delay_tsc+0x43/0x80
kernel: [47233.649452] [<c1257ba8>] __delay+0x18/0x30
kernel: [47233.649452] [<c125d956>] _raw_spin_lock+0xf6/0x170
kernel: [47233.649452] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47233.649452] [<c141b3d2>] _spin_lock_bh+0x62/0x90
kernel: [47233.649452] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47233.649452] [<f8097e94>] ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47233.649452] [<f8098057>] ppp_write+0xd7/0xe0 [ppp_generic]
kernel: [47233.649452] [<c10fafae>] vfs_write+0xbe/0x1c0
kernel: [47233.649452] [<f8097f80>] ? ppp_write+0x0/0xe0 [ppp_generic]
kernel: [47233.649452] [<c10fba38>] sys_write+0x58/0xa0
kernel: [47233.649452] [<c100339b>] sysenter_do_call+0x12/0x32
kernel: [47263.649448] INFO: RCU detected CPU 1 stall (t=14089096/48000 jiffies)
kernel: [47263.649449] Pid: 6052, comm: pppd Not tainted 2.6.31-rc1-dbgnv-git6 #1
kernel: [47263.649449] Call Trace:
kernel: [47263.649449] [<c1417a58>] ? printk+0x23/0x36
kernel: [47263.649449] [<c10a8220>] __rcu_pending+0x140/0x210
kernel: [47263.649449] [<c10a831a>] rcu_pending+0x2a/0x70
kernel: [47263.649449] [<c1051bdf>] update_process_times+0x3f/0x80
kernel: [47263.649449] [<c10714df>] tick_sched_timer+0x6f/0xf0
kernel: [47263.649449] [<c10640f6>] __run_hrtimer+0x56/0xe0
kernel: [47263.649449] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47263.649449] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47263.649449] [<c1064af5>] hrtimer_interrupt+0x145/0x270
kernel: [47263.649449] [<c1070ea6>] ? tick_program_event+0x36/0x60
kernel: [47263.649449] [<c101c4bc>] smp_apic_timer_interrupt+0x5c/0xb0
kernel: [47263.649449] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
kernel: [47263.649449] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
kernel: [47263.649449] [<c1257c65>] ? delay_tsc+0x45/0x80
kernel: [47263.649449] [<c1257ba8>] __delay+0x18/0x30
kernel: [47263.649449] [<c125d956>] _raw_spin_lock+0xf6/0x170
kernel: [47263.649449] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47263.649449] [<c141b3d2>] _spin_lock_bh+0x62/0x90
kernel: [47263.649449] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47263.649449] [<f8097e94>] ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47263.649449] [<f8098057>] ppp_write+0xd7/0xe0 [ppp_generic]
kernel: [47263.649449] [<c10fafae>] vfs_write+0xbe/0x1c0
kernel: [47263.649449] [<f8097f80>] ? ppp_write+0x0/0xe0 [ppp_generic]
kernel: [47263.649449] [<c10fba38>] sys_write+0x58/0xa0
kernel: [47263.649449] [<c100339b>] sysenter_do_call+0x12/0x32
kernel: [47293.649445] INFO: RCU detected CPU 1 stall (t=14098096/57000 jiffies)
kernel: [47293.649446] Pid: 6052, comm: pppd Not tainted 2.6.31-rc1-dbgnv-git6 #1
kernel: [47293.649446] Call Trace:
kernel: [47293.649446] [<c1417a58>] ? printk+0x23/0x36
kernel: [47293.649446] [<c10a8220>] __rcu_pending+0x140/0x210
kernel: [47293.649446] [<c10a831a>] rcu_pending+0x2a/0x70
kernel: [47293.649446] [<c1051bdf>] update_process_times+0x3f/0x80
kernel: [47293.649446] [<c10714df>] tick_sched_timer+0x6f/0xf0
kernel: [47293.649446] [<c10640f6>] __run_hrtimer+0x56/0xe0
kernel: [47293.649446] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47293.649446] [<c1071470>] ? tick_sched_timer+0x0/0xf0
kernel: [47293.649446] [<c1064af5>] hrtimer_interrupt+0x145/0x270
kernel: [47293.649446] [<c1070ea6>] ? tick_program_event+0x36/0x60
kernel: [47293.649446] [<c101c4bc>] smp_apic_timer_interrupt+0x5c/0xb0
kernel: [47293.649446] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
kernel: [47293.649446] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
kernel: [47293.649446] [<c1257c48>] ? delay_tsc+0x28/0x80
kernel: [47293.649446] [<c1257ba8>] __delay+0x18/0x30
kernel: [47293.649446] [<c125d956>] _raw_spin_lock+0xf6/0x170
kernel: [47293.649446] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47293.649446] [<c141b3d2>] _spin_lock_bh+0x62/0x90
kernel: [47293.649446] [<f8097e94>] ? ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47293.649446] [<f8097e94>] ppp_channel_push+0x24/0xd0 [ppp_generic]
kernel: [47293.649446] [<f8098057>] ppp_write+0xd7/0xe0 [ppp_generic]
kernel: [47293.649446] [<c10fafae>] vfs_write+0xbe/0x1c0
kernel: [47293.649446] [<f8097f80>] ? ppp_write+0x0/0xe0 [ppp_generic]
kernel: [47293.649446] [<c10fba38>] sys_write+0x58/0xa0
kernel: [47293.649446] [<c100339b>] sysenter_do_call+0x12/0x32

Sergey

Herbert Xu

unread,

Jul 6, 2009, 3:50:15 AM7/6/09

to

I think the first backtrace is bogus, this is where the real
problem is. So what see here is that on xmit from the network,
PPP will call pty_write from BH context which eventually leads
to tty_throttle and the sleep.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <her...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Herbert Xu

unread,

Jul 6, 2009, 4:40:12 AM7/6/09

to

On Mon, Jul 06, 2009 at 03:43:33PM +0800, Herbert Xu wrote:
>
> I think the first backtrace is bogus, this is where the real
> problem is. So what see here is that on xmit from the network,
> PPP will call pty_write from BH context which eventually leads
> to tty_throttle and the sleep.

Here's the stupidest possible fix for this, changing the mutex
to a spin lock. We should also revert the other commit that
tried to fix this, a6540f731d506d9e82444cf0020e716613d4c46c as
those unthrottles are really needed (the pty driver is always
in the throttled state, so that's why there are no explicit calls
to throttle).

tty: Use spin lock instead of mutex for throttling

Commit 38db89799bdf11625a831c5af33938dcb11908b6 (tty: throttling
race fix) added a mutex to the throttle/unthrottle paths. This
broke PPP because the PPP ldisc calls both throttle and unthrottle
from softirq contexts.

Since any tty device that's been used together with PPP has clearly
coped with throttling and unthrottling in softirq contexts, we can
turn the mutex into a spin lock.

The added protection of termios consistency does not appear to be
needed as it stands. When it is needed, we'll have to redo the
locking, or change the throttle/unthrottle implementation that
needs it.

Signed-off-by: Herbert Xu <her...@gondor.apana.org.au>

diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index a3afa0c..3ee1968 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -2727,6 +2727,7 @@ void initialize_tty_struct(struct tty_struct *tty,
mutex_init(&tty->echo_lock);
spin_lock_init(&tty->read_lock);
spin_lock_init(&tty->ctrl_lock);
+ spin_lock_init(&tty->throttle_lock);
INIT_LIST_HEAD(&tty->tty_files);
INIT_WORK(&tty->SAK_work, do_SAK_work);

diff --git a/drivers/char/tty_ioctl.c b/drivers/char/tty_ioctl.c
index b24f6c6..3b70a4e 100644
--- a/drivers/char/tty_ioctl.c
+++ b/drivers/char/tty_ioctl.c
@@ -22,6 +22,7 @@
#include <linux/bitops.h>
#include <linux/mutex.h>
#include <linux/smp_lock.h>
+#include <linux/spinlock.h>

#include <asm/io.h>
#include <asm/uaccess.h>
@@ -97,19 +98,18 @@ EXPORT_SYMBOL(tty_driver_flush_buffer);
* @tty: terminal
*
* Indicate that a tty should stop transmitting data down the stack.
- * Takes the termios mutex to protect against parallel throttle/unthrottle
- * and also to ensure the driver can consistently reference its own
- * termios data at this point when implementing software flow control.
+ * Takes the throttle lock to protect against parallel throttle and
+ * unthrottle.
*/

void tty_throttle(struct tty_struct *tty)
{
- mutex_lock(&tty->termios_mutex);
+ spin_lock_bh(&tty->throttle_lock);
/* check TTY_THROTTLED first so it indicates our state */
if (!test_and_set_bit(TTY_THROTTLED, &tty->flags) &&
tty->ops->throttle)
tty->ops->throttle(tty);
- mutex_unlock(&tty->termios_mutex);
+ spin_unlock_bh(&tty->throttle_lock);
}
EXPORT_SYMBOL(tty_throttle);

@@ -118,9 +118,8 @@ EXPORT_SYMBOL(tty_throttle);
* @tty: terminal
*
* Indicate that a tty may continue transmitting data down the stack.
- * Takes the termios mutex to protect against parallel throttle/unthrottle
- * and also to ensure the driver can consistently reference its own
- * termios data at this point when implementing software flow control.
+ * Takes the throttle lock to protect against parallel throttle and
+ * unthrottle.
*
* Drivers should however remember that the stack can issue a throttle,
* then change flow control method, then unthrottle.
@@ -128,11 +127,11 @@ EXPORT_SYMBOL(tty_throttle);

void tty_unthrottle(struct tty_struct *tty)
{
- mutex_lock(&tty->termios_mutex);
+ spin_lock_bh(&tty->throttle_lock);
if (test_and_clear_bit(TTY_THROTTLED, &tty->flags) &&
tty->ops->unthrottle)
tty->ops->unthrottle(tty);
- mutex_unlock(&tty->termios_mutex);
+ spin_unlock_bh(&tty->throttle_lock);
}
EXPORT_SYMBOL(tty_unthrottle);

diff --git a/include/linux/tty.h b/include/linux/tty.h
index 1488d8c..1dc2d6b 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -233,6 +233,7 @@ struct tty_struct {

struct mutex termios_mutex;
spinlock_t ctrl_lock;
+ spinlock_t throttle_lock;
/* Termios values are protected by the termios mutex */
struct ktermios *termios, *termios_locked;
struct termiox *termiox; /* May be NULL for unsupported */
diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h
index 3566129..c068940 100644
--- a/include/linux/tty_driver.h
+++ b/include/linux/tty_driver.h
@@ -128,7 +128,7 @@
* signal that no more characters should be sent to the tty.
*
* Optional: Always invoke via tty_throttle(), called under the
- * termios lock.
+ * throttle lock.
*
* void (*unthrottle)(struct tty_struct * tty);
*
@@ -137,7 +137,7 @@
* overrunning the input buffers of the line disciplines.
*
* Optional: Always invoke via tty_unthrottle(), called under the
- * termios lock.
+ * throttle lock.
*
* void (*stop)(struct tty_struct *tty);
*

Alan Cox

unread,

Jul 6, 2009, 6:00:24 AM7/6/09

to

> So what see here is that on xmit from the network,
> PPP will call pty_write from BH context which eventually leads
> to tty_throttle and the sleep.

That would I fit the symptoms and also explain why some broken versions
of the USB code on embedded devices also exploded with PPP over USB links.

Not however at all easy to fix as it only ever worked in the past on the
"close both eyes, pray and jump" principle of locking.

Added to the todo list for later this week. The pty layer has all sorts
of magic hackery to allow it to call back and forth without going via
internal tty buffering like normal hardware. That may have to die as it
causes other horrible problems too.

Alan

Michael Guntsche

unread,

Jul 6, 2009, 11:30:16 AM7/6/09

to

On Jul 6, 2009, at 10:31, Herbert Xu wrote:

> On Mon, Jul 06, 2009 at 03:43:33PM +0800, Herbert Xu wrote:
>>
>> I think the first backtrace is bogus, this is where the real
>> problem is. So what see here is that on xmit from the network,
>> PPP will call pty_write from BH context which eventually leads
>> to tty_throttle and the sleep.
>
> Here's the stupidest possible fix for this, changing the mutex
> to a spin lock. We should also revert the other commit that
> tried to fix this, a6540f731d506d9e82444cf0020e716613d4c46c as
> those unthrottles are really needed

<snip>

Just to confirm.
I reverted the commit and applied the patch on-top of 2.6.31-rc2.
Up to now I was not able to reproduce the crashes I had before on my
setup here.

Kind regards,
Michael

Alan Cox

unread,

Jul 6, 2009, 11:40:11 AM7/6/09

to

> Here's the stupidest possible fix for this, changing the mutex
> to a spin lock. We should also revert the other commit that

Crashes my box immediately. The unthrottle methods call sleeping driver
functions with real hardware so

NAK

If it was that simple I'd have done this ages ago. It has to be fixed
properly, and that is hard.

Alan

Alan Cox

unread,

Jul 6, 2009, 1:20:08 PM7/6/09

to

This is the more drastic way of doing it. I had hoped to put this off but
in fact it cleans stuff up enormously. This has had only basic testing
but the underlying idea is to simply remove all the pty special casing
that causes messes in the first place.

pty: Rework the pty layer to use the normal buffering logic

From: Alan Cox <al...@linux.intel.com>

This fixes the ppp problems and various other issues with call locking
caused by one side of a pty called in one locking context trying to match
another with differing rules on the other side. We also get a big slack
space to work with that means we can bury the flow control deadlock case
for any conceivable real world situation.

Signed-off-by: Alan Cox <al...@linux.intel.com>
---

drivers/char/pty.c | 156 ++++++++++++++++++++--------------------------------
1 files changed, 60 insertions(+), 96 deletions(-)

diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index daebe1b..5448320 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -75,114 +75,88 @@ static void pty_close(struct tty_struct *tty, struct file *filp)
*/

static void pty_unthrottle(struct tty_struct *tty)

{
- struct tty_struct *o_tty = tty->link;
-
- if (!o_tty)
- return;
-
- tty_wakeup(o_tty);
+ tty_wakeup(tty->link);
set_bit(TTY_THROTTLED, &tty->flags);
}

-/*
- * WSH 05/24/97: modified to
- * (1) use space in tty->flip instead of a shared temp buffer
- * The flip buffers aren't being used for a pty, so there's lots
- * of space available. The buffer is protected by a per-pty
- * semaphore that should almost never come under contention.
- * (2) avoid redundant copying for cases where count >> receive_room
- * N.B. Calls from user space may now return an error code instead of
- * a count.
+/**
+ * pty_space - report space left for writing
+ * @to: tty we are writing into
*
- * FIXME: Our pty_write method is called with our ldisc lock held but
- * not our partners. We can't just wait on the other one blindly without
- * risking deadlocks. At some point when everything has settled down we need
- * to look into making pty_write at least able to sleep over an ldisc change.
+ * The tty buffers allow 64K but we sneak a peak and clip at 8K this
+ * allows a lot of overspill room for echo and other fun messes to
+ * be handled properly
+ */
+
+static int pty_space(struct tty_struct *to)
+{
+ int n = 8192 - to->buf.memory_used;
+ if (n < 0)
+ return 0;
+ return n;
+}
+
+/**
+ * pty_write - write to a pty
+ * @tty: the tty we write from
+ * @buf: kernel buffer of data
+ * @count: bytes to write
*
- * The return on no ldisc is a bit counter intuitive but the logic works
- * like this. During an ldisc change the other end will flush its buffers. We
- * thus return the full length which is identical to the case where we had
- * proper locking and happened to queue the bytes just before the flush during
- * the ldisc change.
+ * Our "hardware" write method. Data is coming from the ldisc which
+ * may be in a non sleeping state. We simply throw this at the other
+ * end of the link as if we were an IRQ handler receiving stuff for
+ * the other side of the pty/tty pair.
*/
+
static int pty_write(struct tty_struct *tty, const unsigned char *buf,
int count)
{
struct tty_struct *to = tty->link;
- struct tty_ldisc *ld;
- int c = count;
-
- if (!to || tty->stopped)
+ int c;
+
+ if (tty->stopped)
return 0;
- ld = tty_ldisc_ref(to);
-
- if (ld) {
- c = to->receive_room;
- if (c > count)
- c = count;
- ld->ops->receive_buf(to, buf, NULL, c);
- tty_ldisc_deref(ld);
+
+ /* This isn't locked but our 8K is quite sloppy so no
+ big deal */
+
+ c = pty_space(to);
+ if (c > count)
+ c = count;
+ if (c > 0) {
+ /* Stuff the data into the input queue of the other end */
+ c = tty_insert_flip_string(to, buf, c);
+ /* And shovel */
+ tty_flip_buffer_push(to);
+ tty_wakeup(tty);
}
return c;
}

+/**
+ * pty_write_room - write space
+ * @tty: tty we are writing from
+ *
+ * Report how many bytes the ldisc can send into the queue for
+ * the other device.
+ */
+
static int pty_write_room(struct tty_struct *tty)
{
- struct tty_struct *to = tty->link;
-
- if (!to || tty->stopped)
- return 0;
-
- return to->receive_room;
+ return pty_space(tty->link);
}

-/*
- * WSH 05/24/97: Modified for asymmetric MASTER/SLAVE behavior
- * The chars_in_buffer() value is used by the ldisc select() function
- * to hold off writing when chars_in_buffer > WAKEUP_CHARS (== 256).
- * The pty driver chars_in_buffer() Master/Slave must behave differently:
- *
- * The Master side needs to allow typed-ahead commands to accumulate
- * while being canonicalized, so we report "our buffer" as empty until
- * some threshold is reached, and then report the count. (Any count >
- * WAKEUP_CHARS is regarded by select() as "full".) To avoid deadlock
- * the count returned must be 0 if no canonical data is available to be
- * read. (The N_TTY ldisc.chars_in_buffer now knows this.)
+/**
+ * pty_chars_in_buffer - characters currently in our tx queue
+ * @tty: our tty
*
- * The Slave side passes all characters in raw mode to the Master side's
- * buffer where they can be read immediately, so in this case we can
- * return the true count in the buffer.
+ * Report how much we have in the transmit queue. As everything is
+ * instantly at the other end this is easy to implement.
*/
+
static int pty_chars_in_buffer(struct tty_struct *tty)
{
- struct tty_struct *to = tty->link;
- struct tty_ldisc *ld;
- int count = 0;
-
- /* We should get the line discipline lock for "tty->link" */
- if (!to)
- return 0;
- /* We cannot take a sleeping reference here without deadlocking with
- an ldisc change - but it doesn't really matter */
- ld = tty_ldisc_ref(to);
- if (ld == NULL)
- return 0;
-
- /* The ldisc must report 0 if no characters available to be read */
- if (ld->ops->chars_in_buffer)
- count = ld->ops->chars_in_buffer(to);
-
- tty_ldisc_deref(ld);
-
- if (tty->driver->subtype == PTY_TYPE_SLAVE)
- return count;
-
- /* Master side driver ... if the other side's read buffer is less than
- * half full, return 0 to allow writers to proceed; otherwise return
- * the count. This leaves a comfortable margin to avoid overflow,
- * and still allows half a buffer's worth of typed-ahead commands.
- */
- return (count < N_TTY_BUF_SIZE/2) ? 0 : count;
+ return 0;
}

/* Set the lock flag on a pty */
@@ -202,20 +176,10 @@ static void pty_flush_buffer(struct tty_struct *tty)
{
struct tty_struct *to = tty->link;
unsigned long flags;
- struct tty_ldisc *ld;

if (!to)
return;
- ld = tty_ldisc_ref(to);
-
- /* The other end is changing discipline */
- if (!ld)
- return;
-
- if (ld->ops->flush_buffer)
- to->ldisc->ops->flush_buffer(to);
- tty_ldisc_deref(ld);
-
+ /* tty_buffer_flush(to); FIXME */
if (to->packet) {
spin_lock_irqsave(&tty->ctrl_lock, flags);
tty->ctrl_status |= TIOCPKT_FLUSHWRITE;

Michael Guntsche

unread,

Jul 6, 2009, 2:10:10 PM7/6/09

to

On Jul 6, 2009, at 19:17, Alan Cox wrote:

> This is the more drastic way of doing it. I had hoped to put this
> off but
> in fact it cleans stuff up enormously. This has had only basic testing
> but the underlying idea is to simply remove all the pty special casing
> that causes messes in the first place.

<snip>

Hi Alan,

Must commit a6540f731d506d9e82444cf0020e716613d4c46c still be reverted
with your patch or can it stay in?

Kind regards,
Michael

Alan Cox

unread,

Jul 6, 2009, 2:20:09 PM7/6/09

to

On Mon, 6 Jul 2009 20:07:34 +0200
Michael Guntsche <mi...@it-loops.com> wrote:

>
> On Jul 6, 2009, at 19:17, Alan Cox wrote:
>
> > This is the more drastic way of doing it. I had hoped to put this
> > off but
> > in fact it cleans stuff up enormously. This has had only basic testing
> > but the underlying idea is to simply remove all the pty special casing
> > that causes messes in the first place.
> <snip>
>
> Hi Alan,
>
> Must commit a6540f731d506d9e82444cf0020e716613d4c46c still be reverted
> with your patch or can it stay in?

Herbert is (as usual ;)) correct that it should be reverted.

Alan

Michael Guntsche

unread,

Jul 6, 2009, 2:40:09 PM7/6/09

to

On Jul 6, 2009, at 20:18, Alan Cox wrote:

> On Mon, 6 Jul 2009 20:07:34 +0200
> Michael Guntsche <mi...@it-loops.com> wrote:
>
>>
>> On Jul 6, 2009, at 19:17, Alan Cox wrote:
>>
>>> This is the more drastic way of doing it. I had hoped to put this
>>> off but
>>> in fact it cleans stuff up enormously. This has had only basic
>>> testing
>>> but the underlying idea is to simply remove all the pty special
>>> casing
>>> that causes messes in the first place.
>> <snip>
>>
>> Hi Alan,
>>
>> Must commit a6540f731d506d9e82444cf0020e716613d4c46c still be
>> reverted
>> with your patch or can it stay in?
>
> Herbert is (as usual ;)) correct that it should be reverted.

No crashes with the reverted commit and Alan's patch on my hardware
here.
A Load Test that would trigger it after some seconds is now running
for several minutes without any problems.

Michael