Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

hang on 2.6.33-rc4

1 view
Skip to first unread message

Norbert Preining

unread,
Jan 15, 2010, 10:30:01 AM1/15/10
to
Dear all,

kernel 2.6.33-rc4

I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
2.6.32.3 works fine.

I believe that it is related to the network, because sometimes I can
actually log in (gnomes session) and as soon as I do some network
related suddenly hard hang, not even Sysrq working anymore.

Interestingly it only happens at a specific AP where the ESSID is
hidden (at work). At home I can work without any problems (ESSID not
hidden).

Unfortunately I cannot set up a serial console or similar.

Is there still anything else I can provide you for tracking that down.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
POGES (pl.n.)
The lumps of dry powder that remain after cooking a packet soup.
--- Douglas Adams, The Meaning of Liff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

reinette chatre

unread,
Jan 15, 2010, 12:40:01 PM1/15/10
to
Hi Norbert,

On Fri, 2010-01-15 at 07:22 -0800, Norbert Preining wrote:
> kernel 2.6.33-rc4
>
> I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> 2.6.32.3 works fine.
>
> I believe that it is related to the network, because sometimes I can
> actually log in (gnomes session) and as soon as I do some network
> related suddenly hard hang, not even Sysrq working anymore.
>
> Interestingly it only happens at a specific AP where the ESSID is
> hidden (at work). At home I can work without any problems (ESSID not
> hidden).
>
> Unfortunately I cannot set up a serial console or similar.

Does that mean no netconsole either? Does anything show up in the logs?
Is it easy to reproduce? If so, perhaps you can have increased debug at
that time and hopefully something will be captured in the logs when the
problem occurs.

>
> Is there still anything else I can provide you for tracking that down.

Can you try to boot without X and attempt a command line association
(using iw, iwconfig or wpa_supplicant) to reproduce?

Reinette

Norbert Preining

unread,
Jan 15, 2010, 9:40:02 PM1/15/10
to
On Fr, 15 Jan 2010, reinette chatre wrote:
> Can you try to boot without X and attempt a command line association
> (using iw, iwconfig or wpa_supplicant) to reproduce?

I try on Monday back at work.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------

PIDDLETRENTHIDE (n.)
A trouser stain caused by a wimbledon (q.v.). Not to be confused with
a botley (q.v.)


--- Douglas Adams, The Meaning of Liff

Norbert Preining

unread,
Jan 16, 2010, 1:40:03 PM1/16/10
to
On Fr, 15 Jan 2010, reinette chatre wrote:
> > kernel 2.6.33-rc4
> >
> > I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> > 2.6.32.3 works fine.
> >
> > I believe that it is related to the network, because sometimes I can
> > actually log in (gnomes session) and as soon as I do some network
> > related suddenly hard hang, not even Sysrq working anymore.
> >
> > Interestingly it only happens at a specific AP where the ESSID is
> > hidden (at work). At home I can work without any problems (ESSID not
> > hidden).
> >
> > Unfortunately I cannot set up a serial console or similar.
>
> Does that mean no netconsole either? Does anything show up in the logs?
> Is it easy to reproduce? If so, perhaps you can have increased debug at
> that time and hopefully something will be captured in the logs when the
> problem occurs.

Before I can test this on monday, something else, I just got BUG_ON:
Jan 17 03:28:58 mithrandir kernel: [34535.207253] iwlagn 0000:06:00.0: iwl_tx_agg_start on ra = 00:0a:79:eb:56:10 tid = 0
Jan 17 03:28:58 mithrandir kernel: [34535.331218] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=139, seq_idx=3435, seq=54960
Jan 17 03:28:58 mithrandir kernel: [34535.331275] iwlagn 0000:06:00.0: Received BA when not expected
Jan 17 03:28:58 mithrandir kernel: [34535.331816] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=146, seq_idx=3442, seq=55072
Jan 17 03:28:58 mithrandir kernel: [34535.331915] iwlagn 0000:06:00.0: Received BA when not expected
Jan 17 03:28:58 mithrandir kernel: [34535.332419] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=170, seq_idx=3466, seq=55456

Actually many many many of these lines.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------

LARGOWARD (n.)
Motorists' name for the kind of pedestrian who stands beside a main
road and waves on the traffic, as if it's their right of way.


--- Douglas Adams, The Meaning of Liff

reinette chatre

unread,
Jan 18, 2010, 12:10:01 PM1/18/10
to
On Sat, 2010-01-16 at 10:30 -0800, Norbert Preining wrote:
> On Fr, 15 Jan 2010, reinette chatre wrote:
> > > kernel 2.6.33-rc4
> > >
> > > I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> > > 2.6.32.3 works fine.
> > >
> > > I believe that it is related to the network, because sometimes I can
> > > actually log in (gnomes session) and as soon as I do some network
> > > related suddenly hard hang, not even Sysrq working anymore.
> > >
> > > Interestingly it only happens at a specific AP where the ESSID is
> > > hidden (at work). At home I can work without any problems (ESSID not
> > > hidden).
> > >
> > > Unfortunately I cannot set up a serial console or similar.
> >
> > Does that mean no netconsole either? Does anything show up in the logs?
> > Is it easy to reproduce? If so, perhaps you can have increased debug at
> > that time and hopefully something will be captured in the logs when the
> > problem occurs.
>
> Before I can test this on monday, something else, I just got BUG_ON:
> Jan 17 03:28:58 mithrandir kernel: [34535.207253] iwlagn 0000:06:00.0: iwl_tx_agg_start on ra = 00:0a:79:eb:56:10 tid = 0
> Jan 17 03:28:58 mithrandir kernel: [34535.331218] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=139, seq_idx=3435, seq=54960
> Jan 17 03:28:58 mithrandir kernel: [34535.331275] iwlagn 0000:06:00.0: Received BA when not expected
> Jan 17 03:28:58 mithrandir kernel: [34535.331816] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=146, seq_idx=3442, seq=55072
> Jan 17 03:28:58 mithrandir kernel: [34535.331915] iwlagn 0000:06:00.0: Received BA when not expected
> Jan 17 03:28:58 mithrandir kernel: [34535.332419] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=170, seq_idx=3466, seq=55456
>
> Actually many many many of these lines.
>

What you are seeing here is currently being looked into at
http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2098 - could you
please add your information there?

Reinette

Norbert Preining

unread,
Jan 18, 2010, 12:20:03 PM1/18/10
to
Hi Reinette,

On Mo, 18 Jan 2010, reinette chatre wrote:
> > > Does that mean no netconsole either? Does anything show up in the logs?
> > > Is it easy to reproduce? If so, perhaps you can have increased debug at
> > > that time and hopefully something will be captured in the logs when the
> > > problem occurs.

I tried it today, but had "real work" (university job) to do. It worked
and I found out that it happend (up to now) *NOT* when I was only doing
a ping on a server, but when I ssh-ed into my server it hang.

More testing tomorrow (here it is already 2am).

BTW, logs were empty, unfortunately, complete hard hang.

> > Jan 17 03:28:58 mithrandir kernel: [34535.332419] iwlagn 0000:06:00.0: BUG_ON idx doesn't match seq control idx=170, seq_idx=3466, seq=55456
> >
> > Actually many many many of these lines.
> >
>
> What you are seeing here is currently being looked into at
> http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2098 - could you
> please add your information there?

I did that, although I was not sure what information to provide.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------

THURNBY (n.)
A rucked-up edge of carpet or linoleum which everyone says someone
will trip over and break a leg unless it gets fixed. After a year or
two someone trips over it and breaks a leg.


--- Douglas Adams, The Meaning of Liff

Johannes Berg

unread,
Jan 19, 2010, 3:30:02 PM1/19/10
to
On Tue, 2010-01-19 at 09:01 -0800, reinette chatre wrote:

> > If you want me to create a bug report or you create one in bugzilla,
> > I can also upload it htere, but I attach it for now.
>
> I see that it fails in skb_pull after being called from one of the RX
> handlers. Let's add Johannes.
>
> Johannes, does anything perhaps look familiar to you in this trace?

Sorry, no, seems weird. The trace is not very useful unfortunately, is
this with CONFIG_FRAME_POINTER?

johannes

signature.asc

Norbert Preining

unread,
Jan 19, 2010, 7:40:02 PM1/19/10
to
On Di, 19 Jan 2010, Johannes Berg wrote:
> Sorry, no, seems weird. The trace is not very useful unfortunately, is
> this with CONFIG_FRAME_POINTER?

# CONFIG_FRAME_POINTER is not set

Do you need it?

Other things for the .config needed?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------

GREAT TOSSON (n.)
A fat book containing four words and six cartoons which cost ᅵ6.95.


--- Douglas Adams, The Meaning of Liff

Zhu Yi

unread,
Jan 20, 2010, 4:40:02 AM1/20/10
to
On Mon, 2010-01-18 at 22:47 -0700, Norbert Preining wrote:
> Hi Reinette,

>
> On Fr, 15 Jan 2010, reinette chatre wrote:
> > > I am having repeatable complete hard lockups on my laptop with 2.6.33-rc4.
> > > 2.6.32.3 works fine.
> > >
> > > I believe that it is related to the network, because sometimes I can
> > > actually log in (gnomes session) and as soon as I do some network
> > > related suddenly hard hang, not even Sysrq working anymore.
> > >
> > > Interestingly it only happens at a specific AP where the ESSID is
> > > hidden (at work). At home I can work without any problems (ESSID not
> > > hidden).
> > >
> > > Unfortunately I cannot set up a serial console or similar.
> >
> > Does that mean no netconsole either? Does anything show up in the logs?
> > Is it easy to reproduce? If so, perhaps you can have increased debug at
> > that time and hopefully something will be captured in the logs when the
> > problem occurs.
>
> Ok, I can confirm that setting up the network is not the problem, nor
> is it pinging other hosts. But ssh-ing into another server
> made it go boom. From the screenshot I attach it looks like something
> in TCP code (that explains why it does not happen in pings), below
> I see tcp_data_snd_check
>
> I managed to swithc in time to a console with tail -f syslog before
> it hard locked up. The log files are empty, but I got a screenshot photo
> which has some hopefully useful information. I cannot scroll up or down
> anymore ...

Looks like this this is the BUG_ON in skb_pull. Please try if this patch
help? BTW, are you using swiotlb?

diff --git a/drivers/net/wireless/iwlwifi/iwl-rx.c b/drivers/net/wireless/iwlwifi/iwl-rx.c
index 6f36b6e..2f8978f 100644
--- a/drivers/net/wireless/iwlwifi/iwl-rx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-rx.c
@@ -1031,6 +1031,11 @@ void iwl_rx_reply_rx(struct iwl_priv *priv,
return;
}

+ if (len < ieee80211_hdrlen(header->frame_control)) {
+ IWL_DEBUG_RX(priv, "Packet size is too small %d\n", len);
+ return;
+ }
+
/* This will be used in several places later */
rate_n_flags = le32_to_cpu(phy_res->rate_n_flags);

Johannes Berg

unread,
Jan 20, 2010, 5:00:02 AM1/20/10
to
On Wed, 2010-01-20 at 01:36 +0100, Norbert Preining wrote:
> On Di, 19 Jan 2010, Johannes Berg wrote:
> > Sorry, no, seems weird. The trace is not very useful unfortunately,
> is
> > this with CONFIG_FRAME_POINTER?
>
> # CONFIG_FRAME_POINTER is not set
>
> Do you need it?

The stacktrace would be a lot more useful with it set, yes. Other than
that, I don't know. If there's a way to make your display resolution
higher that might be useful so more info fits on the screen, or maybe
trimming the stack trace depth (though I don't know if that's possible,
I do know it is on powerpc because I added it there but not sure on x86)

All assuming you can reproduce this issue, of course.

johannes

signature.asc

Norbert Preining

unread,
Jan 20, 2010, 7:30:01 PM1/20/10
to
Dear all,

On Mi, 20 Jan 2010, Zhu Yi wrote:
> Looks like this this is the BUG_ON in skb_pull. Please try if this patch
> help? BTW, are you using swiotlb?

On Mi, 20 Jan 2010, Johannes Berg wrote:
> > # CONFIG_FRAME_POINTER is not set
>

> The stacktrace would be a lot more useful with it set, yes. Other than
> that, I don't know. If there's a way to make your display resolution
> higher that might be useful so more info fits on the screen, or maybe
> trimming the stack trace depth (though I don't know if that's possible,
> I do know it is on powerpc because I added it there but not sure on x86)
>
> All assuming you can reproduce this issue, of course.


@Zhu: the patch didn't help. I patched it into the kernel plus activated
CONFIG_FRAME_POINTER which led to the same hang (not surprisingly, the
patch does only debug more ;-)

This time unfortunately I there was too much output to actually capture it.

@Johannes: 100% reproducible. Everytime I boot into 33-rc4 and ssh into
any remote place it goes boom. 100%.

Maybe another tidbig might help: With 2.6.32.3 it happens that I have
hickups with WLAN:
[ 996.514491] iwlagn 0000:06:00.0: iwl_tx_agg_start on ra = 00:24:c4:ab:bb:42 tid = 0
and the connections needs 10-20secs (hard to guess) until it is
back alive.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------

AITH (n.)
The single bristle that sticks out sideways on a cheap paintbrush.


--- Douglas Adams, The Meaning of Liff

John Ranson

unread,
Jan 21, 2010, 1:20:02 PM1/21/10
to
Do you have a point and shoot camera that can shoot video? I've used
one in the past to capture debug info that scrolls by too quickly.

John

> ------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for Conference
> attendees to learn about information security's most important issues through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> _______________________________________________
> Ipw3945-devel mailing list
> Ipw394...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ipw3945-devel

Norbert Preining

unread,
Jan 27, 2010, 10:40:02 AM1/27/10
to
Hi everyone,

On Mi, 20 Jan 2010, Zhu Yi wrote:

> Looks like this this is the BUG_ON in skb_pull. Please try if this patch
> help? BTW, are you using swiotlb?

As said, no it does not help.

I am currently running 2.6.33-rc5 and that bug is in my work place
100% reproducible.

Anything I can do more?

Should we move that to a bugzilla entry?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------

CAMER (n.)
A mis-tossed caber.


--- Douglas Adams, The Meaning of Liff

reinette chatre

unread,
Jan 27, 2010, 12:00:02 PM1/27/10
to
On Wed, 2010-01-27 at 07:37 -0800, Norbert Preining wrote:

> Should we move that to a bugzilla entry?
>

Please do. Thank you very much

Reinette

Norbert Preining

unread,
Jan 28, 2010, 3:50:01 AM1/28/10
to
On Mi, 27 Jan 2010, reinette chatre wrote:
> > Should we move that to a bugzilla entry?
> >
>
> Please do. Thank you very much

Done that,
http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2155

Should that be a bug in the kernel bugzilla as regression, too?
I mean 2.6.32.N does not suffer from that.

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan TU Wien, Austria Debian TeX Task Force
DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------

OSHKOSH (n., vb.)
The noise made by someone who has just been grossly flattered and is
trying to make light of it.


--- Douglas Adams, The Meaning of Liff

reinette chatre

unread,
Jan 28, 2010, 11:30:02 AM1/28/10
to
On Thu, 2010-01-28 at 00:41 -0800, Norbert Preining wrote:

> Done that,
> http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2155

Thank you.

>
> Should that be a bug in the kernel bugzilla as regression, too?
> I mean 2.6.32.N does not suffer from that.

It is easier for our team to track bugs in our bugzilla. That is where
we will be working on resolving this issue. If you would like to submit
a kernel bugzilla for the purpose of tracking a regression you are
welcome to, please then just use it for that purpose and point people to
our bugzilla for the details and latest status of this issue.

Thank you

Reinette

0 new messages