Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: 3.1-rc6+ rtl8192se issue

32 views
Skip to first unread message

Larry Finger

unread,
Sep 15, 2011, 11:30:02 AM9/15/11
to
On 09/15/2011 07:44 AM, Borislav Petkov wrote:
> Hi Larry,
>
> I'm experiencing an issue with rtl8192se since 3.1-ish timeframe where
> the machine becomes completely unresponsive and only a reboot helps
> the situation. I think the issue has to do with the rtl8192se wireless
> driver because if I connect the machine through ethernet, it runs pretty
> smoothly.
>
> Today, I left the machine on the vt console in the expectation of an
> oops or something to appear in the logs and logged into it from another
> machine over ssh.
>
> After a while, the unresponsiveness happened and the box didn't
> react to keyboard input except sysrq with which I was able to do the
> show-backtrace-all-active-cpus(L) thing and attached is a partial
> screen cap of that. It looks like the stuck-up happens somewhere in
> rtl_lps_leave() along the rtl92s_phy_set_rf_power_state() path but the
> register dump is missing with the exact %rIP.
>
> Anyway, pls take a look and let me know if it rings any bells. I'll
> continue trying to debug the issue, maybe I should bisect it if nothing
> else pops up.

Borislav,

Thanks for the report. I have been running rtl8192se for the past few days and I
have also noticed two such system freezes, but not been able to capture any
info. As I have recently made many changes in my system recently, I did not know
what might be the cause, but rtl8192se is certainly on the suspect list.

I have added Chaoming Li to the Cc list. I will send him the screen photo
separately.

Some questions:

I expect that you are running a mainline kernel from Linus's tree. If not,
please let me know. Mine is 3.1-rc4 from the wireless-testing tree. I don't
recall any changes in out "next-flavored" version that are not in 3.1.

Which flavor of RTL8192SE card do you have? The one I'm running shows as
"Realtek Semiconductor Co., Ltd. RTL8191SEvB Wireless LAN Controller [10ec:8172]
(rev 10)", but I have two others. The differences are in the number of TX and RX
streams. Mine is the 1x2 variety.

How frequently do your freezes occur? As I said before, I have only had two in 2
or 3 days, which would make bisection tricky.

I see from the dump that you have x86_64 architecture. How many CPUs and how
fast? There is one questionable report of problems on a box with an 8-way fast
processor. That was on initialization and is not the same, but may indicate a
problem. My system has a dual AMD CPU at 2.0 GHz.

Would you also try loading rtl8192se with the "ips=0" option? As power save is
implicated in your traceback, that may help. I will be trying "swlps=0 ips=0".

Thanks for the report,

Larry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Borislav Petkov

unread,
Sep 15, 2011, 2:50:02 PM9/15/11
to
On Thu, Sep 15, 2011 at 10:23:29AM -0500, Larry Finger wrote:
> Thanks for the report. I have been running rtl8192se for the past
> few days and I have also noticed two such system freezes, but not

Ah ok, so it's not only me seeing this.

> I expect that you are running a mainline kernel from Linus's tree.

It is Linus' tree: v3.1-rc6-10-g003f6c9

> If not, please let me know. Mine is 3.1-rc4 from the
> wireless-testing tree. I don't recall any changes in out
> "next-flavored" version that are not in 3.1.
>
> Which flavor of RTL8192SE card do you have? The one I'm running
> shows as "Realtek Semiconductor Co., Ltd. RTL8191SEvB Wireless LAN
> Controller [10ec:8172] (rev 10)", but I have two others. The
> differences are in the number of TX and RX streams. Mine is the 1x2
> variety.

lspci says:

03:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8191SEvB Wireless LAN Controller [10ec:8172] (rev 10)

so it is exactly the same as yours.

> How frequently do your freezes occur? As I said before, I have only
> had two in 2 or 3 days, which would make bisection tricky.

Well, I can reproduce it pretty reliably: it happens shortly after I
up the iface and establish the WIFI connection with wpa_supplicant. A
couple of minutes after that, more or less, the box grinds down to a
halt.

> I see from the dump that you have x86_64 architecture. How many CPUs
> and how fast? There is one questionable report of problems on a box
> with an 8-way fast processor. That was on initialization and is not
> the same, but may indicate a problem. My system has a dual AMD CPU
> at 2.0 GHz.

I don't think that has any effect on the wifi iface but here it is: dual
core K8 laptop:

...
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 107
model name : AMD Turion(tm) Neo X2 Dual Core Processor L625
stepping : 2
cpu MHz : 800.000

this is of cource the lowest P-state freq - P0 is 1.6GHz.

> Would you also try loading rtl8192se with the "ips=0" option? As
> power save is implicated in your traceback, that may help. I will be
> trying "swlps=0 ips=0".

Ok, I'll run both just in case and let you know.

Thanks for looking into this.

--
Regards/Gruss,
Boris.

Borislav Petkov

unread,
Sep 16, 2011, 2:10:01 PM9/16/11
to
On Thu, Sep 15, 2011 at 10:23:29AM -0500, Larry Finger wrote:
> Would you also try loading rtl8192se with the "ips=0" option? As power
> save is implicated in your traceback, that may help. I will be trying
> "swlps=0 ips=0".

Ok, "ips=0" seems to fix the issue ... almost. I say, almost because I
had only one hang so far for running the box for a day today. Will try
together with "swlps=0" next week.

Thanks.

--
Regards/Gruss,
Boris.

Borislav Petkov

unread,
Sep 19, 2011, 6:00:02 AM9/19/11
to
On Mon, Sep 19, 2011 at 12:43:45PM +0800, 李朝明 wrote:
> Dear sir:
>
> Can you give me the screen pcitrue after crash again with ips = 0;
> or swlps=0 ips=0.
> Thank you!

I just started a "swlps=0 ips=0" test and will try to capture something
if the slowdown happens again - I can't promise you a whole stack trace
because the sysrq output doesn't fit on the screen fully...

Stay tuned.

Borislav Petkov

unread,
Sep 19, 2011, 1:20:01 PM9/19/11
to
On Mon, Sep 19, 2011 at 12:43:45PM +0800, 李朝明 wrote:
> Dear sir:
>
> Can you give me the screen pcitrue after crash again with ips = 0;
> or swlps=0 ips=0.

FWIW,

the box has been running stable a whole workday today with "swlps=0
ips=0" I'll run it only with "ips=0" tomorrow to try the reproduce the
sluggishness again.

Borislav Petkov

unread,
Sep 22, 2011, 4:00:02 AM9/22/11
to
On Wed, Sep 21, 2011 at 10:29:15PM -0500, Larry Finger wrote:
> On 09/21/2011 08:24 PM, 李朝明 wrote:
> > Dear Sir:
> >
> > I can't find _rtl_pci_lps_leave_tasklet in my driver, So I want to
> > kown which driver did you use.
> > Would you like to try this new driver with ips =0 and lps = 0, or
> > some combination of these two functions.
>
> It is not in my driver either. Where did that driver come from?

That's actually _rtl_pci_ips_leave_tasklet with an "i"
in "_ips_" and it is a wrapper around rtl_lps_leave() in
<drivers/net/wireless/rtlwifi/pci.c>

Basically, that's the tasklet handler for ips_leave_tasklet regged in
_rtl_pci_init_struct():

tasklet_init(&rtlpriv->works.ips_leave_tasklet,
(void (*)(unsigned long))_rtl_pci_ips_leave_tasklet,
(unsigned long)hw);


The sluggishness is consistent with the tasklet choking on something,
from looking at rtl_lps_leave() it grabs some spinlocks and then enables
IRQs in the middle of it with a very explanatory comment /* FIXME */
ontop of it which looks very suspicious to me:

/*Leave the leisure power save mode.*/
void rtl_lps_leave(struct ieee80211_hw *hw)
{
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
struct rtl_hal *rtlhal = rtl_hal(rtl_priv(hw));

spin_lock(&rtlpriv->locks.lps_lock);

if (ppsc->fwctrl_lps) {
if (ppsc->dot11_psmode != EACTIVE) {

/*FIX ME */
rtlpriv->cfg->ops->enable_interrupt(hw);
...

But since I don't know anything about networking drivers, I'm actually
hoping that you guys could have an idea here.

HTH.

Borislav Petkov

unread,
Sep 23, 2011, 6:40:02 AM9/23/11
to
On Fri, Sep 23, 2011 at 06:21:07PM +0800, 李朝明 wrote:
> Please set ips =0 and try again..

What does that mean?

I can trigger the grinding-to-a-halt reliably with "ips=0" - it only
takes a couple of hours of network traffic. Also, I don't want to try
the driver you sent me because the version in the kernel needs fixing
not some out-of-tree codebase.

So please clarify your request.

Thanks.

Larry Finger

unread,
Sep 23, 2011, 10:00:03 AM9/23/11
to
On 09/23/2011 05:33 AM, Borislav Petkov wrote:
> On Fri, Sep 23, 2011 at 06:21:07PM +0800, 李朝明 wrote:
>> Please set ips =0 and try again..
>
> What does that mean?
>
> I can trigger the grinding-to-a-halt reliably with "ips=0" - it only
> takes a couple of hours of network traffic. Also, I don't want to try
> the driver you sent me because the version in the kernel needs fixing
> not some out-of-tree codebase.

Does the likelihood of the failure change when "ips=0" is used?

The Realtek group made several changes in the driver that Chaoming sent you that
have not yet been incorporated in the kernel version. If you test that driver,
we might learn if any of them are important to your problem. As neither of us
can duplicate your results, it is not possible for us to do those tests.

I agree that we want to fix the kernel version. It is unfortunate that Realtek
does not generate their improvements as patches to that kernel version, and
publish them that way, but that is a fact of life. When they produce a new
version, I have to look at the diff file between it and the previous version and
test those differences with my devices. Thus far, there have been no changes
that have any effect on my system, but who knows on yours. Please run the test
as Chaoming asked you to do.

Thanks,

Larry

Borislav Petkov

unread,
Sep 28, 2011, 9:20:02 AM9/28/11
to
On Fri, Sep 23, 2011 at 11:34:12AM -0500, Larry Finger wrote:
> On 09/23/2011 05:33 AM, Borislav Petkov wrote:
> >On Fri, Sep 23, 2011 at 06:21:07PM +0800, 李朝明 wrote:
> >>Please set ips =0 and try again..
> >
> >What does that mean?
> >
> >I can trigger the grinding-to-a-halt reliably with "ips=0" - it only
> >takes a couple of hours of network traffic. Also, I don't want to try
> >the driver you sent me because the version in the kernel needs fixing
> >not some out-of-tree codebase.
>
> I got a chance to review the rtl8192se part of the changes in that
> 08/16/2011 version. Attached is a patch to update the kernel
> version.
>
> A prerequisite is:
>
> commit da3ba88a9996cd64c6768bed5727e02da81e2c8d
> Author: Larry Finger <Larry....@lwfinger.net>
> Date: Mon Sep 19 14:34:10 2011 -0500
>
> rtlwifi: Combine instances of RTL_HAL_IS_CCK_RATE macros.
>
> Three drivers, rtl8192ce, rtl8192cu and rtl8192de, use the same macro
> to check if a particular rate is in the CCK set. This common code is
> relocated to a common header file. A distinct macro used by rtl8192se
> with the same name is renamed.
>
> Signed-off-by: Larry Finger <Larry....@lwfinger.net>
> Signed-off-by: John W. Linville <linv...@tuxdriver.com>

Ok, here's what I did.

* merge 'master' branch of
git://git.infradead.org/users/linville/wireless-next.git with -rc8 in
order to get da3ba88a9996.

* apply your attached patch with the Realsil facelift:

patching file drivers/net/wireless/rtlwifi/rtl8192se/hw.c
patching file drivers/net/wireless/rtlwifi/rtl8192se/reg.h
patching file drivers/net/wireless/rtlwifi/rtl8192se/sw.c
patching file drivers/net/wireless/rtlwifi/rtl8192se/trx.c
Hunk #6 FAILED at 540.
1 out of 6 hunks FAILED -- saving rejects to file drivers/net/wireless/rtlwifi/rtl8192se/trx.c.rej
patching file drivers/net/wireless/rtlwifi/wifi.h
Hunk #1 succeeded at 1328 (offset 3 lines).

(had to apply hunk #6 by hand though)

* build

drivers/net/wireless/rtlwifi/rtl8192se/sw.c:307:8: error: `EFUSE_OOB_PROTECT_BYTES_LEN' undeclared here (not in a function)
drivers/net/wireless/rtlwifi/rtl8192se/sw.c:307:2: error: array index in initializer not of integer type
drivers/net/wireless/rtlwifi/rtl8192se/sw.c:307:2: error: (near initialization for `rtl92se_hal_cfg.maps')
make[5]: *** [drivers/net/wireless/rtlwifi/rtl8192se/sw.o] Error 1
make[4]: *** [drivers/net/wireless/rtlwifi/rtl8192se] Error 2
make[3]: *** [drivers/net/wireless/rtlwifi] Error 2
make[2]: *** [drivers/net/wireless] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....

fix with

diff --git a/drivers/net/wireless/rtlwifi/wifi.h b/drivers/net/wireless/rtlwifi/wifi.h
index 615f6b4..55428f2 100644
--- a/drivers/net/wireless/rtlwifi/wifi.h
+++ b/drivers/net/wireless/rtlwifi/wifi.h
@@ -450,6 +450,7 @@ enum rtl_var_map {
EFUSE_HWSET_MAX_SIZE,
EFUSE_MAX_SECTION_MAP,
EFUSE_REAL_CONTENT_SIZE,
+ EFUSE_OOB_PROTECT_BYTES_LEN,

/*CAM map */
RWCAM,

> I have run the new version for a couple of hours without problems.
> Perhaps it will cure your difficulty, but I am not optimistic.

Yeah, I'm sorry to confirm that your pessimism turned into realism :-).
It froze in under 20 mins. Ran the driver with default module parameters
though.

Looks like I'll have to test the Realsil tarball after all. Question, do
I simply overwrite the subtree under drivers/net/wireless/rtlwifi/ with
the files from the tarball?

Also, any preferred module parameters settings you want me to test?

Larry Finger

unread,
Sep 28, 2011, 10:00:02 PM9/28/11
to
Asa you probably do not want to kill your standard tree, unpack the Realtek
version normally, cd to where it unpacked, and do a make. Once it builds, do the
following:

sudo modprobe -rv rtl8192se
sudo modprobe -v mac80211
sudo insmod rtlwifi.ko
sudo insmod rtl8192se/rtl8192se.ko

>
> Also, any preferred module parameters settings you want me to test?

Try it first with the default parameters. If that fails, use "ips=0".

Larry

Borislav Petkov

unread,
Oct 5, 2011, 11:20:01 AM10/5/11
to
On Wed, Sep 28, 2011 at 08:57:38PM -0500, Larry Finger wrote:
> Asa you probably do not want to kill your standard tree, unpack the
> Realtek version normally, cd to where it unpacked, and do a make.
> Once it builds, do the following:
>
> sudo modprobe -rv rtl8192se
> sudo modprobe -v mac80211
> sudo insmod rtlwifi.ko
> sudo insmod rtl8192se/rtl8192se.ko
>
> >
> >Also, any preferred module parameters settings you want me to test?
>
> Try it first with the default parameters. If that fails, use "ips=0".

Ok, I can cautiosly say now that after a couple days of running the
Realtek version that the box runs just fine, no hiccups whatsoever.

Larry, you said in an earlier mail that you've gone through the
rtl8192se changes and weren't optimistic with the attached diff you sent
me. What about the rtlwifi changes, is there something in Realtek's
version which is missing upstream that would cause the sluggishness?

Thanks.

--
Regards/Gruss,
Boris.

Larry Finger

unread,
Oct 5, 2011, 10:40:01 PM10/5/11
to
On 10/05/2011 10:15 AM, Borislav Petkov wrote:

> Ok, I can cautiosly say now that after a couple days of running the
> Realtek version that the box runs just fine, no hiccups whatsoever.
>
> Larry, you said in an earlier mail that you've gone through the
> rtl8192se changes and weren't optimistic with the attached diff you sent
> me. What about the rtlwifi changes, is there something in Realtek's
> version which is missing upstream that would cause the sluggishness?

I have been going though the differences between the 06/20/2011 and 08/16/2011
drivers and making those changes to the kernel drivers. Could you please apply
the 5 attached patches to the wireless-testing tree and see if your sluggishness
is fixed?

Thanks,

Larry

0001-rtlwifi-Change-PCI-drivers-to-use-the-new-PM-framewo.patch
0002-rtlwifi-Update-to-new-Realtek-version-Part-I.patch
0003-rtlwifi-rtl8192ce-Add-new-chip-revisions.patch
0004-rtlwifi-rtl8192se-Updates-from-latest-Realtek-driver.patch
0005-rtlwifi-rtl8192de-Updates-from-latest-Reaktek-driver.patch

Borislav Petkov

unread,
Nov 15, 2011, 9:10:01 AM11/15/11
to
On Wed, Oct 05, 2011 at 09:37:25PM -0500, Larry Finger wrote:
> I have been going though the differences between the 06/20/2011 and
> 08/16/2011 drivers and making those changes to the kernel drivers.
> Could you please apply the 5 attached patches to the
> wireless-testing tree and see if your sluggishness is fixed?

Just to document it here:

3.1 still has the problem, if you have something ready ported to 3.1
from the Realsil driver version, let me know and I'll give it a run.

Thanks.

--
Regards/Gruss,
Boris
0 new messages