gcc 4.5.1 / as 2.20.51.0.11 miscompiling drivers/char/i8k.c ?

492 views
Skip to first unread message

Jim

unread,
Nov 6, 2010, 7:30:02 AM11/6/10
to

After upgrading my Dell laptop, both OS+kernel the i8k interface was giving
nonsensical output. As it turned out it's not the kernel but compiler
upgrade which broke this.

Guys at Archlinux have found the underlying cause (but don't seem to have
submitted a patch yet):
https://bbs.archlinux.org/viewtopic.php?pid=780692#p780692
gcc seems to optimize the assembly statements away.

And indeed, applying this patch makes the i8k interface work again,
i.e. replacing the asm(..) construct by asm volatile(..)

--- i8k.c.ORIG 2010-08-02 17:20:46.000000000 +0200
+++ i8k.c 2010-10-30 13:03:07.000000000 +0200
@@ -119,7 +119,7 @@
int eax = regs->eax;

#if defined(CONFIG_X86_64)
- asm("pushq %%rax\n\t"
+ asm volatile("pushq %%rax\n\t"
"movl 0(%%rax),%%edx\n\t"
"pushq %%rdx\n\t"
"movl 4(%%rax),%%ebx\n\t"
@@ -145,7 +145,7 @@
: "a"(regs)
: "%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");
#else
- asm("pushl %%eax\n\t"
+ asm volatile("pushl %%eax\n\t"
"movl 0(%%eax),%%edx\n\t"
"push %%edx\n\t"
"movl 4(%%eax),%%ebx\n\t"


gcc version:
Reading specs from /usr/lib/gcc/i486-slackware-linux/4.5.1/specs
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i486-slackware-linux/4.5.1/lto-wrapper
Target: i486-slackware-linux
Configured with: ../gcc-4.5.1/configure --prefix=/usr --libdir=/usr/lib
--mandir=/usr/man --infodir=/usr/info --enable-shared --enable-bootstrap
--enable-languages=ada,c,c++,fortran,java,objc --enable-threads=posix
--enable-checking=release --with-system-zlib
--with-python-dir=/lib/python2.6/site-packages
--disable-libunwind-exceptions --enable-__cxa_atexit --enable-libssp
--with-gnu-ld --verbose --with-arch=i486 --target=i486-slackware-linux
--build=i486-slackware-linux --host=i486-slackware-linux
Thread model: posix
gcc version 4.5.1 (GCC)

as version:
GNU assembler (Linux/GNU Binutils) 2.20.51.0.11.20100810
Copyright 2010 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `i486-slackware-linux'.

I would be happy to submit a proper patch if I new this is actually the
correct fix, also there might be other places in the kernel source
suffering from this ??

Thanks,
Jim Bos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Andi Kleen

unread,
Nov 7, 2010, 4:40:03 PM11/7/10
to
Jim <jim...@xs4all.nl> writes:

> After upgrading my Dell laptop, both OS+kernel the i8k interface was giving
> nonsensical output. As it turned out it's not the kernel but compiler
> upgrade which broke this.
>
> Guys at Archlinux have found the underlying cause (but don't seem to have
> submitted a patch yet):
> https://bbs.archlinux.org/viewtopic.php?pid=780692#p780692
> gcc seems to optimize the assembly statements away.
>
> And indeed, applying this patch makes the i8k interface work again,
> i.e. replacing the asm(..) construct by asm volatile(..)

The compiler really should not optimize the asm away, because
it has both input and output arguments which are later used.
"asm volatile" normally just means "don't move significantly"

I tested it with gcc version 4.5.0 20100604 [gcc-4_5-branch revision
160292] (SUSE Linux)
and the asm statement is there for both 32bit and 64bit
(with an allmodconfig, with both -O2 and -Os)

If gcc 4.5.1 broke that over 4.5.0 you should really file a bug report
for the compiler, it seems like a serious regression in 4.5.1

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Andreas Schwab

unread,
Nov 7, 2010, 5:50:02 PM11/7/10
to
Andi Kleen <an...@firstfloor.org> writes:

> Jim <jim...@xs4all.nl> writes:
>
>> After upgrading my Dell laptop, both OS+kernel the i8k interface was giving
>> nonsensical output. As it turned out it's not the kernel but compiler
>> upgrade which broke this.
>>
>> Guys at Archlinux have found the underlying cause (but don't seem to have
>> submitted a patch yet):
>> https://bbs.archlinux.org/viewtopic.php?pid=780692#p780692
>> gcc seems to optimize the assembly statements away.
>>
>> And indeed, applying this patch makes the i8k interface work again,
>> i.e. replacing the asm(..) construct by asm volatile(..)
>
> The compiler really should not optimize the asm away, because
> it has both input and output arguments which are later used.
> "asm volatile" normally just means "don't move significantly"

The asm fails to mention that it modifies *regs.

Andreas.

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Andi Kleen

unread,
Nov 7, 2010, 6:10:02 PM11/7/10
to
Andreas Schwab <sch...@linux-m68k.org> writes:
>
> The asm fails to mention that it modifies *regs.

It has a memory clobber, that should be enough, no?

Besides in any case it cannot be eliminated because it has
valid non dead inputs and outputs.

-Andi
--
a...@linux.intel.com -- Speaking for myself only.

Richard Guenther

unread,
Nov 8, 2010, 5:50:02 AM11/8/10
to
On Mon, Nov 8, 2010 at 12:03 AM, Andi Kleen <an...@firstfloor.org> wrote:
> Andreas Schwab <sch...@linux-m68k.org> writes:
>>
>> The asm fails to mention that it modifies *regs.
>
> It has a memory clobber, that should be enough, no?

No. A memory clobber does not cover automatic storage.

Btw, I can't see a testcase anywhere so I just assume Andreas got
it right as usual.

Richard.

Richard Guenther

unread,
Nov 8, 2010, 6:30:02 AM11/8/10
to
On Mon, Nov 8, 2010 at 12:20 PM, Andi Kleen <an...@firstfloor.org> wrote:
> Richard Guenther <richard....@gmail.com> writes:
>
>> On Mon, Nov 8, 2010 at 12:03 AM, Andi Kleen <an...@firstfloor.org> wrote:
>>> Andreas Schwab <sch...@linux-m68k.org> writes:
>>>>
>>>> The asm fails to mention that it modifies *regs.
>>>
>>> It has a memory clobber, that should be enough, no?
>>
>> No.  A memory clobber does not cover automatic storage.
>
> That's a separate problem.

>
>> Btw, I can't see a testcase anywhere so I just assume Andreas got
>> it right as usual.
>
> An asm with live inputs and outputs should never be optimized
> way. If 4.5.1 started doing that it's seriously broken.

Please provide a testcase, such asms can be optimized if the
outputs are dead.

Richard.

Andi Kleen

unread,
Nov 8, 2010, 6:30:03 AM11/8/10
to
Richard Guenther <richard....@gmail.com> writes:

> On Mon, Nov 8, 2010 at 12:03 AM, Andi Kleen <an...@firstfloor.org> wrote:
>> Andreas Schwab <sch...@linux-m68k.org> writes:
>>>
>>> The asm fails to mention that it modifies *regs.
>>
>> It has a memory clobber, that should be enough, no?
>
> No. A memory clobber does not cover automatic storage.

That's a separate problem.

> Btw, I can't see a testcase anywhere so I just assume Andreas got
> it right as usual.

An asm with live inputs and outputs should never be optimized


way. If 4.5.1 started doing that it's seriously broken.

-Andi

Paul Koning

unread,
Nov 8, 2010, 6:50:02 AM11/8/10
to

On Nov 8, 2010, at 6:20 AM, Richard Guenther wrote:

> On Mon, Nov 8, 2010 at 12:20 PM, Andi Kleen <an...@firstfloor.org> wrote:
>> Richard Guenther <richard....@gmail.com> writes:
>>
>>> On Mon, Nov 8, 2010 at 12:03 AM, Andi Kleen <an...@firstfloor.org> wrote:
>>>> Andreas Schwab <sch...@linux-m68k.org> writes:
>>>>>
>>>>> The asm fails to mention that it modifies *regs.
>>>>
>>>> It has a memory clobber, that should be enough, no?
>>>
>>> No. A memory clobber does not cover automatic storage.
>>
>> That's a separate problem.
>>
>>> Btw, I can't see a testcase anywhere so I just assume Andreas got
>>> it right as usual.
>>
>> An asm with live inputs and outputs should never be optimized
>> way. If 4.5.1 started doing that it's seriously broken.
>
> Please provide a testcase, such asms can be optimized if the
> outputs are dead.

I don't know about 4.5, but I noticed that with 4.6 (trunk), testcasese like gcc.c-torture/compile/20000804-1.c optimize away the asm and all the operand generation except for -O0.

paul

Michael Matz

unread,
Nov 8, 2010, 7:30:01 AM11/8/10
to
Hi,

On Mon, 8 Nov 2010, Andi Kleen wrote:

> Richard Guenther <richard....@gmail.com> writes:
>
> > On Mon, Nov 8, 2010 at 12:03 AM, Andi Kleen <an...@firstfloor.org> wrote:
> >> Andreas Schwab <sch...@linux-m68k.org> writes:
> >>>
> >>> The asm fails to mention that it modifies *regs.
> >>
> >> It has a memory clobber, that should be enough, no?
> >
> > No. A memory clobber does not cover automatic storage.
>
> That's a separate problem.
>
> > Btw, I can't see a testcase anywhere so I just assume Andreas got
> > it right as usual.
>
> An asm with live inputs and outputs should never be optimized
> way. If 4.5.1 started doing that it's seriously broken.

You know the drill: testcase -> gcc.gnu.org/bugzilla/

(In particular up to now it's only speculation in some forum that the asm
really is optimized away, which I agree would be a bug, or if it isn't
merely that regs->eax isn't reloaded after the asm(), which would be
caused by the problem Andreas mentioned)


Ciao,
Michael.

Jakub Jelinek

unread,
Nov 8, 2010, 8:00:01 AM11/8/10
to
On Mon, Nov 08, 2010 at 06:47:59AM -0500, Paul Koning wrote:
> I don't know about 4.5, but I noticed that with 4.6 (trunk), testcasese
> like gcc.c-torture/compile/20000804-1.c optimize away the asm and all the
> operand generation except for -O0.

That's fine, the asm isn't volatile and the output is not used.

Jakub

Dave Korn

unread,
Nov 8, 2010, 1:20:02 PM11/8/10
to
On 08/11/2010 11:20, Andi Kleen wrote:

> An asm with live inputs and outputs should never be optimized
> way. If 4.5.1 started doing that it's seriously broken.

I don't see that. Consider:

void foo (void)
{
int x, y, z;
x = 23;
y = x + 1;
z = y + 1;
}

So far, you'd agree the compiler may optimise the entire function away? So
why not this:

void foo (void)
{
int x, y, z;
x = 23;
asm ("do something" : "=r" (y) : "r" (x) );
z = y + 1;
}

?

cheers,
DaveK

Michael Matz

unread,
Nov 9, 2010, 8:10:02 AM11/9/10
to
Hi,

On Mon, 8 Nov 2010, Dave Korn wrote:

> void foo (void)
> {
> int x, y, z;
> x = 23;
> asm ("do something" : "=r" (y) : "r" (x) );
> z = y + 1;
> }

The case in i8k.c really is different. It does use the value by
influencing the return value and the callers use the returned value in
conditionals and the like. It really, really _is_ used :-) and if GCC
removes the asm (which up to now is only speculation) then it's a GCC bug.

The code outlines like so:

int i8k_smm (regs) {
int rc;
asm (... : "=r"(rc) ...);
if (rc != 0 || ...)
return -EINVAL;
return 0;
}

...
struct regs regs = {.eax = ...}
return i8k_smm(regs) ?: regs.eax;
...

My speculation is, that the asm is not removed but rather that regs.eax
isn't reloaded after the asm because the memory clobber doesn't clobber
automatic variables.


Ciao,
Michael.

Andi Kleen

unread,
Nov 9, 2010, 8:50:01 AM11/9/10
to
> My speculation is, that the asm is not removed but rather that regs.eax
> isn't reloaded after the asm because the memory clobber doesn't clobber
> automatic variables.

Yes that makes sense. I wasn't able to verify it so far though.

Maybe the original poster could try the obvious patch
instead of the volatile change.


i8k: tell gcc that regs gets clobbered

Signed-off-by: Andi Kleen <a...@linux.intel.com>

diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
index 3bc0eef..f3bbf73 100644
--- a/drivers/char/i8k.c
+++ b/drivers/char/i8k.c
@@ -142,7 +142,7 @@ static int i8k_smm(struct smm_regs *regs)
"lahf\n\t"
"shrl $8,%%eax\n\t"
"andl $1,%%eax\n"
- :"=a"(rc)
+ :"=a"(rc), "=m" (*regs)


: "a"(regs)
: "%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");
#else

@@ -167,7 +167,7 @@ static int i8k_smm(struct smm_regs *regs)
"movl %%edx,0(%%eax)\n\t"
"lahf\n\t"
"shrl $8,%%eax\n\t"
- "andl $1,%%eax\n":"=a"(rc)
+ "andl $1,%%eax\n":"=a"(rc), "=m" (*regs)


: "a"(regs)
: "%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");

#endif

-Andi

Andreas Schwab

unread,
Nov 9, 2010, 9:00:02 AM11/9/10
to
Andi Kleen <an...@firstfloor.org> writes:

> @@ -142,7 +142,7 @@ static int i8k_smm(struct smm_regs *regs)
> "lahf\n\t"
> "shrl $8,%%eax\n\t"
> "andl $1,%%eax\n"
> - :"=a"(rc)
> + :"=a"(rc), "=m" (*regs)

I think this should be "+m".

Andreas.

--
Andreas Schwab, sch...@redhat.com
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E


"And now for something completely different."

Jim

unread,
Nov 9, 2010, 11:50:04 AM11/9/10
to
On 11/09/2010 02:57 PM, Andreas Schwab wrote:
> Andi Kleen <an...@firstfloor.org> writes:
>
>> @@ -142,7 +142,7 @@ static int i8k_smm(struct smm_regs *regs)
>> "lahf\n\t"
>> "shrl $8,%%eax\n\t"
>> "andl $1,%%eax\n"
>> - :"=a"(rc)
>> + :"=a"(rc), "=m" (*regs)
>
> I think this should be "+m".
>
> Andreas.
>

Just tested Andi's patch with Andreas' suggestion to make it +m,
i.e. like attached and can confirm it solves the issue.

Thanks guys,
Jim Bos


PATCH.i8k.c

Jim Bos

unread,
Nov 13, 2010, 6:20:02 AM11/13/10
to

More recent GCC caused the i8k driver to stop working, on Slackware
compiler was upgraded from gcc-4.4.4 to gcc-4.5.1 after which it
didn't work anymore, meaning the driver didn't load or gave total
nonsensical output.
As it turned out the asm(..) statement forgot to mention it modifies
the *regs variable.
Credits to Andi Kleen <an...@firstfloor.org> and Andreas Schwab
<sch...@linux-m68k.org> for providing the fix.

Signed-off-by: Jim Bos <jim...@xs4all.nl>

PATCH.i8k.c

James Cloos

unread,
Nov 14, 2010, 8:30:01 PM11/14/10
to
Gcc 4.5.1 running on an amd64 box "cross"-compiling for a P3 i8k fails
to compile the module since commit 6b4e81db2552bad04100e7d5ddeed7e848f53b48
with:

CC drivers/char/i8k.o
drivers/char/i8k.c: In function ‘i8k_smm’:
drivers/char/i8k.c:149:2: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
drivers/char/i8k.c:149:2: error: ‘asm’ operand has impossible constraints

-JimC
--
James Cloos <cl...@jhcloos.com> OpenPGP: 1024D/ED7DAEA6

Linus Torvalds

unread,
Nov 14, 2010, 10:30:01 PM11/14/10
to
On Sun, Nov 14, 2010 at 4:52 PM, James Cloos <cl...@jhcloos.com> wrote:
> Gcc 4.5.1 running on an amd64 box "cross"-compiling for a P3 i8k fails
> to compile the module since commit 6b4e81db2552bad04100e7d5ddeed7e848f53b48
> with:
>
>  CC      drivers/char/i8k.o
> drivers/char/i8k.c: In function ‘i8k_smm’:
> drivers/char/i8k.c:149:2: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
> drivers/char/i8k.c:149:2: error: ‘asm’ operand has impossible constraints

At this point, I think this falls clearly under "unresolvable gcc bug".

Quite frankly, I think gcc was buggy to begin with: since we had a
memory clobber, the "+m" (*regs) should not have mattered. The fact
that "*regs" may be some local variable doesn't make any difference
what-so-ever, since we took the address of the variable. So the memory
clobber _clearly_ can change that variable.

So when Richard Gunther says "a memory clobber doesn't cover automatic
storage", to me that very clearly spells "gcc is buggy as hell".
Because automatic storage with its address taken _very_ much gets
clobbered by things like memset etc. If the compiler doesn't
understand that, the compiler is just broken.

And now, if even the (superfluous) "+m" isn't working, it sounds like
we have no sane options left. Except to say that gcc-4.5.1 is totally
broken wrt asms.

Can we just get gcc to realize that when you pass the address of
automatic storage to an asm, that means that "memory" really does
clobber it? Because clearly that is the case.

Linus

Jakub Jelinek

unread,
Nov 15, 2010, 4:00:02 AM11/15/10
to
On Sun, Nov 14, 2010 at 07:21:50PM -0800, Linus Torvalds wrote:
> So when Richard Gunther says "a memory clobber doesn't cover automatic
> storage", to me that very clearly spells "gcc is buggy as hell".
> Because automatic storage with its address taken _very_ much gets
> clobbered by things like memset etc. If the compiler doesn't
> understand that, the compiler is just broken.

I'll leave the discussion about meaning of "memory" clobber aside to
Richard,

> And now, if even the (superfluous) "+m" isn't working, it sounds like
> we have no sane options left. Except to say that gcc-4.5.1 is totally

just to say that of course there are sane options left.

:"=a"(rc), "+m" (*regs)


: "a"(regs)
: "%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");

is simply too high register pressure for i386 if you force also
-fno-omit-frame-pointer, there is not a single register left.

Yes, reload should figure out it has address of regs already tied to %eax,
unfortunately starting with IRA it doesn't (I'll file a GCC bug about that;
so that leaves 4.4/4.5/4.6 currently not being able to compile it).

That said, changing the inline asm to just clobber one less register
would be completely sufficient to make it work well with all gccs out there,
just push/pop one of the register around the whole body. I doubt calling
out SMM BIOS is actually so performance critical that one push and one pop
would ruin it. Of course x86_64 version can stay as is, there are enough
registers left...

Jakub

Andi Kleen

unread,
Nov 15, 2010, 4:20:02 AM11/15/10
to
> That said, changing the inline asm to just clobber one less register
> would be completely sufficient to make it work well with all gccs out there,
> just push/pop one of the register around the whole body. I doubt calling
> out SMM BIOS is actually so performance critical that one push and one pop
> would ruin it. Of course x86_64 version can stay as is, there are enough
> registers left...

Yes traditionally clobbering all registers has been dangerous
and it clearly can be done inside the asm too.

Here's a untested patch to do some manual push/pops too. Could someone with
the hardware please test it? (running a 32bit kernel)

-Andi

---

i8k: Clobber less registers

gcc doesn't like inline assembler statements that clobber nearly
all registers. Save a few registers manually on i386 to avoid this
problem.

Fix suggested by Jakub Jelinek

Signed-off-by: Andi Kleen <a...@linux.intel.com>

diff --git a/drivers/char/i8k.c b/drivers/char/i8k.c
index f0863be..a2da38b 100644
--- a/drivers/char/i8k.c
+++ b/drivers/char/i8k.c
@@ -146,7 +146,10 @@ static int i8k_smm(struct smm_regs *regs)


: "a"(regs)
: "%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");

#else
- asm("pushl %%eax\n\t"

+ asm("pushl %%ebx\n\t"
+ "pushl %%ecx\n\t"
+ "pushl %%edx\n\t"
+ "pushl %%eax\n\t"


"movl 0(%%eax),%%edx\n\t"
"push %%edx\n\t"
"movl 4(%%eax),%%ebx\n\t"

@@ -167,10 +170,13 @@ static int i8k_smm(struct smm_regs *regs)
"movl %%edx,0(%%eax)\n\t"


"lahf\n\t"
"shrl $8,%%eax\n\t"

- "andl $1,%%eax\n"
+ "andl $1,%%eax\n\t"
+ "popl %%edx\n\t"
+ "popl %%ecx\n\t"
+ "popl %%ebx\n"


:"=a"(rc), "+m" (*regs)
: "a"(regs)

- : "%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");
+ : "%esi", "%edi", "memory");
#endif
if (rc != 0 || (regs->eax & 0xffff) == 0xffff || regs->eax == eax)
return -EINVAL;


--
a...@linux.intel.com -- Speaking for myself only.

Jakub Jelinek

unread,
Nov 15, 2010, 4:30:02 AM11/15/10
to
On Mon, Nov 15, 2010 at 09:56:05AM +0100, Jakub Jelinek wrote:
> Yes, reload should figure out it has address of regs already tied to %eax,
> unfortunately starting with IRA it doesn't (I'll file a GCC bug about that;

http://gcc.gnu.org/PR46479

Jakub Jelinek

unread,
Nov 15, 2010, 5:10:02 AM11/15/10
to
On Mon, Nov 15, 2010 at 09:56:05AM +0100, Jakub Jelinek wrote:
> On Sun, Nov 14, 2010 at 07:21:50PM -0800, Linus Torvalds wrote:
> > So when Richard Gunther says "a memory clobber doesn't cover automatic
> > storage", to me that very clearly spells "gcc is buggy as hell".
> > Because automatic storage with its address taken _very_ much gets
> > clobbered by things like memset etc. If the compiler doesn't
> > understand that, the compiler is just broken.
>
> I'll leave the discussion about meaning of "memory" clobber aside to
> Richard,

And for this the starting point should be what has been requested,
i.e. preprocessed source + gcc options + gcc version and some hints what
actually misbehaves (with the , "+m" (*regs) change reverted)
in gcc bugzilla. Only with that we can actually look at what has been
happening, see whether it is the tree optimizations or RTL and which one
makes a difference.
If I've missed a PR about this I apologize.

Richard Guenther

unread,
Nov 15, 2010, 5:30:02 AM11/15/10
to
On Mon, Nov 15, 2010 at 9:56 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Sun, Nov 14, 2010 at 07:21:50PM -0800, Linus Torvalds wrote:
>> So when Richard Gunther says "a memory clobber doesn't cover automatic
>> storage", to me that very clearly spells "gcc is buggy as hell".
>> Because automatic storage with its address taken _very_ much gets
>> clobbered by things like memset etc. If the compiler doesn't
>> understand that, the compiler is just broken.
>
> I'll leave the discussion about meaning of "memory" clobber aside to
> Richard,

Of course GCC handles memset just fine. Note that I was refering
to non-address taken automatic storage for "memory" (even though
when double-checking the current implementation GCC even thinks
that all address-taken memory is clobbered by asms as soon as
they have at least one memory operand or a "memory" clobber).

It's just that in future we might want to improve this and I think
not covering non-address taken automatic storage for "memory"
is sensible. And I see that you don't see address-taken automatic
storage as a sensible choice to exclude from "memory", and I
have noted that.

Btw, I still haven't seen an testcase for the actual problem we are
talking about.

Richard.

Andi Kleen

unread,
Nov 15, 2010, 6:00:02 AM11/15/10
to
> And for this the starting point should be what has been requested,
> i.e. preprocessed source + gcc options + gcc version and some hints what
> actually misbehaves (with the , "+m" (*regs) change reverted)
> in gcc bugzilla. Only with that we can actually look at what has been
> happening, see whether it is the tree optimizations or RTL and which one
> makes a difference.
> If I've missed a PR about this I apologize.

I tried to file one, but I can't reproduce it currently
(I don't have hardware, so have to rely on code reading and the 32bit
code looks correct to me even without the additional +m)

The preprocessed source is at
http://halobates.de/tmp/i8k.i

Options I used:

-D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=pentium3 -mtune=generic -maccumulate-outgoing-args -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=2048 -fno-stack-protector -fno-omit-frame-pointer -fno-optimize-sibling-calls -pg -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack

-Andi
--
a...@linux.intel.com -- Speaking for myself only.

Jakub Jelinek

unread,
Nov 15, 2010, 6:20:01 AM11/15/10
to
On Mon, Nov 15, 2010 at 11:54:46AM +0100, Andi Kleen wrote:
> > And for this the starting point should be what has been requested,
> > i.e. preprocessed source + gcc options + gcc version and some hints what
> > actually misbehaves (with the , "+m" (*regs) change reverted)
> > in gcc bugzilla. Only with that we can actually look at what has been
> > happening, see whether it is the tree optimizations or RTL and which one
> > makes a difference.
> > If I've missed a PR about this I apologize.
>
> I tried to file one, but I can't reproduce it currently
> (I don't have hardware, so have to rely on code reading and the 32bit
> code looks correct to me even without the additional +m)
>
> The preprocessed source is at
> http://halobates.de/tmp/i8k.i
>
> Options I used:
>
> -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=pentium3 -mtune=generic -maccumulate-outgoing-args -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=2048 -fno-stack-protector -fno-omit-frame-pointer -fno-optimize-sibling-calls -pg -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack

Indeed, with this and 4.5.2 20101111 (prerelease) from SVN as well as
gcc-4.5.1-5.fc14:

...
movl %eax, -16(%ebp) # regs, %sfp
movl (%eax), %eax # regs_2(D)->eax,
movl %eax, -20(%ebp) #, %sfp
movl -16(%ebp), %eax # %sfp,
#APP
# 149 "/home/lsrc/git/linux-work2/drivers/char/i8k.c" 1
...
#NO_APP
testl %eax, %eax #
movl $-22, %edx #, D.18378
movl %eax, -24(%ebp) #, %sfp
je .L7 #,
.L2:
movl -12(%ebp), %ebx #,
movl %edx, %eax # D.18378,
movl -8(%ebp), %esi #,
movl -4(%ebp), %edi #,
movl %ebp, %esp #,
popl %ebp #
ret
.p2align 4,,7
.p2align 3
.L7:
movl -16(%ebp), %eax # %sfp,
movl (%eax), %ecx # regs_2(D)->eax, D.18371
cmpw $-1, %cx #, D.18371
je .L2 #,
cmpl %ecx, -20(%ebp) # D.18371, %sfp
cmovne -24(%ebp), %edx # %sfp,, D.18378
jmp .L2 #
.size i8k_smm, .-i8k_smm

I don't see any problems on the assembly level. i8k_smm is
not inlined in this case and checks all 3 conditions.

Guess we need somebody who actually reported the problem, state what
gcc was actually used and post preprocessed source, gcc options
from his case.

Jakub

Andi Kleen

unread,
Nov 15, 2010, 6:40:03 AM11/15/10
to
> Guess we need somebody who actually reported the problem, state what
> gcc was actually used and post preprocessed source, gcc options
> from his case.

Jim Bos,
Can you please supply that?

Please use

rm drivers/char/i8k.o
make V=1 drivers/char/i8k.o
make drivers/char/i8k.i

and supply the .i file and the output of the first make line

Thanks,
-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Linus Torvalds

unread,
Nov 15, 2010, 11:10:02 AM11/15/10
to
On Mon, Nov 15, 2010 at 3:16 AM, Jakub Jelinek <ja...@redhat.com> wrote:
>
> I don't see any problems on the assembly level.  i8k_smm is
> not inlined in this case and checks all 3 conditions.

If it really is related to gcc not understanding that "*regs" has
changed due to the memory being an automatic variable, and passing in
"regs" itself as a pointer to that automatic variable together with
the "memory" clobber not being sufficient, than I think it's the lack
of inlining that will automatically hide the bug.

(Side note: and I think this does show how much of a gcc bug it is not
to consider "memory" together with passing in a pointer to an asm to
always be a clobber).

Because if it isn't inlined, then "regs" will be seen a a real pointer
to some external memory (the caller) rather than being optimized to
just be the auto structure on the stack. Because *mem is auto only
within the context of the caller.

Which actually points to a possible simpler:
- remove the "+m" since it adds too much register pressure
- mark the i8k_smm() as "noinline" instead.

Quite frankly, I'd hate to add even more crud to that inline asm (to
save/restore the registers manually). It's already not the prettiest
thing around.

So does the attached patch work for everybody?

Linus

patch.diff

Jim Bos

unread,
Nov 15, 2010, 12:50:04 PM11/15/10
to

Hmm, that doesn't work.

[ Not sure if you read to whole thread but initial workaround was to
change the asm(..) to asm volatile(..) which did work. ]

Jim.

Jakub Jelinek

unread,
Nov 15, 2010, 12:50:02 PM11/15/10
to
On Mon, Nov 15, 2010 at 06:36:06PM +0100, Jim Bos wrote:
> On 11/15/2010 12:37 PM, Andi Kleen wrote:
> See attached, note this is the vanilla 2.6.36 i8k.c (without any patch).
> And to be 100% sure, if I build this (make drivers/char/i8k.ko) it won't
> work.
>
> [ The i8k.i is rather big, even gzipped 80k, not sure if it'll bounce ]

Please also say which exact gcc you are using.

Note, I've compiled it with current 4.5 branch and made the function
always_inline and still didn't see any issues in the *.optimized dump,
regs.eax after the inline asm has always been compared to the constant
that has been stored into regs.eax before the inline asm.

Jakub

Linus Torvalds

unread,
Nov 15, 2010, 1:20:02 PM11/15/10
to
On Mon, Nov 15, 2010 at 9:40 AM, Jim Bos <jim...@xs4all.nl> wrote:
>
> Hmm, that doesn't work.
>
> [ Not sure if you read to whole thread but initial workaround was to
> change the asm(..) to asm volatile(..) which did work. ]

Since I have a different gcc than yours (and I'm not going to compile
my own), have you posted your broken .s file anywhere? In fact, with
the noinline (and the removal of the "+m" thing - iow just the patch
you tried), what does just the "i8k_smm" function assembly look like
for you after you've done a "make drivers/char/i8k.s"?

If the asm just doesn't exist AT ALL, that's just odd. Because every
single call-site of i8k_smm() clearly looks at the return value. So
the volatile really shouldn't make any difference from that
standpoint. Odd.

Linus

Jim Bos

unread,
Nov 15, 2010, 1:20:02 PM11/15/10
to

# gcc -v
Reading specs from /usr/lib/gcc/i486-slackware-linux/4.5.1/specs
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i486-slackware-linux/4.5.1/lto-wrapper
Target: i486-slackware-linux
Configured with: ../gcc-4.5.1/configure --prefix=/usr --libdir=/usr/lib
--mandir=/usr/man --infodir=/usr/info --enable-shared --enable-bootstrap
--enable-languages=ada,c,c++,fortran,java,objc --enable-threads=posix
--enable-checking=release --with-system-zlib
--with-python-dir=/lib/python2.6/site-packages
--disable-libunwind-exceptions --enable-__cxa_atexit --enable-libssp
--with-gnu-ld --verbose --with-arch=i486 --target=i486-slackware-linux
--build=i486-slackware-linux --host=i486-slackware-linux
Thread model: posix
gcc version 4.5.1 (GCC)

I'm re-reading this thread where I found the asm-> asm volatine suggestion:
https://bbs.archlinux.org/viewtopic.php?pid=752099#p752099
but nobody there reported their gcc version (but apparently first
people started complaining May 1st).

_
Jim

Jakub Jelinek

unread,
Nov 15, 2010, 1:30:02 PM11/15/10
to
On Mon, Nov 15, 2010 at 07:17:31PM +0100, Jim Bos wrote:
> # gcc -v
> Reading specs from /usr/lib/gcc/i486-slackware-linux/4.5.1/specs
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i486-slackware-linux/4.5.1/lto-wrapper
> Target: i486-slackware-linux
> Configured with: ../gcc-4.5.1/configure --prefix=/usr --libdir=/usr/lib
> --mandir=/usr/man --infodir=/usr/info --enable-shared --enable-bootstrap
> --enable-languages=ada,c,c++,fortran,java,objc --enable-threads=posix
> --enable-checking=release --with-system-zlib
> --with-python-dir=/lib/python2.6/site-packages
> --disable-libunwind-exceptions --enable-__cxa_atexit --enable-libssp
> --with-gnu-ld --verbose --with-arch=i486 --target=i486-slackware-linux
> --build=i486-slackware-linux --host=i486-slackware-linux
> Thread model: posix
> gcc version 4.5.1 (GCC)

Does it have any patches applied? The gcc options look the same as what
I've been already trying earlier.
Thus, can you run gcc with those options on i8k.i and add -fverbose-asm
to make it easier to read and post i8k.s you get?

Jim Bos

unread,
Nov 15, 2010, 1:40:03 PM11/15/10
to
On 11/15/2010 07:30 PM, Jim Bos wrote:

> On 11/15/2010 07:08 PM, Linus Torvalds wrote:
>> On Mon, Nov 15, 2010 at 9:40 AM, Jim Bos <jim...@xs4all.nl> wrote:
>>>
>>> Hmm, that doesn't work.
>>>
>>> [ Not sure if you read to whole thread but initial workaround was to
>>> change the asm(..) to asm volatile(..) which did work. ]
>>
>> Since I have a different gcc than yours (and I'm not going to compile
>> my own), have you posted your broken .s file anywhere? In fact, with
>> the noinline (and the removal of the "+m" thing - iow just the patch
>> you tried), what does just the "i8k_smm" function assembly look like
>> for you after you've done a "make drivers/char/i8k.s"?
>>
>> If the asm just doesn't exist AT ALL, that's just odd. Because every
>> single call-site of i8k_smm() clearly looks at the return value. So
>> the volatile really shouldn't make any difference from that
>> standpoint. Odd.
>>
>> Linus
>>
>
> Attached version with plain 2.6.36 source and version with the committed
> patch, i.e with the '"+m" (*regs)'
>
>
> _
> Jim
>
>

And I just tried with your noninline patch which results in exactly the
same .s file as with plain 2.6.36 source, i.e. the noninline patch is
not doing anything here.

_
Jim

Jeff Law

unread,
Nov 15, 2010, 1:50:02 PM11/15/10