Oops in 2.6.19.1

Alistair John Strachan

unread,

Dec 20, 2006, 9:48:46 AM12/20/06

to LKML

Hi,

Any ideas?

BUG: unable to handle kernel NULL pointer dereference at virtual address
00000009
printing eip:
c0156f60
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat
xt_state iptable_filter ip_tables x_tables prism54 yenta_socket
rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm
snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore ehci_hcd
usblp eth1394 uhci_hcd usbcore ohci1394 ieee1394 via_agp agpgart vt1211
hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack
CPU: 0
EIP: 0060:[<c0156f60>] Not tainted VLI
EFLAGS: 00010246 (2.6.19.1 #1)
EIP is at pipe_poll+0xa0/0xb0
eax: 00000008 ebx: 00000000 ecx: 00000008 edx: 00000000
esi: f70f3e9c edi: f7017c00 ebp: f70f3c1c esp: f70f3c0c
ds: 007b es: 007b ss: 0068
Process python (pid: 4178, ti=f70f2000 task=f70c4a90 task.ti=f70f2000)
Stack: 00000000 00000000 f70f3e9c f6e111c0 f70f3fa4 c015d7f3 f70f3c54 f70f3fac
084c44a0 00000030 084c44d0 00000000 f70f3e94 f70f3e94 00000006 f70f3ecc
00000000 f70f3e94 c015e580 00000000 00000000 00000006 f6e111c0 00000000
Call Trace:
[<c015d7f3>] do_sys_poll+0x253/0x480
[<c015da53>] sys_poll+0x33/0x50
[<c0102c97>] syscall_call+0x7/0xb
[<b7f6b402>] 0xb7f6b402
=======================
Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8
8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f
45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Greg KH

unread,

Dec 20, 2006, 11:31:34 AM12/20/06

to Alistair John Strachan

On Wed, Dec 20, 2006 at 02:21:03PM +0000, Alistair John Strachan wrote:
> Hi,
>
> Any ideas?

Does the problem also happen in 2.6.19?

thanks,

greg k-h

Alistair John Strachan

unread,

Dec 20, 2006, 11:45:19 AM12/20/06

to Greg KH

On Wednesday 20 December 2006 16:30, Greg KH wrote:
> On Wed, Dec 20, 2006 at 02:21:03PM +0000, Alistair John Strachan wrote:
> > Hi,
> >
> > Any ideas?
>
> Does the problem also happen in 2.6.19?

No idea. I ran 2.6.19 for a couple of weeks without problems. It took 2 days
to oops 2.6.19.1, so if it happens again within that time period I guess that
might be indicative of a -stable patch.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Chuck Ebbert

unread,

Dec 20, 2006, 3:52:52 PM12/20/06

to Alistair John Strachan

In-Reply-To: <200612201421....@sms.ed.ac.uk>

On Wed, 20 Dec 2006 14:21:03 +0000, Alistair John Strachan wrote:

> Any ideas?
>
> BUG: unable to handle kernel NULL pointer dereference at virtual address
> 00000009

83 ca 10 or $0x10,%edx
3b .byte 0x3b
87 68 01 xchg %ebp,0x1(%eax) <=====
00 00 add %al,(%eax)

Somehow it is trying to execute code in the middle of an instruction.
That almost never works, even when the resulting fragment is a legal
opcode. :)

The real instruction is:

3b 87 68 01 00 00 00 cmp 0x168(%edi),%eax

I'd guess you have some kind of hardware problem. It could also be
a kernel problem where the saved address was corrupted during an
interrupt, but that's not likely.
--
MBTI: IXTP

Alistair John Strachan

unread,

Dec 20, 2006, 5:39:39 PM12/20/06

to Chuck Ebbert

On Wednesday 20 December 2006 20:48, Chuck Ebbert wrote:
[snip]

> I'd guess you have some kind of hardware problem. It could also be
> a kernel problem where the saved address was corrupted during an
> interrupt, but that's not likely.

Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it
before now.

Maybe a cosmic ray event? ;-)

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Chuck Ebbert

unread,

Dec 21, 2006, 3:10:02 AM12/21/06

to Alistair John Strachan

In-Reply-To: <200612202215....@sms.ed.ac.uk>

On Wed, 20 Dec 2006 22:15:50 +0000, Alistair John Strachan wrote:

> > I'd guess you have some kind of hardware problem. It could also be
> > a kernel problem where the saved address was corrupted during an
> > interrupt, but that's not likely.
>
> Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it
> before now.
>
> Maybe a cosmic ray event? ;-)

The low byte of eip should be 5f and it changed to 60, so that's
probably not it. And the oops report is consistent with that being
the instruction that was really executed, so it's not the kernel
misreporting the address after it happened.

You weren't trying kprobes or something, were you? Have you ever
had another unexplained oops with this machine?

--
MBTI: IXTP

Alistair John Strachan

unread,

Dec 21, 2006, 9:23:30 AM12/21/06

to Chuck Ebbert

On Thursday 21 December 2006 08:05, Chuck Ebbert wrote:
> In-Reply-To: <200612202215....@sms.ed.ac.uk>
>
> On Wed, 20 Dec 2006 22:15:50 +0000, Alistair John Strachan wrote:
> > > I'd guess you have some kind of hardware problem. It could also be
> > > a kernel problem where the saved address was corrupted during an
> > > interrupt, but that's not likely.
> >
> > Seems pretty unlikely on a 4 year old Via Epia. Never had any problems
> > with it before now.
> >
> > Maybe a cosmic ray event? ;-)
>
> The low byte of eip should be 5f and it changed to 60, so that's
> probably not it. And the oops report is consistent with that being
> the instruction that was really executed, so it's not the kernel
> misreporting the address after it happened.
>
> You weren't trying kprobes or something, were you? Have you ever
> had another unexplained oops with this machine?

Nope, it's a stock kernel and it's running on a server, kprobes isn't in use.

And no, to my knowledge there's not been another "unexplained" oops. I've had
crashes, but they've always been known issues or BIOS trouble.

The machine was recently tampered with to install additional HDDs, but the
memory was memtest'ed when it was installed and passed several times without
issue. I'm rather puzzled.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Valdis.K...@vt.edu

unread,

Dec 21, 2006, 10:32:36 AM12/21/06

to Alistair John Strachan

On Wed, 20 Dec 2006 22:15:50 GMT, Alistair John Strachan said:
> Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it
> before now.
>
> Maybe a cosmic ray event? ;-)

More likely a stray alpha particle from a radioactive decay in the actual chip
casing - I saw some research a while back that said that the average commodity
system should *expect* to see 1 or 2 alpha-induced single-bit errors per year,
and the chance that *you* saw the event was directly related to whether the
memory had ECC, and how much of the other circuitry had ECC on it....

Alistair John Strachan

unread,

Dec 23, 2006, 10:49:50 AM12/23/06

to LKML

On Wednesday 20 December 2006 14:21, Alistair John Strachan wrote:
> Hi,
>
> Any ideas?

Pretty much like clockwork, it happened again. I think it's time to take this
seriously as a software bug, and not some hardware problem. I've ran kernels
since 2.6.0 on this machine without such crashes, and now two of the same in
2.6.19.1? Pretty unlikely!

BUG: unable to handle kernel NULL pointer dereference at virtual address
00000009
printing eip:
c0156f60
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat
xt_sta
te iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic
pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus
snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore

usblp ehci_hcd eth1394 uhci_hcd usbcore ohci1394 i

eee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat
ip_conntrack_ftp ip_conntrack
CPU: 0
EIP: 0060:[<c0156f60>] Not tainted VLI
EFLAGS: 00010246 (2.6.19.1 #1)
EIP is at pipe_poll+0xa0/0xb0
eax: 00000008 ebx: 00000000 ecx: 00000008 edx: 00000000

esi: ee1b9e9c edi: f4d80a00 ebp: ee1b9c1c esp: ee1b9c0c

ds: 007b es: 007b ss: 0068

Process java (pid: 5374, ti=ee1b8000 task=f7117560 task.ti=ee1b8000)
Stack: 00000000 00000000 ee1b9e9c f6c17160 ee1b9fa4 c015d7f3 ee1b9c54 ee1b9fac
082dff90 00000010 082dffa0 00000000 ee1b9e94 ee1b9e94 00000002 ee1b9eac
00000000 ee1b9e94 c015e580 00000000 00000000 00000002 f6c17160 00000000

Call Trace:
[<c015d7f3>] do_sys_poll+0x253/0x480
[<c015da53>] sys_poll+0x33/0x50
[<c0102c97>] syscall_call+0x7/0xb

[<b7f26402>] 0xb7f26402

=======================
Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8
8b 75
f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 ca
eb b6 8d b6 00 00 00 00 55 b8 01 00 00

EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c

Alistair John Strachan

unread,

Dec 24, 2006, 9:41:37 AM12/24/06

to Chuck Ebbert

On Sunday 24 December 2006 04:23, Chuck Ebbert wrote:
> In-Reply-To: <200612231540....@sms.ed.ac.uk>

>
> On Sat, 23 Dec 2006 15:40:46 +0000, Alistair John Strachan wrote:
> > Pretty much like clockwork, it happened again. I think it's time to take
> > this seriously as a software bug, and not some hardware problem. I've ran
> > kernels since 2.6.0 on this machine without such crashes, and now two of
> > the same in 2.6.19.1? Pretty unlikely!
>

> Stranger things have happened, e.g. your system might have started
> to overheat just recently.

True, I've considered it, I'll replace the CPU fan.

> Anyway, post your complete .config. And exactly which one of the
> many Via cpus are you using? Are you using the Padlock unit?

No, much older than that:

[alistair] 14:38 [~] cat /proc/cpuinfo
processor : 0
vendor_id : CentaurHauls
cpu family : 6
model : 9
model name : VIA Nehemiah
stepping : 1
cpu MHz : 999.569
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu de tsc msr cx8 mtrr pge cmov mmx fxsr sse fxsr_opt
bogomips : 2000.02

> What do those java/python programs do that are running? What pipe
> are they polling?
>
> You could try going back to 2.6.18.x for a while in the meantime.

Well, I have had a thought. I recently upgraded the toolchain on the machine
from binutils 2.16.x and GCC 3.4.3 (2.6.19 was built with this) to binutils
2.17 and GCC 4.1.1. It's conceivable that this is some sort of compiler bug.

Alistair John Strachan

unread,

Dec 24, 2006, 9:51:55 AM12/24/06

to Chuck Ebbert

On Sunday 24 December 2006 04:23, Chuck Ebbert wrote:

[snip]

> Anyway, post your complete .config.

Config attached.

config-2.6.19.1

Zhang, Yanmin

unread,

Dec 26, 2006, 9:07:57 PM12/26/06

to Alistair John Strachan

Above codes look weird. Could you disassemble kernel image and post
the part around address 0xc0156f60?

"87 68 01 00 00" is instruction xchg, but if I disassemble from the begining,
I couldn't see instruct xchg.

> EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c
>

Alistair John Strachan

unread,

Dec 27, 2006, 7:35:19 AM12/27/06

to Zhang, Yanmin

On Wednesday 27 December 2006 02:07, Zhang, Yanmin wrote:
[snip]

> > 00000000 Call Trace:
> > [<c015d7f3>] do_sys_poll+0x253/0x480
> > [<c015da53>] sys_poll+0x33/0x50
> > [<c0102c97>] syscall_call+0x7/0xb
> > [<b7f26402>] 0xb7f26402
> > =======================
> > Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4
> > 89 c8 8b 75
> > f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45
> > ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
>
> Above codes look weird. Could you disassemble kernel image and post
> the part around address 0xc0156f60?
>
> "87 68 01 00 00" is instruction xchg, but if I disassemble from the
> begining, I couldn't see instruct xchg.
>
> > EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c

Unfortunately, after suspecting the toolchain, I did a manual rebuild of
binutils, gcc and glibc from the official sites, and then rebuilt 2.6.19.1.
This might upset the decompile below, versus the original report.

Assuming it's NOT a bug in my distro's toolchain (because I am now running the
GNU stuff), it'll crash again, so this is still useful.

Here's a current decompilation of vmlinux/pipe_poll() from the running kernel,
the addresses have changed slightly. There's no xchg there either:

c0156ec0 <pipe_poll>:
c0156ec0: 55 push %ebp
c0156ec1: 89 e5 mov %esp,%ebp
c0156ec3: 83 ec 10 sub $0x10,%esp
c0156ec6: 89 5d f4 mov %ebx,0xfffffff4(%ebp)
c0156ec9: 85 d2 test %edx,%edx
c0156ecb: 89 d3 mov %edx,%ebx
c0156ecd: 89 75 f8 mov %esi,0xfffffff8(%ebp)
c0156ed0: 89 c6 mov %eax,%esi
c0156ed2: 89 7d fc mov %edi,0xfffffffc(%ebp)
c0156ed5: 8b 40 08 mov 0x8(%eax),%eax
c0156ed8: 8b 40 08 mov 0x8(%eax),%eax
c0156edb: 8b b8 f0 00 00 00 mov 0xf0(%eax),%edi
c0156ee1: 74 0c je c0156eef <pipe_poll+0x2f>
c0156ee3: 85 ff test %edi,%edi
c0156ee5: 74 08 je c0156eef <pipe_poll+0x2f>
c0156ee7: 89 d1 mov %edx,%ecx
c0156ee9: 89 f0 mov %esi,%eax
c0156eeb: 89 fa mov %edi,%edx
c0156eed: ff 13 call *(%ebx)
c0156eef: 0f b7 5e 1c movzwl 0x1c(%esi),%ebx
c0156ef3: 31 c9 xor %ecx,%ecx
c0156ef5: 8b 47 08 mov 0x8(%edi),%eax
c0156ef8: f6 c3 01 test $0x1,%bl
c0156efb: 89 45 f0 mov %eax,0xfffffff0(%ebp)
c0156efe: 74 20 je c0156f20 <pipe_poll+0x60>
c0156f00: 85 c0 test %eax,%eax
c0156f02: b8 41 00 00 00 mov $0x41,%eax
c0156f07: 0f 4f c8 cmovg %eax,%ecx
c0156f0a: 8b 87 5c 01 00 00 mov 0x15c(%edi),%eax
c0156f10: 85 c0 test %eax,%eax
c0156f12: 74 43 je c0156f57 <pipe_poll+0x97>
c0156f14: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
c0156f1a: 8d bf 00 00 00 00 lea 0x0(%edi),%edi
c0156f20: f6 c3 02 test $0x2,%bl
c0156f23: 74 23 je c0156f48 <pipe_poll+0x88>
c0156f25: 83 7d f0 0f cmpl $0xf,0xfffffff0(%ebp)
c0156f29: b8 04 01 00 00 mov $0x104,%eax
c0156f2e: ba 00 00 00 00 mov $0x0,%edx
c0156f33: 8b 9f 58 01 00 00 mov 0x158(%edi),%ebx
c0156f39: 0f 4f c2 cmovg %edx,%eax
c0156f3c: 09 c1 or %eax,%ecx
c0156f3e: 89 c8 mov %ecx,%eax
c0156f40: 83 c8 08 or $0x8,%eax
c0156f43: 85 db test %ebx,%ebx
c0156f45: 0f 44 c8 cmove %eax,%ecx
c0156f48: 8b 5d f4 mov 0xfffffff4(%ebp),%ebx
c0156f4b: 89 c8 mov %ecx,%eax
c0156f4d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
c0156f50: 8b 7d fc mov 0xfffffffc(%ebp),%edi
c0156f53: 89 ec mov %ebp,%esp
c0156f55: 5d pop %ebp
c0156f56: c3 ret
c0156f57: 89 ca mov %ecx,%edx
c0156f59: 8b 46 6c mov 0x6c(%esi),%eax
c0156f5c: 83 ca 10 or $0x10,%edx
c0156f5f: 3b 87 68 01 00 00 cmp 0x168(%edi),%eax
c0156f65: 0f 45 ca cmovne %edx,%ecx
c0156f68: eb b6 jmp c0156f20 <pipe_poll+0x60>
c0156f6a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Zhang, Yanmin

unread,

Dec 27, 2006, 9:41:59 PM12/27/06

to Alistair John Strachan

Could you reproduce the bug by the new kernel, so we could get the exact address
and instruction of the bug?

Alistair John Strachan

unread,

Dec 27, 2006, 11:02:49 PM12/27/06

to Zhang, Yanmin

On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
[snip]

> > Here's a current decompilation of vmlinux/pipe_poll() from the running
> > kernel, the addresses have changed slightly. There's no xchg there
> > either:
>
> Could you reproduce the bug by the new kernel, so we could get the exact
> address and instruction of the bug?

It crashed again, but this time with no output (machine locked solid). To be
honest, the disassembly looks right (it's like Chuck said, it's jumping back
half way through an instruction):

c0156f5f: 3b 87 68 01 00 00 cmp 0x168(%edi),%eax

So c0156f60 is 87 68 01 00 00..

This is with the GCC recompile, so it's not a distro problem. It could still
either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. 2.6.19 with
GCC 3.4.3 is 100% stable.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Alistair John Strachan

unread,

Dec 27, 2006, 11:14:28 PM12/27/06

to Zhang, Yanmin

On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
> On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
> [snip]
>
> > > Here's a current decompilation of vmlinux/pipe_poll() from the running
> > > kernel, the addresses have changed slightly. There's no xchg there
> > > either:
> >
> > Could you reproduce the bug by the new kernel, so we could get the exact
> > address and instruction of the bug?
>
> It crashed again, but this time with no output (machine locked solid). To
> be honest, the disassembly looks right (it's like Chuck said, it's jumping
> back half way through an instruction):
>
> c0156f5f: 3b 87 68 01 00 00 cmp 0x168(%edi),%eax
>
> So c0156f60 is 87 68 01 00 00..
>
> This is with the GCC recompile, so it's not a distro problem. It could
> still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious.
> 2.6.19 with GCC 3.4.3 is 100% stable.

Looks like a similar crash here:

http://ubuntuforums.org/showthread.php?p=1803389

Alistair John Strachan

unread,

Dec 30, 2006, 12:00:20 PM12/30/06

to Zhang, Yanmin

On Thursday 28 December 2006 04:14, Alistair John Strachan wrote:
> On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
> > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
> > [snip]
> >
> > > > Here's a current decompilation of vmlinux/pipe_poll() from the
> > > > running kernel, the addresses have changed slightly. There's no xchg
> > > > there either:
> > >
> > > Could you reproduce the bug by the new kernel, so we could get the
> > > exact address and instruction of the bug?
> >
> > It crashed again, but this time with no output (machine locked solid). To
> > be honest, the disassembly looks right (it's like Chuck said, it's
> > jumping back half way through an instruction):
> >
> > c0156f5f: 3b 87 68 01 00 00 cmp 0x168(%edi),%eax
> >
> > So c0156f60 is 87 68 01 00 00..
> >
> > This is with the GCC recompile, so it's not a distro problem. It could
> > still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious.
> > 2.6.19 with GCC 3.4.3 is 100% stable.
>
> Looks like a similar crash here:
>
> http://ubuntuforums.org/showthread.php?p=1803389

I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize for
size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via
Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12
hours.

The machine passes 6 hours of Prime95 (a CPU stability tester), four memtest86
passes, and there are no heat problems.

I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using
this compiler (but the same binutils), and will report back if it crashes. My
bet is that it won't, however.

Chuck Ebbert

unread,

Dec 30, 2006, 12:27:57 PM12/30/06

to Alistair John Strachan

In-Reply-To: <200612301659....@sms.ed.ac.uk>

On Sat, 30 Dec 2006 16:59:35 +0000, Alistair John Strachan wrote:

> I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize for
> size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via
> Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12
> hours.

Which CPU are you compiling for? You should try different options.

Can you post disassembly of pipe_poll() for both the one that crashes
and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the
relocation info and post just the one function from each for now.

--
MBTI: IXTP

James Courtier-Dutton

unread,

Dec 30, 2006, 1:07:12 PM12/30/06

to Chuck Ebbert

Chuck Ebbert wrote:
> In-Reply-To: <200612201421....@sms.ed.ac.uk>
>
> On Wed, 20 Dec 2006 14:21:03 +0000, Alistair John Strachan wrote:
>
>> Any ideas?
>>
>> BUG: unable to handle kernel NULL pointer dereference at virtual address
>> 00000009
>
> 83 ca 10 or $0x10,%edx
> 3b .byte 0x3b
> 87 68 01 xchg %ebp,0x1(%eax) <=====
> 00 00 add %al,(%eax)
>
> Somehow it is trying to execute code in the middle of an instruction.
> That almost never works, even when the resulting fragment is a legal
> opcode. :)
>
> The real instruction is:
>
> 3b 87 68 01 00 00 00 cmp 0x168(%edi),%eax
>
> I'd guess you have some kind of hardware problem. It could also be
> a kernel problem where the saved address was corrupted during an
> interrupt, but that's not likely.

This looks rather strange.
The times I have seen this sort of problem is:
1) when one bit of the kernel is corrupting another part of it.
2) Kernel modules compiled with different gcc than rest of kernel.
3) kernel headers do not match the kernel being used.

One way to start tracking this down would be to run it with the fewest
amount of kernel modules loaded as one can, but still reproduce the problem.

James

Alistair John Strachan

unread,

Dec 30, 2006, 1:29:14 PM12/30/06

to Chuck Ebbert

On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
> In-Reply-To: <200612301659....@sms.ed.ac.uk>
>
> On Sat, 30 Dec 2006 16:59:35 +0000, Alistair John Strachan wrote:
> > I've eliminated 2.6.19.1 as the culprit, and also tried toggling
> > "optimize for size", various debug options. 2.6.19 compiled with GCC
> > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
> > within approximately 12 hours.
>
> Which CPU are you compiling for? You should try different options.

I should, I haven't thought of that. Currently it's compiling for
CONFIG_MVIAC3_2, but I could try i686 for example.

> Can you post disassembly of pipe_poll() for both the one that crashes
> and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the
> relocation info and post just the one function from each for now.

Sure, no problem:

http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/

Both use identical configs, neither are optimised for size. The config is
available from the same location.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Alistair John Strachan

unread,

Dec 30, 2006, 1:32:33 PM12/30/06

to James Courtier-Dutton

On Saturday 30 December 2006 18:06, James Courtier-Dutton wrote:
> > I'd guess you have some kind of hardware problem. It could also be
> > a kernel problem where the saved address was corrupted during an
> > interrupt, but that's not likely.
>
> This looks rather strange.

[snip]

> 2) Kernel modules compiled with different gcc than rest of kernel.

Previously there was only one GCC version (4.1.1 totally replaced 3.4.3, and
is the system wide GCC), now I have installed 3.4.6 into /opt/gcc-3.4.6 and
it is only PATH'ed explicitly by me when I wish to compile a kernel using it:

export PATH=/opt/gcc-3.4.6/bin:$PATH
cp /boot/config-2.6.19-test .config
make oldconfig
make

> 3) kernel headers do not match the kernel being used.

The tree is a pristine 2.6.19.

> One way to start tracking this down would be to run it with the fewest
> amount of kernel modules loaded as one can, but still reproduce the
> problem.

Crippling the machine, though. Impractical for something that isn't
immediately reproducible.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Alistair John Strachan

unread,

Dec 31, 2006, 8:47:06 AM12/31/06

to Zhang, Yanmin

On Saturday 30 December 2006 16:59, Alistair John Strachan wrote:
> I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config
> using this compiler (but the same binutils), and will report back if it
> crashes. My bet is that it won't, however.

Still fine after >24 hours. Linux 2.6.19, GCC 3.4.6, Binutils 2.17.

Adrian Bunk

unread,

Dec 31, 2006, 11:27:53 AM12/31/06

to Alistair John Strachan

There are occasional reports of problems with kernels compiled with
gcc 4.1 that vanish when using older versions of gcc.

AFAIK, until now noone has ever debugged whether that's a gcc bug,
gcc exposing a kernel bug or gcc exposing a hardware bug.

Comparing your report and [1], it seems that if these are the same
problem, it's not a hardware bug but a gcc or kernel bug.

> Cheers,
> Alistair.

cu
Adrian

[1] http://bugzilla.kernel.org/show_bug.cgi?id=7176

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

Adrian Bunk

unread,

Dec 31, 2006, 11:28:50 AM12/31/06

to Alistair John Strachan

On Sat, Dec 30, 2006 at 06:29:15PM +0000, Alistair John Strachan wrote:
> On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
> > In-Reply-To: <200612301659....@sms.ed.ac.uk>
> >
> > On Sat, 30 Dec 2006 16:59:35 +0000, Alistair John Strachan wrote:
> > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling
> > > "optimize for size", various debug options. 2.6.19 compiled with GCC
> > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
> > > within approximately 12 hours.
> >
> > Which CPU are you compiling for? You should try different options.
>
> I should, I haven't thought of that. Currently it's compiling for
> CONFIG_MVIAC3_2, but I could try i686 for example.
>
> > Can you post disassembly of pipe_poll() for both the one that crashes
> > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the
> > relocation info and post just the one function from each for now.
>
> Sure, no problem:
>
> http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
>
> Both use identical configs, neither are optimised for size. The config is
> available from the same location.

Can you try enabling as many debug options as possible?

> Cheers,
> Alistair.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

Alistair John Strachan

unread,

Dec 31, 2006, 11:48:45 AM12/31/06

to Adrian Bunk

On Sunday 31 December 2006 16:28, Adrian Bunk wrote:
> On Sat, Dec 30, 2006 at 06:29:15PM +0000, Alistair John Strachan wrote:
> > On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
> > > In-Reply-To: <200612301659....@sms.ed.ac.uk>
> > >
> > > On Sat, 30 Dec 2006 16:59:35 +0000, Alistair John Strachan wrote:
> > > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling
> > > > "optimize for size", various debug options. 2.6.19 compiled with GCC
> > > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
> > > > within approximately 12 hours.
> > >
> > > Which CPU are you compiling for? You should try different options.
> >
> > I should, I haven't thought of that. Currently it's compiling for
> > CONFIG_MVIAC3_2, but I could try i686 for example.
> >
> > > Can you post disassembly of pipe_poll() for both the one that crashes
> > > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the
> > > relocation info and post just the one function from each for now.
> >
> > Sure, no problem:
> >
> > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
> >
> > Both use identical configs, neither are optimised for size. The config is
> > available from the same location.
>
> Can you try enabling as many debug options as possible?

Specifically what? I've already had:

CONFIG_DETECT_SOFTLOCKUP
CONFIG_FRAME_POINTER
CONFIG_UNWIND_INFO

Enabled. CONFIG_4KSTACKS is disabled. Are there any debugging features
actually pertinent to this bug?

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Alistair John Strachan

unread,

Dec 31, 2006, 11:55:49 AM12/31/06

to Adrian Bunk

This bug specifically indicates some kind of miscompilation in a driver,
causing boot time hangs. My problem is quite different, and more subtle. The
crash happens in the same place every time, which does suggest determinism
(even with various options toggled on and off, and a 300K smaller kernel
image), but it takes 8-12 hours to manifest and only happens with GCC 4.1.1.

Unless we can start narrowing this down, it would be a mammoth task to seek
out either the kernel or GCC change that first exhibited this bug, due to the
non-immediate reproducibility of the bug, the lack of clues, and this
machine's role as a stable, high-availability server.

(If I had another Epia M10000 or another computer I could reproduce the bug
on, I would be only too happy to boot as many kernels as required to fix it;
however I cannot spare this machine).

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Chuck Ebbert

unread,

Dec 31, 2006, 4:48:55 PM12/31/06

to Alistair John Strachan

In-Reply-To: <200612301829....@sms.ed.ac.uk>

On Sat, 30 Dec 2006 18:29:15 +0000, Alistair John Strachan wrote:

> > Can you post disassembly of pipe_poll() for both the one that crashes
> > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the
> > relocation info and post just the one function from each for now.
>
> Sure, no problem:
>
> http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
>
> Both use identical configs, neither are optimised for size. The config is
> available from the same location.

Those were compiled without frame pointers. Can you post them compiled
with frame pointers so they match your original bug report? And confirm
that pipe_poll() is still at 0xc0156ec0 in vmlinux?

--
MBTI: IXTP

Alistair John Strachan

unread,

Dec 31, 2006, 5:16:55 PM12/31/06

to Chuck Ebbert

On Sunday 31 December 2006 21:43, Chuck Ebbert wrote:
> In-Reply-To: <200612301829....@sms.ed.ac.uk>
>
> On Sat, 30 Dec 2006 18:29:15 +0000, Alistair John Strachan wrote:
> > > Can you post disassembly of pipe_poll() for both the one that crashes
> > > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the
> > > relocation info and post just the one function from each for now.
> >
> > Sure, no problem:
> >
> > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
> >
> > Both use identical configs, neither are optimised for size. The config is
> > available from the same location.
>
> Those were compiled without frame pointers. Can you post them compiled
> with frame pointers so they match your original bug report? And confirm
> that pipe_poll() is still at 0xc0156ec0 in vmlinux?

c0156ec0 <pipe_poll>:

I used the config I original sent you to rebuild it again. This time I've put
up the whole vmlinux for both kernels, the config is replaced, the
decompilation is re-done, I've confirmed the offset in the GCC 4.1.1 kernel
is identical. Sorry for the confusion.

The reason I changed the configs was to experiment with enabling and disabling
debugging (and other such) options that might have shaken out compiler bugs.

However none of these kernels have ever crashed gracefully again, most of them
hang the machine (no nmi watchdog though) so I've not been able to look at
the oops. It's the same root cause, however, as GCC 3.4.6 kernels do not
crash.

http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/

Happy new year, btw.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Adrian Bunk

unread,

Jan 2, 2007, 4:11:09 PM1/2/07

to Alistair John Strachan

>...

Sorry if my point goes a bit away from your problem:

My point is that we have several reported problems only visible
with gcc 4.1.

Other bug reports are e.g. [2] and [3], but they are only present with
using gcc 4.1 _and_ using -Os.

There's simply a bunch of bugs only present with gcc 4.1, and what
worries me most is that the estimated number of unknown cases is most
likely very high since most people won't check different compiler
versions when running into a problem.

> Cheers,
> Alistair.

cu
Adrian

[1] http://bugzilla.kernel.org/show_bug.cgi?id=7176
[2] http://bugzilla.kernel.org/show_bug.cgi?id=7106
[3] https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=186852

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

Adrian Bunk

unread,

Jan 2, 2007, 4:12:46 PM1/2/07

to Alistair John Strachan

No, that's only an "enable as much as possible and hope one helps" shot
in the dark.

> Cheers,
> Alistair.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

Alistair John Strachan

unread,

Jan 2, 2007, 4:56:50 PM1/2/07

to Adrian Bunk

On Tuesday 02 January 2007 21:10, Adrian Bunk wrote:
[snip]

> > > Comparing your report and [1], it seems that if these are the same
> > > problem, it's not a hardware bug but a gcc or kernel bug.
> >
> > This bug specifically indicates some kind of miscompilation in a driver,
> > causing boot time hangs. My problem is quite different, and more subtle.
> > The crash happens in the same place every time, which does suggest
> > determinism (even with various options toggled on and off, and a 300K
> > smaller kernel image), but it takes 8-12 hours to manifest and only
> > happens with GCC 4.1.1. ...
>
> Sorry if my point goes a bit away from your problem:
>
> My point is that we have several reported problems only visible
> with gcc 4.1.
>
> Other bug reports are e.g. [2] and [3], but they are only present with
> using gcc 4.1 _and_ using -Os.

I find [2] most compelling, and I can confirm that I do have the same problem
with or without optimisation for size. I don't use selinux nor has it ever
been enabled.

At any rate, I have absolute confirmation that it is GCC 4.1.1, because with
GCC 3.4.6 the same kernel I reported booting three days ago is still
cheerfully working. I regularly get uptimes of 60+ days on that machine,
rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this
regard.

Perhaps fortunately, the configs I've tried have consistently failed to shake
the crash, so I have a semi-reproducible test case here on C3-2 hardware if
somebody wants to investigate the problem (though it still takes 6-12 hours).

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Linus Torvalds

unread,

Jan 2, 2007, 5:05:08 PM1/2/07

to Adrian Bunk

On Tue, 2 Jan 2007, Adrian Bunk wrote:
>
> My point is that we have several reported problems only visible
> with gcc 4.1.
>
> Other bug reports are e.g. [2] and [3], but they are only present with
> using gcc 4.1 _and_ using -Os.

Traditionally, afaik, -Os has tended to show compiler problems that
_could_ happen with -O2 too, but never do in practice. It may be that
gcc-4.1 without -Os miscompiles some very unusual code, and then with -Os
we just hit more cases of that.

That said, I th ink gcc-4.1.1 is very common - I know it's the Fedora
compiler. Also, CC_OPTIMIZE_FOR_SIZE defaults to 'y' if you have
EXPERIMENTAL on, and from all the bug-reports about other features that
are marked EXPERIMENTAL, I know that a lot of people do seem to select for
it. So I would expect that gcc-4.1.1 and -Os is actually a fairly common
combination. I just checked, and it's what I use personally, for example.

Of course, my main machine is an x86-64, and it has more registers. At
least some historical -Os bug was about bad things happening under
register pressure, iirc, and so x86-64 would show fewer problems than
regular 32-bit x86 (which has far fewer registers for the compiler to
use).

It is a bit worrisome. These things seem to be about 50:50 real kernel
bugs (just hidden by some common code generation sequence) and real
honest-to-goodness compiler bugs. But they are hard as hell to find.

Linus

D. Hazelton

unread,

Jan 2, 2007, 5:06:48 PM1/2/07

to Alistair John Strachan

The GCC code generator appears to have been rewritten between 3.4.6 and
4.1.1....

I took a look at the dump he posted and there are some minor and some massive
differences between the code. In one case some of the code is swapped, in
another there is code in the 3.4.6 version that isn't in the 4.1.1... Finally
the 4.1.1 version of the function has what appears to be function calls and
these don't appear in the code generated by 3.4.6

In other words - the code generation for 4.1.1 appears to be broken when it
comes to generating system code.

DRH

Linus Torvalds

unread,

Jan 2, 2007, 5:18:35 PM1/2/07

to Alistair John Strachan

On Tue, 2 Jan 2007, Alistair John Strachan wrote:
>
> At any rate, I have absolute confirmation that it is GCC 4.1.1, because with
> GCC 3.4.6 the same kernel I reported booting three days ago is still
> cheerfully working. I regularly get uptimes of 60+ days on that machine,
> rebooting only for kernel upgrades. 2.6.19 seems to be no worse in this
> regard.
>
> Perhaps fortunately, the configs I've tried have consistently failed to shake
> the crash, so I have a semi-reproducible test case here on C3-2 hardware if
> somebody wants to investigate the problem (though it still takes 6-12 hours).

Historically, some people have actually used horrible hacks like trying to
figure out which particular C file gets miscompiled by basically having
both compilers installed, and then trying out different subdirectories
with different compilers. And once the subdirectory has been pinpointed,
pinpointing which particular file it is.. etc.

Pretty damn horrible to do, and I'm afraid we don't have any real helpful
scripts to do any of the work for you. So it's all effectively manual
(basically boils down to: "compile everything with known-good compiler.
Then replace the good compiler with the bad one, remove the object files
from one directory, and recompile the kernel". "Rinse and repeat".

I don't think anybody has ever done that with something where triggering
the cause then also takes that long - that just ends up making the whole
thing even more painful.

What are the exact crash details? That might narrow things down enough
that maybe you could try just one or two files that are "suspect".

Linus

David Rientjes

unread,

Jan 2, 2007, 6:13:35 PM1/2/07

to Linus Torvalds

On Tue, 2 Jan 2007, Linus Torvalds wrote:

> Traditionally, afaik, -Os has tended to show compiler problems that
> _could_ happen with -O2 too, but never do in practice. It may be that
> gcc-4.1 without -Os miscompiles some very unusual code, and then with -Os
> we just hit more cases of that.
>

gcc optimizations were almost completely rewritten between 3.4.6 and 4.1,
and one of the subtle changes that may have been introduced is with regard
to the heuristics used to determine whether to inline an 'inline' function
or not when using -Os. This problem can show up in dynamic linking and
break on certain architectures but should be detectable by using -Winline.

David

Alistair John Strachan

unread,

Jan 2, 2007, 6:18:04 PM1/2/07

to Linus Torvalds

Linus,

On Tuesday 02 January 2007 22:13, Linus Torvalds wrote:
[snip]

> What are the exact crash details? That might narrow things down enough
> that maybe you could try just one or two files that are "suspect".

I'll do a digest of the problem for you and anybody else that's lost track of
the debugging story so far..

There are no hardware problems evidenced by any testing I have performed
(memtest, prime95 CPU torture tests, temp monitors). Furthermore, kernels
compiled with older GCCs have been running without problems for literally
years on this machine.

Here is an example of an oops. The kernel continued to limp along after this.

BUG: unable to handle kernel NULL pointer dereference at virtual address
00000009

printing eip:
c0156f60
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat

xt_state iptable_filter ip_tables x_tables prism54 yenta_socket

rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm

snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore ehci_hcd
usblp eth1394 uhci_hcd usbcore ohci1394 ieee1394 via_agp agpgart vt1211

hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack
CPU: 0
EIP: 0060:[<c0156f60>] Not tainted VLI
EFLAGS: 00010246 (2.6.19.1 #1)
EIP is at pipe_poll+0xa0/0xb0
eax: 00000008 ebx: 00000000 ecx: 00000008 edx: 00000000

esi: f70f3e9c edi: f7017c00 ebp: f70f3c1c esp: f70f3c0c

ds: 007b es: 007b ss: 0068

Process python (pid: 4178, ti=f70f2000 task=f70c4a90 task.ti=f70f2000)
Stack: 00000000 00000000 f70f3e9c f6e111c0 f70f3fa4 c015d7f3 f70f3c54 f70f3fac
084c44a0 00000030 084c44d0 00000000 f70f3e94 f70f3e94 00000006 f70f3ecc
00000000 f70f3e94 c015e580 00000000 00000000 00000006 f6e111c0 00000000

Call Trace:
[<c015d7f3>] do_sys_poll+0x253/0x480
[<c015da53>] sys_poll+0x33/0x50
[<c0102c97>] syscall_call+0x7/0xb

[<b7f6b402>] 0xb7f6b402

=======================
Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8

8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f

45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00

EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c

Chuck observed that the kernel tries to reenter pipe_poll half way through an
instruction (c0156f5f->c0156f60); it's not a single-bit error but an
off-by-one.

On Wednesday 20 December 2006 20:48, Chuck Ebbert wrote:
> In-Reply-To: <200612201421....@sms.ed.ac.uk>
>
> On Wed, 20 Dec 2006 14:21:03 +0000, Alistair John Strachan wrote:
> > Any ideas?
> >
> > BUG: unable to handle kernel NULL pointer dereference at virtual address
> > 00000009
>
> 83 ca 10 or $0x10,%edx
> 3b .byte 0x3b
> 87 68 01 xchg %ebp,0x1(%eax) <=====
> 00 00 add %al,(%eax)
>
> Somehow it is trying to execute code in the middle of an instruction.
> That almost never works, even when the resulting fragment is a legal
> opcode. :)
>
> The real instruction is:
>

> 3b 87 68 01 00 00 00 cmp 0x168(%edi),%eax

I've tried a multitude of kernel configs and compiler options, but none have
made any difference. That first oops was pretty lucky, very often the machine
locks up after oopsing (panic_on_oops=1 doesn't work). I've not seen oopses
anywhere but in pipe_poll, but I've not seen many oopses.

The machine runs jabberd 2.x which uses separate python processes as
transports to different networks. The server hosts 50-100 users. One of my
oops reports had Java crashing in the same place, that's Azureus.

I've got binutils 2.17, gcc 4.1.1 hand bootstrapped from GNU sources (not
distro versions). I've got another, secondary compiler (3.4.6), also compiled
from GNU sources, installed elsewhere which I have used to build working
kernels. So the only variable, for sure, is GCC itself.

Both compilers were built with "make bootstrap" and I built binutils with the
resulting GCC, and GCC with the resulting binutils, just to be sure. The only
slightly non-standard thing I do is to compile everything (GCC, binutils, the
kernels) on a dual-opteron box, inside a 32bit chroot, which is rsync'ed over
to the Via C3-2 box with the problem. I can't see how this would cause any
problems (and indeed have done it successfully for years), but I thought I'd
point it out.

The crashes take time to appear, which is why so many people suspected
hardware initially. But the uptime of a GCC 4.1.1 kernel will always be less
than 12 hours, where a 3.4.6 kernel will survive for months. I've had no
other mysterious software crashes, ever.

On Sunday 31 December 2006 22:16, Alistair John Strachan wrote:
> On Sunday 31 December 2006 21:43, Chuck Ebbert wrote:
> > Those were compiled without frame pointers. Can you post them compiled
> > with frame pointers so they match your original bug report? And confirm
> > that pipe_poll() is still at 0xc0156ec0 in vmlinux?
>
> c0156ec0 <pipe_poll>:
>
> I used the config I original sent you to rebuild it again. This time I've
> put up the whole vmlinux for both kernels, the config is replaced, the
> decompilation is re-done, I've confirmed the offset in the GCC 4.1.1 kernel
> is identical. Sorry for the confusion.

[snip]
> http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/

At the above URL can be found vmlinux images, the config used to build both,
and decompilations of the fs/pipe.o file (with relocation information).

The suggestions I've had so far which I have not yet tried:

- Select a different x86 CPU in the config.
- Unfortunately the C3-2 flags seem to simply tell GCC
to schedule for ppro (like i686) and enabled MMX and SSE
- Probably useless

- Enable as many debug options as possible ("a shot in the dark")

- Try compiling a minimal kernel config, sans modules that are not required
for booting. The problem with this one (whilst it might uncover some bizarre
memory scribbling or stack corruption) is that the machine's primary role is
that of a router, so I require most of the modules loaded for the oops to be
reproduced (chicken, egg?).

If I can provide any more information, please do let me know.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Adrian Bunk

unread,

Jan 2, 2007, 6:24:53 PM1/2/07

to D. Hazelton

Differences are expected since we disable unit-at-a-time for gcc < 4
and gcc development didn't stall between 3.4 and 4.1.

> In other words - the code generation for 4.1.1 appears to be broken when it
> comes to generating system code.

Bug number for an either already open or created by you bug in the gcc
Bugzilla for what you claim to be a bug in gcc?

> DRH

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

-

D. Hazelton

unread,

Jan 2, 2007, 6:42:09 PM1/2/07

to Adrian Bunk

Okay. Thing is that these noted differences, aside from where 4.1.1 doesn't
generate an opcode that 3.4.6 does aren't all that fatal, IMHO. The fact that
there it does generate call's rather than jumps for local pointer moves
(IIRC - been a while since I looked at the dump of pipe_poll that he
provided) might be part of the problem

> > In other words - the code generation for 4.1.1 appears to be broken when
> > it comes to generating system code.
>
> Bug number for an either already open or created by you bug in the gcc
> Bugzilla for what you claim to be a bug in gcc?

None. I didn't file a report on this because I didn't find the big, just noted
a problem that appears to occur. In this case the call's generated seem to
wrap loops - something I've never heard of anyone doing. These *might* be
causing the off-by-one that is causing the function to re-enter in the middle
of an instruction.

Seeing this I'd guess that this follows for all system-level code generated by
4.1.1 and this is exactly what I was reporting. If you'd like I'll go dig up
the dumps he posted and post the two related segments side-by-side to give
you a better example what I'm referring to.

DRH

Linus Torvalds

unread,

Jan 2, 2007, 8:47:22 PM1/2/07

to Alistair John Strachan

On Tue, 2 Jan 2007, Alistair John Strachan wrote:
>

> eax: 00000008 ebx: 00000000 ecx: 00000008 edx: 00000000
> esi: f70f3e9c edi: f7017c00 ebp: f70f3c1c esp: f70f3c0c
>

> Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8
> 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f
> 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
> EIP: [<c0156f60>] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c
>
> Chuck observed that the kernel tries to reenter pipe_poll half way through an
> instruction (c0156f5f->c0156f60); it's not a single-bit error but an
> off-by-one.

It's not an off-by-one either (eg say we're taking an exception and
screiwing up %eip by one somehow).

The code sequence in question is

mov %ecx,%edx
mov 0x6c(%esi),%eax
or $0x10,%edx
cmp 0x168(%edi),%eax <--
cmovne %edx,%ecx
jmp ...

and it's in the second byte of the "cmp".

And yes, it definitely entered there, because trying other random
entry-points will have either invalid instructions or instructions that
would fault due to NULL pointers. HOWEVER, it's also not as simple as
"took an interrupt, and returned with %eip incremented by one", becasue
your %edx is zero, so it won't have done that "or $10,%edx" and then some
interrupt happened and screwed up just %eip.

So it's literally a random %eip, but since you say it's consistently in
that function, it's not truly "random". There's something that triggers it
just _there_.

However, that's a damn simple function. There's _nothing_ there. The
particular code that is involved right there is literally

if (!pipe->writers && filp->f_version != pipe->w_counter)
mask |= POLLHUP;

and that's it. There's not even anything half-way interesting around it,
except for the "poll_wait()" call, but even that is about as common as
you can humanly get..

Looking at the register set and the stack, I see:

Stack: 00000000
00000000 <- saved %ebx (dunno, seems dead in caller)
f70f3e9c <- saved %esi (== pollfd in do_pollfd)
f6e111c0 <- saved %edi (== filp)
f70f3fa4 <- outer EBP (looks reasonable)
c015d7f3 <- return address (do_sys_poll+0x253/0x480)

and the strange thing is that when the oops happens, it really looks like
%esi _still_ contains the value it had originally (and that is saved on
the stack). But afaik, from your disassembly, it should have been
overwritten by the initial %eax, which should have had the same value as
%edi on entry...

IOW, none of it really makes any sense. The stack frames look fine, so we
_did_ enter at the beginning of the function (and it wasn't the *poll fn
pointer that was corrupt.

> The suggestions I've had so far which I have not yet tried:
>
> - Select a different x86 CPU in the config.
> - Unfortunately the C3-2 flags seem to simply tell GCC
> to schedule for ppro (like i686) and enabled MMX and SSE
> - Probably useless

Actually, try this one. Try using something that doesn't like "cmov".
Maybe the C3-2 simply has some internal cmov bugginess.

Linus

Horst H. von Brand

unread,

Jan 2, 2007, 9:07:38 PM1/2/07

to D. Hazelton

D. Hazelton <dhaz...@enter.net> wrote:

[...]

> None. I didn't file a report on this because I didn't find the big, just
> noted a problem that appears to occur. In this case the call's generated
> seem to wrap loops - something I've never heard of anyone doing.

Example code showing this weirdness?

> These
> *might* be causing the off-by-one that is causing the function to
> re-enter in the middle of an instruction.

If something like this happened, programs would be crashing left and right.

> Seeing this I'd guess that this follows for all system-level code
> generated by 4.1.1

Define "system-level code". What makes it different from, say,
bog-of-the-mill compiler code (yes, gcc compiles itself as part of its
sanity checking)?

> and this is exactly what I was reporting. If you'd
> like I'll go dig up the dumps he posted and post the two related segments
> side-by-side to give you a better example what I'm referring to.

If the related segments show code that is somehow wrong, by all means
report it /with your detailed analysis/ to the compiler people. Just a
warning, gcc is pretty smart in what it does, its code is often surprising
to the unwashed. Also, the C standard is subtle, the error might be in a
unwarranted assumption in the source code.

Alistair John Strachan

unread,

Jan 2, 2007, 9:21:22 PM1/2/07

to Mikael Pettersson

On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote:

> On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > The suggestions I've had so far which I have not yet tried:
> > >
> > > - Select a different x86 CPU in the config.
> > > - Unfortunately the C3-2 flags seem to simply tell GCC
> > > to schedule for ppro (like i686) and enabled MMX and SSE
> > > - Probably useless
> >
> > Actually, try this one. Try using something that doesn't like "cmov".
> > Maybe the C3-2 simply has some internal cmov bugginess.
>

> That's a good suggestion. Earlier C3s didn't have cmov so it's
> not entirely unlikely that cmov in C3-2 is broken in some cases.
> Configuring for P5MMX or 486 should be good safe alternatives.

Or just C3 (not C3-2), which is what I've done.

I'll report back whether it crashes or not.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Mikael Pettersson

unread,

Jan 2, 2007, 9:22:06 PM1/2/07

to s034...@sms.ed.ac.uk, torv...@osdl.org

On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > The suggestions I've had so far which I have not yet tried:
> >
> > - Select a different x86 CPU in the config.
> > - Unfortunately the C3-2 flags seem to simply tell GCC
> > to schedule for ppro (like i686) and enabled MMX and SSE
> > - Probably useless
>
> Actually, try this one. Try using something that doesn't like "cmov".
> Maybe the C3-2 simply has some internal cmov bugginess.

That's a good suggestion. Earlier C3s didn't have cmov so it's
not entirely unlikely that cmov in C3-2 is broken in some cases.
Configuring for P5MMX or 486 should be good safe alternatives.

/Mikael

Willy Tarreau

unread,

Jan 3, 2007, 12:59:11 AM1/3/07

to Mikael Pettersson

On Wed, Jan 03, 2007 at 03:12:13AM +0100, Mikael Pettersson wrote:
> On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > The suggestions I've had so far which I have not yet tried:
> > >
> > > - Select a different x86 CPU in the config.
> > > - Unfortunately the C3-2 flags seem to simply tell GCC
> > > to schedule for ppro (like i686) and enabled MMX and SSE
> > > - Probably useless
> >
> > Actually, try this one. Try using something that doesn't like "cmov".
> > Maybe the C3-2 simply has some internal cmov bugginess.
>
> That's a good suggestion. Earlier C3s didn't have cmov so it's
> not entirely unlikely that cmov in C3-2 is broken in some cases.

Agreed! When I developped the cmov emulator, I used an early C3 for the
tests (well, a "Samuel2" to be precise), because it did not report "cmov"
in its flags. I first thought "wow, my emulator is amazingly fast!" because
it took something like 50 cycles to do cmovne %eax,%ebx.

Then I realized that this processor performed cmov itself between
registers, and only triggered the invalid opcode when one of the operand
was a memory reference. And this time, for a hard-coded instruction, it
was really slow...

For this reason, I would not be surprized at all that there would be some
buggy behaviour in the cmov right there. Maybe a bug in the decoder unit
making it skip a byte when the next instruction in the prefetch queue is
a cmov affecting same registers... When vendors can do dirty things such
as executing unsupported instructions, we can expect anything from them.

> Configuring for P5MMX or 486 should be good safe alternatives.

I generally use the P5MMX target for such processors.

> /Mikael

Regards,
Willy

Alan

unread,

Jan 3, 2007, 5:22:01 AM1/3/07

to Mikael Pettersson

> That's a good suggestion. Earlier C3s didn't have cmov so it's
> not entirely unlikely that cmov in C3-2 is broken in some cases.
> Configuring for P5MMX or 486 should be good safe alternatives.

The proper fix for all of this mess is to fix the gcc compiler suite to
actually generate i686 code when told to use i686. CMOV is an optional
i686 extension which gcc uses without checking. In early PIV days it made
sense but on modern processors CMOV is so pointless the bug should be
fixed. At that point an i686 kernel would contain i686 instructions and
actually run on all i686 processors ending all the i586 pain for most
users and distributions.

Unfortunately the compiler people don't appear to care about their years
old bug.

Alan

Grzegorz Kulewski

unread,

Jan 3, 2007, 5:33:06 AM1/3/07

to Alan

On Wed, 3 Jan 2007, Alan wrote:
> The proper fix for all of this mess is to fix the gcc compiler suite to
> actually generate i686 code when told to use i686. CMOV is an optional
> i686 extension which gcc uses without checking. In early PIV days it made
> sense but on modern processors CMOV is so pointless the bug should be
> fixed. At that point an i686 kernel would contain i686 instructions and
> actually run on all i686 processors ending all the i586 pain for most
> users and distributions.

Could you explain why CMOV is pointless now? Are there any benchmarks
proving that?

Thanks,

Grzegorz Kulewski

Jeff Garzik

unread,

Jan 3, 2007, 6:53:12 AM1/3/07

to Grzegorz Kulewski

Grzegorz Kulewski wrote:
> On Wed, 3 Jan 2007, Alan wrote:
>> The proper fix for all of this mess is to fix the gcc compiler suite to
>> actually generate i686 code when told to use i686. CMOV is an optional
>> i686 extension which gcc uses without checking. In early PIV days it made
>> sense but on modern processors CMOV is so pointless the bug should be
>> fixed. At that point an i686 kernel would contain i686 instructions and
>> actually run on all i686 processors ending all the i586 pain for most
>> users and distributions.
>
> Could you explain why CMOV is pointless now? Are there any benchmarks
> proving that?

In theory modern processors should have no trouble converting a
test/move sequence into the same uops generated by a cmov instruction,
for one.

Jeff

Alan

unread,

Jan 3, 2007, 7:38:10 AM1/3/07

to Grzegorz Kulewski

> > fixed. At that point an i686 kernel would contain i686 instructions and
> > actually run on all i686 processors ending all the i586 pain for most
> > users and distributions.
>
> Could you explain why CMOV is pointless now? Are there any benchmarks
> proving that?

Take a look at the recent ffmpeg bits on the mplayer list for one example
I have to hand - P4 cmov is pretty slow. The crypto folks find the same
things.

Alan

Arjan van de Ven

unread,

Jan 3, 2007, 8:37:50 AM1/3/07

to Alan

On Wed, 2007-01-03 at 12:44 +0000, Alan wrote:
> > > fixed. At that point an i686 kernel would contain i686 instructions and
> > > actually run on all i686 processors ending all the i586 pain for most
> > > users and distributions.
> >
> > Could you explain why CMOV is pointless now? Are there any benchmarks
> > proving that?
>
> Take a look at the recent ffmpeg bits on the mplayer list for one example
> I have to hand - P4 cmov is pretty slow. The crypto folks find the same
> things.

cmov is effectively the same cost as a compare and jump, in both cases
the cpu needs to do a prediction, and on a mispredict, restart.

the reason cmov can make sense is because it's smaller code...

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

Jakub Jelinek

unread,

Jan 3, 2007, 9:01:38 AM1/3/07

to Arjan van de Ven

On Wed, Jan 03, 2007 at 05:32:16AM -0800, Arjan van de Ven wrote:
> On Wed, 2007-01-03 at 12:44 +0000, Alan wrote:
> > > > fixed. At that point an i686 kernel would contain i686 instructions and
> > > > actually run on all i686 processors ending all the i586 pain for most
> > > > users and distributions.
> > >
> > > Could you explain why CMOV is pointless now? Are there any benchmarks
> > > proving that?
> >
> > Take a look at the recent ffmpeg bits on the mplayer list for one example
> > I have to hand - P4 cmov is pretty slow. The crypto folks find the same
> > things.
>
> cmov is effectively the same cost as a compare and jump, in both cases
> the cpu needs to do a prediction, and on a mispredict, restart.
>
> the reason cmov can make sense is because it's smaller code...

BTW, from GCC POV availability of CMOV is the only difference between
-march=i586 -mtune=something and -march=i686 -mtune=something. So this is
just a naming thing, it could be called -march=i686cmov to make it more
obvious but it is too late (and too unimportant) to change it now.
Perhaps adding a note to info gcc/man gcc ought to be enough?
If you don't want CMOV being emitted, compile with -march=i586 -mtune=generic
(or whatever other tuning you pick up), with -march=i686 -mtune=generic
you tell GCC you have CMOV. Whether CMOV is actually used in generated
code is another matter, which should be decided based on the selected
-mtune. For -Os CMOV should be used whenever available, as that means
usually smaller code, otherwise if on some particular chip CMOV is actually
slower than compare, jump and assignment, then CMOV should not be selected
for that particular tuning (say if Pentium4 has slower CMOV than
compare+jump+assignment, -mtune=pentium4 should not emit CMOV, at least not
often), if you have examples of that, please file a bug to
http://gcc.gnu.org/bugzilla/. -mtune=generic should emit resp. not emit
CMOV depending on whether it is a win on the currently common CPUs.

Jakub

Alan

unread,

Jan 3, 2007, 9:24:20 AM1/3/07

to Arjan van de Ven

> cmov is effectively the same cost as a compare and jump, in both cases
> the cpu needs to do a prediction, and on a mispredict, restart.

On a P4 it appears to be slower than compare/jump in most cases

Linus Torvalds

unread,

Jan 3, 2007, 11:11:19 AM1/3/07

to Grzegorz Kulewski

On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
>
> Could you explain why CMOV is pointless now? Are there any benchmarks proving
> that?

CMOV (and, more generically, any "predicated instruction") tends to
generally a bad idea on an aggressively out-of-order CPU. It doesn't
always have to be horrible, but in practice it is seldom very nice, and
(as usual) on the P4 it can be really quite bad.

On a P4, I think a cmov basically takes 10 cycles.

But even ignoring the usual P4 "I suck at things that aren't totally
normal", cmov is actually not a great idea. You can always replace it by

j<negated condition> forward
mov ..., %reg
forward:

and assuming the branch is AT ALL predictable (and 95+% of all branches
are), the branch-over will actually be a LOT better for a CPU.

Why? Becuase branches can be predicted, and when they are predicted they
basically go away. They go away on many levels, too. Not just the branch
itself, but the _conditional_ for the branch goes away as far as the
critical path of code is concerned: the CPU still has to calculate it and
check it, but from a performance angle it "doesn't exist any more",
because it's not holding anything else up (well, you want to do it in
_some_ reasonable time, but the point stands..)

Similarly, whichever side of the branch wasn't taken goes away. Again, in
an out-of-order machine with register renaming, this means that even if
the branch isn't taken above, and you end up executing all the non-branch
instructions, because you now UNCONDITIONALLY over-write the register, the
old data in the register is now DEAD, so now all the OTHER writes to that
register are off the critical path too!

So the end result is that with a conditional branch, ona good CPU, the
_only_ part of the code that is actually performance-sensitive is the
actual calculation of the value that gets used!

In contrast, if you use a predicated instruction, ALL of it is on the
critical path. Calculating the conditional is on the critical path.
Calculating the value that gets used is obviously ALSO on the critical
path, but so is the calculation for the value that DOESN'T get used too.
So the cmov - rather than speeding things up - actually slows things down,
because it makes more code be dependent on each other.

So here's the basic rule:

- cmov is sometimes nice for code density. It's not a big win, but it
certainly can be a win.

- if you KNOW the branch is totally unpredictable, cmov is often good for
performance. But a compiler almost never knows that, and even if you
train it with input data and profiling, remember that not very many
branches _are_ totally unpredictable, so even if you were to know that
something is unpredictable, it's going to be very rare.

- on a P4, branch mispredictions are expensive, but so is cmov, so all
the above is to some degree exaggerated. On nicer microarchitectures
(the Intel Core 2 in particular is something I have to say is very nice
indeed), the difference will be a lot less noticeable. The loss from
cmov isn't very big (it's not as sucky as P4), but neither is the win
(branch misprediction isn't that expensive either).

Here's an example program that you can test and time yourself.

On my Core 2, I get

[torvalds@woody ~]$ gcc -DCMOV -Wall -O2 t.c
[torvalds@woody ~]$ time ./a.out
600000000

real 0m0.194s
user 0m0.192s
sys 0m0.000s

[torvalds@woody ~]$ gcc -Wall -O2 t.c
[torvalds@woody ~]$ time ./a.out
600000000

real 0m0.167s
user 0m0.168s
sys 0m0.000s

ie the cmov is quite a bit slower. Maybe I did something wrong. But note
how cmov not only is slower, it's fundamnetally more limited too (ie the
branch-over can actually do a lot of things cmov simply cannot do).

So don't use cmov. Except for non-performance-critical code, or if you
really care about code-size, and it helps (which is actually fairly rare:
quite often cmov isn't even smaller than a conditional jump and a regular
move, partly because a regular move can take arguments that a cmov cannot:
move to memory, move from an immediate etc etc, so depending on what
you're moving, cmov simply isn't good even if it's _just_ a move).

(For me, the "cmov" version of the function ends up being three bytes
shorter. So it's actually a good example of everything above)

Linus

(*) x86 only has "move to register" as a predicated instruction, but some
other architectures have lots of them, potentially all instructions. I
don't count conditional branches as "predicated", although some crazy
people do. ARM has predicated instructions (but they are gone in Thumb, I
think), and ia64 obviously has predicated instructions (but it will be
gone in a few years ;)

t.c

Linus Torvalds

unread,

Jan 3, 2007, 11:11:29 AM1/3/07

to Alan

On Wed, 3 Jan 2007, Alan wrote:
>

> > cmov is effectively the same cost as a compare and jump, in both cases
> > the cpu needs to do a prediction, and on a mispredict, restart.
>
> On a P4 it appears to be slower than compare/jump in most cases

On just about EVERYTHING it's slower than compare/jump. See my other post
on why, together with a (largely untested) test app.

Linus

l.ge...@oltrelinux.com

unread,

Jan 3, 2007, 12:07:28 PM1/3/07

to Linus Torvalds

Just to make clearer why I am so curious, this from X86_64 X2 3800+:

DarkStar:{venom}:/tmp> gcc -DCMOV -Wall -O2 t.c
DarkStar:{venom}:/tmp>time ./a.out
600000000

real 0m0.151s
user 0m0.150s
sys 0m0.000s
DarkStar:{venom}:/tmp> gcc -Wall -O2 t.c
DarkStar:{venom}:/tmp> time ./a.out
600000000

real 0m0.176s
user 0m0.180s
sys 0m0.000s
DarkStar:{venom}:/tmp>gcc -m32 -DCMOV -Wall -O2 t.c
DarkStar:{venom}:/tmp>time ./a.out
600000000

real 0m0.152s
user 0m0.160s
sys 0m0.000s
DarkStar:{venom}:/tmp>gcc -m32 -Wall -O2 t.c
DarkStar:{venom}:/tmp>time ./a.out
600000000

real 0m0.200s
user 0m0.200s
sys 0m0.000s

l.ge...@oltrelinux.com

unread,

Jan 3, 2007, 12:08:58 PM1/3/07

to Linus Torvalds

Just curious why on Opteron dual core 2600MHZ I get:

phoenix:{root}:/tmp> gcc -DCMOV -Wall -O2 t.c
phoenix:{root}:/tmp>time ./a.out
600000000

real 0m0.117s
user 0m0.120s
sys 0m0.000s
phoenix:{root}:/tmp>gcc -Wall -O2 t.c
phoenix:{root}:/tmp> time ./a.out
600000000

real 0m0.136s
user 0m0.130s
sys 0m0.010s

Regards

(I understand it is very different from P4)

Luigi Genoni

On Wed, 3 Jan 2007, Linus Torvalds wrote:

> Date: Wed, 3 Jan 2007 08:03:37 -0800 (PST)
> From: Linus Torvalds <torv...@osdl.org>
> To: Grzegorz Kulewski <kan...@polcom.net>
> Cc: Alan <al...@lxorguk.ukuu.org.uk>, Mikael Pettersson <mi...@it.uu.se>,
> s034...@sms.ed.ac.uk, 76306...@compuserve.com, ak...@osdl.org,
> bu...@stusta.de, gr...@kroah.com, linux-...@vger.kernel.org,
> yanmin...@linux.intel.com
> Subject: Re: kernel + gcc 4.1 = several problems
> Resent-Date: Wed, 03 Jan 2007 17:16:00 +0100
> Resent-From: <l.ge...@sns.it>

Tim Schmielau

unread,

Jan 3, 2007, 12:45:52 PM1/3/07

to l.ge...@oltrelinux.com

Well, on a P4 (which is supposed to be soo bad) I get:

> gcc -O2 t.c -o t
> foreach x ( 1 2 3 4 5 )
>> time ./t > /dev/null
>> end
0.196u 0.004s 0:00.19 100.0% 0+0k 0+0io 0pf+0w
0.168u 0.004s 0:00.16 100.0% 0+0k 0+0io 0pf+0w
0.168u 0.000s 0:00.16 100.0% 0+0k 0+0io 0pf+0w
0.160u 0.000s 0:00.15 106.6% 0+0k 0+0io 0pf+0w
0.180u 0.000s 0:00.18 100.0% 0+0k 0+0io 0pf+0w
> gcc -DCMOV -O2 t.c -o t
> foreach x ( 1 2 3 4 5 )
>> time ./t > /dev/null
>> end
0.168u 0.000s 0:00.17 94.1% 0+0k 0+0io 0pf+0w
0.152u 0.000s 0:00.15 100.0% 0+0k 0+0io 0pf+0w
0.136u 0.004s 0:00.13 100.0% 0+0k 0+0io 0pf+0w
0.168u 0.000s 0:00.16 100.0% 0+0k 0+0io 0pf+0w
0.172u 0.000s 0:00.17 100.0% 0+0k 0+0io 0pf+0w

see?

Denis Vlasenko

unread,

Jan 3, 2007, 2:49:02 PM1/3/07

to Linus Torvalds

On Wednesday 03 January 2007 17:03, Linus Torvalds wrote:
> On Wed, 3 Jan 2007, Grzegorz Kulewski wrote:
> > Could you explain why CMOV is pointless now? Are there any benchmarks proving
> > that?
>
> CMOV (and, more generically, any "predicated instruction") tends to
> generally a bad idea on an aggressively out-of-order CPU. It doesn't
> always have to be horrible, but in practice it is seldom very nice, and
> (as usual) on the P4 it can be really quite bad.
>
> On a P4, I think a cmov basically takes 10 cycles.
>
> But even ignoring the usual P4 "I suck at things that aren't totally
> normal", cmov is actually not a great idea. You can always replace it by
>
> j<negated condition> forward
> mov ..., %reg
> forward:

..
..

> In contrast, if you use a predicated instruction, ALL of it is on the
> critical path. Calculating the conditional is on the critical path.
> Calculating the value that gets used is obviously ALSO on the critical
> path, but so is the calculation for the value that DOESN'T get used too.
> So the cmov - rather than speeding things up - actually slows things down,
> because it makes more code be dependent on each other.

Why CPU people do not internally convert cmov into jmp,mov pair?
--
vda

Linus Torvalds

unread,

Jan 3, 2007, 3:31:41 PM1/3/07

to Tim Schmielau

On Wed, 3 Jan 2007, Tim Schmielau wrote:
>
> Well, on a P4 (which is supposed to be soo bad) I get:

Interesting. My P4 gets basically exactly the same timings for the cmov
and branch cases. And my Core 2 is consistently faster (something like
15%) for the branch version.

Btw, the test-case should be the best possible one for cmov, since there
are no data-dependencies except for ALU operations, and everything is
totally independent (the actual values have no data dependencies at all,
since they are constants). So the critical path issue never show up.

Linus

Linus Torvalds

unread,

Jan 3, 2007, 3:43:34 PM1/3/07

to Denis Vlasenko

On Wed, 3 Jan 2007, Denis Vlasenko wrote:
>
> Why CPU people do not internally convert cmov into jmp,mov pair?

Probably because

- it's not worth it. cmov's certainly _can_ be faster for unpredictable
input. So expecially if you teach your compiler (by using profiling) to
use cmov's mainly for unpredictable cases, turning it into a
conditional jump internally would likely be a bad idea.

- the biggest reason to do it would likely be microarchitectural: if you
have an ALU or a bypass network that just isn't suitable for bypassing
the flags that way (because you designed your pipeline for a
conditional branch), you might decide that it just simplifies things to
turn the cmov internally into a branch+mov uop pair.

- cmov's simply aren't common enough to be worth worrying about,
especially as it's not likely that the difference is all that big in
the end. The limitations on cmov's means that the compiler can only use
them under certain fairly limited circumstances anyway, so it's not
like you'll make a huge difference by doing anything clever. So see
above - it's simply a wash, and likely ends up just depending on other
issues.

And don't get me wrong. cmov's can make a difference. You can use them to
avoid polluting your branch prediction tables, you can use them to make
code smaller, and you can use them when they simply just fit the problem
really well. It's just _not_ the case that they are "obviously better".
They simply aren't. Conditional branches aren't "evil". There are many
MUCH worse things you can do, and other things you should avoid.

It really all boils down to: there's simply no real reason to use cmov.
It's not horrible either, so go ahead and use it if you want to, but don't
expect your code to really magically run any faster.

Linus

Denis Vlasenko

unread,

Jan 3, 2007, 4:50:57 PM1/3/07

to Linus Torvalds

On Wednesday 03 January 2007 21:38, Linus Torvalds wrote:
> On Wed, 3 Jan 2007, Denis Vlasenko wrote:
> >
> > Why CPU people do not internally convert cmov into jmp,mov pair?
>

..

> It really all boils down to: there's simply no real reason to use cmov.
> It's not horrible either, so go ahead and use it if you want to, but don't
> expect your code to really magically run any faster.

IOW: yet another slot in instruction opcode matrix and thousands of
transistors in instruction decoders are wasted because of this
"clever invention", eh?
--
vda

Linus Torvalds

unread,

Jan 3, 2007, 5:12:37 PM1/3/07

to Thomas Sailer

On Wed, 3 Jan 2007, Thomas Sailer wrote:
>
> IF... Counterexample: Add-Compare-Select in a Viterbi Decoder.

Yes. [De]compression stuff tends to be (a) totally unpredictable and (b) a
situation where people care about performance. It's fairly rare in many
other situations.

That said, any real performance these days is about avoiding cache misses.
There cmov really can help more, if it results in denser code (fairly big
if, though).

Linus

Linus Torvalds

unread,

Jan 3, 2007, 5:17:35 PM1/3/07

to Denis Vlasenko

On Wed, 3 Jan 2007, Denis Vlasenko wrote:
>
> IOW: yet another slot in instruction opcode matrix and thousands of
> transistors in instruction decoders are wasted because of this
> "clever invention", eh?

Well, in all fairness, it can probably help more on certain
microarchitectures. Intel is fairly aggressively OoO, especially in Core
2, and predicted branches are not only free, they allow OoO to do a great
job around them. But an in-order architecture doesn't have that, and cmov
might show more of an advantage there.

Linus

Thomas Sailer

unread,

Jan 3, 2007, 5:30:06 PM1/3/07

to Linus Torvalds

On Wed, 2007-01-03 at 08:03 -0800, Linus Torvalds wrote:

> and assuming the branch is AT ALL predictable (and 95+% of all branches
> are), the branch-over will actually be a LOT better for a CPU.

IF... Counterexample: Add-Compare-Select in a Viterbi Decoder. If the
compare can be predicted, you botched the compression of the data (if
you can predict the data, you could have compressed it better), or your
noise is not white, i.e. you f*** up the whitening filter. So in any
practical viterbi decoder, the compares cannot be predicted. I remember
cmov made a big difference in Viterbi Decoder performance on a Cyrix
6x86. But granted, nowadays these things are usually done with SIMD and
masks.

Tom

Zou, Nanhai

unread,

Jan 3, 2007, 10:09:31 PM1/3/07

to Linus Torvalds, Grzegorz Kulewski

> -----Original Message-----
> From: linux-ker...@vger.kernel.org
> [mailto:linux-ker...@vger.kernel.org] On Behalf Of Linus Torvalds
> Sent: 2007年1月4日 0:04
> To: Grzegorz Kulewski
> Cc: Alan; Mikael Pettersson; s034...@sms.ed.ac.uk;
> 76306...@compuserve.com; ak...@osdl.org; bu...@stusta.de; gr...@kroah.com;
> linux-...@vger.kernel.org; yanmin...@linux.intel.com
> Subject: Re: kernel + gcc 4.1 = several problems
>
>
>

Hi,
cmov will stall on eflags in your test program.
I think you will see benefit of cmov if you can manage to put some instructions which does NOT modify eflags between testl and cmov.

Thanks
Zou Nan hai

Albert Cahalan

unread,

Jan 4, 2007, 2:13:03 AM1/4/07

to mi...@it.uu.se, s034...@sms.ed.ac.uk, torv...@osdl.org, linux-...@vger.kernel.org, ak...@osdl.org, bu...@stusta.de

Linus Torvalds writes:
> [probably Mikael Pettersson] writes:

>> The suggestions I've had so far which I have not yet tried:
>>
>> - Select a different x86 CPU in the config.
>> - Unfortunately the C3-2 flags seem to simply tell GCC to
>> schedule for ppro (like i686) and enabled MMX and SSE
>> - Probably useless
>
> Actually, try this one. Try using something that doesn't like "cmov".
> Maybe the C3-2 simply has some internal cmov bugginess.

Of course that changes register usage, register spilling,
and thus ultimately even the stack layout. :-(

Adjusting gcc flags to eliminate optimizations is another way to go.
Adding -fwrapv would be an excellent start. Lack of this flag breaks
most code which checks for integer wrap-around. The compiler "knows"
that signed integers don't ever wrap, and thus eliminates any code
which checks for values going negative after a wrap-around. I could
imagine this affecting a switch() or other jump table.

Linus Torvalds

unread,

Jan 4, 2007, 10:39:37 AM1/4/07

to Zou, Nanhai

On Thu, 4 Jan 2007, Zou, Nanhai wrote:
>
> cmov will stall on eflags in your test program.

And that is EXACTLY my point.

CMOV is a piece of CRAP for most things, exactly because it serializes
three streams of data: the two inputs, and the conditional.

My test-case was actually _good_ for cmov, because there was just the one
conditional (which was 100% ALU) thing that was serialized. In real life,
the two data sources also come from memory, and _any_ of them being
delayed ends up delaying the cmov, and screwing up your out-of-order
pipeline because you now introduced a serialization point that was very
possibly not necessary at all.

In contrast, a conditional branch-around serializes absolutely NOTHING,
because branches get predicted.

> I think you will see benefit of cmov if you can manage to put some
> instructions which does NOT modify eflags between testl and cmov.

A lot of the time, the conditional _is_ the critical path.

The whole point of this discussion was that cmov isn't really all that
great. It has fundamental problems that a conditional branch that gets
predicted simply does not have.

That's qiute apart from the fact that cmov has rather limited semantics,
and that in 99% of all cases you have to use a conditional branch anyway.

Linus

Segher Boessenkool

unread,

Jan 4, 2007, 11:46:21 AM1/4/07

to Albert Cahalan

> Adjusting gcc flags to eliminate optimizations is another way to go.
> Adding -fwrapv would be an excellent start. Lack of this flag breaks
> most code which checks for integer wrap-around.

Lack of the flag does not break any valid C code, only code
making unwarranted assumptions (i.e., buggy code).

> The compiler "knows"
> that signed integers don't ever wrap, and thus eliminates any code
> which checks for values going negative after a wrap-around.

You cannot assume it eliminates such code; the compiler is free
to do whatever it wants in such a case.

You should typically write such a computation using unsigned
types, FWIW.

Anyway, with 4.1 you shouldn't see frequent problems due to
"not using -fwrapv while my code is broken WRT signed overflow"
yet; and if/when problems start to happen, to "correct" action
to take is not to add the compiler flag, but to fix the code.

Segher

Albert Cahalan

unread,

Jan 4, 2007, 12:04:42 PM1/4/07

to Segher Boessenkool

On 1/4/07, Segher Boessenkool <seg...@kernel.crashing.org> wrote:
> > Adjusting gcc flags to eliminate optimizations is another way to go.
> > Adding -fwrapv would be an excellent start. Lack of this flag breaks
> > most code which checks for integer wrap-around.
>
> Lack of the flag does not break any valid C code, only code
> making unwarranted assumptions (i.e., buggy code).

Right, if "C" means "strictly conforming ISO C" to you.
(in which case, nearly all real-world code is broken)

FYI, the kernel also assumes that a "char" is 8 bits.
Maybe you should run away screaming.

> > The compiler "knows"
> > that signed integers don't ever wrap, and thus eliminates any code
> > which checks for values going negative after a wrap-around.
>
> You cannot assume it eliminates such code; the compiler is free
> to do whatever it wants in such a case.
>
> You should typically write such a computation using unsigned
> types, FWIW.
>
> Anyway, with 4.1 you shouldn't see frequent problems due to

Right, it gets much worse with the current gcc snapshots.

IMHO you should play such games with "g++ -O9", but that's
a discussion for a different mailing list.

> "not using -fwrapv while my code is broken WRT signed overflow"
> yet; and if/when problems start to happen, to "correct" action
> to take is not to add the compiler flag, but to fix the code.

Nope, unless we decide that the performance advantages of
a language change are worth the risk and pain.

Segher Boessenkool

unread,

Jan 4, 2007, 12:25:46 PM1/4/07

to Albert Cahalan

>> Lack of the flag does not break any valid C code, only code
>> making unwarranted assumptions (i.e., buggy code).
>
> Right, if "C" means "strictly conforming ISO C" to you.

Without any further qualification, it of course does, yes.

> (in which case, nearly all real-world code is broken)

Not "nearly all" -- but lots of code, yes.

> FYI, the kernel also assumes that a "char" is 8 bits.
> Maybe you should run away screaming.

No, that's fine with me. It's fine with GCC as well
of course.

>> Anyway, with 4.1 you shouldn't see frequent problems due to
>
> Right, it gets much worse with the current gcc snapshots.

Yes. And that problem will be fixed some way pretty soon --
simply because it _has_ to be fixed.

> IMHO you should play such games with "g++ -O9", but that's
> a discussion for a different mailing list.

For a different mailing list indeed; let me just point out
that for certain important quite common cases it's an ~50%
overall speedup.

>> "not using -fwrapv while my code is broken WRT signed overflow"
>> yet; and if/when problems start to happen, to "correct" action
>> to take is not to add the compiler flag, but to fix the code.
>
> Nope, unless we decide that the performance advantages of
> a language change are worth the risk and pain.

If the kernel breaks all over the place, of course you should add
the flag. But it won't, it would break *all* programs all over
the place then, and that wouldn't be acceptable to GCC. If instead
only a few kernel code bugs pop up, it's easy to fix.

Aaaaanyway -- my only real point was to point out that there's
no doomsday scenario here, yes current GCC TOT seems to regress
here (for some definition of that word), but GCC development
is in stage 1, that sort of thing happens. It'll stabilise
again.

In the meantime, building git HEAD kernels with GCC 4.1 and
4.2 will probably rattle out quite a few bugs still, both
in the kernel and in GCC -- neither is used all that often
it seems?

Segher

Linus Torvalds

unread,

Jan 4, 2007, 12:41:42 PM1/4/07

to Albert Cahalan

On Thu, 4 Jan 2007, Albert Cahalan wrote:

> On 1/4/07, Segher Boessenkool <seg...@kernel.crashing.org> wrote:
> >
> > Lack of the flag does not break any valid C code, only code
> > making unwarranted assumptions (i.e., buggy code).
>
> Right, if "C" means "strictly conforming ISO C" to you.
> (in which case, nearly all real-world code is broken)

Indeed. The gcc people seem to often think that "language lawyering" is a
good idea, and totally overrides "real world". The whole flap about the
completely idiotic things they do (or at least did) for alias analysis on
the grounds that "they can" is an example of this.

> FYI, the kernel also assumes that a "char" is 8 bits.
> Maybe you should run away screaming.

Gcc people are quick to condemn others for assumptions that breaks
standards, but it has tons of assumptions very deeply embedded itself. I
don't think it could realistically work very well on setups where pointers
aren't the same size as long, and it has various deep assumptions itself
about what is "realistic".

The kernel does the same. Some of it intentional and by design, much of it
probably totally unintentional, but the result of "it worked, and nobody
even thought about anything else".

With 7+ million lines of C code and headers, I'm not interested in
compilers that read the letter of the law. We don't want some really
clever code generation that gets us .5% on some unrealistic load. We want
good _solid_ code generation that does the obvious thing.

Compiler writers seem to seldom even realize this. A lot of commercial
code gets shipped with basically no optimizations at all (or with specific
optimizations turned off), because people want to ship what they debug and
work with.

I'll happily turn off compiler features that are "clever optimizations
that never actually matter in practice, but are just likely to possible
cause problems".

The sad part is that "straightforward optimizations" (as opposed to
"really clever ones") often work better in practice too. At least with
kernel code, which is not that high-level to begin with.

> > to take is not to add the compiler flag, but to fix the code.
>
> Nope, unless we decide that the performance advantages of
> a language change are worth the risk and pain.

Indeed. We'd happily fix the code if:
(a) it's reasonably easy to find places that are buggy.
(b) there are syntactically sane ways to fix it
(c) the optimization actually makes sense and is worthwhile

An example of where _none_ of these things were true was the old gcc alias
analysis. I think gcc eventually added a sane way to mark pointers as
being possible aliases (ie case (b): give a syntactially acceptable way
for code maintainability to actually fix things), but since neither (a)
nor (b) are there, the _correct_ solution was just to tell the compiler to
stop doing that.

With integer overflow optimizations, the same situation may be true. The
kernel has never been "strict ANSI C". We've always used C extensions. The
extension of "signed integer arithmetic follows 2's-complement-arithmetic"
is a perfectly sane extension to the language, and quite possibly worth
it.

And the fact that it's not "strict ANSI C" has absolutely _zero_
relevance.

Linus

Linus Torvalds

unread,

Jan 4, 2007, 12:51:00 PM1/4/07

to Segher Boessenkool

On Thu, 4 Jan 2007, Segher Boessenkool wrote:
>
> > (in which case, nearly all real-world code is broken)
>
> Not "nearly all" -- but lots of code, yes.

I wouldn't say "lots of code". I would say "all real projects".

NOBODY will guarantee you that they follow all standards to the letter.
Some use compiler extensions knowingly, but pretty much _everybody_ ends
up depending on subtle issues without even realizing it. It's almost
impossible to write a real program that has no bugs, and if they don't
show up in testing (because the compiler didn't generate buggy assembly
code from source code that had the _potential_ for bugs), they often won't
get fixed.

The kernel does things like compare pointers across objects, and the
kernel EXPECTS it to work. I seriously doubt that the kernel is even
unusual in this. The common way to avoid AB-BA deadlocks in any threaded
code (whether kernel or user space) is to just take two locks in a
specific order, and the common way to do that for locks of the same type
is simply to compare the addresses).

The fact that this is "undefined" behaviour matters not a _whit_. Not for
the kernel, and I bet not for a lot of other applications either.

So "nearly all" is probably _understating_ things rather than overstating
it as you claim. Anybody who thinks that they have proven the correctness
of their program is likely lying. It's a good thing if they have _tested_
all the code-paths, but they've invariably been tested with a compiler
that doesn't go out of its way to try to generate "legal but idiotic"
code. So the testing won't generally find cases where the compiler may
have been _allowed_ to do something else.

The end result: any nontrivial project always has dodgy code. Because
people simply don't write perfect code.

Compiler people who don't realize this aren't compiler people. They're
academics involved with mental masturbation.

Linus

Andreas Schwab

unread,

Jan 4, 2007, 1:09:19 PM1/4/07

to Albert Cahalan

"Albert Cahalan" <acah...@gmail.com> writes:

> FYI, the kernel also assumes that a "char" is 8 bits.
> Maybe you should run away screaming.

You are confusing "undefined" with "implementation defined". Those are
two quite different concepts.

Andreas.

--
Andreas Schwab, SuSE Labs, sch...@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Segher Boessenkool

unread,

Jan 4, 2007, 1:36:55 PM1/4/07

to Linus Torvalds

> I'll happily turn off compiler features that are "clever optimizations
> that never actually matter in practice, but are just likely to possible
> cause problems".

The "signed wrap is undefined" thing doesn't fit in this category
though:

-- It is an important optimisation for loops with a signed
induction variable;
-- "Random code" where it causes problems is typically buggy
already (i.e., code that doesn't take overflow into account
at all won't expect wraparound either);
-- Code that explicitly depends on signed overflow two's complement
wraparound can be trivially converted to use unsigned arithmetic
(and in almost all cases it really should have used that already).

If GCC can generate warnings for things in the second bullet point
(and it probably will, but nothing is finalised yet), I don't see
a reason for the kernel to turn off the optimisation. Why not try
it out and only _if_ it causes troubles (after the compiler version
is stable) turn it off.

to take is not to add the compiler flag, but to fix the code.
>>
>> Nope, unless we decide that the performance advantages of
>> a language change are worth the risk and pain.

But it's not a language change -- GCC has worked like this
for a _long_ time already, since May 2003 if I read the
ChangeLog correctly -- it's just that it starts to optimise
some things more aggressively now.

> With integer overflow optimizations, the same situation may be true.
> The
> kernel has never been "strict ANSI C". We've always used C extensions.
> The
> extension of "signed integer arithmetic follows
> 2's-complement-arithmetic"
> is a perfectly sane extension to the language, and quite possibly worth
> it.

Could be. Who knows, without testing. I'm just saying to
not add -fwrapv purely as a knee-jerk reaction.

> And the fact that it's not "strict ANSI C" has absolutely _zero_
> relevance.

I certainly never claimed so, that's all in Albert's mind it seems :-)

Segher

Segher Boessenkool

unread,

Jan 4, 2007, 1:56:50 PM1/4/07

to Linus Torvalds

>>> (in which case, nearly all real-world code is broken)
>>
>> Not "nearly all" -- but lots of code, yes.
>
> I wouldn't say "lots of code". I would say "all real projects".

All projects that tell the compiler they're written in ISO C,
while they're not, can easily break, sure. You can't say this
is GCC's fault; sure in some cases decisions were made that
resulted in more of those programs breaking than was really
necessary, but it's obviously *impossible* to prevent all
from breaking.

And yes it's true: most people do not program in ISO C at all,
_even if they think they do_, simply because they are not aware
of all the rules. For some of the areas where most of the
mistakes are made, for example aliasing rules and signed overflow,
GCC provides helpful options to switch behaviour to something
that makes those people's programs work. You can also use those
options if you have made a conscious decision that you want to
write your code in one of the resulting dialects of C.

Segher

p.s. If it's decided to not use -fwrapv, a debug option that
sets -ftrapv can be introduced -- it will make it a BUG() if
any (accidental) signed overflow happens after all.

Al Viro

unread,

Jan 4, 2007, 2:11:32 PM1/4/07

to Linus Torvalds

On Thu, Jan 04, 2007 at 09:47:01AM -0800, Linus Torvalds wrote:
> NOBODY will guarantee you that they follow all standards to the letter.
> Some use compiler extensions knowingly, but pretty much _everybody_ ends
> up depending on subtle issues without even realizing it. It's almost
> impossible to write a real program that has no bugs, and if they don't
> show up in testing (because the compiler didn't generate buggy assembly
> code from source code that had the _potential_ for bugs), they often won't
> get fixed.
>
> The kernel does things like compare pointers across objects, and the
> kernel EXPECTS it to work. I seriously doubt that the kernel is even
> unusual in this. The common way to avoid AB-BA deadlocks in any threaded
> code (whether kernel or user space) is to just take two locks in a
> specific order, and the common way to do that for locks of the same type
> is simply to compare the addresses).
>
> The fact that this is "undefined" behaviour matters not a _whit_. Not for
> the kernel, and I bet not for a lot of other applications either.

True, but we'd better understand what assumptions we are making. I have
seen patches seriously attempting to _subtract_ unrelated pointers. And
that simply doesn't work for obvious reasons...

Geert Bosch

unread,

Jan 4, 2007, 6:04:48 PM1/4/07

to Segher Boessenkool

On Jan 4, 2007, at 13:34, Segher Boessenkool wrote:

> The "signed wrap is undefined" thing doesn't fit in this category
> though:
>
> -- It is an important optimisation for loops with a signed
> induction variable;

It certainly isn't that important. Even SpecINT compiled with
-O3 and top-of-tree GCC *improves* 1% by adding -fwrapv.
If the compiler itself can rely on wrap-around semantics and
doesn't have to worry about introducing overflows between
optimization passes, it can reorder simple chains of additions.
This is more important for many real-world applications than
being able to perform some complex loop-interchange.
Compiler developers always make the mistake of overrating
their optimizations.

If GCC does really poorly on a few important loops that matter,
that issue is easily addressed. If GCC generates unreliable
code for millions of boring lines of important real-world C,
the compiler is worthless.

-Geert

Alistair John Strachan

unread,

Jan 5, 2007, 10:53:25 AM1/5/07

to Mikael Pettersson

On Wednesday 03 January 2007 02:20, Alistair John Strachan wrote:
> On Wednesday 03 January 2007 02:12, Mikael Pettersson wrote:

> > On Tue, 2 Jan 2007 17:43:00 -0800 (PST), Linus Torvalds wrote:
> > > > The suggestions I've had so far which I have not yet tried:
> > > >
> > > > - Select a different x86 CPU in the config.
> > > > - Unfortunately the C3-2 flags seem to simply tell GCC
> > > > to schedule for ppro (like i686) and enabled MMX and SSE
> > > > - Probably useless
> > >
> > > Actually, try this one. Try using something that doesn't like "cmov".
> > > Maybe the C3-2 simply has some internal cmov bugginess.
> >

> > That's a good suggestion. Earlier C3s didn't have cmov so it's
> > not entirely unlikely that cmov in C3-2 is broken in some cases.
> > Configuring for P5MMX or 486 should be good safe alternatives.
>
> Or just C3 (not C3-2), which is what I've done.
>
> I'll report back whether it crashes or not.

This didn't help. After about 14 hours, the machine crashed again.

cmov is not the culprit.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Linus Torvalds

unread,

Jan 5, 2007, 11:06:45 AM1/5/07

to Alistair John Strachan

On Fri, 5 Jan 2007, Alistair John Strachan wrote:
>
> This didn't help. After about 14 hours, the machine crashed again.
>
> cmov is not the culprit.

Ok. Have you ever tried to limit the drivers you have loaded? I notice you
had the prism54 wireless thing in your modules list and the vt1211 hw
monitoring thing. I'm wondering about the vt1211 thing - it probably isn't
too common. But if you can use that machine without the wireless too, it
might be good to try without either.

(The rest of your module list looked bog-standard, so if it's not
hardware-specific, I don't think it's there)

Turning of the VIA sound driver just in case would be good too.

The reason I mention vt1211 in particular is that it does things like
regulate fan activity etc. Is the problem perhaps heat-related?

Linus

Alistair John Strachan

unread,

Jan 5, 2007, 11:20:08 AM1/5/07

to Linus Torvalds

On Friday 05 January 2007 16:02, Linus Torvalds wrote:
> On Fri, 5 Jan 2007, Alistair John Strachan wrote:
> > This didn't help. After about 14 hours, the machine crashed again.
> >
> > cmov is not the culprit.
>
> Ok. Have you ever tried to limit the drivers you have loaded? I notice you
> had the prism54 wireless thing in your modules list and the vt1211 hw
> monitoring thing. I'm wondering about the vt1211 thing - it probably isn't
> too common.

Sure, and it only got added to 2.6.19 anyway (however GCC 3.4.6 really does
seem to have no problem with it).

> But if you can use that machine without the wireless too, it
> might be good to try without either.

Required, plus I've been running prism54 on three different machines with a
huge number of compilers since the early 2.6 days with no problems.

> (The rest of your module list looked bog-standard, so if it's not
> hardware-specific, I don't think it's there)

Agreed, the config is already _very_ minimal for this machine.

> Turning of the VIA sound driver just in case would be good too.

I'm not even really sure why that's enabled. I can do that.

> The reason I mention vt1211 in particular is that it does things like
> regulate fan activity etc. Is the problem perhaps heat-related?

It definitely isn't heat related. This CPU puts out 7-10W, has a ridiculous
5000 RPM fan on it (that works) and the temp never exceeds 40C. If anything,
the -O2, 3.4.6 kernel with CMOV ran the chip a little hotter.

As far as I can see, all the other components are either cool to touch or have
stupidly big heatsinks on them.

(I realise with problems like these it's almost always some sort of obscure
hardware problem, but I find that very difficult to believe when I can toggle
from 3 years of stability to 6-18 hours crashing by switching compiler. I've
also ran extensive stability test programs on the hardware with absolutely no
negative results.)

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Linus Torvalds

unread,

Jan 5, 2007, 11:57:46 AM1/5/07

to Alistair John Strachan

On Fri, 5 Jan 2007, Alistair John Strachan wrote:
>
> (I realise with problems like these it's almost always some sort of obscure
> hardware problem, but I find that very difficult to believe when I can toggle
> from 3 years of stability to 6-18 hours crashing by switching compiler. I've
> also ran extensive stability test programs on the hardware with absolutely no
> negative results.)

The thing is, I agree with you - it does seem to be compiler-related. But
at the same time, I'm almost positive that it's not in "pipe_poll()"
itself, because that function is just too simple, and looking at the
assembly code, I don't see how what you describe could happen in THAT
function.

HOWEVER.

I can easily see an NMI coming in, or another interrupt, or something, and
that one corrupting the stack under it because of a compiler bug (or a
kernel bug that just needs a specific compiler to trigger). For example,
we've had problems before with the compiler thinking it owns the stack
frame for an "asmlinkage" function, and us having no way to tell the
compiler to keep its hands off - so the compiler ended up touching
registers that were actually in the "save area" of the interrupt or system
call, and then returning with corrupted state.

Here's a stupid patch. It just adds more debugging to the oops message,
and shows all the code pointers it can find on the WHOLE stack.

It also makes the raw stack dumping print out as much of the stack
contents _under_ the stack pointer as it does above it too.

However, this patch is mostly useless if you have a separate stack for
IRQ's (since if that happens, any interrupt will be taken on a different
stack which we don't see any more), so you should NOT enable the 4KSTACKS
config option if you try this out.

I'm not sure how enlightening any of the output might be, but it is
probably worth trying.

Linus

---
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
index 0efad8a..2359eed 100644
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -243,6 +243,20 @@ void show_trace(struct task_struct *task, struct pt_regs *regs,
show_trace_log_lvl(task, regs, stack, "");
}

+static void show_all_stack_addresses(unsigned long *esp)
+{
+ struct thread_info *tinfo = (void *) ((unsigned long)esp & (~(THREAD_SIZE - 1)));
+ unsigned long *stack = (unsigned long *)(tinfo+1);
+
+ printk("All stack code pointers:\n");
+ while (valid_stack_ptr(tinfo, stack)) {
+ unsigned long addr = *stack++;
+ if (__kernel_text_address(addr))
+ print_symbol(" %s", addr);
+ }
+ printk("\n");
+}
+
static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
unsigned long *esp, char *log_lvl)
{
@@ -256,8 +270,10 @@ static void show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
esp = (unsigned long *)&esp;
}

+ show_all_stack_addresses(esp);
stack = esp;
- for(i = 0; i < kstack_depth_to_print; i++) {
+ stack -= kstack_depth_to_print;
+ for(i = 0; i < 2*kstack_depth_to_print; i++) {
if (kstack_end(stack))
break;
if (i && ((i % 8) == 0))

Pavel Machek

unread,

Jan 6, 2007, 2:31:44 AM1/6/07

to Segher Boessenkool

Hi!

> >IMHO you should play such games with "g++ -O9", but
> >that's
> >a discussion for a different mailing list.
>
> For a different mailing list indeed; let me just point
> out
> that for certain important quite common cases it's an
> ~50%
> overall speedup.

Hmm, what code was that? 'signed int does not wrap around' does not
seem to provide _that_ much info...
Pavel
--
Thanks for all the (sleeping) penguins.

Segher Boessenkool

unread,

Jan 6, 2007, 3:25:50 AM1/6/07

to Pavel Machek

>> For a different mailing list indeed; let me just point
>> out
>> that for certain important quite common cases it's an
>> ~50%
>> overall speedup.
>
> Hmm, what code was that? 'signed int does not wrap around' does not
> seem to provide _that_ much info...

One of the recent huge threads on the GCC dev list has a
post that says *some other* compiler gets a result like
this from this optimisation (I don't have a link to the
exact post and I don't remember the details; perhaps it
was XLC?)

Sorry if I wasn't clear enough and you understood I meant
that GCC exploits this optimisation opportunity well
enough for such nice results already.

- - -

So I searched for it anyway:

<http://gcc.gnu.org/ml/gcc/2006-12/msg00768.html>

It looks like the result for *integer* code wasn't *all*
that dramatic a difference. Anyway, it's obvious that
the optimisation can certainly give nice results and it
wouldn't be a good idea for the Linux kernel to dismiss
it without really evaluating the impact first; and anyway,
this is for some future date, GCC-4.2 isn't here yet.

Segher

Pavel Machek

unread,

Jan 6, 2007, 7:37:31 PM1/6/07

to Linus Torvalds

Hi!

stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere,
and stack overflows?

that hw monitoring thingie... I'd turn it off. Its interactions with
acpi are non-trivial and dangerous.

Pavel
--
Thanks for all the (sleeping) penguins.

Alistair John Strachan

unread,

Jan 6, 2007, 7:58:26 PM1/6/07

to Pavel Machek

On Sunday 07 January 2007 00:36, Pavel Machek wrote:
[snip]

> > However, this patch is mostly useless if you have a separate stack for
> > IRQ's (since if that happens, any interrupt will be taken on a different
> > stack which we don't see any more), so you should NOT enable the 4KSTACKS
> > config option if you try this out.
>
> stupid idea... perhaps gcc-4.1 generates bigger stackframe somewhere,
> and stack overflows?

The primary reason it's not 4KSTACKS already is that I run multiple XFS
partitions on top of an md RAID 1. LVM isn't involved, however, and I'm not
using any other filesystem overlays like dm.

I'm fairly sceptical that it's a stack overflow, but I'll be sure to enable
the debugging option on the next try.

> that hw monitoring thingie... I'd turn it off. Its interactions with
> acpi are non-trivial and dangerous.

Well, GCC 3.4 kernels seem to run fine with it, but as I said to Linus I'll be
sure to turn this and the sound drivers off in the next build.

--
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

Denis Vlasenko

unread,

Jan 6, 2007, 11:27:44 PM1/6/07

to Linus Torvalds

On Thursday 04 January 2007 18:37, Linus Torvalds wrote:
> With 7+ million lines of C code and headers, I'm not interested in
> compilers that read the letter of the law. We don't want some really
> clever code generation that gets us .5% on some unrealistic load. We want
> good _solid_ code generation that does the obvious thing.
>
> Compiler writers seem to seldom even realize this. A lot of commercial
> code gets shipped with basically no optimizations at all (or with specific
> optimizations turned off), because people want to ship what they debug and
> work with.

I'd say "care about obvious, safe optimizations which we still not do".
I want this:

char v[4];
..
memcmp(v, "abcd", 4) == 0

compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous:

LC0:
.string "abcd"
.text
..
pushl $4
pushl $.LC0
pushl $v
call memcmp
addl $12, %esp
testl %eax, %eax

There are tons of examples where you can improve code generation.
--
vda

Linus Torvalds

unread,

Jan 6, 2007, 11:49:51 PM1/6/07

to Denis Vlasenko

On Sun, 7 Jan 2007, Denis Vlasenko wrote:
>
> I'd say "care about obvious, safe optimizations which we still not do".
> I want this:
>
> char v[4];

> ...

> memcmp(v, "abcd", 4) == 0
>
> compile to single cmpl on i386.

Yeah. For a more relevant case, look at the hoops we used to jump through
to get "memcpy()" to generate ok code for trivial fixed-sized cases.

(That said, I think __builtin_memcpy() does a reasonable job these days
with gcc, and we might drop the crap one day when we can trust the
compiler to do ok. It didn't use to, and we continued using our
ridiculous macro/__builtin_constant_p misuses just because it works with
_all_ relevant gcc versions).

Linus

Jeff Garzik

unread,

Jan 7, 2007, 12:26:33 AM1/7/07

to Linus Torvalds

Linus Torvalds wrote:
> (That said, I think __builtin_memcpy() does a reasonable job these days
> with gcc, and we might drop the crap one day when we can trust the
> compiler to do ok. It didn't use to, and we continued using our
> ridiculous macro/__builtin_constant_p misuses just because it works with
> _all_ relevant gcc versions).

Yep, a ton of work by Roger Sayle, among others, really matured the gcc
str*/mem* builtins in the 4.x series. They are definitely worth another
look.

Jeff

Segher Boessenkool

unread,

Jan 7, 2007, 10:13:25 AM1/7/07

to Denis Vlasenko

> I want this:
>
> char v[4];

> ...

> memcmp(v, "abcd", 4) == 0
>
> compile to single cmpl on i386. This (gcc 4.1.1) is ridiculous:

> call memcmp

i686-linux-gcc (GCC) 4.2.0 20060410 (experimental)

movl $4, %ecx #, tmp65
cld
movl $v, %esi #, tmp63
movl $.LC0, %edi #, tmp64
repz
cmpsb
sete %al #, tmp68

Still not perfect, but better already. If you have any
specific examples that you'd like to have compiled to
better code, please report them in GCC bugzilla (with a
self-contained testcase, please).

Segher

Michael K. Edwards

unread,

Jan 26, 2007, 5:06:06 PM1/26/07

to Segher Boessenkool

ALSA + GCC 4.1.1 + -Os is known to be a bad combination on some
arches; see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27363 . (I
tripped over it on an ARM target, but my limited understanding of GCC
internals does not allow me to conclude that it is ARM-specific.) A
patch claiming to fix the bug was integrated into the 4.1 branch, but
my tests with a recent (20070115) gcc-4.1 snapshot indicate that it
has regressed again.

You might also check /proc/cpu/alignment; we have seen the alignment
fixup code trigger for alignment errors in both kernel and userspace.
The default appears to be to IGNORE alignment traps from userspace,
which results in bogus data and potentially a wacky series of system
calls, which could conceivably trigger an oops. I am told that echo 2
> /proc/cpu/alignment activates the kernel alignment fixup code, and
that 3 turns on some sort of logging in addition to the fixup (haven't
pursued this myself). No idea whether this is relevant to your CPU.

Cheers,
- Michael