weirdness with compiling a 2.6.33 kernel on arm debian

dave b

unread,

Mar 5, 2010, 11:10:02 PM3/5/10

to

Hi have now successfully built a 2.6.33 kernel on a linkstation pro
v2. This is an arm device. It is currently running debian lenny
armel.

I compiled (make) zImage, then did a make modules which failed on the
first two rounds of compiling the modules -

"fs/afs/super.c: In function ‘afs_test_super’:
fs/afs/super.c:278: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.3/README.Bugs> for instructions."
This was the error encountered on the attempt at compiling the
modules.

"crypto/gcm.c: In function ‘crypto_gcm_setauthsize’:
crypto/gcm.c:152: internal compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.3/README.Bugs> for instructions.
make[1]: *** [crypto/gcm.o] Error 1"
This was the error the on the second attempt at compiling the modules.

The 3rd attempt at building the modules was successful...

The device boots and runs fine with this kernel and modules appear to work.
[root@nas ~]# uname -a
Linux nas 2.6.33 #1 Fri Mar 5 23:54:51 EST 2010 armv5tel GNU/Linux

*SO* is this a gcc bug or is it related to the changes to the build
process on arm?

gcc -v
Using built-in specs.
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Debian
4.3.2-1.1' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3
--enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
--enable-mpfr --disable-libssp --disable-sjlj-exceptions
--enable-checking=release --build=arm-linux-gnueabi
--host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.3.2 (Debian 4.3.2-1.1)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Mikael Pettersson

unread,

Mar 6, 2010, 4:10:02 AM3/6/10

to

GCC bug. Report it to Debian, just like it asked you to.

In theory it could be flaky hardware or a kernel/CPU combination
with cache coherency issues, but in those cases I'd have expected
many more failures.

The ARM kernel mailing list is linux-ar...@lists.infradead.org.

dave b

unread,

Mar 6, 2010, 5:30:02 AM3/6/10

to

I had already reported it to debian -
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572653

I have cc'ed linux-arm-kernel into this email.

On 6 March 2010 20:03, Mikael Pettersson <mi...@it.uu.se> wrote:
> dave b writes:
> > Hi have now successfully built a 2.6.33 kernel on a linkstation pro

> > v2. This is an arm device. It is currently running debianÂ lenny
> > armel.
> >
> >
> > I compiledÂ (make) zImage, then did a make modules which failed on the

> > first two rounds of compiling the modules -
> >

> > "fs/afs/super.c: In function â€˜afs_test_superâ€™:

> > fs/afs/super.c:278: internal compiler error: Segmentation fault
> > Please submit a full bug report,
> > with preprocessed source if appropriate.
> > See <file:///usr/share/doc/gcc-4.3/README.Bugs> for instructions."
> > This was the error encountered on the attempt at compiling the
> > modules.
> >

> > "crypto/gcm.c: In function â€˜crypto_gcm_setauthsizeâ€™:

Daniel Mack

unread,

Mar 6, 2010, 5:50:02 AM3/6/10

to

On Sat, Mar 06, 2010 at 09:24:49PM +1100, dave b wrote:
> On 6 March 2010 20:03, Mikael Pettersson <mi...@it.uu.se> wrote:
> > dave b writes:
> > > Hi have now successfully built a 2.6.33 kernel on a linkstation pro
> > > v2. This is an arm device. It is currently running debianÂ lenny
> > > armel.
> > >
> > >
> > > I compiledÂ (make) zImage, then did a make modules which failed on the
> > > first two rounds of compiling the modules -
> > >
> > > "fs/afs/super.c: In function â€˜afs_test_superâ€™:
> > > fs/afs/super.c:278: internal compiler error: Segmentation fault
> > > Please submit a full bug report,
> > > with preprocessed source if appropriate.
> > > See <file:///usr/share/doc/gcc-4.3/README.Bugs> for instructions."
> > > This was the error encountered on the attempt at compiling the
> > > modules.
> > >
> > > "crypto/gcm.c: In function â€˜crypto_gcm_setauthsizeâ€™:
> > > crypto/gcm.c:152: internal compiler error: Segmentation fault
> > > Please submit a full bug report,
> > > with preprocessed source if appropriate.
> > > See <file:///usr/share/doc/gcc-4.3/README.Bugs> for instructions.
> > > make[1]: *** [crypto/gcm.o] Error 1"
> > > This was the error the on the second attempt at compiling the modules.
> > >
> > > The 3rd attempt at building the modules was successful...

Whenever I had comparable problems, it was _always_ faulty RAM on my
local machine, and I'm very sure you're seeing a similar. _If_ gcc
crashes, it will always do that for the same input.

Daniel

dave b

unread,

Mar 6, 2010, 8:10:02 PM3/6/10

to

Ok... however how should one test the memory of an arm machine? ...
memtest is only for x86. *I am referring to the kernel memtest and not
memtest86.

Martin Guy

unread,

Mar 7, 2010, 6:10:03 AM3/7/10

to

On 3/7/10, dave b <db.pu...@gmail.com> wrote:
> Ok... however how should one test the memory of an arm machine? ...
> memtest is only for x86. *I am referring to the kernel memtest and not
> memtest86.

Well, there's a userspace one: follow the link from
http://www.arm.linux.org.uk/developer/stresstests.php

M

Uwe Kleine-König

unread,

Mar 8, 2010, 5:00:03 AM3/8/10

to

On Sun, Mar 07, 2010 at 12:05:21PM +1100, dave b wrote:
> Ok... however how should one test the memory of an arm machine? ...
> memtest is only for x86. *I am referring to the kernel memtest and not
> memtest86.

The easiest is: rerun make and check if it fails at exactly the same
place.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K�nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

Daniel Mack

unread,

Mar 8, 2010, 5:40:01 AM3/8/10

to

On Mon, Mar 08, 2010 at 10:53:37AM +0100, Uwe Kleine-K�nig wrote:
> On Sun, Mar 07, 2010 at 12:05:21PM +1100, dave b wrote:
> > Ok... however how should one test the memory of an arm machine? ...
> > memtest is only for x86. *I am referring to the kernel memtest and not
> > memtest86.
> The easiest is: rerun make and check if it fails at exactly the same
> place.

Hmm, I wonder whether this is in any way related to what Pavel and Cyril
reported in the 'bit error' thread.

Dave, does your bootloader have any memory test built-in? Do you see the
same issues with any older kernel?

FWIW, we're currently hunting a strange bug with hanging tasks, which
only seems to affect systems with Wifi enabled. That might be totally
unrelated to both of these issues though.

Daniel

dave b

unread,

Mar 11, 2010, 8:20:02 AM3/11/10

to

2010/3/8 Daniel Mack <dan...@caiaq.de>:

> On Mon, Mar 08, 2010 at 10:53:37AM +0100, Uwe Kleine-König wrote:
>> On Sun, Mar 07, 2010 at 12:05:21PM +1100, dave b wrote:
>> > Ok... however how should one test the memory of an arm machine? ...
>> > memtest is only for x86. *I am referring to the kernel memtest and not
>> > memtest86.
>> The easiest is: rerun make and check if it fails at exactly the same
>> place.
>
> Hmm, I wonder whether this is in any way related to what Pavel and Cyril
> reported in the 'bit error' thread.
>
> Dave, does your bootloader have any memory test built-in? Do you see the
> same issues with any older kernel?
>
> FWIW, we're currently hunting a strange bug with hanging tasks, which
> only seems to affect systems with Wifi enabled. That might be totally
> unrelated to both of these issues though.
>
> Daniel
>

U-boot apparently has a very simple memory checker, this doesn't help
me as I don't have serial access. I have now re-compiled the 2.6.33
kernel whilst the device has been on the 2.6.33 kernel 4 times now
*without* an *error*. I also ran memtester for a while using most of
the memory on the device, without invoking the oom killer.

I will re-run these tests on the 2.6.32.7 kernel soon.

Russell King - ARM Linux

unread,

Mar 11, 2010, 8:40:02 AM3/11/10

to

On Sat, Mar 06, 2010 at 09:24:49PM +1100, dave b wrote:

> I had already reported it to debian -
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572653
>
> I have cc'ed linux-arm-kernel into this email.

I think most of the points have already been convered, but just for
completeness,

What is the history of the hardware you're running these builds on?
Has it proven itself on previous kernel versions running much the same
tests?

Another point to consider: how are you running the compiler - is it
over NFS from a PC?

The reason I ask is that you can suffer from very weird corruption
issues - I have a nice illustration of one which takes a copy of 1GB
worth of data each day, and every once in a while, bytes 8-15 of a
naturally aligned 16 byte block in the data become corrupted somewhere
between the network and disk. The probability of corruption happening
is around 0.0000001%, but it still happens... and that makes it
extremely difficult to track down.

Jamie Lokier

unread,

Mar 14, 2010, 9:10:02 PM3/14/10

to

Russell King - ARM Linux wrote:
> On Sat, Mar 06, 2010 at 09:24:49PM +1100, dave b wrote:
> > I had already reported it to debian -
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572653
> >
> > I have cc'ed linux-arm-kernel into this email.
>
> I think most of the points have already been convered, but just for
> completeness,
>
> What is the history of the hardware you're running these builds on?
> Has it proven itself on previous kernel versions running much the same
> tests?
>
> Another point to consider: how are you running the compiler - is it
> over NFS from a PC?
>
> The reason I ask is that you can suffer from very weird corruption
> issues - I have a nice illustration of one which takes a copy of 1GB
> worth of data each day, and every once in a while, bytes 8-15 of a
> naturally aligned 16 byte block in the data become corrupted somewhere
> between the network and disk. The probability of corruption happening
> is around 0.0000001%, but it still happens... and that makes it
> extremely difficult to track down.

We had annoying corruption in some totally different hardware a few
years ago, but not quite as rare as that.

It was only on ext3 filesystems, not vfat as was supplied with the
SDK. It turned out that the chip's IDE driver started DMA like this:

1. Write to DMA address and count registers.
2. Flush D-cache.
3. Write start DMA command to DMA controller.

We found step 1 preloaded the first 128 bytes into a chip FIFO
(undocumented of course), although the DMA didn't start until step 3.
Swapping steps 1 and 2 fixed it.

The chip supplier hadn't encountered corruption because the code path
from vfat down always had the side effect of flushing those cachelines.

With the cache handling complexity that some ARMs now seem to require,
I wonder if you're seeing a similarly missed cache flush? Adding
cache flushes at strategic points throughout the kernel was very
helpful in narrowing down the one we saw.

-- Jamie

dave b

unread,

Mar 26, 2010, 6:20:01 AM3/26/10

to

I believe that the 2.6.32.7 kernel I compiled and was using on the
device while compiling the 2.6.33 kernel had *issues* (although most
likely not kernel related). In particular various issues (apt-get not
working) including on one piece of hardware using the kernel binary as
produced by others.

Thank you.

Pavel Machek

unread,

Apr 1, 2010, 2:10:02 AM4/1/10

to

On Fri 2010-03-26 21:12:34, dave b wrote:
> I believe that the 2.6.32.7 kernel I compiled and was using on the
> device while compiling the 2.6.33 kernel had *issues* (although most
> likely not kernel related). In particular various issues (apt-get not
> working) including on one piece of hardware using the kernel binary as
> produced by others.

Interesting. I remember apt-get failing recently; I thought my
databases went corrupt, but then it started to work magically...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html