Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

GCC -O0+flags != -O1

40 views
Skip to first unread message

Toby Douglass

unread,
Dec 5, 2009, 4:49:06 AM12/5/09
to
I have a test program for an algorithm.

It runs fine compiled -O0; I ran it overnight last night, no seg fault.

If I compile -O1 (or -O2) it seg faults after a few seconds.

I used -Q -help=optimizers to list the optimizations for -O0 and -O1; I
then compiled -O0 and explicitly added the list of extra optimizations
being turned on for -O1.

That build runs fine!

I get suspicious and examine the binary sizes...and it turns out they're
rather different.

-O0+switchs = 41,771
-O1 = 37,443

So I observe -O1 is doing extra stuff other than GCC tells me in its
list of active optimizations.

Anyone enlighten me?

Toby Douglass

unread,
Dec 5, 2009, 6:30:02 AM12/5/09
to
Additional information;

If I compile -O1, I crash.

If I compile -O0, I run.

If I compile -O0 + flags so I'm equal to -O1, I run.

If I compile -O1 - flags so I'm equal to -O0, I crash.

-O1 does something different to -O0 which is not represented by the
optimizer flags.

I note there are a large number of optimizer parameters (e.g. gcc -Q --
help=params). I wonder if the value of these params varies by -O value?
I'm trying to find a way to emit the current value of params, but I
can't find one.

Jan Seiffert

unread,
Dec 5, 2009, 8:12:19 AM12/5/09
to
Toby Douglass schrieb:

Yes, it is known that the O flags do more than what is listed.

If crashes even with -O1, than either your GCC is broken (version?
selfcompiled?), or your programm is violating some basic rules in a nasty way,
fix it (you know -Wall? Often a good hint what could be problematic).


Greetings
Jan

--
Miksch's Law:
If a string has one end, then it has another end.

Toby Douglass

unread,
Dec 5, 2009, 8:44:46 AM12/5/09
to
nomail@invalid wrote:
> Toby Douglass schrieb:

> Yes, it is known that the O flags do more than what is listed.

Ah.



> If crashes even with -O1, than either your GCC is broken (version?
> selfcompiled?),

Version 4.3.4; comes with Debian.

> or your programm is violating some basic rules in a nasty way,
> fix it (you know -Wall? Often a good hint what could be problematic).

-Wall is on; there are no warnings.

I've been looking at this problem basically non-stop all day for the
last three weekends and most evenings too.

The code works fine with -O2 with GCC 4.4.1 on Ubuntu on x86 and with
MSVC (debug and release builds) on x86 and x64.

The 4.3.4 compiler I have is on an ARM platform, which obviously is not
quite in the same league as the x86 target.

I think the next step is to try 4.4.1 on ARM.

Toby Douglass

unread,
Dec 5, 2009, 8:46:53 AM12/5/09
to
a...@b.com wrote:
> The code works fine with -O2 with GCC 4.4.1 on Ubuntu on x86 and with
> MSVC (debug and release builds) on x86 and x64.

Also works fine with GCC 4.1.0 on Fedora Core 8 on x86.

I've not tried it very recently, but it was working fine with GCC 4.1.0
on x64, too (not sure what Linux that was though offhand). I'll recheck
this later today.

Jan Seiffert

unread,
Dec 6, 2009, 9:46:53 AM12/6/09
to
Toby Douglass schrieb:

> a...@b.com wrote:
>> The code works fine with -O2 with GCC 4.4.1 on Ubuntu on x86 and with
>> MSVC (debug and release builds) on x86 and x64.
>
> Also works fine with GCC 4.1.0 on Fedora Core 8 on x86.
>

The x86 compiler you mentioned are distribution compiler, and the whole
distribution is build with them. With all these compiled packages, they can be
seen as "stable".


> I've not tried it very recently, but it was working fine with GCC 4.1.0
> on x64, too (not sure what Linux that was though offhand). I'll recheck
> this later today.
>

Hmmm, since all the working plattforms are x86, any chance your problem is ARM?
Dumb example: an unaligned access due to a casted pointer. Should be reported as
SIGBUS, but for example if you have an unaligned access violation in x86 SSE
code it gets reported as SIGSEGV.
Try -Wcast-align.


The arm cross compiler you mentioned, a Debian 4.3.4, i don't know. The version
sounds sane, probably some more already tested it. 4.3.4 is also the distro
compiler version stable for Gentoo.
So 4.3.4 sounds good.

Can you at least pinpoint where it crashes?
Is this an embedded target without OS or a "normal" environment?
How does i crash?

Greetings
Jan

--
programmer, n:
A red eyed, mumbling mammal capable of conversing with inanimate
monsters.

Toby Douglass

unread,
Dec 7, 2009, 3:10:29 AM12/7/09
to
nomail@invalid wrote:
> Hmmm, since all the working plattforms are x86, any chance your
problem is ARM?

Turns out - surprise, surprise - the problem is me.

I incorrectly specificed volatile in my pointer declerations, e.g.

I wrote

volatile void *p;

rather than

void volatile *p;

On x86, where the CPU isn't very aggressive, the code runs okay; on ARM,
where there is heavy re-ordering, the CPU trips up.

This of course explains why playing with the optimizer switches made no
difference. With optimization on, volatile is looked at.

Jan Seiffert

unread,
Dec 7, 2009, 6:00:17 AM12/7/09
to
Toby Douglass schrieb:

> nomail@invalid wrote:
>> Hmmm, since all the working plattforms are x86, any chance your
> problem is ARM?
>
> Turns out - surprise, surprise - the problem is me.
>
> I incorrectly specificed volatile in my pointer declerations, e.g.
>
> I wrote
>
> volatile void *p;
>
> rather than
>
> void volatile *p;
>

Uhhhh, i think i have to "grep -R volatile ." my source.
Can you help my rusty brain here a little:
You wanted the pointer itself to be volatile, not the pointee?

> On x86, where the CPU isn't very aggressive, the code runs okay; on ARM,
> where there is heavy re-ordering, the CPU trips up.
>

Hmmm, are you sure it's the hardware that played tricks on you? Or only the
compiler reodering things (because it has more regs, a different heuristic, or
whatever on another arch.) a little, so a potential problem was uncovered?

If it really means you are fighting with memory ordering, you have a problem.
Such is not guarantied by C and not obeyed by the compiler, no, not even with
volatile. In other words: tricky lock less stuff which rely on volatile are ...
volatile.

Do you known of the "great" Linux Kernel Mailing List "volatile is evil" discussion?
Volatile is for the compiler, it gives you some semantics, but who knows what,
it puts your compiler on a leash, and that often helps on the surface, but does
not help you any with the real hazard architectures.
The only reason, it doesn't haunt us any day is the effect of working with
mostly strong ordered architecures.
Still, to make your software really portabel you have to obay such things, and
thats where hardware-"slab on the back"s come into play, memory barriers in the
form of some special instruction.

There are a deal of rmb() and wmb() and such things where it matters in the
kernel. On architectures with strong memory ordering they mostly compile to
nothing, but on weakly ordered Systems, they generate the "right" instruction.

> This of course explains why playing with the optimizer switches made no
> difference. With optimization on, volatile is looked at.

Greetings
Jan

--
Shift happens.
-- Doppler

Toby Douglass

unread,
Dec 7, 2009, 8:07:45 AM12/7/09
to
nomail@invalid wrote:
> Toby Douglass schrieb:

> > I wrote


> >
> > volatile void *p;
> >
> > rather than
> >
> > void volatile *p;

> Uhhhh, i think i have to "grep -R volatile ." my source.
> Can you help my rusty brain here a little:
> You wanted the pointer itself to be volatile, not the pointee?

Correct. Lock-free data structures. The pointee is usually a structure
of some kind, with a pointer or two in and some user data. It's not
volatile! but the pointers, they *are* volatile, they're getting
swapped by atomic compare-and-swap all the time.



> > On x86, where the CPU isn't very aggressive, the code runs okay; on ARM,
> > where there is heavy re-ordering, the CPU trips up.

> Hmmm, are you sure it's the hardware that played tricks on you? Or
only the
> compiler reodering things (because it has more regs, a different heuristic, or
> whatever on another arch.) a little, so a potential problem was uncovered?

Combined, I'm certain. The compiler only does some things, the CPU only
does some things. However, when you look at the table of what types of
re-ordering are supported by different CPUs, x86/x64 is WAY behind
everything else.



> If it really means you are fighting with memory ordering, you have a problem.
> Such is not guarantied by C and not obeyed by the compiler, no, not even with
> volatile. In other words: tricky lock less stuff which rely on volatile are ...
> volatile.

The problem is in the compiler optimization. The CPU side memory
ordering is properly handled, by memory barriers. However, the compiler
was getting confused about what might or might not have changed
underneath it (e.g. volatile mattered), since the data observed by and
one thread was often being changed by threads on other cores.



> Do you known of the "great" Linux Kernel Mailing List "volatile is evil" discussion?

I've bumped into it :-) I kind of agree, but the fact is, volatile as
it is is necessary; you need it to tell the compiler that stuff in this
thread on this core can change without warning, so don't optimize based
on assumptions it'll be unchanged.

> There are a deal of rmb() and wmb() and such things where it matters
in the
> kernel. On architectures with strong memory ordering they mostly compile to
> nothing, but on weakly ordered Systems, they generate the "right" instruction.

Yes, but they're kernel constructs; not available in user-mode. You can
use __sync_synchronize(), but as I found out, it silently emits nothing
on ARM...

Jan Seiffert

unread,
Dec 7, 2009, 10:25:30 AM12/7/09
to
Toby Douglass schrieb:

> nomail@invalid wrote:
>> Toby Douglass schrieb:
>
>>> I wrote
>>>
>>> volatile void *p;
>>>
>>> rather than
>>>
>>> void volatile *p;
>
>> Uhhhh, i think i have to "grep -R volatile ." my source.
>> Can you help my rusty brain here a little:
>> You wanted the pointer itself to be volatile, not the pointee?
>
> Correct. Lock-free data structures. The pointee is usually a structure
> of some kind, with a pointer or two in and some user data. It's not
> volatile! but the pointers, they *are* volatile, they're getting
> swapped by atomic compare-and-swap all the time.
>

Have some of that myself...

>>> On x86, where the CPU isn't very aggressive, the code runs okay; on ARM,
>>> where there is heavy re-ordering, the CPU trips up.
>
>> Hmmm, are you sure it's the hardware that played tricks on you? Or
> only the
>> compiler reodering things (because it has more regs, a different heuristic, or
>> whatever on another arch.) a little, so a potential problem was uncovered?
>
> Combined, I'm certain. The compiler only does some things, the CPU only
> does some things. However, when you look at the table of what types of
> re-ordering are supported by different CPUs, x86/x64 is WAY behind
> everything else.
>

No, x86 is the only sane arch here, really, even if i don't like the rest of x86
that much.
(Note: SPARC for example defines several settings for mem ordering, but most
impl. follow strict, again, in software you have to assume the worst, luckily
the membars are nops in such a case...)

I thought of that for some time. The promisses of weak ordering sound good, and
logical. Less dependencies, more performance, etc.

But on the other hand, it neglets a very simple fact:
Micromanaging basic things come at a very high cost when done in software.

You maybe know the joke:
How many hardware engineers it takes to change a light bulb?
- None, we will fix it in software.

After the first haha, think again how do you emulate a non existing/broken light
bulb in software...

Software tends to build APIs, to make things manageble, for small things (a copy
routine), for large things (decode and display this media file for me).
You can see the CPU ISA as an api here.

But every API and abstration layer is leaky.

For large things this is OK, the relation between overhead of abstraction and
work done is good. For small things it gets problematic, for single
"instructions"/bus cycles etc. it gets a PITA.

Not all x86 CPUs have a conditinal move. Outsourcing this to a function to
abstract that makes it unusable.
Early ALPHA can not do bytewise access, wrapping every bytewise acces into a lib...
The SPARC popcount instruction is dog slow, better not use it, the soon to be
produces Fujitsu UltraSPARC IIIifx (or how it will be called) promises it will
finally be in hardware...

To solve this you better switch the implementation at a larger abstartion point.
Or use self modifying code, but you know which pain comes with that (at least
today). Yeah, there are times where coding in the Linux kernel space is more fun
than in userspace, they already gone through this, build abstractions, for a lot
of archs...

Putting this mem ordering micromanagement into software maybe makes your Chip
simpler (transistor count, embedded), makes you look great on paper ("we are
o-so scalable with this weak ordering"), but in practise these instructions will
be dump (flush _all_ cache, things like that), and everytime a software
programmer has to use them, he will throw out your advantage with the bath
water, esp. since he will use them "always" to be on the safe side, he does not
know in what context he is and don't want to manage it (overhead,
maintainability, bug count).
And thats a shame, because much off that could be done *much better* in hardware
("do i need to flush?", a question you can not really answer from software, "No
- no overhead") with only a "few transistors"...

I can go on, but maybe you see the point:
Don't f**** up your API, esp. when it's put into sillicon.

[snip]


>> Do you known of the "great" Linux Kernel Mailing List "volatile is evil" discussion?
>
> I've bumped into it :-) I kind of agree, but the fact is, volatile as
> it is is necessary; you need it to tell the compiler that stuff in this
> thread on this core can change without warning, so don't optimize based
> on assumptions it'll be unchanged.
>

yes, sure, but a rmb() or something like that should also give the compiler the
slab on the back of the head in such a case. Look for example for the
barrier()-thingy, mostly it is to stop the compiler beeing overly clever, like:

/* some global thing, changed by other thread */
int status = 1;
...
/* spin till ready */
while(status)
barrier(); /* or cpu_relax(), a compiler barrier and
signal to the cpu we spin, if supported */

But the mechanisms for this are very gcc specific, what you can do and force gcc
to do.

The volatile is only more convenient in most situations (and only option on
other compiler).

>> There are a deal of rmb() and wmb() and such things where it matters
> in the
>> kernel. On architectures with strong memory ordering they mostly compile to
>> nothing, but on weakly ordered Systems, they generate the "right" instruction.
>
> Yes, but they're kernel constructs; not available in user-mode. You can
> use __sync_synchronize(), but as I found out, it silently emits nothing
> on ARM...
>

And another dumb thing in the API, those instructions (dmb, etc) are privileged?
*sigh*
Like making a read of the processor status word a privileged instruction when it
is the only source to know on what CPU version you are to decide which code to
use (and no, there is no incriminating info in it if you can not write it)...
It's often the small things that are a real show stopper: building a great
palace and then they forget to install a door knob.

But if my memory serves me right with these ARM things, the compiler should make
a call to the runtime system which calls the kernel to do this for you.
If your compiler does not do this he is either in the wrong mode (wrong CPU as
-mcpu?) or droped the support when he was compiled because he could not find the
support in the runtime system.

Greetings
Jan

--
Klingon function calls do not have 'parameters', they have 'arguments' -
and they always win them.
[Nele Abels in dsg]

Alan Curry

unread,
Dec 7, 2009, 5:27:50 PM12/7/09
to
In article <MPG.25871c89...@text.giganews.com>,

Toby Douglass <a...@b.com> wrote:
>nomail@invalid wrote:
>> Toby Douglass schrieb:
>
>> > I wrote
>> >
>> > volatile void *p;
>> >
>> > rather than
>> >
>> > void volatile *p;
>
>> Uhhhh, i think i have to "grep -R volatile ." my source.
>> Can you help my rusty brain here a little:
>> You wanted the pointer itself to be volatile, not the pointee?
>
>Correct.

Then you better make it void *volatile p; and report a bug on any compiler
that didn't treat the first 2 as identical to each other.

--
Alan Curry

Toby Douglass

unread,
Dec 7, 2009, 5:42:46 PM12/7/09
to
pac...@kosh.dhis.org wrote:
> Toby Douglass <a...@b.com> wrote:
> >nomail@invalid wrote:
> >> Toby Douglass schrieb:
> >
> >> > I wrote
> >> >
> >> > volatile void *p;
> >> >
> >> > rather than
> >> >
> >> > void volatile *p;

> Then you better make it void *volatile p; and report a bug on any

compiler
> that didn't treat the first 2 as identical to each other.

lols

Yes, typo - bad one too!

0 new messages