Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sleazy intel compiler trick (SOURCE ATTACHED)

257 views
Skip to first unread message

iccOut

unread,
Feb 9, 2004, 5:38:39 PM2/9/04
to
As part of my study of Operating Systems and embedded systems, one of
the things I've been looking at is compilers. I'm interested in
analyzing how different compilers optimize code for different
platforms. As part of this comparison, I was looking at the Intel
Compiler and how it optimizes code. The Intel Compilers have a free
evaluation download from here:
http://www.intel.com/products/software/index.htm?iid=Corporate+Header_prod_softwr&#compilers
 

One of the things that the version 8.0 of the Intel compiler included
was an "Intel-specific" flag. According to the documentation, binaries
compiled with this flag would only run on Intel processors and would
include Intel-specific optimizations to make them run faster. The
documentation was unfortunately lacking in explaining what these
optimizations were, so I decided to do some investigating. 

First I wanted to pick a primarily CPU-bound test to run, so I chose
SPEC CPU2000. The test system was a P4 3.2G Extreme Edition with 1 gig
of ram running WIndows XP Pro. First I compiled and ran spec with the
"generic x86 flag" (-QxW), which compiles code to run on any x86
processor. After running the generic version, I recompiled and ran
spec with the "Intel-specific flag" (-QxN) to see what kind of
difference that would make. For most benchmarks, there was not very
much change, but for 181.mcf, there was a win of almost 22% !

Curious as to what sort of optimizations the compiler was doing to
allow the Intel-specific version to run 22% faster, I tried running
the same binary on my friend's computer. His computer, the second test
machine, was an AMD FX51, also with 1 gig of ram, running Windows XP
Pro. First I ran the "generic x86" binaries on the FX51, and then
tried to run the "Intel-only" binaries. The Intel-specific ones
printed out an error message saying that the processor was not
supported and exited.  This wasn't very helpful, was it true that only
Intel processors could take advantage of this performance boost?

I started mucking around with a dissassembly of the Intel-specific
binary and found one particular call (proc_init_N) that appeared to be
performing this check. As far as I can tell, this call is supposed to
verify that the CPU supports SSE and SSE2 and it checks the CPUID to
ensure that its an Intel processor. I wrote a quick utility which I
call iccOut, to go through a binary that has been compiled with this
Intel-only flag and remove that check.

Once I ran the binary that was compiled with the Intel-specific flag
(-QxN) through iccOut, it was able to run on the FX51. Much to my
surprise, it ran fine and did not miscompare. On top of that, it got
the same 22% performance boost that I saw on the Pentium4 with an
actual Intel processor. This is very interesting to me, since it
appears that in fact no Intel-specific optimization has been done if
the AMD processor is also capable to taking advantage of these same
optimizations. If I'm missing something, I'd love for someone to point
it out for me. From the way it looks right now, it appears that Intel
is simply "cheating" to make their processors look better against
competitor's processors.

Links:
Intel Compiler:http://www.intel.com/products/software/index.htm?iid=Corporate+Header_prod_softwr&#compilers
 

Here is the text:

/*
* iccOut 1.0
*
* This program enables programs compiled with the intel compiler
using the
* -xN flag to run on non-intel processors. This can sometimes result
in
* large performance increases, depending on the application. Note
that even
* though the check will be removed, the CPU running the application
*MUST*
* support both SSE and SSE2 or the program will crash.
*
*/

#include <stdio.h>
#include <string.h>


// x86 codes

#define X86_CALL 232 // E8 in hex
#define PUSH_EAX 80 // 50 in hex
#define X86_NOP 144 // 90 in hex

bool handleCall( unsigned char theBuffer[7], FILE* inputBinary, FILE*
fixedBinary );

//convienently, the check always seems to be one of the first calls in
//the file. this makes it easier to find.
void printUsage() {
printf("Usage:\n");
printf("iccOut filename\n\n");
printf("Filename is the name of the file to fix.\n\n");
}


//returns whether code was replaced
bool processNextCall( FILE* inputBinary, FILE* fixedBinary ) {

int lenRead;
int startIndex, bytesNeeded;
unsigned char addressBuffer[4];
unsigned char checkBuffer[2];
unsigned char fullBuffer[7];
unsigned char tempChar;
bool codeReplaced;
bool otherReplaced;

otherReplaced = false;

//fixme: error checking for reads
lenRead = fread( &addressBuffer, 1, 4, inputBinary );
lenRead = fread( &checkBuffer, 1, 2, inputBinary );

fullBuffer[0] = X86_CALL;
for( int i=1; i<5;i++ ) {
fullBuffer[i] = addressBuffer[i-1];
}
fullBuffer[5] = checkBuffer[0];
fullBuffer[6] = checkBuffer[1];

codeReplaced = handleCall( fullBuffer, inputBinary, fixedBinary );

if ( ! codeReplaced ) {

//if either of the last 2 bytes were a call, we need to keep doing
this
//until we run out of calls
while ( ( fullBuffer[5] == X86_CALL ) || ( fullBuffer[6] == X86_CALL
) ) {

if ( fullBuffer[5] != X86_CALL ) { //write it and ignore it
tempChar = fullBuffer[5];
fwrite( &tempChar, 1, 1, fixedBinary );
fullBuffer[0] = fullBuffer[6];
bytesNeeded = 6;
startIndex = 1;
} else {
fullBuffer[0] = fullBuffer[5];
fullBuffer[1] = fullBuffer[6];
bytesNeeded = 5;
startIndex = 2;
}

for( int i=0; i < bytesNeeded; i++ ) {
fread( &tempChar, 1, 1, inputBinary );
fullBuffer[startIndex+i] = tempChar;
}

otherReplaced = otherReplaced || handleCall( fullBuffer,
inputBinary, fixedBinary );
}
}
return ( codeReplaced || otherReplaced );
}

//returns whether code was replaced
bool handleCall( unsigned char theBuffer[7], FILE* inputBinary, FILE*
fixedBinary ) {

bool replacedCode;
unsigned char tempChar;

replacedCode = false;

//check if its what we're looking for (one of the first calls
followed by 2 push eax's)
if ( ( theBuffer[5] == PUSH_EAX ) && ( theBuffer[6] == PUSH_EAX ) ){
printf("Located call to subroutine to check intel support!\n");
printf("Substituting code ...\n");

//replace the call with nops
replacedCode = true;
for ( int i=0; i<5;i++ ) {
theBuffer[i] = X86_NOP;
}
}

if ( replacedCode || ( ( theBuffer[5] != X86_CALL ) && ( theBuffer[6]
!= X86_CALL ) )) {
//write out the two as they were
for ( int j=0; j<7;j++ ) {
tempChar = theBuffer[j];
fwrite( &tempChar, 1, 1, fixedBinary );
}
} else {
//don't write last 2 bytes
for( int i=0; i < 5; i++ ) {
tempChar = theBuffer[i];
fwrite( &tempChar, 1, 1, fixedBinary );
}
}
return replacedCode;
}

void fixIntelBinary( char *filename ) {

FILE *inputBinary;
FILE *fixedBinary;
unsigned char theChar;
bool editedCall;
bool skipWrite;
int lenRead;

printf("iccOut is currently fixing binary: %s\n\n", filename );

editedCall = false;
skipWrite = false;

//open files for reading and writing
inputBinary = fopen( filename, "rb" );
fixedBinary = fopen( strcat( filename, ".fixed" ), "wb" );

if ( ! inputBinary ) {
printf("Error opening input binary.\n");
return;
}

if ( ! fixedBinary ) {
printf("Error opening output file.\n");
return;
}

//start reading until we find what we want
fread( &theChar, 1, 1, inputBinary );
while (1) {
if ( !skipWrite ) {
//write last values
fwrite( &theChar, 1, 1, fixedBinary );
}
skipWrite = false;

//read next
lenRead = fread( &theChar, 1, 1, inputBinary );
if ( lenRead == 0) { //at end of file
break;
}

if ( ! editedCall ) {
//check if its the call XXX
if ( theChar == X86_CALL ) {
editedCall = processNextCall( inputBinary, fixedBinary );
skipWrite = true;

}
}
}

printf("iccOut has saved the day!\n");

//close files when finished
fclose( inputBinary );
fclose( fixedBinary );
}

bool fileExists( char *filename ) {

FILE *temp;
bool ret = false;

temp = fopen( filename, "r" );

if ( temp != 0 ) {
ret = true;
fclose( temp );
}
return ret;
}

int main( int argc, char **argv ) {

printf("\nWelcome to iccOut!\n\n");
printf("This will enable binaries compiled with -xN to run on
non-intel machines\n\n");

//verify parameters
if ( argc < 2 ) {
printUsage();
return 0;
}

//make sure file exists
if ( ! fileExists( argv[1] ) ) {
printf("File does not exist or is not accessible: %s\n", argv[1] );
return 0;
}

fixIntelBinary( argv[1] );
return 0;
}

Jeff

unread,
Feb 10, 2004, 1:21:00 AM2/10/04
to
I will be the first person to admit that Intel is evil, I have spent a
year co-oping with them, and I know first hand how things are done
there. While this may seem somewhat sleezy, that is only half of it.
The other side of Intel is the side that likes everything to be
perfect. Odds are, a major reason for the Intel only part is that
Intel does not want to put their reputation on the line that code will
run better on an AMD chip that has not yet been released. Intel tests
everything, over and over again, and if something doesn't work right,
they fix it before they release it. Intel doesn't have that control
over AMD processors, and one of the optimizations might not work on an
AMD, which would make Intel look bad. Keep in mind, Intel isn't
likely to pass up a chance to make themselves look better than AMD,
but Intel also likes to ensure that their products work as well as
possible, especially after some of the times that they have been
burned.

iccou...@yahoo.com (iccOut) wrote in message news:<a13e403a.04020...@posting.google.com>...

Grumble

unread,
Feb 10, 2004, 4:56:52 AM2/10/04
to
iccOut wrote:

> #define X86_CALL 232 // E8 in hex
> #define PUSH_EAX 80 // 50 in hex
> #define X86_NOP 144 // 90 in hex

I'm just wondering: if these three values make more sense to you in
hexadecimal than in decimal, then why not use hexadecimal notation?

#define X86_CALL 0xE8
#define PUSH_EAX 0x50
#define X86_NOP 0x90

Bernd Paysan

unread,
Feb 10, 2004, 6:10:50 AM2/10/04
to
Jeff wrote:

> I will be the first person to admit that Intel is evil, I have spent a
> year co-oping with them, and I know first hand how things are done
> there. While this may seem somewhat sleezy, that is only half of it.
> The other side of Intel is the side that likes everything to be
> perfect. Odds are, a major reason for the Intel only part is that
> Intel does not want to put their reputation on the line that code will
> run better on an AMD chip that has not yet been released. Intel tests
> everything, over and over again, and if something doesn't work right,
> they fix it before they release it. Intel doesn't have that control
> over AMD processors, and one of the optimizations might not work on an
> AMD, which would make Intel look bad. Keep in mind, Intel isn't
> likely to pass up a chance to make themselves look better than AMD,
> but Intel also likes to ensure that their products work as well as
> possible, especially after some of the times that they have been
> burned.

Last c't (3/2004) also reported that the -Qx[PBN] switches generate a check
for the precise processor, but run fine when the CPUID test is patched out.
I do agree that Intel can't control AMD's chips, but this sort of test is
dangerous. Remember Microsoft, who did put a test for MS-DOS into Windows
3.1, to make sure that it won't run under DR-DOS? They had to pay 300
millions to Caldera (who bought DR-DOS to litigate).

IMHO, it's ok to check for features (like SSE2), and stop if the used
features are not available, and it's perhaps ok to print a warning if the
program runs on a CPU it's not optimized for, i.e. if you say -QxP,
anything that's not a Prescott should trigger that warning. It's not ok to
check if it runs on a competing product, and refuse to work there. Not for
someone who has a "monopoly" (>70% market share).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Peter Dickerson

unread,
Feb 10, 2004, 7:30:11 AM2/10/04
to
"Bernd Paysan" <bernd....@gmx.de> wrote in message
news:qqqmf1-...@miriam.mikron.de...

Perhaps the features that Intel are checking for are SSE2 and full Intel
compatibility. Perhaps the way to find out is to wait for the next release
to see if the test for Intelness is much harder to identify and patch out,
or removed because AMD have been validated. I know which my money is on.

> anything that's not a Prescott should trigger that warning. It's not ok to
> check if it runs on a competing product, and refuse to work there. Not for
> someone who has a "monopoly" (>70% market share).
>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"
> http://www.jwdt.com/~paysan/

--
Peter
Peter.Dickerson (at) ukonline (dot) co (dot) uk


Jan de Vos

unread,
Feb 10, 2004, 8:37:48 AM2/10/04
to
In comp.arch, Bernd Paysan wrote:
> IMHO, it's ok to check for features (like SSE2), and stop if the used
> features are not available, and it's perhaps ok to print a warning if the
> program runs on a CPU it's not optimized for, i.e. if you say -QxP,
> anything that's not a Prescott should trigger that warning. It's not ok to
> check if it runs on a competing product, and refuse to work there. Not for
> someone who has a "monopoly" (>70% market share).

Intel doesn't have a monopoly on compilers.


jdv

Igor Levicki

unread,
Feb 10, 2004, 11:28:18 AM2/10/04
to
@iccOut:

First off, you could patch the function that does the check instead of
patching each call to it. That says a lot about your programming and
reverse engineering skills and logic.

Second, what is so sleazy about it? Why would they allow AMD to get
optimized code for their "advanced 8th generation" architecture for
free? They invested considerable amount of time and money into
optimization research and the development of their compiler. On the
other side when Pentium 4 came out people spat on it because "it
needed optimizations to run fast" and liked Athlon because it was
faster without optimizations.

@Jeff:

What is so evil in protecting your own investment?

@Bernd

# it's perhaps ok to print a warning if the
# program runs on a CPU it's not optimized for

If the program is compiled for Prescott and run on Pentium 4 and it
uses PNI (or SSE3 if you like that name better) then the program would
crash as soon as it encounters Prescott instruction.

# It's not ok to check if it runs on a competing
# product, and refuse to work there

Why not??? It is _Intel_ compiler for God sake!!! Why should it
produce code for AMD or any other CPU for that matter at all? If you
buy Intel compiler you should not expect it to work for other CPUs
unless they are 100% Intel compatible (e.g. they paid a license fee
for instruction set).

@everyone:
If you want compiler for AMD CPUs then go and ask AMD to make one. I
think that it is fair enough from Intel to allow generation of Pentium
3 and Pentium 4 code (SSE and SSE2) that works on Athlon XP and Athlon
64 CPUs. There is standalone compiler that supports both Intel and AMD
-- Codeplay VectorC so check it out. You have a choice not to use
Intel Compiler and Intel has the _right_ not to support competing
products.

Bernd Paysan

unread,
Feb 10, 2004, 12:06:39 PM2/10/04
to
Igor Levicki wrote:
> Why not??? It is _Intel_ compiler for God sake!!! Why should it
> produce code for AMD or any other CPU for that matter at all?

It does produce code for AMD or other x86-compatible CPUs. It just inserts
code that uses cpuid to check if this is actually an Intel CPU, and refuses
to run on other CPUs, *despite* it can run there without any problems!

This is not a matter of "support". Printing out a warning "This code is
running under a CPU which it is not optimized for" is perfectly ok, and
when the application runs slow or even produces wrong results: you have
been warned.

Why is the compiler trick a question? Intel doesn't have a monopoly on
compilers. This is ok for the open source world, where you can always use
another compiler if the result of some specific compiler isn't what you
want. This is not ok for the closed source world, where you have to use the
binary compiled with the compiler of choice from the ISV. The ISV may be
ignorant in one way or the other (he isn't aware of the problem/he doesn't
care about competing products to Intel).

Do we want Intel cloners to provide user-writable results to the cpuid
instruction? No. We want to use cpuid to check which CPU our program runs
on, we don't want anybody to fake it for any reason. A compiled program
that booboos at the user when it doesn't see "GenuineIntel" is such a
reason.

> If you
> buy Intel compiler you should not expect it to work for other CPUs
> unless they are 100% Intel compatible (e.g. they paid a license fee
> for instruction set).

Actually, AMD "paid" the license fee, i.e. they have a complete
cross-license agreement on x86 and extensions. AMD can use Intel's
instruction set, and Intel can (and will, according to recent news) use
AMD's instruction set. Does this make all your previous arguments moot?

And why is it so difficult to understand "fair play"? Why can't Intel just
produce better chips so that their code runs faster on their own chips, and
slower on competing chips without any dirty tricks?

BTW: AMD does support compiler development. They don't build their own
compiler, they just support compiler developers like the GCC team or the
Portland Group. The results look promising. I hope that everybody can use
those compilers on Intel processors when they finally release their CT
chips. On the other hand, I think it would be fair (in a tit-for-tat kind
of fairness) if those compilers would all emit cpuid code checking for
"AuthenticAMD", to force Intel to fake the result of cpuid in 64 bit mode,
too.

hack

unread,
Feb 10, 2004, 12:09:10 PM2/10/04
to
In article <qqqmf1-...@miriam.mikron.de>,
Bernd Paysan <bernd....@gmx.de> wrote:

>IMHO, it's ok to check for features (like SSE2), and stop if the used
>features are not available, and it's perhaps ok to print a warning if the
>program runs on a CPU it's not optimized for, i.e. if you say -QxP,
>anything that's not a Prescott should trigger that warning. It's not ok to
>check if it runs on a competing product, and refuse to work there. Not for
>someone who has a "monopoly" (>70% market share).

[The context of the original question was slightly different: an optimisation
that appeared to give the same substantial benefit on both Intel and AMD chips
for a certain benchmark, but was controlled by an Intel-only check.]

Suppose that Intel can prove (from its detailed knowledge of the internals
of its own processors) that the optimisation is valid in all cases, but that,
based only on the public ISA specs, certain cases might arise where it would
be invalid. In that case doing the optimisation when it *might* fail would
be wrong. So it didn't fail in this case with a non-Intel processor, but
that's not evidence that it could not produce the wrong result in another
case.

Whether it is ethical or legal to take such advantage of "insider" knowledge
is a different question. But should one concede this point, would you rather
have the flag speed up some code at the risk of producing the wrong result on
a non-Intel processor? And what should be the ethical and legal position on
THAT?

Michel.

Christoph Breitkopf

unread,
Feb 10, 2004, 12:25:00 PM2/10/04
to
Bernd Paysan <bernd....@gmx.de> writes:

> And why is it so difficult to understand "fair play"? Why can't Intel just
> produce better chips so that their code runs faster on their own chips, and
> slower on competing chips without any dirty tricks?

Even ignoring fair play, it might be good business sense to check
features instead of GenuineIntel. After all, even AMD used the Intel
compiler for their SPEC submissions, and for lots of code, it
is still the best optimizing compiler for the Athlon. Checking
for a GenuineIntel CPU devalues the compiler for people using,
or developing for, AMD systems.

OTOH, making money on compiler sales is probably not of
any importance to intel.

Regards,
Chris

CorpZ

unread,
Feb 10, 2004, 1:05:33 PM2/10/04
to
if that was to only change to the code to make it optimized for intel,
why wouldn't it work? All AMD 64-bit Cpu's can use SSE2(XP's could
only use SSE)

Jason Watkins

unread,
Feb 10, 2004, 1:25:04 PM2/10/04
to
Just how optimized is -QxW? Is it "generic x86" as in 386 compatable,
or is "generic 686"?

While I think the cpuid check is not a good thing, or at the least,
should be controlled by yet another compiler switch, these results
don't necessarily mean intel just purely cheating. You may not be
seeing all the optimiations the intel specific mode enables in action,
and your 22% may be purely 386 vs 686 code differences. I supppose
it's also possible that the intel specific mode does have some
optimization that causes potencial problems on hardware besides the
cpuid's they check for.

Zak

unread,
Feb 10, 2004, 1:27:20 PM2/10/04
to
Christoph Breitkopf wrote:

> Even ignoring fair play, it might be good business sense to check
> features instead of GenuineIntel. After all, even AMD used the Intel
> compiler for their SPEC submissions, and for lots of code, it
> is still the best optimizing compiler for the Athlon. Checking
> for a GenuineIntel CPU devalues the compiler for people using,
> or developing for, AMD systems.
>
> OTOH, making money on compiler sales is probably not of
> any importance to intel.

But this check prevents AMD from using the optimization flags in SPEC
and similar benchmarks. Which may be all what matters here.

Or would it be allowed for AMD to come up with 'AMD CompilerShell' as a
compiler, which calls icc and does the required patching afterwards?


Thomas

Rupert Pigott

unread,
Feb 10, 2004, 1:31:13 PM2/10/04
to
"hack" <ha...@watson.ibm.com> wrote in message
news:c0b37m$gr4$1...@news.btv.ibm.com...

> In article <qqqmf1-...@miriam.mikron.de>,
> Bernd Paysan <bernd....@gmx.de> wrote:
>
> >IMHO, it's ok to check for features (like SSE2), and stop if the used
> >features are not available, and it's perhaps ok to print a warning if the
> >program runs on a CPU it's not optimized for, i.e. if you say -QxP,
> >anything that's not a Prescott should trigger that warning. It's not ok
to
> >check if it runs on a competing product, and refuse to work there. Not
for
> >someone who has a "monopoly" (>70% market share).
>
> [The context of the original question was slightly different: an
optimisation
> that appeared to give the same substantial benefit on both Intel and AMD
chips
> for a certain benchmark, but was controlled by an Intel-only check.]
>
> Suppose that Intel can prove (from its detailed knowledge of the internals
> of its own processors) that the optimisation is valid in all cases, but
that,
> based only on the public ISA specs, certain cases might arise where it
would
> be invalid. In that case doing the optimisation when it *might* fail
would

They could just say "These options may cause code to fail on
non Intel(r)(tm) processors" in the blurb. Hell, even have the
compiler issue a warning to that effect perhaps. Silently
generating break on execute type strikes me as *thoroughly*
broken regardless of the moral aspects.

Let's say that you don't test on all of the possible variations
of x86 out there (highly likely), and you get a call from a
user of your code saying "It won't run [because of the silent
code insertion]" ... I think I'd be *extremely* pissed off by
that kind of call, it could be a bastard to fix as well, even
if you do just ditch ICC and ship a binary compiled by a compiler
that doesn't pull stunts like that. This relates to the code-path
thing Nick has with IA-64.

It's exactly this kind of market protection/ass covering that
drives people towards Open Source and in my view makes it a
*necessity* for applications you really care about.

Cheers,
Rupert


Stephen Sprunk

unread,
Feb 10, 2004, 1:32:52 PM2/10/04
to
"hack" <ha...@watson.ibm.com> wrote in message
news:c0b37m$gr4$1...@news.btv.ibm.com...
> Suppose that Intel can prove (from its detailed knowledge of the internals
> of its own processors) that the optimisation is valid in all cases, but
that,
> based only on the public ISA specs, certain cases might arise where it
would
> be invalid. In that case doing the optimisation when it *might* fail
would
> be wrong. So it didn't fail in this case with a non-Intel processor, but
> that's not evidence that it could not produce the wrong result in another
> case.
>
> Whether it is ethical or legal to take such advantage of "insider"
knowledge
> is a different question. But should one concede this point, would you
rather
> have the flag speed up some code at the risk of producing the wrong result
on
> a non-Intel processor? And what should be the ethical and legal position
on
> THAT?

If there are ambiguities in the SSE2 spec that disallow certain
optimizations on legal implementations (and nobody has shown or even claimed
this is the case), the ethical thing to do is revise the extension
definition and create a new CPUID flag for compliant implementations.
Simply assuming that no other vendor can implement SSE2 with the same
guarantees as Intel is downright sleazy and smacks of marketing involvement
rather than technical reasons.

At a minimum, there should be a flag to at least _allow_ -QxN code to run on
non-Intel chips so that software vendors can test other processors and make
the decision themselves. The ideal solution is for Intel to add flags to
optimize for non-Intel chips, or at least allow -QxN to work on non-Intel
chips they _have_ validated (if there's a true technical problem), but I
think it's safe to count that out in the near future.

I believe Intel's compiler folks truly want to produce the best compiler
possible for _all_ x86 chips because that's what would get their particular
division the most revenue and acclaim. If gcc's performance exceeded icc's
on non-Intel chips by using the optimizations in question, I think we'd find
icc suddenly allowing the optimization on non-Intel chips as well. However,
since the FSF places a higher priority on gcc's freedom and portability than
on raw performance, I don't know if/when that day may come.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


Robert Klute

unread,
Feb 10, 2004, 1:44:38 PM2/10/04
to
On 10 Feb 2004 10:25:04 -0800, jason_...@pobox.com (Jason Watkins)
wrote:

>Just how optimized is -QxW? Is it "generic x86" as in 386 compatable,
>or is "generic 686"?

One question to ask is if the compiler automatically inserts the check
when -QxW is used, or only when 'Intel'-specific optimizations are
inserted.

Hank Oredson

unread,
Feb 10, 2004, 2:01:27 PM2/10/04
to

"Zak" <sp...@jutezak.invalid> wrote in message
news:co9Wb.3678$O41.96116@amstwist00...

> Christoph Breitkopf wrote:
>
> > Even ignoring fair play, it might be good business sense to check
> > features instead of GenuineIntel. After all, even AMD used the Intel
> > compiler for their SPEC submissions, and for lots of code, it
> > is still the best optimizing compiler for the Athlon. Checking
> > for a GenuineIntel CPU devalues the compiler for people using,
> > or developing for, AMD systems.
> >
> > OTOH, making money on compiler sales is probably not of
> > any importance to intel.
>
> But this check prevents AMD from using the optimization flags in SPEC
> and similar benchmarks. Which may be all what matters here.

Huh what?
AMD is free to use any compiler they choose.
If they choose a compiler created by a competitor, that is their lookout.

> Or would it be allowed for AMD to come up with 'AMD CompilerShell' as a
> compiler, which calls icc and does the required patching afterwards?

Perhaps this is simply too obvious?
There is nothing I can think of that stops AMD from creating
(or paying some other company to create) their own compiler.

--

... Hank

Hank: http://horedson.home.att.net
W0RLI: http://w0rli.home.att.net


Agrabob

unread,
Feb 10, 2004, 3:45:36 PM2/10/04
to
Bernd Paysan <bernd....@gmx.de> wrote in message news:<qqqmf1-...@miriam.mikron.de>...
> Jeff wrote:
>
> > I will be the first person to admit that Intel is evil, I have spent a
> > year co-oping with them, and I know first hand how things are done
> > there. While this may seem somewhat sleezy, that is only half of it.
> > The other side of Intel is the side that likes everything to be
> > perfect. Odds are, a major reason for the Intel only part is that
> > Intel does not want to put their reputation on the line that code will
> > run better on an AMD chip that has not yet been released. Intel tests
> > everything, over and over again, and if something doesn't work right,
> > they fix it before they release it. Intel doesn't have that control
> > over AMD processors, and one of the optimizations might not work on an
> > AMD, which would make Intel look bad. Keep in mind, Intel isn't
> > likely to pass up a chance to make themselves look better than AMD,
> > but Intel also likes to ensure that their products work as well as
> > possible, especially after some of the times that they have been
> > burned.
>


Just looked over this:
http://www.intel.com/software/products/compilers/cwin/sysreq.htm

Heres a snippet:
"
Minimum Hardware Requirements to Develop IA-32 Applications
A system based on a 450 MHz Intel® Pentium® II processor or greater,
Intel Pentium 4 recommended

...

Minimum Hardware Requirements to Develop Itanium®-based Applications
on an IA-32 System
A system with a 450 MHz Intel® Pentium® II processor or greater
(Pentium 4 recommended)

...

Minimum Hardware Requirements to Develop Itanium-based Applications on
an Itanium-based System
A system with an Intel Itanium processor or greater (Itanium 2
recommended)
"
(end of quote)

Obviously I did omit parts to shorten the message(sorry about the
lenght), but it is also obvious that Intel makes no mention that this
product will work on an AMD proc and explicitly requires that you have
an Intel cpu that falls under one of these three categories.

Maybe, I am missing something as far as legality goes, but it would
seem to me that if Intel doesn't claim that an AMD cpu will work with
the compiler, they have nothing to worry about.

They are most likely doing one or both of two things here:
Making sure no "Intel Optimized" code will run on an AMD cpu for,
1) Compatibility. If a developer released an app that contained Intel
optimized code, but fails to run correctly on an AMD cpu and tey have
sold millions of copies of it: they're screwed. And they are going to
blame Intel. If you have to "hack" the compiler(iccOut utility) to get
the code to work, then Intel doesn't have to take any blame when it
blows up on you(if it ever will).
2) Competition. If they can get away with not compiling SIMD
instructions for competeting CPUs then all comercial apps with Intel
optimizations will make Intel CPUs more appealing to the consumer.

Someone analyze my thinking on this. On the surface it seems like I
have caught all the legal angles, but I feel I am missing something(im
not a lawyer people :P).

iccOut

unread,
Feb 10, 2004, 6:00:11 PM2/10/04
to
@ hack:

I think that the particular optimization they are doing, at least for
this benchmark, does not involve any special trickery with the way
instructions work. It doesn't even rely on MMX/SSE/SSE2/SSE3, this
particular mcf optimization appears to be soley re-arranging fields in
a struct, which is clearly not intel-specific and any processor (intel
or amd) should be able to take advantage of this. It wouldn't surprise
me to learn of more cases like this one where it appears Intel is
trying to handicap AMD's performance on SPEC. It's possible that there
are programs that, when compiled with the -QxN flag, will generate
code that will not work on AMD processors but I've yet to encounter
one.

@ igor levicki:

Patching the routine that does the check is another alternative.
However, there is only ever one call to proc_init_N and thus only one
call to patch anyway. Simply removing the call is easier than going
through the routine that does checking since x86 lets you have
instructions of crazy lengths and you have to be careful to keep all
the offsets and lengths the same.

@ Christian Brietkopf:

You're correct, AMD does use the Intel compiler for SPEC submissions
and while it does do a fair amount of optimization, there are cases
such as this one where completely general optimizations will only
occur with the -QxN flag even though they're clearly not
intel-specific.

@ Robert Klute:

The compiler will not insert these checks when compiled with -QxW.
However, it will also not perform anywhere near the same level of
optimization as -QxN. Running the -QxW binaries vs the -QxN binaries
on an AMD machine shows a 22% performance difference, which is not
insignificant.

Benjamin Goldsteen

unread,
Feb 10, 2004, 8:33:38 PM2/10/04
to
"Stephen Sprunk" <ste...@sprunk.org> wrote in message

> At a minimum, there should be a flag to at least _allow_ -QxN code to run on
> non-Intel chips so that software vendors can test other processors and make
> the decision themselves. The ideal solution is for Intel to add flags to
> optimize for non-Intel chips, or at least allow -QxN to work on non-Intel
> chips they _have_ validated (if there's a true technical problem), but I
> think it's safe to count that out in the near future.

Intel gives away their compiler for free to certain populations (e.g.
.edu). Why should Intel spend money to develop software that will be
given away to people who plan to use the software on non-Intel
processors?

Isn't this the same as the GPL license prohibiting the use of GPL'd
software components in non-GPL licensed software? GPL people are
always concerned that someone will make use of their IP for profit
without giving anything back to the GPL community. Similarly, Intel
doesn't care much for its IP being used by a competitor.

I don't think it is out-of-line for Intel to make their compiler a)
only compile on Intel processors and b) generate executables that only
run on Intel (or IA32-licensed) processors at high optimization. It
would be difficult to depend on the compiler if the generated code
didn't run on non-Intel processors at any optimization. One could
never use it to distribute binaries. Or maybe Intel could charge $500
for the Intel-only compiler and $1500 for the any-PC compiler.
Whether or not such restrictions are part of a good long-term strategy
is a different question. However, if you don't like it, you can
always use GNU C or the Portland Group compilers.

P.S.This e-mail address is not active. Do not reply directly to
sender.

Stephen Sprunk

unread,
Feb 10, 2004, 8:26:30 PM2/10/04
to
"Agrabob" <mtl...@sbcglobal.net> wrote in message
news:d53ddc33.04021...@posting.google.com...

> Maybe, I am missing something as far as legality goes, but it would
> seem to me that if Intel doesn't claim that an AMD cpu will work with
> the compiler, they have nothing to worry about.
> ...

> Someone analyze my thinking on this. On the surface it seems like I
> have caught all the legal angles, but I feel I am missing something(im
> not a lawyer people :P).

Well, I'm not a lawyer either, but I can't see anything _illegal_ about
Intel's behavior here. Excluding unlikely (IMHO) anti-trust considerations,
Intel is free to sell whatever products they want with whatever features
they want, and it's industry practice to disclaim that the product will do
even what they claim it will do.

The main complaint, from me and others, is that it's unethical and/or sleazy
(but quite legal) to disable valid SSE2 optimizations simply because the
generated code happens to be running on a competitor's CPU. Does icc
generate MMX and SSE1 code that runs on non-Intel CPUs? If so, their
behavior is not only sleazy, it's not even self-consistent. If not, we're a
few years late in flaming them -- but it's still sleazy.

Stephen Sprunk

unread,
Feb 10, 2004, 8:58:12 PM2/10/04
to
"Hank Oredson" <hore...@att.net> wrote in message
news:bU9Wb.3322$hR.1...@bgtnsc05-news.ops.worldnet.att.net...

> "Zak" <sp...@jutezak.invalid> wrote in message
> news:co9Wb.3678$O41.96116@amstwist00...
> > Or would it be allowed for AMD to come up with 'AMD CompilerShell'
> > as a compiler, which calls icc and does the required patching
afterwards?
>
> Perhaps this is simply too obvious?

If it's not standard practice yet, it probably will be soon. If I can patch
an opcode or two in my binaries and get a 22% speed bump, what reasons do I
have _not_ to do it? In fact, it might even be worth the larger hassle of
patching icc to not emit the check in the first place. It's not like
software warranties are worth the bits they're printed on...

> There is nothing I can think of that stops AMD from creating
> (or paying some other company to create) their own compiler.

AMD funds a lot of work on gcc and the Portland Group's compiler, but
neither of those is competitive performance-wise with icc at this point or
AMD would be using one of them for SPEC. IIRC, Intel funds work on gcc
also, even though icc (almost?) always produces better code.

Stephen Sprunk

unread,
Feb 10, 2004, 8:58:13 PM2/10/04
to
"iccOut" <iccou...@yahoo.com> wrote in message
news:a13e403a.04021...@posting.google.com...

> I think that the particular optimization they are doing, at least for
> this benchmark, does not involve any special trickery with the way
> instructions work. It doesn't even rely on MMX/SSE/SSE2/SSE3, this
> particular mcf optimization appears to be soley re-arranging fields in
> a struct, which is clearly not intel-specific and any processor (intel
> or amd) should be able to take advantage of this.

Well, aside from the obvious fact that icc -QxN is breaking the C spec by
rearranging the contents of a struct, it sounds like we should be yelling at
SPEC to improve their source code instead of berating Intel for sleazy
behavior.

> Patching the routine that does the check is another alternative.
> However, there is only ever one call to proc_init_N and thus only one
> call to patch anyway. Simply removing the call is easier than going
> through the routine that does checking since x86 lets you have
> instructions of crazy lengths and you have to be careful to keep all
> the offsets and lengths the same.

Couldn't you just alter the entry point of proc_init_N to clean up its stack
and return immediately? The remainder of the function would be dead code so
you don't have to worry about maintaining proper x86 decoding.

Ivan

unread,
Feb 10, 2004, 9:09:31 PM2/10/04
to
I have played with the Intel compiler 8.0 and AMD
CPU's (just XP and MP, not FX-51) on Linux myself.
Because the AMD XP/MP does not support sse2,
I used the -xK switch (optimize for PIII or later)
and the executables run typically 10-15% faster.
On my P4 there is absolutely no difference between
-xK and -xN.
Unfortunately, one gets sometimes segmentation faults in
vectorized loops that contain calls to log => one has
to prevent vectorization of such loops if the code
is to be able to run on Athlons.
In addition, Fortran I/O fails
sometimes (especially when rewinding open files) =>
one has to compile such functions separately without
the -xK switch.

It is interesting that, once the above problems are solved,
the Intel compiler optimizes
better for the Athlons than for a P4 (well, at least
when compared to the GNU compiler). In one of my applications,
the code runs about 10% faster on an XP 1800+ compared
to a 2 GHz P4 when compiled with GCC, but 25% faster when
compiled with Intel 8.0! On an absolute scale the
Intel executables are ~15% faster than GCC 3.3.1
but only 3-4% faster when compared to GCC 3.4

Seongbae Park

unread,
Feb 11, 2004, 3:48:05 AM2/11/04
to
Stephen Sprunk wrote:
...

> Well, aside from the obvious fact that icc -QxN is breaking the C spec by
> rearranging the contents of a struct,

Simply rearranging struct fields doesn't violate C standard.
As long as the user code can not tell the difference,
it's standard conforming.
I don't know whether Intel's doing it correctly or not
in this particular case though.

> it sounds like we should be yelling at
> SPEC to improve their source code instead of berating Intel for sleazy
> behavior.

I wonder what's your rationale of this yelling.
How many C/C++ programmers do you know who pays
any attention to the order of struct/class fields
for the purpose of improving performance ?
I don't know any, except a few compiler writers and performance analysts.

Also, rearranging struct fields often reduces the readability of the code
and the optimal arrangement is often dependent
on the particular machine features such as cache line size.
So doing it manually is not always desirable.

If the compiler can do it properly, I'd say it's a good thing.
Of course, it's not trivial to do properly.

Seongbae

Ketil Malde

unread,
Feb 11, 2004, 4:18:02 AM2/11/04
to
b...@inka.mssm.edu (Benjamin Goldsteen) writes:

> Intel gives away their compiler for free to certain populations (e.g.
> .edu). Why should Intel spend money to develop software that will be
> given away to people who plan to use the software on non-Intel
> processors?

I've no opinion on why they give it away. But to me it looks like
Intel has the best compiler out there, but they would really like to
have the fastest processor instead. So they try to hamper the use of
their compiler on competing processors to make it harder to take
full advantage of them.

> Isn't this the same as the GPL license prohibiting the use of GPL'd
> software components in non-GPL licensed software?

I don't think so. You can run GPL software together with proprietary
software. You have something similar in the Linux kernel, where
loading a proprietary (binary only) module will "taint" the kernel.
However, while it will print a warning and possibly make it harder to
get support, it won't stop you from running it.

-kzm
--
If I haven't seen further, it is by standing in the footprints of giants

Jan C. Vorbrüggen

unread,
Feb 11, 2004, 5:28:26 AM2/11/04
to
> Or would it be allowed for AMD to come up with 'AMD CompilerShell' as a
> compiler, which calls icc and does the required patching afterwards?

If I remember the run rules correctly, you are allowed to cross-compile
- i.e., run the build phase on an Intel system and run the benchmark on
an AMD system.

Jan

Jan C. Vorbrüggen

unread,
Feb 11, 2004, 5:31:06 AM2/11/04
to
> Also, rearranging struct fields often reduces the readability of the code
> and the optimal arrangement is often dependent
> on the particular machine features such as cache line size.

I can't remember a system where following the general rule "Start with
the largest (in bytes/words) items and end with the smallest ones" wouldn't
fit well. Exceptions would be compilers that do not properly align the
struct starting address - but those are broken (at least in the sense of
quality-of-implementation) anyway.

Jan

Benny Amorsen

unread,
Feb 11, 2004, 6:17:58 AM2/11/04
to
>>>>> "BG" == Benjamin Goldsteen <b...@inka.mssm.edu> writes:

BG> Isn't this the same as the GPL license prohibiting the use of
BG> GPL'd software components in non-GPL licensed software?

There is no such prohibition. You cannot distribute the combined work
of course, but it is perfectly legal to /use/ GPL'd software
components any way you want.


/Benny

Ken Hagan

unread,
Feb 11, 2004, 6:24:22 AM2/11/04
to
Jan C. Vorbrüggen wrote:
>
> I can't remember a system where following the general rule "Start
> with the largest (in bytes/words) items and end with the smallest
> ones" wouldn't fit well.

Different systems may typedef things to different sizes.

Still, more of a problem in practice is the fact that programmers
don't normally trawl through headers mentally expanding typedefs.

'Tis a pity C has no way of saying "I don't care how this struct
is laid out. I'm not going to bang on it using memcpy().".


Ricardo Bugalho

unread,
Feb 11, 2004, 7:07:35 AM2/11/04
to

Depends on your definition of component.
But if a given software depends on a GPL'd software to run (for example,
a library) then that software must also be GPL. That's why lots of
libraries have LGPL or BSD licences.

--
Ricardo

Bernd Paysan

unread,
Feb 11, 2004, 8:14:21 AM2/11/04
to
Stephen Sprunk wrote:
> AMD funds a lot of work on gcc and the Portland Group's compiler, but
> neither of those is competitive performance-wise with icc at this point or
> AMD would be using one of them for SPEC.

The Portland Group Fortran compiler is competitive to icc. According to c't
3/2004, SPECfp2000 base under Linux is 1169 with PGI 5.1/GCC 3.3.2, and
1159 with icc V8 on an Athlon 64 3400+. GCC is faster than PGI on C, so
that's used there. Looking at the SPECint2000 base values also shows almost
competitive performance from GCC 3.3.2: 1164 vs. 1240 from icc; GCC takes
advantage of the 64 bit mode. Note that on Pentium 4 3.2GHz, icc runs
significantly faster (more than 10%): 1127 vs. 976 from GCC. c't used a
patched icc (no cpuid="GenuineIntel" check). What I don't understand is the
big difference between Windows SPECint2000 base (1406 with icc) and the
Linux result (on the Pentium 4, the difference is less).

Jan C. Vorbrüggen

unread,
Feb 11, 2004, 10:44:30 AM2/11/04
to
> Different systems may typedef things to different sizes.

Sure - but are the _relative_ sizes likely to change? That is, you have
a sequence that on one system yields, say, 8-4-4-2-2 bytes that will result
on another system in 4-8-2-4-2 bytes?

Jan

Terje Mathisen

unread,
Feb 11, 2004, 11:32:30 AM2/11/04
to

I've seen a lot of systems like that:

All x86 machines, all the way back to the original 8088, handles
structure offsets within +127 bytes of the base pointer better than
stuff that's further away.

If some parts of the structure is likely to be accessed together, then
they should also be aligned in such a way that they are likely to end up
in the same cache line, even if that could lead to an occasional
non-optimal (size-wise) packing.

However, given that the two items above doesn't matter, using your
'largest first' rule does work pretty well.

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Zak

unread,
Feb 11, 2004, 1:05:12 PM2/11/04
to
Ricardo Bugalho wrote:

> Depends on your definition of component.
> But if a given software depends on a GPL'd software to run (for example,
> a library) then that software must also be GPL. That's why lots of
> libraries have LGPL or BSD licences.

Only when you want to distribute 'given software' with the GPL'd
software, and ISTR you can even distribute them together as a single
working setup if the parts are separate components that can in theory be
exchanged. Shared libraries do not count as such for GPL, but things
that run in pipes or use tmp files can.

This for example allows someone to distribute Linux with closed source
software.


Thomas

Seongbae Park

unread,
Feb 11, 2004, 12:52:52 PM2/11/04
to
In article <402A046A...@mediasec.de>, Jan C Vorbrüggen wrote:
>> Also, rearranging struct fields often reduces the readability of the code
>> and the optimal arrangement is often dependent
>> on the particular machine features such as cache line size.

"optimal" here meant for performance, not space.

> I can't remember a system where following the general rule "Start with
> the largest (in bytes/words) items and end with the smallest ones" wouldn't
> fit well. Exceptions would be compilers that do not properly align the
> struct starting address - but those are broken (at least in the sense of
> quality-of-implementation) anyway.
>
> Jan

That rule would work fine for space, and to some degree for performance
but it leaves too much, especially in large structs with many pointers
(which are all same size as we all know) and even in relatively small structs.
Having frequently accessed pointers of a struct to be in the same L1 cache line
can make a huge difference.

And my guess is Intel's mcf optimization is mainly to
improve the cache hit rate - we (Sun) already published a paper about this:

www.sc-conference.org/sc2003/paperpdfs/pap182.pdf

Seongbae

Sheldon Simms

unread,
Feb 11, 2004, 1:50:46 PM2/11/04
to
On Wed, 11 Feb 2004 08:48:05 +0000, Seongbae Park wrote:

> Stephen Sprunk wrote:
> ...
>> Well, aside from the obvious fact that icc -QxN is breaking the C spec by
>> rearranging the contents of a struct,
>
> Simply rearranging struct fields doesn't violate C standard.
> As long as the user code can not tell the difference,
> it's standard conforming.
> I don't know whether Intel's doing it correctly or not
> in this particular case though.

C99 6.7.2.1:

13 Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared.


Seongbae Park

unread,
Feb 11, 2004, 1:58:27 PM2/11/04
to

Yes. But if there's no user code that takes address of struct field
or relies on such, this is irrelavant.

Seongbae

Rupert Pigott

unread,
Feb 11, 2004, 2:46:39 PM2/11/04
to
"Seongbae Park" <Seongb...@sun.com> wrote in message
news:c0du0j$hso$1...@news1nwk.SFbay.Sun.COM...

However it would "violate C standard", regardless. How
about folks who poke around core dumps ? I imagine they
might notice...


Cheers,
Rupert


David Gay

unread,
Feb 11, 2004, 3:03:55 PM2/11/04
to

Probably not, except if there's some piece of the standard that states
"addresses must be machine-level addresses" (in other words, an address
which is not observed is effectively undefined as you haven't had to define
its mapping to real addresses).

> How about folks who poke around core dumps ? I imagine they might
> notice...

Indeed. I'd expect problems with separately-compiled programs and writing
structs to files/sockets, too (so taking the address of a struct foo, using
struct foo *, etc should disable any field-reordering optimisation for
struct foo).

--
David Gay, not speaking for Intel
dg...@acm.org

Benjamin Goldsteen

unread,
Feb 11, 2004, 4:00:41 PM2/11/04
to
Ketil Malde <ke...@ii.uib.no> wrote in message news:<egptcmt...@havengel.ii.uib.no>...

> b...@inka.mssm.edu (Benjamin Goldsteen) writes:
>
> > Intel gives away their compiler for free to certain populations (e.g.
> > .edu). Why should Intel spend money to develop software that will be
> > given away to people who plan to use the software on non-Intel
> > processors?
>
> I've no opinion on why they give it away. But to me it looks like
> Intel has the best compiler out there, but they would really like to
> have the fastest processor instead. So they try to hamper the use of
> their compiler on competing processors to make it harder to take
> full advantage of them.

In the modern world, processors and compilers are built around each
other. It is meaningless to say that I have the fastest processor but
there is no compiler to take advantage of it. If AMD doesn't have a
compiler that demonstrate that their processor is the fastest then
they don't have the fastest processor.

I don't think Intel is under any obligation to provide their compilers
for non-Intel processors. I think it is fair to say that Intel didn't
develop their compiler because they wanted to sell compilers. They
developed their compiler because they wanted to show off their
processors.

In the HPC world, we test Sun with Sun Forte (Sun One?) compilers
against IBM with IBM XL compilers against SGI with MIPSpro compilers
against ... Its not important that Sun would do better if only they
had IBM's compiler technology. In order to compete in this world,
Intel needed a compiler that demonstrated the performance of their
processor. Saying that their processor was the fastest if only there
was a good compiler for it wasn't good enough. Similarly, AMD can't
claim their processor is fastest unless they have a compiler for it.

AMD either needs to a) license Intel's compiler, b) develop their own
compiler, c) improve GNU's optimization, or d) or depend on a
3rd-party like the Portland Group. Depending on a non-cooperative
competitor doesn't cut it.

> > Isn't this the same as the GPL license prohibiting the use of GPL'd
> > software components in non-GPL licensed software?
>
> I don't think so. You can run GPL software together with proprietary
> software. You have something similar in the Linux kernel, where
> loading a proprietary (binary only) module will "taint" the kernel.
> However, while it will print a warning and possibly make it harder to
> get support, it won't stop you from running it.


I think you missed my point. Its about IP. GPL people are sensitive
to their IP being used for non-GPL projects. Intel doesn't want their
IP being used for non-Intel projects. A BSD-type license says "we
want to get their technology out there and we don't care if someone
else makes the money on it". People who use GPL-type licenses care
about how their IP is used. Their restrictions are different but they
still care. So does Intel. I'm glad everyone has that choice.

Stephen Sprunk

unread,
Feb 11, 2004, 4:00:37 PM2/11/04
to
"Seongbae Park" <Seongb...@sun.com> wrote in message
news:c0du0j$hso$1...@news1nwk.SFbay.Sun.COM...

As the saying goes, if a tree falls in the forest and nobody is around to
hear it, does it really make a noise?

The C spec (thanks for the citation, Sheldon) says you must lay out struct
members in the order they're declared. If Intel's compiler doesn't do that,
it's not compliant, period.

Unless icc is incredibly smart about detecting you taking the address of a
struct, using it in a union, aliasing it via questionable pointer math,
passing it as a C++ reference argument, etc. there's a chance it can
generate incorrect code. The C standard has enough ambiguities that cause
portability problems; we don't need people intentionally introducing new
problems with the unambiguous parts.

John F. Carr

unread,
Feb 11, 2004, 4:23:13 PM2/11/04
to
In article <a13e403a.04020...@posting.google.com>,
iccOut <iccou...@yahoo.com> wrote:
>I started mucking around with a dissassembly of the Intel-specific
>binary and found one particular call (proc_init_N) that appeared to be
>performing this check. As far as I can tell, this call is supposed to
>verify that the CPU supports SSE and SSE2 and it checks the CPUID to
>ensure that its an Intel processor.

I remember the discussion when Intel announced CPUID and implicitly
announced the ability to make programs not run on clone x86 processors.
I still like the solution I posted back then:

<http://groups.google.com/groups?threadm=1pdqj0INN2h1%40senator-bedfellow.MIT.EDU>


--
John Carr (j...@mit.edu)

Peter Boyle

unread,
Feb 11, 2004, 4:47:34 PM2/11/04
to

On Wed, 11 Feb 2004, John F. Carr wrote:

> I remember the discussion when Intel announced CPUID and implicitly
> announced the ability to make programs not run on clone x86 processors.
> I still like the solution I posted back then:
>
> <http://groups.google.com/groups?threadm=1pdqj0INN2h1%40senator-bedfellow.MIT.EDU>

Cute!
Peter

>
> --
> John Carr (j...@mit.edu)
>

Peter Boyle pbo...@physics.gla.ac.uk


Stephen Clarke

unread,
Feb 11, 2004, 5:14:56 PM2/11/04
to
"Stephen Sprunk" <ste...@sprunk.org> wrote in message
news:6d6f33a965891137...@news.teranews.com...

> "Seongbae Park" <Seongb...@sun.com> wrote in message
> news:c0du0j$hso$1...@news1nwk.SFbay.Sun.COM...
> > In article <pan.2004.02.11....@yahoo.com>, Sheldon Simms
> wrote:
> > > On Wed, 11 Feb 2004 08:48:05 +0000, Seongbae Park wrote:
> > >
> > >> Stephen Sprunk wrote:
> > >>> Well, aside from the obvious fact that icc -QxN is breaking the C
> > >>> spec by rearranging the contents of a struct,
> > >>
> > >> Simply rearranging struct fields doesn't violate C standard. As long
> > >> as the user code can not tell the difference, it's standard conforming.
> > >> I don't know whether Intel's doing it correctly or not in this
> particular
> > >> case though.
> > >
> > > C99 6.7.2.1:
> > >
> > > 13 Within a structure object, the non-bit-field members and the units in
> > > which bit-fields reside have addresses that increase in the order in
> > > which they are declared.
> >
> > Yes. But if there's no user code that takes address of struct field
> > or relies on such, this is irrelavant.
>
> As the saying goes, if a tree falls in the forest and nobody is around to
> hear it, does it really make a noise?
>
> The C spec (thanks for the citation, Sheldon) says you must lay out struct
> members in the order they're declared. If Intel's compiler doesn't do that,
> it's not compliant, period.

The standard describes the behaviour on an abstract machine (C99, 5.1.2.3).
The compiler has to map this behaviour onto the real machine in
such a way that the behaviour observed by the program matches
that abstract machine behaviour. If the program never observes
the relative offsets of two fields in a struct, then there's no need
to preserve that behaviour on the real machine.

If you require 6.7.2.1 to hold for the real machine, then
any compiler that holds structs in registers, or passes struct-valued
arguments in registers would also be non-compliant, because registers
don't have addresses at all.
That would be a significant number of compilers ...

Stephen.

Seongbae Park

unread,
Feb 11, 2004, 5:06:45 PM2/11/04
to
Stephen Sprunk wrote:
...

>> > C99 6.7.2.1:
>> >
>> > 13 Within a structure object, the non-bit-field members and the units in
>> > which bit-fields reside have addresses that increase in the order in
>> > which they are declared.
>>
>> Yes. But if there's no user code that takes address of struct field
>> or relies on such, this is irrelavant.
>
> As the saying goes, if a tree falls in the forest and nobody is around to
> hear it, does it really make a noise?
>
> The C spec (thanks for the citation, Sheldon) says you must lay out struct
> members in the order they're declared. If Intel's compiler doesn't do that,
> it's not compliant, period.

Let's play the standard game then.

C99 5.1.2.3:

5. The least requirements on a conforming implementation are:

At sequence points, volatile objects are stable in the sense that
previous accesses are complete and subsequent accesses have not
yet occurred.

At program termination, all data written into files shall be
identical to the result that execution of the program according to
the abstract semantics would have produced.

The input and output dynamics of interactive devices shall take
place as specified in 7.19.3. The intent of these requirements is
that unbuffered or line-buffered output appear as soon as
possible, to ensure that prompting messages actually appear prior
to a program waiting for input.

So, as long as reordering of struct fields satisfies all of the above,
it meets this "least requirements" of a conforming implementation.
And I don't see why it can not be done.

> Unless icc is incredibly smart about detecting

A compiler doesn't have to be incredibly smart to do this.

> you taking the address of a struct,

Easy.

> using it in a union,

Easy.

> aliasing it via questionable pointer math,

If the code is standard compliant, a compiler can tell
when it can not follow what's going on anymore and give up.
So no problem here.
Sure, you can write a non-compliant code that breaks such,
but the same applies to almost any other optimization.

> passing it as a C++ reference argument, etc.

No big deal. Same as address taken.

> there's a chance it can generate incorrect code.

Many compilers have been doing similar analysis already
- type based aliasing, interprocedural alias analysis and
escape analysis all require detection of most of above conditions.

An implementation can have a bug.
That doesn't mean it can not be fixed. And in this case,
it is possible to do such transformation correctly.

> The C standard has enough ambiguities that cause
> portability problems; we don't need people intentionally introducing new
> problems with the unambiguous parts.

Huh ? What new problem ? Reordering of struct fields, if properly done,
does not cause any portability problem nor any more ambiguities.

There are many optimizing compilers
that unbox a local struct/class *completely*.
Do they cause any portability problem ? No.
Are they compliant ? Yes, of course.

Seongbae

Disclaimer: All of my postings in this thread discusses
theoretical and practical issues in implementing "a" compiler,
not any specific compiler implementation.

Robert Wessel

unread,
Feb 11, 2004, 7:13:35 PM2/11/04
to
"Rupert Pigott" <r...@dark-try-removing-this-boong.demon.co.uk> wrote in message news:<10765288...@saucer.planet.gong>...

> "Seongbae Park" <Seongb...@sun.com> wrote in message
> news:c0du0j$hso$1...@news1nwk.SFbay.Sun.COM...
> > >> Simply rearranging struct fields doesn't violate C standard.
> > >> As long as the user code can not tell the difference,
> > >> it's standard conforming.
> > >> I don't know whether Intel's doing it correctly or not
> > >> in this particular case though.
> > >
> > > C99 6.7.2.1:
> > >
> > > 13 Within a structure object, the non-bit-field members and the units in
> > > which bit-fields reside have addresses that increase in the order in
> > > which they are declared.
> >
> > Yes. But if there's no user code that takes address of struct field
> > or relies on such, this is irrelavant.
>
> However it would "violate C standard", regardless.


Incorrect. The section on program execution, 5.1.2.3 (since we're
quoting C99), pretty much defines what's known as the "as-if" rule.
So long as the application (or the defined part of the environment -
for example the contents of a file written to) can't tell the
difference, the C compiler may do anything it wants.


>How
> about folks who poke around core dumps ? I imagine they
> might notice...


We might, but nobody cares. ;-)

Rupert Pigott

unread,
Feb 11, 2004, 7:46:31 PM2/11/04
to
"Robert Wessel" <robert...@yahoo.com> wrote in message
news:bea2590e.04021...@posting.google.com...

Like a core dump for instance.

GG

Cheers,
Rupert


Greg Lindahl

unread,
Feb 11, 2004, 9:01:33 PM2/11/04
to
In article <4db74fa6.04021...@posting.google.com>,
Benjamin Goldsteen <b...@inka.mssm.edu> wrote:

>In the modern world, processors and compilers are built around each
>other.

This is true for traditional systems companies. But AMD builds
processors that execute x86 code fast -- no compiler interaction,
because they are a minority player in the x86 space. Intel *probably*
builds future x86 cpus that execute existing apps fast, because
there's so much legacy code being executed.

Now that AMD has a CPU that's going places other than gamer desktops,
they are doing some different things than in the past -- they paid
SuSE to work on gcc, for example.

-- greg

Greg Lindahl

unread,
Feb 11, 2004, 9:04:53 PM2/11/04
to
In article <CPxWb.719$MQ1...@news-binary.blueyonder.co.uk>,
Stephen Clarke <stephen...@who.needs.spam.earthling.net> wrote:

>The standard describes the behaviour on an abstract machine (C99, 5.1.2.3).

In the HPC space, everyone uses flags that encourage the compiler to
break the rules to get better performance. You can go read up about the
SPEC flags people use; flags that break all the standards are normal.

-- greg

Ketil Malde

unread,
Feb 12, 2004, 3:18:02 AM2/12/04
to
b...@inka.mssm.edu (Benjamin Goldsteen) writes:

> In the modern world, processors and compilers are built around each
> other. It is meaningless to say that I have the fastest processor but
> there is no compiler to take advantage of it.

But of course, this isn't the case: there *is* a compiler that
(presumably) shows it is faster. Hobbled by its owner, in order to
avoid showing that.

> If AMD doesn't have a compiler that demonstrate that their processor
> is the fastest then they don't have the fastest processor.

You could take this one step further, and say that it is meaningless
to claim a fast CPU, unless there are applications making use of it.
Which eventually brings us back to the old maxim about benchmarking
the application you're interested in, instead of relying on artificial
"benchmark numbers".

> I think you missed my point. Its about IP. GPL people are sensitive
> to their IP being used for non-GPL projects. Intel doesn't want their
> IP being used for non-Intel projects.

One question is to what degree should Intel be able to decide this.
Do they get to decide on which CPUs I run executables compiled with
their compiler? What if I compile GCC with ICC, does the same apply
to the GCC executable? And can they limit my ability to publish
benchmarks I make?

(And BTW, GPL is fine in non-GPL projects, lots of people compile
non-GPL code with GCC, for instance. You just can distribute modified
GPL code without source.)

Jan C. Vorbrüggen

unread,
Feb 12, 2004, 3:50:28 AM2/12/04
to
> That rule would work fine for space, and to some degree for performance
> but it leaves too much, especially in large structs with many pointers
> (which are all same size as we all know) and even in relatively small
> structs. Having frequently accessed pointers of a struct to be in the same
> L1 cache line can make a huge difference.

Sure, but if they're the same size, you can re-arrange them to your
heart's content - as the programmer always, and legally as a compiler
if you can assure the "as-if" rule is obeyed.

How large are L1 line sizes these days - 32-64 bytes, i.e., 8-16 pointers
on a 32-bit machine? You're saying people are using data structures with
fan-outs of that order? Bah.

Jan

Chris

unread,
Feb 12, 2004, 4:22:23 AM2/12/04
to
One other way to possibly bypass Intel "protection" :

Since Windows XP is using a CPU driver located into
c:\windows\system32\drivers\amdk7.sys or amdk6.sys, I think that we
could be able to modify the AMD CPU's ids by changing the "line" that
specify the id. Thus, people running Athlon could see their
performances increase. So it would be rather simple... in theory.
Or why an AMD user couldn't use a P4 driver ? That way people might
get benefits onto their AMDs.
We could also create a program that emulate a P4 id. I don't think
that so complicated...

In fact, the "compiler trick" would be the reason why Intel gets
better performance on 3DSMax, Media Encoding, several games,
...However, I mustn't misapprehend : it's only a supposition.

Robert Wessel

unread,
Feb 12, 2004, 5:02:36 AM2/12/04
to
"Rupert Pigott" <r...@dark-try-removing-this-boong.demon.co.uk> wrote in message news:<10765467...@saucer.planet.gong>...

> "Robert Wessel" <robert...@yahoo.com> wrote in message
> > Incorrect. The section on program execution, 5.1.2.3 (since we're
> > quoting C99), pretty much defines what's known as the "as-if" rule.
> > So long as the application (or the defined part of the environment -
> > for example the contents of a file written to) can't tell the
> > difference, the C compiler may do anything it wants.
>
> Like a core dump for instance.


Core dumps are quite outside the C standard. There's no requirement
that structures be laid out in any particular way in a core dump.

Nick Maclaren

unread,
Feb 12, 2004, 5:22:04 AM2/12/04
to
In article <c0e91l$o7q$1...@news1nwk.SFbay.Sun.COM>,

Seongbae Park <Seongb...@sun.com> wrote:
>Stephen Sprunk wrote:
>...
>>> > C99 6.7.2.1:
>>> >
>>> > 13 Within a structure object, the non-bit-field members and the units in
>>> > which bit-fields reside have addresses that increase in the order in
>>> > which they are declared.
>>>
>>> Yes. But if there's no user code that takes address of struct field
>>> or relies on such, this is irrelavant.
>>
>> As the saying goes, if a tree falls in the forest and nobody is around to
>> hear it, does it really make a noise?
>>
>> The C spec (thanks for the citation, Sheldon) says you must lay out struct
>> members in the order they're declared. If Intel's compiler doesn't do that,
>> it's not compliant, period.
>
>Let's play the standard game then.

You called?

>C99 5.1.2.3:
>
>5. The least requirements on a conforming implementation are:
>

>So, as long as reordering of struct fields satisfies all of the above,
>it meets this "least requirements" of a conforming implementation.
>And I don't see why it can not be done.

I won't give the quotes, because they are voluminous, but one of the
many aspects of the C standard that is hopelessly ambiguous, is used
in real code, and causes hell to real compilers is:

typedef struct {
double d;
int n;
} t;

t x;
void *p = (void *)&x;

*((int *)((char *)p+offsetof(t,n))) = 123;

Note that this doesn't deny your point, it merely means that it is
very, very hard to reorganise fields in C structures and maintain
support for that use.

I can tell you that the C standards committee have had several flame
wars about that, leading to no conclusion that I could discover. I
believe that the current position is that whether the above is
required to work by the standard or undefined behaviour is one of
the many interpretations that is deliberately left up to the reader.


Regards,
Nick Maclaren.

Nick Maclaren

unread,
Feb 12, 2004, 5:26:27 AM2/12/04
to

Er, no. Some people do; others don't. In particular, there are a lot
of people who write cleanish Fortran and optimise up to the standard's
limits but not beyond. Our default compiler options are set up that
way, and very few users have trouble or bother to increase them.

If you were referring only to C, it is unclear what the standard allows,
and so almost ALL optimisation will 'break the rules' if you interpret
the standard one way. Alternatively, all programs do, if you interpret
it another.


Regards,
Nick Maclaren.

Sander Vesik

unread,
Feb 12, 2004, 7:27:43 AM2/12/04
to

There is no such trhing as an "unobserved address". You can take the "address"
of any structure member, and then compare them.

>
> > How about folks who poke around core dumps ? I imagine they might
> > notice...
>
> Indeed. I'd expect problems with separately-compiled programs and writing
> structs to files/sockets, too (so taking the address of a struct foo, using
> struct foo *, etc should disable any field-reordering optimisation for
> struct foo).
>

--
Sander

+++ Out of cheese error +++

Nick Maclaren

unread,
Feb 12, 2004, 7:43:33 AM2/12/04
to
Sorry about following up to my own posting, but cancellation and
resubmission rarely works. I forgot to mention that the following
variant adds even more standards chaos, and is nearly as common:

typedef struct {
double d;
int n;
} t;

typedef union {
double d;
int n;
} u;
t x;
u *p = (u *)&x;

*((int *)((char *)p+offsetof(t,n))) = 123;


Regards,
Nick Maclaren.

Jan C. Vorbrüggen

unread,
Feb 12, 2004, 8:39:41 AM2/12/04
to
> >In the HPC space, everyone uses flags that encourage the compiler to
> >break the rules to get better performance. You can go read up about the
> >SPEC flags people use; flags that break all the standards are normal.
>
> Er, no. Some people do; others don't. In particular, there are a lot
> of people who write cleanish Fortran and optimise up to the standard's
> limits but not beyond.

Well, a lot of Fortran compilers have a seperate switch saying, "yes,
you can really rely on the code not doing non-conformant argument aliasing"
- and for very good reason.

Jan

Sander Vesik

unread,
Feb 12, 2004, 9:15:46 AM2/12/04
to
Seongbae Park <Seongb...@sun.com> wrote:
>
> > aliasing it via questionable pointer math,
>
> If the code is standard compliant, a compiler can tell
> when it can not follow what's going on anymore and give up.
> So no problem here.
> Sure, you can write a non-compliant code that breaks such,
> but the same applies to almost any other optimization.

But how do you do it if it happens in a different .c file?

>
> Seongbae

Rupert Pigott

unread,
Feb 12, 2004, 11:42:21 AM2/12/04
to
"Robert Wessel" <robert...@yahoo.com> wrote in message
news:bea2590e.04021...@posting.google.com...

There bloody well is if I'm debugging.

These tools are there to assist me in doing a job, if they
make my job as a programmer *harder* they are no damn good.

Cheers,
Rupert


David Gay

unread,
Feb 12, 2004, 12:44:30 PM2/12/04
to

Sander Vesik <san...@haldjas.folklore.ee> writes:

> David Gay <dg...@beryl.cs.berkeley.edu> wrote:
> >
> > "Rupert Pigott" <r...@dark-try-removing-this-boong.demon.co.uk> writes:
> > > Seongbae Park" <Seongb...@sun.com> wrote in message
> > > news:c0du0j$hso$1...@news1nwk.SFbay.Sun.COM...
> > > > In article <pan.2004.02.11....@yahoo.com>, Sheldon Simms
> > > wrote:
> > > > > C99 6.7.2.1:
> > > > > 13 Within a structure object, the non-bit-field members and the units in
> > > > > which bit-fields reside have addresses that increase in the order in
> > > > > which they are declared.
> > > >
> > > > Yes. But if there's no user code that takes address of struct field
> > > > or relies on such, this is irrelavant.
> > >
> > > However it would "violate C standard", regardless.
> >
> > Probably not, except if there's some piece of the standard that states
> > "addresses must be machine-level addresses" (in other words, an address
> > which is not observed is effectively undefined as you haven't had to define
> > its mapping to real addresses).
>
> There is no such trhing as an "unobserved address". You can take the "address"
> of any structure member, and then compare them.

You just observed it, then. The as-if posts gave a more complete list of
things that (conservatively) constitute "observation".

Rick Jones

unread,
Feb 12, 2004, 1:25:01 PM2/12/04
to
>> Core dumps are quite outside the C standard. There's no requirement
>> that structures be laid out in any particular way in a core dump.

> There bloody well is if I'm debugging.

> These tools are there to assist me in doing a job, if they
> make my job as a programmer *harder* they are no damn good.

At which point if we've gone beyond the scope of the language
standard, we then have to consider other standards and/or rules and
decide if something is fit for a particular purpose and whatnot. At
least if we are talking about applying such optimizations to
benchmarks and the like.

rick jones
--
denial, anger, bargaining, depression, acceptance, rebirth...
where do you want to be today?
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...

Seongbae Park

unread,
Feb 12, 2004, 1:29:19 PM2/12/04
to
In article <10765953...@haldjas.folklore.ee>, Sander Vesik wrote:
> Seongbae Park <Seongb...@sun.com> wrote:
>>
>> > aliasing it via questionable pointer math,
>>
>> If the code is standard compliant, a compiler can tell
>> when it can not follow what's going on anymore and give up.
>> So no problem here.
>> Sure, you can write a non-compliant code that breaks such,
>> but the same applies to almost any other optimization.
>
> But how do you do it if it happens in a different .c file?

You don't do it.
I believe Intel uses interprocedural optimization for SPEC like us.

Seongbae

Christopher Brian Colohan

unread,
Feb 12, 2004, 1:36:34 PM2/12/04
to
Sander Vesik <san...@haldjas.folklore.ee> writes:
> Seongbae Park <Seongb...@sun.com> wrote:
> >
> > > aliasing it via questionable pointer math,
> >
> > If the code is standard compliant, a compiler can tell
> > when it can not follow what's going on anymore and give up.
> > So no problem here.
> > Sure, you can write a non-compliant code that breaks such,
> > but the same applies to almost any other optimization.
>
> But how do you do it if it happens in a different .c file?

Many compilers do whole program analysis -- they delay many
optimizations until link time (which is really a "finish compilation
and then link" phase). Once you have the whole program (and
instrumented libraries) in hand you can figure out a lot more about
the use of symbols than if you are looking at things one file at a
time.

This is not that new -- the SGI compiler did this at least 7 years ago
(when I got my last SGI)...

Chris
--
Chris Colohan Email: ch...@colohan.ca PGP: finger col...@cs.cmu.edu
Web: www.colohan.com Phone: (412)268-4751

Seongbae Park

unread,
Feb 12, 2004, 1:57:49 PM2/12/04
to
In article <402B3E54...@mediasec.de>, Jan C Vorbrüggen wrote:
>> That rule would work fine for space, and to some degree for performance
>> but it leaves too much, especially in large structs with many pointers
>> (which are all same size as we all know) and even in relatively small
>> structs. Having frequently accessed pointers of a struct to be in the same
>> L1 cache line can make a huge difference.
>
> Sure, but if they're the same size, you can re-arrange them to your
> heart's content - as the programmer always, and legally as a compiler
> if you can assure the "as-if" rule is obeyed.

My point is that this doesn't make it much easier for the compiler, if at all.
Also if there's nested structures or unions, it isn't possible.
And readability is often more important than 10-20% performance
depending on the applications.

> How large are L1 line sizes these days - 32-64 bytes, i.e., 8-16 pointers
> on a 32-bit machine? You're saying people are using data structures with
> fan-outs of that order? Bah.

Yes. 128bytes or larger struct isn't that uncommon.
I've seen ISV code with ~1KB structures.

Seongbae

Seongbae Park

unread,
Feb 12, 2004, 2:25:19 PM2/12/04
to
In article <c0fk4c$rn4$1...@pegasus.csx.cam.ac.uk>, Nick Maclaren wrote:
..

>>Let's play the standard game then.
>
> You called?

I wouldn't bet on me winning against you :)

...


> Note that this doesn't deny your point, it merely means that it is
> very, very hard to reorganise fields in C structures and maintain
> support for that use.
>
> I can tell you that the C standards committee have had several flame
> wars about that, leading to no conclusion that I could discover. I
> believe that the current position is that whether the above is
> required to work by the standard or undefined behaviour is one of
> the many interpretations that is deliberately left up to the reader.

Thanks God that offsetof() usage isn't widespread (at least yet),
and that the above code works most of the time on our system.
We get enough bugs on C standard compliance
that turn out to be either customer's misunderstanding or ambiguities
in the standard - probably the most popular one is volatile usage.

Seongbae

John Dallman

unread,
Feb 12, 2004, 3:00:00 PM2/12/04
to
In article <a13e403a.04020...@posting.google.com>,
iccou...@yahoo.com (iccOut) wrote:

> I started mucking around with a dissassembly of the Intel-specific
> binary and found one particular call (proc_init_N) that appeared to be
> performing this check. As far as I can tell, this call is supposed to
> verify that the CPU supports SSE and SSE2 and it checks the CPUID to

> ensure that its an Intel processor. I wrote a quick utility which I
> call iccOut, to go through a binary that has been compiled with this
> Intel-only flag and remove that check.

I've used the Intel compiler extensively, and it doesn't have this trick
in v7.1 or earlier. I haven't switched to v8.0 yet, but downloaded it
today, and the docs suggest that /QxN only causes a CPUID check if you
compile main() with it.

What happens if you compile the file with main() in it with /QxW, and the
rest with /QxN ? If that runs on an AMD with SSE2, can you split out
main() into a file of its own, and just compile that with /QxW ? It's a
trivial workaround, but if it works, it works.

Of course, Intel could start checking the CPU in every function, but it
would be kind of futile. The overhead of doing that would rather eat into
their performance gains.

---
John Dallman j...@cix.co.uk
"Any sufficiently advanced technology is indistinguishable from a
well-rigged demo"

Stefan Monnier

unread,
Feb 12, 2004, 3:04:03 PM2/12/04
to
> There bloody well is if I'm debugging.

Which world do you live in ?

When was the last time that you could debug fully-optimized C code without
having a lot of trouble not getting lost in the unrecognizable code
generated by the compiler and the weird order in which operations
are performed?


Stefan

Nick Maclaren

unread,
Feb 12, 2004, 3:52:21 PM2/12/04
to
In article <c0gjuv$590$1...@news1nwk.SFbay.Sun.COM>,

Seongbae Park <Seongb...@sun.com> wrote:
>
>Thanks God that offsetof() usage isn't widespread (at least yet),
>and that the above code works most of the time on our system.
>We get enough bugs on C standard compliance
>that turn out to be either customer's misunderstanding or ambiguities
>in the standard - probably the most popular one is volatile usage.

Unfortunately, it is :-(

If you look at the code (and even the interfaces!) of the X Toolkit
and most widget sets, you will find it. They aren't the only things
with that, either. I have rarely seen it except in disgusting code,
of the sort that the only practical solution is to compile with a
low level of optimisation and pray. This occasionally occurs in even
well-written programs, but is then well localised and commented. A
GOOD program will even contain a check that its unreliable assumptions
are correct, when that is possible ....

I agree with you about volatile. I don't know what it does on most
of the systems I use, but I am sure that they all conform in some way
or another.


Regards,
Nick Maclaren.

Rick Jones

unread,
Feb 12, 2004, 4:43:19 PM2/12/04
to

That would be following the instructions, but now we are talking about
adding where one finds the values in memory too right?

rick jones
--
Process shall set you free from the need for rational thought.

Ricardo Bugalho

unread,
Feb 12, 2004, 7:10:15 PM2/12/04
to
On Thu, 12 Feb 2004 21:43:19 +0000, Rick Jones wrote:


> That would be following the instructions, but now we are talking about
> adding where one finds the values in memory too right?

This an academic discussion but changing the order of the fields in a
struct is something that would be easy to describe in the binary's debug
information and for the debugger to cope with. I don't know if any
compiler/debuger supports that though.
But, like Stefan Monnier said, optimisation and debugging don't mix well.
When you're debugging you're looking at the internals of the program. But
C compilers only have to care about the side effects of running the
program and they'll rightfully twist every aspect of the internal
working when optimising. Changing the order of the fields of structs is
only of the tricks they'll do that make it hard to debug optimised code.

--
Ricardo

Robert Wessel

unread,
Feb 12, 2004, 8:47:03 PM2/12/04
to
"Rupert Pigott" <r...@dark-try-removing-this-boong.demon.co.uk> wrote in message news:<10766041...@saucer.planet.gong>...

> > Core dumps are quite outside the C standard. There's no requirement
> > that structures be laid out in any particular way in a core dump.
>
> There bloody well is if I'm debugging.
>
> These tools are there to assist me in doing a job, if they
> make my job as a programmer *harder* they are no damn good.


Certainly any particular compiler implementation ought to tell you how
to find data in a debugger or core dump, and one that didn't wouldn't
be a very good tools. But with optimized code, it's often quite tough
to see any one-to-one mapping from your source to the object code (and
which is why debuggers tend to work so much better with unoptimized
code).

Nonetheless, all that's quite outside the scope of the standard.

Hank Oredson

unread,
Feb 13, 2004, 12:11:08 AM2/13/04
to

"Seongbae Park" <Seongb...@sun.com> wrote in message
news:c0gjuv$590$1...@news1nwk.SFbay.Sun.COM...

It is my opinion that offsetof() would only be used by
a totally incompetent programmer.

--

... Hank

Hank: http://horedson.home.att.net
W0RLI: http://w0rli.home.att.net


Terje Mathisen

unread,
Feb 13, 2004, 2:28:44 AM2/13/04
to
Robert Wessel wrote:

Actually, it is very seldom that I have any serious problems reading the
optimized output of a compiler, and if it was kind enough to show the
roughly corresponding block of source code, it's relatively easy.

I believe this is much more of a 'I have managed well without really
learning asm optimization, and I bloody well don't intend to waste any
time on that useless stuff now!' viewpoint.


>
> Nonetheless, all that's quite outside the scope of the standard.

Right. In fact, it would seem like a _very_ worthwhile task for a
compiler to use storage efficiency, structure alignement and cache reuse
as inputs to a (possibly feedback-directed) data reorganization phase.

Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Terje Mathisen

unread,
Feb 13, 2004, 2:45:45 AM2/13/04
to
Hank Oredson wrote:

> "Seongbae Park" <Seongb...@sun.com> wrote in message

>>Thanks God that offsetof() usage isn't widespread (at least yet),
>>and that the above code works most of the time on our system.
>>We get enough bugs on C standard compliance
>>that turn out to be either customer's misunderstanding or ambiguities
>>in the standard - probably the most popular one is volatile usage.
>>
>>Seongbae
>
> It is my opinion that offsetof() would only be used by
> a totally incompetent programmer.

What I really don't understand is when you'd use it?

If I require a pointer to the beginning of a small table embedded in a
struct, I'd simply do so:

char *p = struct_ptr->char_table;

Why would I use

p = (char *) struct_ptr + offsetof(struct_t, char_table);

instead?

Secondly, why would the offsetof() version generate any different code?

I can see that taking the difference between various offsetof() values
could force the compiler to avoid all struct reorganization, and
possibly also force it to use something like #pragma pack(1) to avoid
holes, but otherwise, what is the real problem?

Christian Bau

unread,
Feb 13, 2004, 2:55:18 AM2/13/04
to
In article <c0hvb9$bko$1...@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.m...@hda.hydro.com> wrote:

> Hank Oredson wrote:
>
> > It is my opinion that offsetof() would only be used by
> > a totally incompetent programmer.

There's a reputationn gone.

> What I really don't understand is when you'd use it?
>
> If I require a pointer to the beginning of a small table embedded in a
> struct, I'd simply do so:
>
> char *p = struct_ptr->char_table;
>
> Why would I use
>
> p = (char *) struct_ptr + offsetof(struct_t, char_table);
>
> instead?

To set everything in a struct before the char_table to zeroes:

memset (struct_ptr, 0, offsetof (struct_t, char_table));

Works in C, works in C++ if you use PODs (plain old datatypes).

Terje Mathisen

unread,
Feb 13, 2004, 3:05:19 AM2/13/04
to

OK, it is much more readable than a subtraction of two (cast'ed)
pointers, I can see that.

It also marks said structure as 'do not move this member in fron of any
preceeding member', which would probably be implemented as simply yet
another way to flag it as 'no not reorganize at all'.

Thanks!

Nick Maclaren

unread,
Feb 13, 2004, 4:25:03 AM2/13/04
to
In article <christian.bau-5BE...@slb-newsm1.svr.pol.co.uk>,

Christian Bau <christ...@cbau.freeserve.co.uk> wrote:
>In article <c0hvb9$bko$1...@osl016lin.hda.hydro.com>,
> Terje Mathisen <terje.m...@hda.hydro.com> wrote:
>> Hank Oredson wrote:
>>
>> > It is my opinion that offsetof() would only be used by
>> > a totally incompetent programmer.
>
>There's a reputationn gone.

Mine too :-)

>> What I really don't understand is when you'd use it?
>

>To set everything in a struct before the char_table to zeroes:
>
> memset (struct_ptr, 0, offsetof (struct_t, char_table));
>
>Works in C, works in C++ if you use PODs (plain old datatypes).

Yes. And there are half a dozen similar uses; in all cases, you can
use pointer subtraction instead, but some people prefer offsetof.
offsetof is, after all, just syntactic sugar.

One use where it is pretty well unavoidable is when writing a generic
qsort-like chain sort, when it is necessary to specify where in the
structure the chain pointer is. Yes, it would be possible to provide
a pointer-locator function as an argument, but that would add cost and
obscurity for very little gain.

Where I agree with Hank is that using it gratuitously or (worse)
saving the result in a variable is almost always the sign of truly
disgusting code. I can't now remember exactly which of the X Toolkit
or widget set interfaces requires it, but there are some. And allowing
that sort of use means that the compiler has to give up as soon as it
sees any pointer arithmetic, which is one reason 'standards breaking'
compiler options are needed for C but not Fortran.


Regards,
Nick Maclaren.

Ralph Schmidt

unread,
Feb 13, 2004, 6:34:07 AM2/13/04
to
Terje Mathisen wrote:

> Hank Oredson wrote:
>
>> "Seongbae Park" <Seongb...@sun.com> wrote in message
>>>Thanks God that offsetof() usage isn't widespread (at least yet),
>>>and that the above code works most of the time on our system.
>>>We get enough bugs on C standard compliance
>>>that turn out to be either customer's misunderstanding or ambiguities
>>>in the standard - probably the most popular one is volatile usage.
>>>
>>>Seongbae
>>
>> It is my opinion that offsetof() would only be used by
>> a totally incompetent programmer.
>
> What I really don't understand is when you'd use it?
>

Imagine an entity which has embedded list nodes and these nodes
are part of some (global) lists and the entity's ptr can't be part
of the nodes itself for easy back referencing.
Now you need to get the entity ptr from the node ptr itself when
you go through such list. Then you need offsetof to calculate the
entity ptr from the node offset to the entity's start.

It's also useful when you use asm() macros in gcc where you can
use offsetof to pass the offset to certain entries.

Jan C. Vorbrüggen

unread,
Feb 13, 2004, 7:03:14 AM2/13/04
to
> But how do you do it if it happens in a different .c file?

To switch languages, that's why Fortran since F90 allows you to put
a struct declaration into a MODULE, which all its USErs will see in
the same way, and to add SEQUENCE to the declaration, which tells
the compiler that it will see it again in another scope (and possibly
different optimization settings), so it shouldn't do certain optimizations
that otherwise would be allowed. Good compilers also give you informational
or warning messages when they are forced to select suboptimal/bad code
because they cannot realign structure members for one of these reasons.

Jan

David Gay

unread,
Feb 13, 2004, 11:25:26 AM2/13/04
to

"Hank Oredson" <hore...@att.net> writes:
> It is my opinion that offsetof() would only be used by
> a totally incompetent programmer.

One obvious one: variable-sized structs (array at the end):
struct foo {
int len;
double elems[1];
};
Then you allocate with
malloc(offsetof(struct foo, len) + sizeof(double) * nelems);

These days you can declare foo with
struct foo {
int len;
double elems[];
};
and avoid the offsetof.

--
David Gay
dg...@acm.org

Peter Dickerson

unread,
Feb 13, 2004, 11:50:49 AM2/13/04
to
"David Gay" <dg...@beryl.CS.Berkeley.EDU> wrote in message
news:s71lln7...@beryl.CS.Berkeley.EDU...

>
> "Hank Oredson" <hore...@att.net> writes:
> > It is my opinion that offsetof() would only be used by
> > a totally incompetent programmer.
>
> One obvious one: variable-sized structs (array at the end):
> struct foo {
> int len;
> double elems[1];
> };
> Then you allocate with
> malloc(offsetof(struct foo, len) + sizeof(double) * nelems);

You might want to reconsider that...

> These days you can declare foo with
> struct foo {
> int len;
> double elems[];
> };
> and avoid the offsetof.
>
> --
> David Gay
> dg...@acm.org

Peter


Eric

unread,
Feb 13, 2004, 11:33:45 AM2/13/04
to
Terje Mathisen wrote:
>
> Christian Bau wrote:
> > In article <c0hvb9$bko$1...@osl016lin.hda.hydro.com>,
> > Terje Mathisen <terje.m...@hda.hydro.com> wrote:
> >>Why would I use
> >>
> >> p = (char *) struct_ptr + offsetof(struct_t, char_table);
> >>
> >>instead?
> >
> >
> > To set everything in a struct before the char_table to zeroes:
> >
> > memset (struct_ptr, 0, offsetof (struct_t, char_table));
> >
> > Works in C, works in C++ if you use PODs (plain old datatypes).
>
> OK, it is much more readable than a subtraction of two (cast'ed)
> pointers, I can see that.
>
> It also marks said structure as 'do not move this member in fron of any
> preceeding member', which would probably be implemented as simply yet
> another way to flag it as 'no not reorganize at all'.
>
> Thanks!
>
> Terje

A better example is to convert from the address of a field inside a struct
back to a pointer to the containing struct. In other words you are not limited
to casting pointers to just the first field of a struct. Very useful.
I use this a lot. So do WNT and VMS (though VMS is Bliss/Macro).

For example if struct FooBarT can be in two linked lists
(or Btree indexes) then it contains two double linked list entry fields
FooLink and BarLink. offsetof allows you to convert a pointer to the fields
FooLink and BarLink back into a pointer to FooBarT.

// Subtract byte offset of field to get back to container
//
#define CONTAINING_REC(parentType,fldName,fldPtr) \
(parentType*)((char*)fldPtr-offsetof(parentType,fldName))

// Circular double linked list of link records.
struct DlinkT { DlinkT *FlinkP, *BlinkP; };
typedef DlinkT DlistT;

struct FooBarT
{
int Junk;
DlinkT FooLink; // Points to other FooLink structs
int MoreJunk;
DlinkT BarLink; // Points to other BarLink structs
int StillMoreJunk;
BnodeT IdNode; // Points to other Bnode structs
};

#define FooBarFromFooLink(linkPtr) CONTAINING_REC(FooBarT, FooLink, linkPtr)
#define FooBarFromBarLink(linkPtr) CONTAINING_REC(FooBarT, BarLink, linkPtr)
#define FooBarFromIdNode(idPtr) CONTAINING_REC(FooBarT, IdNode, idPtr)

{
FooBarT *fbP, *nextP;

// Convert link pointer back to original record.
fbP = FooBarFromBarLink(myLinkPtr);

nextP = FooBarFromFooLink (fbP->FooLink.FlinkP);
}

You can have 'generic' code for linked list or Btree node managament using
DlinkT or BnodeT structs and their pointers, and switch back to the container.

A more realistic example would be a KThreadT struct in the kernel which is a
member of the threads-in-a-process Dlist and can also be in a ready to run Dlist.

// Get next thread to run.
newthreadP = KThreadFromSchedLink (DlistPopHead (schedReadyList));

Eric

Nick Maclaren

unread,
Feb 13, 2004, 12:06:14 PM2/13/04
to

In article <s71lln7...@beryl.CS.Berkeley.EDU>,

David Gay <dg...@beryl.CS.Berkeley.EDU> writes:
|>
|> These days you can declare foo with
|> struct foo {
|> int len;
|> double elems[];
|> };
|> and avoid the offsetof.

Don't bet on it. That is C99 and the vast majority of compilers
are C90, for good reasons (one of which is that customers demand
C90). While that syntax is one of the majority of C99 extensions
that is compatible with C90, a good compiler will warn about it
even if it accepts it, to promote portability.

And it doesn't help at all with the generic chain sort problem.


Regards,
Nick Maclaren.

Christopher Brian Colohan

unread,
Feb 13, 2004, 12:17:26 PM2/13/04
to
"Hank Oredson" <hore...@att.net> writes:
> It is my opinion that offsetof() would only be used by
> a totally incompetent programmer.

In C++ there is a type of pointer called "pointer-to-member". A
pointer-to-member is basically the same as offsetof(), except it is
properly typed. This is very useful for creating generic routines.

For example, what if you wanted to sort a list of structures based on
an arbitrary field in the structure? You could either create a new
comparison function for every field you wanted to use as a key, or you
could specify that field to the comparison function with a
pointer-to-member (or, in C, a structure offset).

If you argue that offsetof() is useless, then you should probably also
argue that pointers-to-members are useless... :-)

Eugene Nalimov

unread,
Feb 13, 2004, 2:11:38 PM2/13/04
to

"Terje Mathisen" <terje.m...@hda.hydro.com> wrote in message
news:c0hubc$av8$1...@osl016lin.hda.hydro.com...
>
>[...]

>
> Actually, it is very seldom that I have any serious problems reading the
> optimized output of a compiler, and if it was kind enough to show the
> roughly corresponding block of source code, it's relatively easy.
>
> I believe this is much more of a 'I have managed well without really
> learning asm optimization, and I bloody well don't intend to waste any
> time on that useless stuff now!' viewpoint.
>
> [...]

>
> Terje
> --
> - <Terje.M...@hda.hydro.com>
> "almost all programming can be viewed as an exercise in caching"

I regularly cannot understand output of Visual C for IPF. I have
years of low-level experience with both architecture and
compiler, and I don't think I am the dumbiest programmer. And
no, that is not my first (or second, or third, or ...) compiler, and
not my first (or second, or third, or ...) architecture.

On IPF you just have more registers than you can keep track in
your head (or on reasonable-sized piece of paper), don't have
address modes so there are even more instructions, global
scheduler moves instructions tens of basic blocks (and duplicates
them, and inserts compensation code), you have speculative and
advanced loads everywhere, predicated instructions (including
predicated comparisons), etc. Understanding software-pipelined
loops with rotated registers is child's play compared to that...

I remember recieving e-mail from one of IPF Visual C early
adopters, who traced and gave us tens of compiler bugs for
different architectures. Mail said "I spent several hours trying to
understand the code, but I still cannot understand it, and suspect
it is incorrect". Code in question was correct, we just generated
non-trivial sequence of instructions to implement one of the C
constructs (BTW, sequence was suggested by Peter Montgomery
who sometimes posts here).

Thanks,
Eugene


Stephen Sprunk

unread,
Feb 13, 2004, 2:53:38 PM2/13/04
to
"Terje Mathisen" <terje.m...@hda.hydro.com> wrote in message
news:c0hvb9$bko$1...@osl016lin.hda.hydro.com...

> Hank Oredson wrote:
> > It is my opinion that offsetof() would only be used by
> > a totally incompetent programmer.
>
> What I really don't understand is when you'd use it?

Many of the odder bits in the C language are there solely so that the C
standard library can be written 100% in C -- something not possible with
most other languages.

I do have a reference lying around on exactly when offsetof() is required,
but I'm too lazy to dig it up unless someone asks. My fuzzy memory says
that it's required to implement some compile-time constant folding in struct
definitions, which can't be implemented using -> or . notation.

> I can see that taking the difference between various offsetof() values
> could force the compiler to avoid all struct reorganization, and
> possibly also force it to use something like #pragma pack(1) to avoid
> holes, but otherwise, what is the real problem?

offsetof() is very useful when dealing with different member alignments and
possibly even compilers that rearrange struct members. In fact, its use is
a good hint to the compiler that such optimizations are _safe_, because it
shows the programmer is aware the struct's layout may vary and he's
compensated for that.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


Seongbae Park

unread,
Feb 13, 2004, 4:09:30 PM2/13/04
to
Stephen Sprunk wrote:
...

> but I'm too lazy to dig it up unless someone asks. My fuzzy memory says
> that it's required to implement some compile-time constant folding in struct
> definitions, which can't be implemented using -> or . notation.

offsetof() is just a macro
and the standard explicitly says it is a macro (7.17 3).
Defined as following in the sun systems:

#define offsetof(s, m) ((size_t)(&(((s *)0)->m)))

Seongbae

Del Cecchi

unread,
Feb 13, 2004, 4:21:25 PM2/13/04
to

"Benjamin Goldsteen" <b...@inka.mssm.edu> wrote in message
news:4db74fa6.04021...@posting.google.com...
> Ketil Malde <ke...@ii.uib.no> wrote in message
news:<egptcmt...@havengel.ii.uib.no>...
> > b...@inka.mssm.edu (Benjamin Goldsteen) writes:
> >
> > > Intel gives away their compiler for free to certain populations (e.g.
> > > .edu). Why should Intel spend money to develop software that will be
> > > given away to people who plan to use the software on non-Intel
> > > processors?
> >
> > I've no opinion on why they give it away. But to me it looks like
> > Intel has the best compiler out there, but they would really like to
> > have the fastest processor instead. So they try to hamper the use of
> > their compiler on competing processors to make it harder to take
> > full advantage of them.
>
> In the modern world, processors and compilers are built around each
> other. It is meaningless to say that I have the fastest processor but
> there is no compiler to take advantage of it. If AMD doesn't have a
> compiler that demonstrate that their processor is the fastest then
> they don't have the fastest processor.
>
snip

In the Real world, people also worry about performance running existing
binaries.

del cecchi


David Gay

unread,
Feb 13, 2004, 4:23:17 PM2/13/04
to

"Peter Dickerson" <first{dot}sur...@ukonline.co.uk> writes:

> "David Gay" <dg...@beryl.CS.Berkeley.EDU> wrote in message
> news:s71lln7...@beryl.CS.Berkeley.EDU...
> >
> > "Hank Oredson" <hore...@att.net> writes:
> > > It is my opinion that offsetof() would only be used by
> > > a totally incompetent programmer.
> >
> > One obvious one: variable-sized structs (array at the end):
> > struct foo {
> > int len;
> > double elems[1];
> > };
> > Then you allocate with
> > malloc(offsetof(struct foo, len) + sizeof(double) * nelems);
>
> You might want to reconsider that...

typo-time, indeed: offsetof(struct foo, elems)

--
David Gay
dg...@acm.org

Robert Wessel

unread,
Feb 13, 2004, 11:08:40 PM2/13/04
to
Seongbae Park <Seongb...@sun.com> wrote in message news:<c0jeea$msd$1...@news1nwk.SFbay.Sun.COM>...


It is indeed required to be a macro, except that it is not possible to
write the macro in conforming C. So each implementation has to
provide a suitable nonconforming, implementation specific, macro to
implement the function. The implementation you've shown has at least
two examples of implementation specific behavior.

Hank Oredson

unread,
Feb 14, 2004, 1:03:00 AM2/14/04
to

"Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
news:c0i55f$3n1$1...@pegasus.csx.cam.ac.uk...

The places I've seen it used were all, without exception that I
can recall, done to avoid using typedefs to name the substructures
of interest. If one really does want to refer to (as mentioned in
another post) the two sets of pointers that doubly thread a linked
list, one can give each set a name and just refer to them. Or maybe
it IS "just a style thing".

For that case we should have the full set: sizeof(), offsetof(),
addressof(), baseof() and typeof(). Now I can obfuscate things
without hardly even trying ;-)

Nick Maclaren

unread,
Feb 14, 2004, 5:04:33 AM2/14/04
to
In article <f38a3384eb52a219...@news.teranews.com>,

Stephen Sprunk <ste...@sprunk.org> wrote:
>"Terje Mathisen" <terje.m...@hda.hydro.com> wrote in message
>news:c0hvb9$bko$1...@osl016lin.hda.hydro.com...
>> Hank Oredson wrote:
>> > It is my opinion that offsetof() would only be used by
>> > a totally incompetent programmer.
>>
>> What I really don't understand is when you'd use it?
>
>Many of the odder bits in the C language are there solely so that the C
>standard library can be written 100% in C -- something not possible with
>most other languages.

Really? As someone who has implemented run-time systems for several
languages, including C, I don't recognise what you are referring to.
If I recall, offsetof isn't needed or even useful for implementing any
other part of the standard library.

Not merely are very few of the odder aspects needed for that purpose,
as shown by the fact that the same amount of the standard libraries
for Algol 68, BCPL and others were written in those languages, you
can't write the whole of the standard library in C on any system
that doesn't use C as its system interface.

Or even if it does, actually. setjmp and longjmp are like offsetof
in that they are logically part of the compiler, not the run-time
system, and cannot be implemented using standard C.

>I do have a reference lying around on exactly when offsetof() is required,
>but I'm too lazy to dig it up unless someone asks. My fuzzy memory says
>that it's required to implement some compile-time constant folding in struct
>definitions, which can't be implemented using -> or . notation.

Effectively, yes. If you need it in a context where you cannot declare
a variable, you can't calculate the offset using the addresses of the
structure and its member.

>offsetof() is very useful when dealing with different member alignments and
>possibly even compilers that rearrange struct members. In fact, its use is
>a good hint to the compiler that such optimizations are _safe_, because it
>shows the programmer is aware the struct's layout may vary and he's
>compensated for that.

The mind boggles! That is completely false. Consider:

typedef struct {
double a;
int b;
double c;
int d;
} t;

t x, y;

memcpy(&x.b,&y.b,offsetof(t,d)-offsetof(t,b));


Regards,
Nick Maclaren.

Stephen Sprunk

unread,
Feb 14, 2004, 6:00:28 AM2/14/04
to
"Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
news:c0krrh$afd$1...@pegasus.csx.cam.ac.uk...

> Really? As someone who has implemented run-time systems for several
> languages, including C, I don't recognise what you are referring to.
> If I recall, offsetof isn't needed or even useful for implementing any
> other part of the standard library.

The reference implementation (cited below) I have uses offsetof() for many
different things; the example given is so hideous I refuse to describe it,
but I must assume the author had a reason for doing it that way, as most of
his code is extremely clean.

> Not merely are very few of the odder aspects needed for that purpose,
> as shown by the fact that the same amount of the standard libraries
> for Algol 68, BCPL and others were written in those languages, you
> can't write the whole of the standard library in C on any system
> that doesn't use C as its system interface.

I was thinking of languages like Pascal, Java, and VB -- the ones you
mention are long before my time so I can't comment.

> Or even if it does, actually. setjmp and longjmp are like offsetof
> in that they are logically part of the compiler, not the run-time
> system, and cannot be implemented using standard C.

P. J. Plauger's _The Standard C Library_ (ISBN 0-13-131509-9) contains a
pure C implementation of setjmp/longjmp, though it's clearly not something
you'd ever use in a production system for both performance and reliability
reasons.

The same reference also has a C implementation of offsetof(), to wit:
#define offsetof(T, member) ((size_t)&((T *)0)->member)

> The mind boggles! That is completely false. Consider:
>
> typedef struct {
> double a;
> int b;
> double c;
> int d;
> } t;
>
> t x, y;
>
> memcpy(&x.b,&y.b,offsetof(t,d)-offsetof(t,b));

I'd say that's a contrived counter-example, but I'm sure someone would
happily go dig up such an example from the Linux or X11 source code and a
reason why it should be considered valid code.

I stand corrected.

Philip Armstrong

unread,
Feb 14, 2004, 6:41:27 AM2/14/04
to
In article <ba89f0f2aacaf405...@news.teranews.com>,

Stephen Sprunk <ste...@sprunk.org> wrote:
>I'd say that's a contrived counter-example, but I'm sure someone would
>happily go dig up such an example from the Linux or X11 source code and a
>reason why it should be considered valid code.

A quick grep through the linux kernel sources reveals 1159 lines using
offsetof().

Phil
--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

Nick Maclaren

unread,
Feb 14, 2004, 7:51:05 AM2/14/04
to
In article <ba89f0f2aacaf405...@news.teranews.com>,

Stephen Sprunk <ste...@sprunk.org> wrote:
>
>The reference implementation (cited below) I have uses offsetof() for many
>different things; the example given is so hideous I refuse to describe it,

Given the reference, that fails to surprise me.

>but I must assume the author had a reason for doing it that way, as most of
>his code is extremely clean.

Um. Well, maybe. I haven't seen that book, but have seen others of
his and seen some of his code on the C standard reflector and
elsewhere.

>I was thinking of languages like Pascal, Java, and VB -- the ones you
>mention are long before my time so I can't comment.

The same would apply to Fortran and probably Cobol. But there is no
need for unclean or even odd features in a suitably flexible language.

>> Or even if it does, actually. setjmp and longjmp are like offsetof
>> in that they are logically part of the compiler, not the run-time
>> system, and cannot be implemented using standard C.
>
>P. J. Plauger's _The Standard C Library_ (ISBN 0-13-131509-9) contains a
>pure C implementation of setjmp/longjmp, though it's clearly not something
>you'd ever use in a production system for both performance and reliability
>reasons.

It is a long time since I saw that but, if I recall, that "pure C"
implementation does not work under the majority of C compilers,
because it makes so many assumptions that ain't so. There was a
heated debate on the C standard reflector about it. I did say that
it can't be done using "standard C"; obviously, there are systems
for which some sufficiently horrible C will work.

>The same reference also has a C implementation of offsetof(), to wit:
>#define offsetof(T, member) ((size_t)&((T *)0)->member)

That is one of the two widespread methods of doing it, but is still
not legal C. There is no way of implementing it in standard C (see
the Rationale).

>I'd say that's a contrived counter-example, but I'm sure someone would
>happily go dig up such an example from the Linux or X11 source code and a
>reason why it should be considered valid code.

Quite.


Regards,
Nick Maclaren.

Terje Mathisen

unread,
Feb 14, 2004, 10:56:44 AM2/14/04
to
Eugene Nalimov wrote:
> I regularly cannot understand output of Visual C for IPF. I have
> years of low-level experience with both architecture and
> compiler, and I don't think I am the dumbiest programmer. And
> no, that is not my first (or second, or third, or ...) compiler, and
> not my first (or second, or third, or ...) architecture.
>
> On IPF you just have more registers than you can keep track in
> your head (or on reasonable-sized piece of paper), don't have
> address modes so there are even more instructions, global
> scheduler moves instructions tens of basic blocks (and duplicates
> them, and inserts compensation code), you have speculative and
> advanced loads everywhere, predicated instructions (including
> predicated comparisons), etc. Understanding software-pipelined
> loops with rotated registers is child's play compared to that...

OK, I'll accept that Itanium might be an exception, this is like an
order of magnitude more complex (i.e. obfuscated object code) than most
other architectures.


>
> I remember recieving e-mail from one of IPF Visual C early
> adopters, who traced and gave us tens of compiler bugs for
> different architectures. Mail said "I spent several hours trying to
> understand the code, but I still cannot understand it, and suspect
> it is incorrect". Code in question was correct, we just generated
> non-trivial sequence of instructions to implement one of the C
> constructs (BTW, sequence was suggested by Peter Montgomery
> who sometimes posts here).

In that case it was probably pretty efficient. :-)

Benjamin Goldsteen

unread,
Feb 14, 2004, 1:23:12 PM2/14/04
to
"Del Cecchi" <cecchi...@us.ibm.com> wrote in message news:<c0jf4m$17o3g1$1...@ID-129159.news.uni-berlin.de>...

> > In the modern world, processors and compilers are built around each
> > other. It is meaningless to say that I have the fastest processor but
> > there is no compiler to take advantage of it. If AMD doesn't have a
> > compiler that demonstrate that their processor is the fastest then
> > they don't have the fastest processor.
> >
> snip
>
> In the Real world, people also worry about performance running existing
> binaries.

I agree. And no one is prevented from running existing binaries on
the AMD platform and measuring them relative the Intel chips.

However, the SPEC benchmarks are a measure of the CPU, memory,
compiler, operating system, disk drives, etc. It makes no sense to
say that the AMD chip is the fastest when combined with the Intel
compiler, Cray RAM, the IRIX OS, submersed in liquid helium, and ECC
turned off. Intel obviously didn't intend for their compiler to be
used with the AMD chip, and I don't think the AMD chip should be
measured with the Intel compiler's output (curiosity and academic
exercises aside).

Benjamin Goldsteen

unread,
Feb 14, 2004, 1:47:44 PM2/14/04
to
Ketil Malde <ke...@ii.uib.no> wrote in message news:<eg3c9gz...@sefirot.ii.uib.no>...

> b...@inka.mssm.edu (Benjamin Goldsteen) writes:
>
> > In the modern world, processors and compilers are built around each
> > other. It is meaningless to say that I have the fastest processor but
> > there is no compiler to take advantage of it.
>
> But of course, this isn't the case: there *is* a compiler that
> (presumably) shows it is faster. Hobbled by its owner, in order to
> avoid showing that.

>
> > If AMD doesn't have a compiler that demonstrate that their processor
> > is the fastest then they don't have the fastest processor.
>
> You could take this one step further, and say that it is meaningless
> to claim a fast CPU, unless there are applications making use of it.
> Which eventually brings us back to the old maxim about benchmarking
> the application you're interested in, instead of relying on artificial
> "benchmark numbers".

>
> > I think you missed my point. Its about IP. GPL people are sensitive
> > to their IP being used for non-GPL projects. Intel doesn't want their
> > IP being used for non-Intel projects.
>
> One question is to what degree should Intel be able to decide this.
> Do they get to decide on which CPUs I run executables compiled with
> their compiler? What if I compile GCC with ICC, does the same apply
> to the GCC executable?

GCC always compiles itself as the last step so the final executable
should be indistinguishable from a GCC executable created with another
compiler.

> And can they limit my ability to publish
> benchmarks I make?

No -- I don't think they can limit what you do with the output from
your program.

Here is one I am undecided about: can they restrict you from binary
patching the resulting executable to run on non-Intel platforms?

Anyway, Intel probably doesn't care about the general case. I think
all they are trying to do is stop AMD from making use of Intel IP to
generate high SPEC numbers. Let's be honest, not too many years ago
we all dismissed Intel SPEC numbers because we considered the Intel
compiler a "SPEC special". No one used it for real applications
because it was considered too buggy. I would be unhappy, too, if I
spent good R&D money to develop a compiler for running benchmarks and
then my competitors starting making use of it.

> (And BTW, GPL is fine in non-GPL projects, lots of people compile
> non-GPL code with GCC, for instance. You just can distribute modified
> GPL code without source.)

My point was not the specific restrictions associated with the
different licenses, but that restrictions on the use of IP to benefit
a competitor is common.

However, take GCC. My understanding is that I can't make a non-GPL
frontend to GCC. The authors of GCC have threatened to deliberately
obfusicate the interface if anyone tries. I would consider that
deliberate crippling a product (or the threat to do so) to prevent it
use by a competitor.

Last, I want to clarify that I am not saying that Intel's decision is
good. Its important to distinguish feasible (technical), legal,
ethical, and smart. Just because it is technically possible for AMD
to use Intel's compiler to run the benchmark doesn't mean they have a
right to do so. I won't touch the legal issue, but I don't think it
is unethical for Intel to restrict the use of the compiler in the
manner they have. Whether it is smart -- whether it will help or hurt
their long-term strategy -- is, I think, a more interesting question.
And one probably that will only be settled in retrospect.

It is loading more messages.
0 new messages