sleazy intel compiler trick (SOURCE ATTACHED)

238 views
Skip to first unread message

iccOut

unread,
Feb 9, 2004, 5:38:39 PM2/9/04
to
As part of my study of Operating Systems and embedded systems, one of
the things I've been looking at is compilers. I'm interested in
analyzing how different compilers optimize code for different
platforms. As part of this comparison, I was looking at the Intel
Compiler and how it optimizes code. The Intel Compilers have a free
evaluation download from here:
http://www.intel.com/products/software/index.htm?iid=Corporate+Header_prod_softwr&#compilers
 

One of the things that the version 8.0 of the Intel compiler included
was an "Intel-specific" flag. According to the documentation, binaries
compiled with this flag would only run on Intel processors and would
include Intel-specific optimizations to make them run faster. The
documentation was unfortunately lacking in explaining what these
optimizations were, so I decided to do some investigating. 

First I wanted to pick a primarily CPU-bound test to run, so I chose
SPEC CPU2000. The test system was a P4 3.2G Extreme Edition with 1 gig
of ram running WIndows XP Pro. First I compiled and ran spec with the
"generic x86 flag" (-QxW), which compiles code to run on any x86
processor. After running the generic version, I recompiled and ran
spec with the "Intel-specific flag" (-QxN) to see what kind of
difference that would make. For most benchmarks, there was not very
much change, but for 181.mcf, there was a win of almost 22% !

Curious as to what sort of optimizations the compiler was doing to
allow the Intel-specific version to run 22% faster, I tried running
the same binary on my friend's computer. His computer, the second test
machine, was an AMD FX51, also with 1 gig of ram, running Windows XP
Pro. First I ran the "generic x86" binaries on the FX51, and then
tried to run the "Intel-only" binaries. The Intel-specific ones
printed out an error message saying that the processor was not
supported and exited.  This wasn't very helpful, was it true that only
Intel processors could take advantage of this performance boost?

I started mucking around with a dissassembly of the Intel-specific
binary and found one particular call (proc_init_N) that appeared to be
performing this check. As far as I can tell, this call is supposed to
verify that the CPU supports SSE and SSE2 and it checks the CPUID to
ensure that its an Intel processor. I wrote a quick utility which I
call iccOut, to go through a binary that has been compiled with this
Intel-only flag and remove that check.

Once I ran the binary that was compiled with the Intel-specific flag
(-QxN) through iccOut, it was able to run on the FX51. Much to my
surprise, it ran fine and did not miscompare. On top of that, it got
the same 22% performance boost that I saw on the Pentium4 with an
actual Intel processor. This is very interesting to me, since it
appears that in fact no Intel-specific optimization has been done if
the AMD processor is also capable to taking advantage of these same
optimizations. If I'm missing something, I'd love for someone to point
it out for me. From the way it looks right now, it appears that Intel
is simply "cheating" to make their processors look better against
competitor's processors.

Links:
Intel Compiler:http://www.intel.com/products/software/index.htm?iid=Corporate+Header_prod_softwr&#compilers
 

Here is the text:

/*
* iccOut 1.0
*
* This program enables programs compiled with the intel compiler
using the
* -xN flag to run on non-intel processors. This can sometimes result
in
* large performance increases, depending on the application. Note
that even
* though the check will be removed, the CPU running the application
*MUST*
* support both SSE and SSE2 or the program will crash.
*
*/

#include <stdio.h>
#include <string.h>


// x86 codes

#define X86_CALL 232 // E8 in hex
#define PUSH_EAX 80 // 50 in hex
#define X86_NOP 144 // 90 in hex

bool handleCall( unsigned char theBuffer[7], FILE* inputBinary, FILE*
fixedBinary );

//convienently, the check always seems to be one of the first calls in
//the file. this makes it easier to find.
void printUsage() {
printf("Usage:\n");
printf("iccOut filename\n\n");
printf("Filename is the name of the file to fix.\n\n");
}


//returns whether code was replaced
bool processNextCall( FILE* inputBinary, FILE* fixedBinary ) {

int lenRead;
int startIndex, bytesNeeded;
unsigned char addressBuffer[4];
unsigned char checkBuffer[2];
unsigned char fullBuffer[7];
unsigned char tempChar;
bool codeReplaced;
bool otherReplaced;

otherReplaced = false;

//fixme: error checking for reads
lenRead = fread( &addressBuffer, 1, 4, inputBinary );
lenRead = fread( &checkBuffer, 1, 2, inputBinary );

fullBuffer[0] = X86_CALL;
for( int i=1; i<5;i++ ) {
fullBuffer[i] = addressBuffer[i-1];
}
fullBuffer[5] = checkBuffer[0];
fullBuffer[6] = checkBuffer[1];

codeReplaced = handleCall( fullBuffer, inputBinary, fixedBinary );

if ( ! codeReplaced ) {

//if either of the last 2 bytes were a call, we need to keep doing
this
//until we run out of calls
while ( ( fullBuffer[5] == X86_CALL ) || ( fullBuffer[6] == X86_CALL
) ) {

if ( fullBuffer[5] != X86_CALL ) { //write it and ignore it
tempChar = fullBuffer[5];
fwrite( &tempChar, 1, 1, fixedBinary );
fullBuffer[0] = fullBuffer[6];
bytesNeeded = 6;
startIndex = 1;
} else {
fullBuffer[0] = fullBuffer[5];
fullBuffer[1] = fullBuffer[6];
bytesNeeded = 5;
startIndex = 2;
}

for( int i=0; i < bytesNeeded; i++ ) {
fread( &tempChar, 1, 1, inputBinary );
fullBuffer[startIndex+i] = tempChar;
}

otherReplaced = otherReplaced || handleCall( fullBuffer,
inputBinary, fixedBinary );
}
}
return ( codeReplaced || otherReplaced );
}

//returns whether code was replaced
bool handleCall( unsigned char theBuffer[7], FILE* inputBinary, FILE*
fixedBinary ) {

bool replacedCode;
unsigned char tempChar;

replacedCode = false;

//check if its what we're looking for (one of the first calls
followed by 2 push eax's)
if ( ( theBuffer[5] == PUSH_EAX ) && ( theBuffer[6] == PUSH_EAX ) ){
printf("Located call to subroutine to check intel support!\n");
printf("Substituting code ...\n");

//replace the call with nops
replacedCode = true;
for ( int i=0; i<5;i++ ) {
theBuffer[i] = X86_NOP;
}
}

if ( replacedCode || ( ( theBuffer[5] != X86_CALL ) && ( theBuffer[6]
!= X86_CALL ) )) {
//write out the two as they were
for ( int j=0; j<7;j++ ) {
tempChar = theBuffer[j];
fwrite( &tempChar, 1, 1, fixedBinary );
}
} else {
//don't write last 2 bytes
for( int i=0; i < 5; i++ ) {
tempChar = theBuffer[i];
fwrite( &tempChar, 1, 1, fixedBinary );
}
}
return replacedCode;
}

void fixIntelBinary( char *filename ) {

FILE *inputBinary;
FILE *fixedBinary;
unsigned char theChar;
bool editedCall;
bool skipWrite;
int lenRead;

printf("iccOut is currently fixing binary: %s\n\n", filename );

editedCall = false;
skipWrite = false;

//open files for reading and writing
inputBinary = fopen( filename, "rb" );
fixedBinary = fopen( strcat( filename, ".fixed" ), "wb" );

if ( ! inputBinary ) {
printf("Error opening input binary.\n");
return;
}

if ( ! fixedBinary ) {
printf("Error opening output file.\n");
return;
}

//start reading until we find what we want
fread( &theChar, 1, 1, inputBinary );
while (1) {
if ( !skipWrite ) {
//write last values
fwrite( &theChar, 1, 1, fixedBinary );
}
skipWrite = false;

//read next
lenRead = fread( &theChar, 1, 1, inputBinary );
if ( lenRead == 0) { //at end of file
break;
}

if ( ! editedCall ) {
//check if its the call XXX
if ( theChar == X86_CALL ) {
editedCall = processNextCall( inputBinary, fixedBinary );
skipWrite = true;

}
}
}

printf("iccOut has saved the day!\n");

//close files when finished
fclose( inputBinary );
fclose( fixedBinary );
}

bool fileExists( char *filename ) {

FILE *temp;
bool ret = false;

temp = fopen( filename, "r" );

if ( temp != 0 ) {
ret = true;
fclose( temp );
}
return ret;
}

int main( int argc, char **argv ) {

printf("\nWelcome to iccOut!\n\n");
printf("This will enable binaries compiled with -xN to run on
non-intel machines\n\n");

//verify parameters
if ( argc < 2 ) {
printUsage();
return 0;
}

//make sure file exists
if ( ! fileExists( argv[1] ) ) {
printf("File does not exist or is not accessible: %s\n", argv[1] );
return 0;
}

fixIntelBinary( argv[1] );
return 0;
}

Jeff

unread,
Feb 10, 2004, 1:21:00 AM2/10/04
to
I will be the first person to admit that Intel is evil, I have spent a
year co-oping with them, and I know first hand how things are done
there. While this may seem somewhat sleezy, that is only half of it.
The other side of Intel is the side that likes everything to be
perfect. Odds are, a major reason for the Intel only part is that
Intel does not want to put their reputation on the line that code will
run better on an AMD chip that has not yet been released. Intel tests
everything, over and over again, and if something doesn't work right,
they fix it before they release it. Intel doesn't have that control
over AMD processors, and one of the optimizations might not work on an
AMD, which would make Intel look bad. Keep in mind, Intel isn't
likely to pass up a chance to make themselves look better than AMD,
but Intel also likes to ensure that their products work as well as
possible, especially after some of the times that they have been
burned.

iccou...@yahoo.com (iccOut) wrote in message news:<a13e403a.04020...@posting.google.com>...

Grumble

unread,
Feb 10, 2004, 4:56:52 AM2/10/04
to
iccOut wrote:

> #define X86_CALL 232 // E8 in hex
> #define PUSH_EAX 80 // 50 in hex
> #define X86_NOP 144 // 90 in hex

I'm just wondering: if these three values make more sense to you in
hexadecimal than in decimal, then why not use hexadecimal notation?

#define X86_CALL 0xE8
#define PUSH_EAX 0x50
#define X86_NOP 0x90

Bernd Paysan

unread,
Feb 10, 2004, 6:10:50 AM2/10/04
to
Jeff wrote:

> I will be the first person to admit that Intel is evil, I have spent a
> year co-oping with them, and I know first hand how things are done
> there. While this may seem somewhat sleezy, that is only half of it.
> The other side of Intel is the side that likes everything to be
> perfect. Odds are, a major reason for the Intel only part is that
> Intel does not want to put their reputation on the line that code will
> run better on an AMD chip that has not yet been released. Intel tests
> everything, over and over again, and if something doesn't work right,
> they fix it before they release it. Intel doesn't have that control
> over AMD processors, and one of the optimizations might not work on an
> AMD, which would make Intel look bad. Keep in mind, Intel isn't
> likely to pass up a chance to make themselves look better than AMD,
> but Intel also likes to ensure that their products work as well as
> possible, especially after some of the times that they have been
> burned.

Last c't (3/2004) also reported that the -Qx[PBN] switches generate a check
for the precise processor, but run fine when the CPUID test is patched out.
I do agree that Intel can't control AMD's chips, but this sort of test is
dangerous. Remember Microsoft, who did put a test for MS-DOS into Windows
3.1, to make sure that it won't run under DR-DOS? They had to pay 300
millions to Caldera (who bought DR-DOS to litigate).

IMHO, it's ok to check for features (like SSE2), and stop if the used
features are not available, and it's perhaps ok to print a warning if the
program runs on a CPU it's not optimized for, i.e. if you say -QxP,
anything that's not a Prescott should trigger that warning. It's not ok to
check if it runs on a competing product, and refuse to work there. Not for
someone who has a "monopoly" (>70% market share).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Peter Dickerson

unread,
Feb 10, 2004, 7:30:11 AM2/10/04
to
"Bernd Paysan" <bernd....@gmx.de> wrote in message
news:qqqmf1-...@miriam.mikron.de...

Perhaps the features that Intel are checking for are SSE2 and full Intel
compatibility. Perhaps the way to find out is to wait for the next release
to see if the test for Intelness is much harder to identify and patch out,
or removed because AMD have been validated. I know which my money is on.

> anything that's not a Prescott should trigger that warning. It's not ok to
> check if it runs on a competing product, and refuse to work there. Not for
> someone who has a "monopoly" (>70% market share).
>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"
> http://www.jwdt.com/~paysan/

--
Peter
Peter.Dickerson (at) ukonline (dot) co (dot) uk


Jan de Vos

unread,
Feb 10, 2004, 8:37:48 AM2/10/04
to
In comp.arch, Bernd Paysan wrote:
> IMHO, it's ok to check for features (like SSE2), and stop if the used
> features are not available, and it's perhaps ok to print a warning if the
> program runs on a CPU it's not optimized for, i.e. if you say -QxP,
> anything that's not a Prescott should trigger that warning. It's not ok to
> check if it runs on a competing product, and refuse to work there. Not for
> someone who has a "monopoly" (>70% market share).

Intel doesn't have a monopoly on compilers.


jdv

Igor Levicki

unread,
Feb 10, 2004, 11:28:18 AM2/10/04
to
@iccOut:

First off, you could patch the function that does the check instead of
patching each call to it. That says a lot about your programming and
reverse engineering skills and logic.

Second, what is so sleazy about it? Why would they allow AMD to get
optimized code for their "advanced 8th generation" architecture for
free? They invested considerable amount of time and money into
optimization research and the development of their compiler. On the
other side when Pentium 4 came out people spat on it because "it
needed optimizations to run fast" and liked Athlon because it was
faster without optimizations.

@Jeff:

What is so evil in protecting your own investment?

@Bernd

# it's perhaps ok to print a warning if the
# program runs on a CPU it's not optimized for

If the program is compiled for Prescott and run on Pentium 4 and it
uses PNI (or SSE3 if you like that name better) then the program would
crash as soon as it encounters Prescott instruction.

# It's not ok to check if it runs on a competing
# product, and refuse to work there

Why not??? It is _Intel_ compiler for God sake!!! Why should it
produce code for AMD or any other CPU for that matter at all? If you
buy Intel compiler you should not expect it to work for other CPUs
unless they are 100% Intel compatible (e.g. they paid a license fee
for instruction set).

@everyone:
If you want compiler for AMD CPUs then go and ask AMD to make one. I
think that it is fair enough from Intel to allow generation of Pentium
3 and Pentium 4 code (SSE and SSE2) that works on Athlon XP and Athlon
64 CPUs. There is standalone compiler that supports both Intel and AMD
-- Codeplay VectorC so check it out. You have a choice not to use
Intel Compiler and Intel has the _right_ not to support competing
products.

Bernd Paysan

unread,
Feb 10, 2004, 12:06:39 PM2/10/04
to
Igor Levicki wrote:
> Why not??? It is _Intel_ compiler for God sake!!! Why should it
> produce code for AMD or any other CPU for that matter at all?

It does produce code for AMD or other x86-compatible CPUs. It just inserts
code that uses cpuid to check if this is actually an Intel CPU, and refuses
to run on other CPUs, *despite* it can run there without any problems!

This is not a matter of "support". Printing out a warning "This code is
running under a CPU which it is not optimized for" is perfectly ok, and
when the application runs slow or even produces wrong results: you have
been warned.

Why is the compiler trick a question? Intel doesn't have a monopoly on
compilers. This is ok for the open source world, where you can always use
another compiler if the result of some specific compiler isn't what you
want. This is not ok for the closed source world, where you have to use the
binary compiled with the compiler of choice from the ISV. The ISV may be
ignorant in one way or the other (he isn't aware of the problem/he doesn't
care about competing products to Intel).

Do we want Intel cloners to provide user-writable results to the cpuid
instruction? No. We want to use cpuid to check which CPU our program runs
on, we don't want anybody to fake it for any reason. A compiled program
that booboos at the user when it doesn't see "GenuineIntel" is such a
reason.

> If you
> buy Intel compiler you should not expect it to work for other CPUs
> unless they are 100% Intel compatible (e.g. they paid a license fee
> for instruction set).

Actually, AMD "paid" the license fee, i.e. they have a complete
cross-license agreement on x86 and extensions. AMD can use Intel's
instruction set, and Intel can (and will, according to recent news) use
AMD's instruction set. Does this make all your previous arguments moot?

And why is it so difficult to understand "fair play"? Why can't Intel just
produce better chips so that their code runs faster on their own chips, and
slower on competing chips without any dirty tricks?

BTW: AMD does support compiler development. They don't build their own
compiler, they just support compiler developers like the GCC team or the
Portland Group. The results look promising. I hope that everybody can use
those compilers on Intel processors when they finally release their CT
chips. On the other hand, I think it would be fair (in a tit-for-tat kind
of fairness) if those compilers would all emit cpuid code checking for
"AuthenticAMD", to force Intel to fake the result of cpuid in 64 bit mode,
too.

hack

unread,
Feb 10, 2004, 12:09:10 PM2/10/04
to
In article <qqqmf1-...@miriam.mikron.de>,
Bernd Paysan <bernd....@gmx.de> wrote:

>IMHO, it's ok to check for features (like SSE2), and stop if the used
>features are not available, and it's perhaps ok to print a warning if the
>program runs on a CPU it's not optimized for, i.e. if you say -QxP,
>anything that's not a Prescott should trigger that warning. It's not ok to
>check if it runs on a competing product, and refuse to work there. Not for
>someone who has a "monopoly" (>70% market share).

[The context of the original question was slightly different: an optimisation
that appeared to give the same substantial benefit on both Intel and AMD chips
for a certain benchmark, but was controlled by an Intel-only check.]

Suppose that Intel can prove (from its detailed knowledge of the internals
of its own processors) that the optimisation is valid in all cases, but that,
based only on the public ISA specs, certain cases might arise where it would
be invalid. In that case doing the optimisation when it *might* fail would
be wrong. So it didn't fail in this case with a non-Intel processor, but
that's not evidence that it could not produce the wrong result in another
case.

Whether it is ethical or legal to take such advantage of "insider" knowledge
is a different question. But should one concede this point, would you rather
have the flag speed up some code at the risk of producing the wrong result on
a non-Intel processor? And what should be the ethical and legal position on
THAT?

Michel.

Christoph Breitkopf

unread,
Feb 10, 2004, 12:25:00 PM2/10/04
to
Bernd Paysan <bernd....@gmx.de> writes:

> And why is it so difficult to understand "fair play"? Why can't Intel just
> produce better chips so that their code runs faster on their own chips, and
> slower on competing chips without any dirty tricks?

Even ignoring fair play, it might be good business sense to check
features instead of GenuineIntel. After all, even AMD used the Intel
compiler for their SPEC submissions, and for lots of code, it
is still the best optimizing compiler for the Athlon. Checking
for a GenuineIntel CPU devalues the compiler for people using,
or developing for, AMD systems.

OTOH, making money on compiler sales is probably not of
any importance to intel.

Regards,
Chris

CorpZ

unread,
Feb 10, 2004, 1:05:33 PM2/10/04
to
if that was to only change to the code to make it optimized for intel,
why wouldn't it work? All AMD 64-bit Cpu's can use SSE2(XP's could
only use SSE)

Jason Watkins

unread,
Feb 10, 2004, 1:25:04 PM2/10/04
to
Just how optimized is -QxW? Is it "generic x86" as in 386 compatable,
or is "generic 686"?

While I think the cpuid check is not a good thing, or at the least,
should be controlled by yet another compiler switch, these results
don't necessarily mean intel just purely cheating. You may not be
seeing all the optimiations the intel specific mode enables in action,
and your 22% may be purely 386 vs 686 code differences. I supppose
it's also possible that the intel specific mode does have some
optimization that causes potencial problems on hardware besides the
cpuid's they check for.

Zak

unread,
Feb 10, 2004, 1:27:20 PM2/10/04
to
Christoph Breitkopf wrote:

> Even ignoring fair play, it might be good business sense to check
> features instead of GenuineIntel. After all, even AMD used the Intel
> compiler for their SPEC submissions, and for lots of code, it
> is still the best optimizing compiler for the Athlon. Checking
> for a GenuineIntel CPU devalues the compiler for people using,
> or developing for, AMD systems.
>
> OTOH, making money on compiler sales is probably not of
> any importance to intel.

But this check prevents AMD from using the optimization flags in SPEC
and similar benchmarks. Which may be all what matters here.

Or would it be allowed for AMD to come up with 'AMD CompilerShell' as a
compiler, which calls icc and does the required patching afterwards?


Thomas

Rupert Pigott

unread,
Feb 10, 2004, 1:31:13 PM2/10/04
to
"hack" <ha...@watson.ibm.com> wrote in message
news:c0b37m$gr4$1...@news.btv.ibm.com...

> In article <qqqmf1-...@miriam.mikron.de>,
> Bernd Paysan <bernd....@gmx.de> wrote:
>
> >IMHO, it's ok to check for features (like SSE2), and stop if the used
> >features are not available, and it's perhaps ok to print a warning if the
> >program runs on a CPU it's not optimized for, i.e. if you say -QxP,
> >anything that's not a Prescott should trigger that warning. It's not ok
to
> >check if it runs on a competing product, and refuse to work there. Not
for
> >someone who has a "monopoly" (>70% market share).
>
> [The context of the original question was slightly different: an
optimisation
> that appeared to give the same substantial benefit on both Intel and AMD
chips
> for a certain benchmark, but was controlled by an Intel-only check.]
>
> Suppose that Intel can prove (from its detailed knowledge of the internals
> of its own processors) that the optimisation is valid in all cases, but
that,
> based only on the public ISA specs, certain cases might arise where it
would
> be invalid. In that case doing the optimisation when it *might* fail
would

They could just say "These options may cause code to fail on
non Intel(r)(tm) processors" in the blurb. Hell, even have the
compiler issue a warning to that effect perhaps. Silently
generating break on execute type strikes me as *thoroughly*
broken regardless of the moral aspects.

Let's say that you don't test on all of the possible variations
of x86 out there (highly likely), and you get a call from a
user of your code saying "It won't run [because of the silent
code insertion]" ... I think I'd be *extremely* pissed off by
that kind of call, it could be a bastard to fix as well, even
if you do just ditch ICC and ship a binary compiled by a compiler
that doesn't pull stunts like that. This relates to the code-path
thing Nick has with IA-64.

It's exactly this kind of market protection/ass covering that
drives people towards Open Source and in my view makes it a
*necessity* for applications you really care about.

Cheers,
Rupert


Stephen Sprunk

unread,
Feb 10, 2004, 1:32:52 PM2/10/04
to
"hack" <ha...@watson.ibm.com> wrote in message
news:c0b37m$gr4$1...@news.btv.ibm.com...
> Suppose that Intel can prove (from its detailed knowledge of the internals
> of its own processors) that the optimisation is valid in all cases, but
that,
> based only on the public ISA specs, certain cases might arise where it
would
> be invalid. In that case doing the optimisation when it *might* fail
would
> be wrong. So it didn't fail in this case with a non-Intel processor, but
> that's not evidence that it could not produce the wrong result in another
> case.
>
> Whether it is ethical or legal to take such advantage of "insider"
knowledge
> is a different question. But should one concede this point, would you
rather
> have the flag speed up some code at the risk of producing the wrong result
on
> a non-Intel processor? And what should be the ethical and legal position
on
> THAT?

If there are ambiguities in the SSE2 spec that disallow certain
optimizations on legal implementations (and nobody has shown or even claimed
this is the case), the ethical thing to do is revise the extension
definition and create a new CPUID flag for compliant implementations.
Simply assuming that no other vendor can implement SSE2 with the same
guarantees as Intel is downright sleazy and smacks of marketing involvement
rather than technical reasons.

At a minimum, there should be a flag to at least _allow_ -QxN code to run on
non-Intel chips so that software vendors can test other processors and make
the decision themselves. The ideal solution is for Intel to add flags to
optimize for non-Intel chips, or at least allow -QxN to work on non-Intel
chips they _have_ validated (if there's a true technical problem), but I
think it's safe to count that out in the near future.

I believe Intel's compiler folks truly want to produce the best compiler
possible for _all_ x86 chips because that's what would get their particular
division the most revenue and acclaim. If gcc's performance exceeded icc's
on non-Intel chips by using the optimizations in question, I think we'd find
icc suddenly allowing the optimization on non-Intel chips as well. However,
since the FSF places a higher priority on gcc's freedom and portability than
on raw performance, I don't know if/when that day may come.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


Robert Klute

unread,
Feb 10, 2004, 1:44:38 PM2/10/04
to
On 10 Feb 2004 10:25:04 -0800, jason_...@pobox.com (Jason Watkins)
wrote:

>Just how optimized is -QxW? Is it "generic x86" as in 386 compatable,
>or is "generic 686"?

One question to ask is if the compiler automatically inserts the check
when -QxW is used, or only when 'Intel'-specific optimizations are
inserted.

Hank Oredson

unread,
Feb 10, 2004, 2:01:27 PM2/10/04
to

"Zak" <sp...@jutezak.invalid> wrote in message
news:co9Wb.3678$O41.96116@amstwist00...

> Christoph Breitkopf wrote:
>
> > Even ignoring fair play, it might be good business sense to check
> > features instead of GenuineIntel. After all, even AMD used the Intel
> > compiler for their SPEC submissions, and for lots of code, it
> > is still the best optimizing compiler for the Athlon. Checking
> > for a GenuineIntel CPU devalues the compiler for people using,
> > or developing for, AMD systems.
> >
> > OTOH, making money on compiler sales is probably not of
> > any importance to intel.
>
> But this check prevents AMD from using the optimization flags in SPEC
> and similar benchmarks. Which may be all what matters here.

Huh what?
AMD is free to use any compiler they choose.
If they choose a compiler created by a competitor, that is their lookout.

> Or would it be allowed for AMD to come up with 'AMD CompilerShell' as a
> compiler, which calls icc and does the required patching afterwards?

Perhaps this is simply too obvious?
There is nothing I can think of that stops AMD from creating
(or paying some other company to create) their own compiler.

--

... Hank

Hank: http://horedson.home.att.net
W0RLI: http://w0rli.home.att.net


Agrabob

unread,
Feb 10, 2004, 3:45:36 PM2/10/04
to
Bernd Paysan <bernd....@gmx.de> wrote in message news:<qqqmf1-...@miriam.mikron.de>...
> Jeff wrote:
>
> > I will be the first person to admit that Intel is evil, I have spent a
> > year co-oping with them, and I know first hand how things are done
> > there. While this may seem somewhat sleezy, that is only half of it.
> > The other side of Intel is the side that likes everything to be
> > perfect. Odds are, a major reason for the Intel only part is that
> > Intel does not want to put their reputation on the line that code will
> > run better on an AMD chip that has not yet been released. Intel tests
> > everything, over and over again, and if something doesn't work right,
> > they fix it before they release it. Intel doesn't have that control
> > over AMD processors, and one of the optimizations might not work on an
> > AMD, which would make Intel look bad. Keep in mind, Intel isn't
> > likely to pass up a chance to make themselves look better than AMD,
> > but Intel also likes to ensure that their products work as well as
> > possible, especially after some of the times that they have been
> > burned.
>


Just looked over this:
http://www.intel.com/software/products/compilers/cwin/sysreq.htm

Heres a snippet:
"
Minimum Hardware Requirements to Develop IA-32 Applications
A system based on a 450 MHz Intel® Pentium® II processor or greater,
Intel Pentium 4 recommended

...

Minimum Hardware Requirements to Develop Itanium®-based Applications
on an IA-32 System
A system with a 450 MHz Intel® Pentium® II processor or greater
(Pentium 4 recommended)

...

Minimum Hardware Requirements to Develop Itanium-based Applications on
an Itanium-based System
A system with an Intel Itanium processor or greater (Itanium 2
recommended)
"
(end of quote)

Obviously I did omit parts to shorten the message(sorry about the
lenght), but it is also obvious that Intel makes no mention that this
product will work on an AMD proc and explicitly requires that you have
an Intel cpu that falls under one of these three categories.

Maybe, I am missing something as far as legality goes, but it would
seem to me that if Intel doesn't claim that an AMD cpu will work with
the compiler, they have nothing to worry about.

They are most likely doing one or both of two things here:
Making sure no "Intel Optimized" code will run on an AMD cpu for,
1) Compatibility. If a developer released an app that contained Intel
optimized code, but fails to run correctly on an AMD cpu and tey have
sold millions of copies of it: they're screwed. And they are going to
blame Intel. If you have to "hack" the compiler(iccOut utility) to get
the code to work, then Intel doesn't have to take any blame when it
blows up on you(if it ever will).
2) Competition. If they can get away with not compiling SIMD
instructions for competeting CPUs then all comercial apps with Intel
optimizations will make Intel CPUs more appealing to the consumer.

Someone analyze my thinking on this. On the surface it seems like I
have caught all the legal angles, but I feel I am missing something(im
not a lawyer people :P).

iccOut

unread,
Feb 10, 2004, 6:00:11 PM2/10/04
to
@ hack:

I think that the particular optimization they are doing, at least for
this benchmark, does not involve any special trickery with the way
instructions work. It doesn't even rely on MMX/SSE/SSE2/SSE3, this
particular mcf optimization appears to be soley re-arranging fields in
a struct, which is clearly not intel-specific and any processor (intel
or amd) should be able to take advantage of this. It wouldn't surprise
me to learn of more cases like this one where it appears Intel is
trying to handicap AMD's performance on SPEC. It's possible that there
are programs that, when compiled with the -QxN flag, will generate
code that will not work on AMD processors but I've yet to encounter
one.

@ igor levicki:

Patching the routine that does the check is another alternative.
However, there is only ever one call to proc_init_N and thus only one
call to patch anyway. Simply removing the call is easier than going
through the routine that does checking since x86 lets you have
instructions of crazy lengths and you have to be careful to keep all
the offsets and lengths the same.

@ Christian Brietkopf:

You're correct, AMD does use the Intel compiler for SPEC submissions
and while it does do a fair amount of optimization, there are cases
such as this one where completely general optimizations will only
occur with the -QxN flag even though they're clearly not
intel-specific.

@ Robert Klute:

The compiler will not insert these checks when compiled with -QxW.
However, it will also not perform anywhere near the same level of
optimization as -QxN. Running the -QxW binaries vs the -QxN binaries
on an AMD machine shows a 22% performance difference, which is not
insignificant.

Benjamin Goldsteen

unread,
Feb 10, 2004, 8:33:38 PM2/10/04
to
"Stephen Sprunk" <ste...@sprunk.org> wrote in message

> At a minimum, there should be a flag to at least _allow_ -QxN code to run on
> non-Intel chips so that software vendors can test other processors and make
> the decision themselves. The ideal solution is for Intel to add flags to
> optimize for non-Intel chips, or at least allow -QxN to work on non-Intel
> chips they _have_ validated (if there's a true technical problem), but I
> think it's safe to count that out in the near future.

Intel gives away their compiler for free to certain populations (e.g.
.edu). Why should Intel spend money to develop software that will be
given away to people who plan to use the software on non-Intel
processors?

Isn't this the same as the GPL license prohibiting the use of GPL'd
software components in non-GPL licensed software? GPL people are
always concerned that someone will make use of their IP for profit
without giving anything back to the GPL community. Similarly, Intel
doesn't care much for its IP being used by a competitor.

I don't think it is out-of-line for Intel to make their compiler a)
only compile on Intel processors and b) generate executables that only
run on Intel (or IA32-licensed) processors at high optimization. It
would be difficult to depend on the compiler if the generated code
didn't run on non-Intel processors at any optimization. One could
never use it to distribute binaries. Or maybe Intel could charge $500
for the Intel-only compiler and $1500 for the any-PC compiler.
Whether or not such restrictions are part of a good long-term strategy
is a different question. However, if you don't like it, you can
always use GNU C or the Portland Group compilers.

P.S.This e-mail address is not active. Do not reply directly to
sender.

Stephen Sprunk

unread,
Feb 10, 2004, 8:26:30 PM2/10/04
to
"Agrabob" <mtl...@sbcglobal.net> wrote in message
news:d53ddc33.04021...@posting.google.com...

> Maybe, I am missing something as far as legality goes, but it would
> seem to me that if Intel doesn't claim that an AMD cpu will work with
> the compiler, they have nothing to worry about.
> ...

> Someone analyze my thinking on this. On the surface it seems like I
> have caught all the legal angles, but I feel I am missing something(im
> not a lawyer people :P).

Well, I'm not a lawyer either, but I can't see anything _illegal_ about
Intel's behavior here. Excluding unlikely (IMHO) anti-trust considerations,
Intel is free to sell whatever products they want with whatever features
they want, and it's industry practice to disclaim that the product will do
even what they claim it will do.

The main complaint, from me and others, is that it's unethical and/or sleazy
(but quite legal) to disable valid SSE2 optimizations simply because the
generated code happens to be running on a competitor's CPU. Does icc
generate MMX and SSE1 code that runs on non-Intel CPUs? If so, their
behavior is not only sleazy, it's not even self-consistent. If not, we're a
few years late in flaming them -- but it's still sleazy.

Stephen Sprunk

unread,
Feb 10, 2004, 8:58:12 PM2/10/04
to
"Hank Oredson" <hore...@att.net> wrote in message
news:bU9Wb.3322$hR.1...@bgtnsc05-news.ops.worldnet.att.net...

> "Zak" <sp...@jutezak.invalid> wrote in message
> news:co9Wb.3678$O41.96116@amstwist00...
> > Or would it be allowed for AMD to come up with 'AMD CompilerShell'
> > as a compiler, which calls icc and does the required patching
afterwards?
>
> Perhaps this is simply too obvious?

If it's not standard practice yet, it probably will be soon. If I can patch
an opcode or two in my binaries and get a 22% speed bump, what reasons do I
have _not_ to do it? In fact, it might even be worth the larger hassle of
patching icc to not emit the check in the first place. It's not like
software warranties are worth the bits they're printed on...

> There is nothing I can think of that stops AMD from creating
> (or paying some other company to create) their own compiler.

AMD funds a lot of work on gcc and the Portland Group's compiler, but
neither of those is competitive performance-wise with icc at this point or
AMD would be using one of them for SPEC. IIRC, Intel funds work on gcc
also, even though icc (almost?) always produces better code.

Stephen Sprunk

unread,
Feb 10, 2004, 8:58:13 PM2/10/04
to
"iccOut" <iccou...@yahoo.com> wrote in message
news:a13e403a.04021...@posting.google.com...

> I think that the particular optimization they are doing, at least for
> this benchmark, does not involve any special trickery with the way
> instructions work. It doesn't even rely on MMX/SSE/SSE2/SSE3, this
> particular mcf optimization appears to be soley re-arranging fields in
> a struct, which is clearly not intel-specific and any processor (intel
> or amd) should be able to take advantage of this.

Well, aside from the obvious fact that icc -QxN is breaking the C spec by
rearranging the contents of a struct, it sounds like we should be yelling at
SPEC to improve their source code instead of berating Intel for sleazy
behavior.

> Patching the routine that does the check is another alternative.
> However, there is only ever one call to proc_init_N and thus only one
> call to patch anyway. Simply removing the call is easier than going
> through the routine that does checking since x86 lets you have
> instructions of crazy lengths and you have to be careful to keep all
> the offsets and lengths the same.

Couldn't you just alter the entry point of proc_init_N to clean up its stack
and return immediately? The remainder of the function would be dead code so
you don't have to worry about maintaining proper x86 decoding.

Ivan

unread,
Feb 10, 2004, 9:09:31 PM2/10/04
to
I have played with the Intel compiler 8.0 and AMD
CPU's (just XP and MP, not FX-51) on Linux myself.
Because the AMD XP/MP does not support sse2,
I used the -xK switch (optimize for PIII or later)
and the executables run typically 10-15% faster.
On my P4 there is absolutely no difference between
-xK and -xN.
Unfortunately, one gets sometimes segmentation faults in
vectorized loops that contain calls to log => one has
to prevent vectorization of such loops if the code
is to be able to run on Athlons.
In addition, Fortran I/O fails
sometimes (especially when rewinding open files) =>
one has to compile such functions separately without
the -xK switch.

It is interesting that, once the above problems are solved,
the Intel compiler optimizes
better for the Athlons than for a P4 (well, at least
when compared to the GNU compiler). In one of my applications,
the code runs about 10% faster on an XP 1800+ compared
to a 2 GHz P4 when compiled with GCC, but 25% faster when
compiled with Intel 8.0! On an absolute scale the
Intel executables are ~15% faster than GCC 3.3.1
but only 3-4% faster when compared to GCC 3.4

Seongbae Park

unread,
Feb 11, 2004, 3:48:05 AM2/11/04
to
Stephen Sprunk wrote:
...

> Well, aside from the obvious fact that icc -QxN is breaking the C spec by
> rearranging the contents of a struct,

Simply rearranging struct fields doesn't violate C standard.
As long as the user code can not tell the difference,
it's standard conforming.
I don't know whether Intel's doing it correctly or not
in this particular case though.

> it sounds like we should be yelling at
> SPEC to improve their source code instead of berating Intel for sleazy
> behavior.

I wonder what's your rationale of this yelling.
How many C/C++ programmers do you know who pays
any attention to the order of struct/class fields
for the purpose of improving performance ?
I don't know any, except a few compiler writers and performance analysts.

Also, rearranging struct fields often reduces the readability of the code
and the optimal arrangement is often dependent
on the particular machine features such as cache line size.
So doing it manually is not always desirable.

If the compiler can do it properly, I'd say it's a good thing.
Of course, it's not trivial to do properly.

Seongbae

Ketil Malde

unread,
Feb 11, 2004, 4:18:02 AM2/11/04
to
b...@inka.mssm.edu (Benjamin Goldsteen) writes:

> Intel gives away their compiler for free to certain populations (e.g.
> .edu). Why should Intel spend money to develop software that will be
> given away to people who plan to use the software on non-Intel
> processors?

I've no opinion on why they give it away. But to me it looks like
Intel has the best compiler out there, but they would really like to
have the fastest processor instead. So they try to hamper the use of
their compiler on competing processors to make it harder to take
full advantage of them.

> Isn't this the same as the GPL license prohibiting the use of GPL'd
> software components in non-GPL licensed software?

I don't think so. You can run GPL software together with proprietary
software. You have something similar in the Linux kernel, where
loading a proprietary (binary only) module will "taint" the kernel.
However, while it will print a warning and possibly make it harder to
get support, it won't stop you from running it.

-kzm
--
If I haven't seen further, it is by standing in the footprints of giants

Jan C. Vorbrüggen

unread,
Feb 11, 2004, 5:28:26 AM2/11/04
to
> Or would it be allowed for AMD to come up with 'AMD CompilerShell' as a
> compiler, which calls icc and does the required patching afterwards?

If I remember the run rules correctly, you are allowed to cross-compile
- i.e., run the build phase on an Intel system and run the benchmark on
an AMD system.

Jan

Jan C. Vorbrüggen

unread,
Feb 11, 2004, 5:31:06 AM2/11/04
to
> Also, rearranging struct fields often reduces the readability of the code
> and the optimal arrangement is often dependent
> on the particular machine features such as cache line size.

I can't remember a system where following the general rule "Start with
the largest (in bytes/words) items and end with the smallest ones" wouldn't
fit well. Exceptions would be compilers that do not properly align the
struct starting address - but those are broken (at least in the sense of
quality-of-implementation) anyway.

Jan

Benny Amorsen

unread,
Feb 11, 2004, 6:17:58 AM2/11/04
to
>>>>> "BG" == Benjamin Goldsteen <b...@inka.mssm.edu> writes:

BG> Isn't this the same as the GPL license prohibiting the use of
BG> GPL'd software components in non-GPL licensed software?

There is no such prohibition. You cannot distribute the combined work
of course, but it is perfectly legal to /use/ GPL'd software
components any way you want.


/Benny

Ken Hagan

unread,
Feb 11, 2004, 6:24:22 AM2/11/04
to
Jan C. Vorbrüggen wrote:
>
> I can't remember a system where following the general rule "Start
> with the largest (in bytes/words) items and end with the smallest
> ones" wouldn't fit well.

Different systems may typedef things to different sizes.

Still, more of a problem in practice is the fact that programmers
don't normally trawl through headers mentally expanding typedefs.

'Tis a pity C has no way of saying "I don't care how this struct
is laid out. I'm not going to bang on it using memcpy().".


Ricardo Bugalho

unread,
Feb 11, 2004, 7:07:35 AM2/11/04
to

Depends on your definition of component.
But if a given software depends on a GPL'd software to run (for example,
a library) then that software must also be GPL. That's why lots of
libraries have LGPL or BSD licences.

--
Ricardo

Bernd Paysan

unread,
Feb 11, 2004, 8:14:21 AM2/11/04
to
Stephen Sprunk wrote:
> AMD funds a lot of work on gcc and the Portland Group's compiler, but
> neither of those is competitive performance-wise with icc at this point or
> AMD would be using one of them for SPEC.

The Portland Group Fortran compiler is competitive to icc. According to c't
3/2004, SPECfp2000 base under Linux is 1169 with PGI 5.1/GCC 3.3.2, and
1159 with icc V8 on an Athlon 64 3400+. GCC is faster than PGI on C, so
that's used there. Looking at the SPECint2000 base values also shows almost
competitive performance from GCC 3.3.2: 1164 vs. 1240 from icc; GCC takes
advantage of the 64 bit mode. Note that on Pentium 4 3.2GHz, icc runs
significantly faster (more than 10%): 1127 vs. 976 from GCC. c't used a
patched icc (no cpuid="GenuineIntel" check). What I don't understand is the
big difference between Windows SPECint2000 base (1406 with icc) and the
Linux result (on the Pentium 4, the difference is less).

Jan C. Vorbrüggen

unread,
Feb 11, 2004, 10:44:30 AM2/11/04
to
> Different systems may typedef things to different sizes.

Sure - but are the _relative_ sizes likely to change? That is, you have
a sequence that on one system yields, say, 8-4-4-2-2 bytes that will result
on another system in 4-8-2-4-2 bytes?

Jan

Terje Mathisen

unread,
Feb 11, 2004, 11:32:30 AM2/11/04
to

I've seen a lot of systems like that:

All x86 machines, all the way back to the original 8088, handles
structure offsets within +127 bytes of the base pointer better than
stuff that's further away.

If some parts of the structure is likely to be accessed together, then
they should also be aligned in such a way that they are likely to end up
in the same cache line, even if that could lead to an occasional
non-optimal (size-wise) packing.

However, given that the two items above doesn't matter, using your
'largest first' rule does work pretty well.

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Zak

unread,
Feb 11, 2004, 1:05:12 PM2/11/04
to
Ricardo Bugalho wrote:

> Depends on your definition of component.
> But if a given software depends on a GPL'd software to run (for example,
> a library) then that software must also be GPL. That's why lots of
> libraries have LGPL or BSD licences.

Only when you want to distribute 'given software' with the GPL'd
software, and ISTR you can even distribute them together as a single
working setup if the parts are separate components that can in theory be
exchanged. Shared libraries do not count as such for GPL, but things
that run in pipes or use tmp files can.

This for example allows someone to distribute Linux with closed source
software.


Thomas

Seongbae Park

unread,
Feb 11, 2004, 12:52:52 PM2/11/04
to
In article <402A046A...@mediasec.de>, Jan C Vorbrüggen wrote:
>> Also, rearranging struct fields often reduces the readability of the code
>> and the optimal arrangement is often dependent
>> on the particular machine features such as cache line size.

"optimal" here meant for performance, not space.

> I can't remember a system where following the general rule "Start with
> the largest (in bytes/words) items and end with the smallest ones" wouldn't
> fit well. Exceptions would be compilers that do not properly align the
> struct starting address - but those are broken (at least in the sense of
> quality-of-implementation) anyway.
>
> Jan

That rule would work fine for space, and to some degree for performance
but it leaves too much, especially in large structs with many pointers
(which are all same size as we all know) and even in relatively small structs.
Having frequently accessed pointers of a struct to be in the same L1 cache line
can make a huge difference.

And my guess is Intel's mcf optimization is mainly to
improve the cache hit rate - we (Sun) already published a paper about this:

www.sc-conference.org/sc2003/paperpdfs/pap182.pdf

Seongbae

Sheldon Simms

unread,
Feb 11, 2004, 1:50:46 PM2/11/04
to
On Wed, 11 Feb 2004 08:48:05 +0000, Seongbae Park wrote:

> Stephen Sprunk wrote:
> ...
>> Well, aside from the obvious fact that icc -QxN is breaking the C spec by
>> rearranging the contents of a struct,
>
> Simply rearranging struct fields doesn't violate C standard.
> As long as the user code can not tell the difference,
> it's standard conforming.
> I don't know whether Intel's doing it correctly or not
> in this particular case though.

C99 6.7.2.1:

13 Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared.


Seongbae Park

unread,
Feb 11, 2004, 1:58:27 PM2/11/04
to

Yes. But if there's no user code that takes address of struct field
or relies on such, this is irrelavant.

Seongbae

Rupert Pigott

unread,
Feb 11, 2004, 2:46:39 PM2/11/04
to
"Seongbae Park" <Seongb...@sun.com> wrote in message
news:c0du0j$hso$1...@news1nwk.SFbay.Sun.COM...

However it would "violate C standard", regardless. How
about folks who poke around core dumps ? I imagine they
might notice...


Cheers,
Rupert


David Gay

unread,
Feb 11, 2004, 3:03:55 PM2/11/04
to

Probably not, except if there's some piece of the standard that states
"addresses must be machine-level addresses" (in other words, an address
which is not observed is effectively undefined as you haven't had to define
its mapping to real addresses).

> How about folks who poke around core dumps ? I imagine they might
> notice...

Indeed. I'd expect problems with separately-compiled programs and writing
structs to files/sockets, too (so taking the address of a struct foo, using
struct foo *, etc should disable any field-reordering optimisation for
struct foo).

--
David Gay, not speaking for Intel
dg...@acm.org

Benjamin Goldsteen

unread,
Feb 11, 2004, 4:00:41 PM2/11/04
to
Ketil Malde <ke...@ii.uib.no> wrote in message news:<egptcmt...@havengel.ii.uib.no>...

> b...@inka.mssm.edu (Benjamin Goldsteen) writes:
>
> > Intel gives away their compiler for free to certain populations (e.g.
> > .edu). Why should Intel spend money to develop software that will be
> > given away to people who plan to use the software on non-Intel
> > processors?
>
> I've no opinion on why they give it away. But to me it looks like
> Intel has the best compiler out there, but they would really like to
> have the fastest processor instead. So they try to hamper the use of
> their compiler on competing processors to make it harder to take
> full advantage of them.

In the modern world, processors and compilers are built around each
other. It is meaningless to say that I have the fastest processor but
there is no compiler to take advantage of it. If AMD doesn't have a
compiler that demonstrate that their processor is the fastest then
they don't have the fastest processor.

I don't think Intel is under any obligation to provide their compilers
for non-Intel processors. I think it is fair to say that Intel didn't
develop their compiler because they wanted to sell compilers. They
developed their compiler because they wanted to show off their
processors.

In the HPC world, we test Sun with Sun Forte (Sun One?) compilers
against IBM with IBM XL compilers against SGI with MIPSpro compilers
against ... Its not important that Sun would do better if only they
had IBM's compiler technology. In order to compete in this world,
Intel needed a compiler that demonstrated the performance of their
processor. Saying that their processor was the fastest if only there
was a good compiler for it wasn't good enough. Similarly, AMD can't
claim their processor is fastest unless they have a compiler for it.

AMD either needs to a) license Intel's compiler, b) develop their own
compiler, c) improve GNU's optimization, or d) or depend on a
3rd-party like the Portland Group. Depending on a non-cooperative
competitor doesn't cut it.

> > Isn't this the same as the GPL license prohibiting the use of GPL'd
> > software components in non-GPL licensed software?
>
> I don't think so. You can run GPL software together with proprietary
> software. You have something similar in the Linux kernel, where
> loading a proprietary (binary only) module will "taint" the kernel.
> However, while it will print a warning and possibly make it harder to
> get support, it won't stop you from running it.


I think you missed my point. Its about IP. GPL people are sensitive
to their IP being used for non-GPL projects. Intel doesn't want their
IP being used for non-Intel projects. A BSD-type license says "we
want to get their technology out there and we don't care if someone
else makes the money on it". People who use GPL-type licenses care
about how their IP is used. Their restrictions are different but they
still care. So does Intel. I'm glad everyone has that choice.

Stephen Sprunk

unread,
Feb 11, 2004, 4:00:37 PM2/11/04
to
"Seongbae Park" <Seongb...@sun.com> wrote in message
news:c0du0j$hso$1...@news1nwk.SFbay.Sun.COM...

As the saying goes, if a tree falls in the forest and nobody is around to
hear it, does it really make a noise?

The C spec (thanks for the citation, Sheldon) says you must lay out struct
members in the order they're declared. If Intel's compiler doesn't do that,
it's not compliant, period.

Unless icc is incredibly smart about detecting you taking the address of a
struct, using it in a union, aliasing it via questionable pointer math,
passing it as a C++ reference argument, etc. there's a chance it can
generate incorrect code. The C standard has enough ambiguities that cause
portability problems; we don't need people intentionally introducing new
problems with the unambiguous parts.

John F. Carr

unread,
Feb 11, 2004, 4:23:13 PM2/11/04
to
In article <a13e403a.04020...@posting.google.com>,
iccOut <iccou...@yahoo.com> wrote:
>I started mucking around with a dissassembly of the Intel-specific
>binary and found one particular call (proc_init_N) that appeared to be
>performing this check. As far as I can tell, this call is supposed to
>verify that the CPU supports SSE and SSE2 and it checks the CPUID to
>ensure that its an Intel processor.

I remember the discussion when Intel announced CPUID and implicitly
announced the ability to make programs not run on clone x86 processors.
I still like the solution I posted back then:

<http://groups.google.com/groups?threadm=1pdqj0INN2h1%40senator-bedfellow.MIT.EDU>


--
John Carr (j...@mit.edu)

Peter Boyle

unread,
Feb 11, 2004, 4:47:34 PM2/11/04
to

On Wed, 11 Feb 2004, John F. Carr wrote:

> I remember the discussion when Intel announced CPUID and implicitly
> announced the ability to make programs not run on clone x86 processors.
> I still like the solution I posted back then:
>
> <http://groups.google.com/groups?threadm=1pdqj0INN2h1%40senator-bedfellow.MIT.EDU>

Cute!
Peter

>
> --
> John Carr (j...@mit.edu)
>

Peter Boyle pbo...@physics.gla.ac.uk


Stephen Clarke

unread,
Feb 11, 2004, 5:14:56 PM2/11/04
to
"Stephen Sprunk" <ste...@sprunk.org> wrote in message
news:6d6f33a965891137...@news.teranews.com...

> "Seongbae Park" <Seongb...@sun.com> wrote in message
> news:c0du0j$hso$1...@news1nwk.SFbay.Sun.COM...
> > In article <pan.2004.02.11....@yahoo.com>, Sheldon Simms
> wrote:
> > > On Wed, 11 Feb 2004 08:48:05 +0000, Seongbae Park wrote:
> > >
> > >> Stephen Sprunk wrote:
> > >>> Well, aside from the obvious fact that icc -QxN is breaking the C
> > >>> spec by rearranging the contents of a struct,
> > >>
> > >> Simply rearranging struct fields doesn't violate C standard. As long
> > >> as the user code can not tell the difference, it's standard conforming.
> > >> I don't know whether Intel's doing it correctly or not in this
> particular
> > >> case though.
> > >
> > > C99 6.7.2.1:
> > >
> > > 13 Within a structure object, the non-bit-field members and the units in
> > > which bit-fields reside have addresses that increase in the order in
> > > which they are declared.
> >
> > Yes. But if there's no user code that takes address of struct field
> > or relies on such, this is irrelavant.
>
> As the saying goes, if a tree falls in the forest and nobody is around to
> hear it, does it really make a noise?
>
> The C spec (thanks for the citation, Sheldon) says you must lay out struct
> members in the order they're declared. If Intel's compiler doesn't do that,
> it's not compliant, period.

The standard describes the behaviour on an abstract machine (C99, 5.1.2.3).
The compiler has to map this behaviour onto the real machine in
such a way that the behaviour observed by the program matches
that abstract machine behaviour. If the program never observes
the relative offsets of two fields in a struct, then there's no need
to preserve that behaviour on the real machine.

If you require 6.7.2.1 to hold for the real machine, then
any compiler that holds structs in registers, or passes struct-valued
arguments in registers would also be non-compliant, because registers
don't have addresses at all.
That would be a significant number of compilers ...

Stephen.

Seongbae Park

unread,
Feb 11, 2004, 5:06:45 PM2/11/04
to
Stephen Sprunk wrote:
...

>> > C99 6.7.2.1:
>> >
>> > 13 Within a structure object, the non-bit-field members and the units in
>> > which bit-fields reside have addresses that increase in the order in
>> > which they are declared.
>>
>> Yes. But if there's no user code that takes address of struct field
>> or relies on such, this is irrelavant.
>
> As the saying goes, if a tree falls in the forest and nobody is around to
> hear it, does it really make a noise?
>
> The C spec (thanks for the citation, Sheldon) says you must lay out struct
> members in the order they're declared. If Intel's compiler doesn't do that,
> it's not compliant, period.

Let's play the standard game then.

C99 5.1.2.3:

5. The least requirements on a conforming implementation are:

At sequence points, volatile objects are stable in the sense that
previous accesses are complete and subsequent accesses have not
yet occurred.

At program termination, all data written into files shall be
identical to the result that execution of the program according to
the abstract semantics would have produced.

The input and output dynamics of interactive devices shall take
place as specified in 7.19.3. The intent of these requirements is
that unbuffered or line-buffered output appear as soon as
possible, to ensure that prompting messages actually appear prior
to a program waiting for input.

So, as long as reordering of struct fields satisfies all of the above,
it meets this "least requirements" of a conforming implementation.
And I don't see why it can not be done.

> Unless icc is incredibly smart about detecting

A compiler doesn't have to be incredibly smart to do this.

> you taking the address of a struct,

Easy.

> using it in a union,

Easy.

> aliasing it via questionable pointer math,

If the code is standard compliant, a compiler can tell
when it can not follow what's going on anymore and give up.
So no problem here.
Sure, you can write a non-compliant code that breaks such,
but the same applies to almost any other optimization.

> passing it as a C++ reference argument, etc.

No big deal. Same as address taken.

> there's a chance it can generate incorrect code.

Many compilers have been doing similar analysis already
- type based aliasing, interprocedural alias analysis and
escape analysis all require detection of most of above conditions.

An implementation can have a bug.
That doesn't mean it can not be fixed. And in this case,
it is possible to do such transformation correctly.

> The C standard has enough ambiguities that cause
> portability problems; we don't need people intentionally introducing new
> problems with the unambiguous parts.

Huh ? What new problem ? Reordering of struct fields, if properly done,
does not cause any portability problem nor any more ambiguities.

There are many optimizing compilers
that unbox a local struct/class *completely*.
Do they cause any portability problem ? No.
Are they compliant ? Yes, of course.

Seongbae

Disclaimer: All of my postings in this thread discusses
theoretical and practical issues in implementing "a" compiler,
not any specific compiler implementation.

Robert Wessel

unread,
Feb 11, 2004, 7:13:35 PM2/11/04
to
"Rupert Pigott" <r...@dark-try-removing-this-boong.demon.co.uk> wrote in message news:<10765288...@saucer.planet.gong>...

> "Seongbae Park" <Seongb...@sun.com> wrote in message
> news:c0du0j$hso$1...@news1nwk.SFbay.Sun.COM...
> > >> Simply rearranging struct fields doesn't violate C standard.
> > >> As long as the user code can not tell the difference,
> > >> it's standard conforming.
> > >> I don't know whether Intel's doing it correctly or not
> > >> in this particular case though.
> > >
> > > C99 6.7.2.1:
> > >
> > > 13 Within a structure object, the non-bit-field members and the units in
> > > which bit-fields reside have addresses that increase in the order in
> > > which they are declared.
> >
> > Yes. But if there's no user code that takes address of struct field
> > or relies on such, this is irrelavant.
>
> However it would "violate C standard", regardless.


Incorrect. The section on program execution, 5.1.2.3 (since we're
quoting C99), pretty much defines what's known as the "as-if" rule.
So long as the application (or the defined part of the environment -
for example the contents of a file written to) can't tell the
difference, the C compiler may do anything it wants.


>How
> about folks who poke around core dumps ? I imagine they
> might notice...


We might, but nobody cares. ;-)

Rupert Pigott

unread,
Feb 11, 2004, 7:46:31 PM2/11/04
to
"Robert Wessel" <robert...@yahoo.com> wrote in message
news:bea2590e.04021...@posting.google.com...

Like a core dump for instance.

GG

Cheers,
Rupert


Greg Lindahl

unread,
Feb 11, 2004, 9:01:33 PM2/11/04
to
In article <4db74fa6.04021...@posting.google.com>,
Benjamin Goldsteen <b...@inka.mssm.edu> wrote:

>In the modern world, processors and compilers are built around each
>other.

This is true for traditional systems companies. But AMD builds
processors that execute x86 code fast -- no compiler interaction,
because they are a minority player in the x86 space. Intel *probably*
builds future x86 cpus that execute existing apps fast, because
there's so much legacy code being executed.

Now that AMD has a CPU that's going places other than gamer desktops,
they are doing some different things than in the past -- they paid
SuSE to work on gcc, for example.

-- greg

Greg Lindahl

unread,
Feb 11, 2004, 9:04:53 PM2/11/04
to
In article <CPxWb.719$MQ1...@news-binary.blueyonder.co.uk>,
Stephen Clarke <stephen...@who.needs.spam.earthling.net> wrote:

>The standard describes the behaviour on an abstract machine (C99, 5.1.2.3).

In the HPC space, everyone uses flags that encourage the compiler to
break the rules to get better performance. You can go read up about the
SPEC flags people use; flags that break all the standards are normal.

-- greg

Ketil Malde

unread,
Feb 12, 2004, 3:18:02 AM2/12/04
to
b...@inka.mssm.edu (Benjamin Goldsteen) writes:

> In the modern world, processors and compilers are built around each
> other. It is meaningless to say that I have the fastest processor but
> there is no compiler to take advantage of it.

But of course, this isn't the case: there *is* a compiler that
(presumably) shows it is faster. Hobbled by its owner, in order to
avoid showing that.

> If AMD doesn't have a compiler that demonstrate that their processor
> is the fastest then they don't have the fastest processor.

You could take this one step further, and say that it is meaningless
to claim a fast CPU, unless there are applications making use of it.
Which eventually brings us back to the old maxim about benchmarking
the application you're interested in, instead of relying on artificial
"benchmark numbers".

> I think you missed my point. Its about IP. GPL people are sensitive
> to their IP being used for non-GPL projects. Intel doesn't want their
> IP being used for non-Intel projects.

One question is to what degree should Intel be able to decide this.
Do they get to decide on which CPUs I run executables compiled with
their compiler? What if I compile GCC with ICC, does the same apply
to the GCC executable? And can they limit my ability to publish
benchmarks I make?

(And BTW, GPL is fine in non-GPL projects, lots of people compile
non-GPL code with GCC, for instance. You just can distribute modified
GPL code without source.)

Jan C. Vorbrüggen

unread,
Feb 12, 2004, 3:50:28 AM2/12/04
to
> That rule would work fine for space, and to some degree for performance
> but it leaves too much, especially in large structs with many pointers
> (which are all same size as we all know) and even in relatively small
> structs. Having frequently accessed pointers of a struct to be in the same
> L1 cache line can make a huge difference.

Sure, but if they're the same size, you can re-arrange them to your
heart's content - as the programmer always, and legally as a compiler
if you can assure the "as-if" rule is obeyed.

How large are L1 line sizes these days - 32-64 bytes, i.e., 8-16 pointers
on a 32-bit machine? You're saying people are using data structures with
fan-outs of that order? Bah.

Jan

Chris

unread,
Feb 12, 2004, 4:22:23 AM2/12/04
to
One other way to possibly bypass Intel "protection" :

Since Windows XP is using a CPU driver located into
c:\windows\system32\drivers\amdk7.sys or amdk6.sys, I think that we
could be able to modify the AMD CPU's ids by changing the "line" that
specify the id. Thus, people running Athlon could see their
performances increase. So it would be rather simple... in theory.
Or why an AMD user couldn't use a P4 driver ? That way people might
get benefits onto their AMDs.
We could also create a program that emulate a P4 id. I don't think
that so complicated...

In fact, the "compiler trick" would be the reason why Intel gets
better performance on 3DSMax, Media Encoding, several games,
...However, I mustn't misapprehend : it's only a supposition.

Robert Wessel

unread,
Feb 12, 2004, 5:02:36 AM2/12/04
to
"Rupert Pigott" <r...@dark-try-removing-this-boong.demon.co.uk> wrote in message news:<10765467...@saucer.planet.gong>...

> "Robert Wessel" <robert...@yahoo.com> wrote in message
> > Incorrect. The section on program execution, 5.1.2.3 (since we're
> > quoting C99), pretty much defines what's known as the "as-if" rule.
> > So long as the application (or the defined part of the environment -
> > for example the contents of a file written to) can't tell the
> > difference, the C compiler may do anything it wants.
>
> Like a core dump for instance.


Core dumps are quite outside the C standard. There's no requirement
that structures be laid out in any particular way in a core dump.

Nick Maclaren

unread,
Feb 12, 2004, 5:22:04 AM2/12/04
to
In article <c0e91l$o7q$1...@news1nwk.SFbay.Sun.COM>,

Seongbae Park <Seongb...@sun.com> wrote:
>Stephen Sprunk wrote:
>...
>>> > C99 6.7.2.1:
>>> >
>>> > 13 Within a structure object, the non-bit-field members and the units in
>>> > which bit-fields reside have addresses that increase in the order in
>>> > which they are declared.
>>>
>>> Yes. But if there's no user code that takes address of struct field
>>> or relies on such, this is irrelavant.
>>
>> As the saying goes, if a tree falls in the forest and nobody is around to
>> hear it, does it really make a noise?
>>
>> The C spec (thanks for the citation, Sheldon) says you must lay out struct
>> members in the order they're declared. If Intel's compiler doesn't do that,
>> it's not compliant, period.
>
>Let's play the standard game then.

You called?

>C99 5.1.2.3:
>
>5. The least requirements on a conforming implementation are:
>

>So, as long as reordering of struct fields satisfies all of the above,
>it meets this "least requirements" of a conforming implementation.
>And I don't see why it can not be done.

I won't give the quotes, because they are voluminous, but one of the
many aspects of the C standard that is hopelessly ambiguous, is used
in real code, and causes hell to real compilers is:

typedef struct {
double d;
int n;
} t;

t x;
void *p = (void *)&x;

*((int *)((char *)p+offsetof(t,n))) = 123;

Note that this doesn't deny your point, it merely means that it is
very, very hard to reorganise fields in C structures and maintain
support for that use.

I can tell you that the C standards committee have had several flame
wars about that, leading to no conclusion that I could discover. I
believe that the current position is that whether the above is
required to work by the standard or undefined behaviour is one of
the many interpretations that is deliberately left up to the reader.


Regards,
Nick Maclaren.

Nick Maclaren

unread,
Feb 12, 2004, 5:26:27 AM2/12/04
to

Er, no. Some people do; others don't. In particular, there are a lot
of people who write cleanish Fortran and optimise up to the standard's
limits but not beyond. Our default compiler options are set up that
way, and very few users have trouble or bother to increase them.

If you were referring only to C, it is unclear what the standard allows,
and so almost ALL optimisation will 'break the rules' if you interpret
the standard one way. Alternatively, all programs do, if you interpret
it another.


Regards,
Nick Maclaren.

Sander Vesik

unread,
Feb 12, 2004, 7:27:43 AM2/12/04
to

There is no such trhing as an "unobserved address". You can take the "address"
of any structure member, and then compare them.

>
> > How about folks who poke around core dumps ? I imagine they might
> > notice...
>
> Indeed. I'd expect problems with separately-compiled programs and writing
> structs to files/sockets, too (so taking the address of a struct foo, using
> struct foo *, etc should disable any field-reordering optimisation for
> struct foo).
>

--
Sander

+++ Out of cheese error +++

Nick Maclaren

unread,
Feb 12, 2004, 7:43:33 AM2/12/04
to
Sorry about following up to my own posting, but cancellation and
resubmission rarely works. I forgot to mention that the following
variant adds even more standards chaos, and is nearly as common:

typedef struct {
double d;
int n;
} t;

typedef union {
double d;
int n;
} u;
t x;
u *p = (u *)&x;

*((int *)((char *)p+offsetof(t,n))) = 123;


Regards,
Nick Maclaren.

Jan C. Vorbrüggen

unread,
Feb 12, 2004, 8:39:41 AM2/12/04
to
> >In the HPC space, everyone uses flags that encourage the compiler to
> >break the rules to get better performance. You can go read up about the
> >SPEC flags people use; flags that break all the standards are normal.
>
> Er, no. Some people do; others don't. In particular, there are a lot
> of people who write cleanish Fortran and optimise up to the standard's
> limits but not beyond.

Well, a lot of Fortran compilers have a seperate switch saying, "yes,
you can really rely on the code not doing non-conformant argument aliasing"
- and for very good reason.

Jan

Sander Vesik

unread,
Feb 12, 2004, 9:15:46 AM2/12/04
to
Seongbae Park <Seongb...@sun.com> wrote:
>
> > aliasing it via questionable pointer math,
>
> If the code is standard compliant, a compiler can tell
> when it can not follow what's going on anymore and give up.
> So no problem here.
> Sure, you can write a non-compliant code that breaks such,
> but the same applies to almost any other optimization.

But how do you do it if it happens in a different .c file?

>
> Seongbae

Rupert Pigott

unread,
Feb 12, 2004, 11:42:21 AM2/12/04
to
"Robert Wessel" <robert...@yahoo.com> wrote in message
news:bea2590e.04021...@posting.google.com...

There bloody well is if I'm debugging.

These tools are there to assist me in doing a job, if they
make my job as a programmer *harder* they are no damn good.

Cheers,
Rupert