Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sleazy intel compiler trick

0 views
Skip to first unread message

iccOut

unread,
Feb 9, 2004, 5:08:45 PM2/9/04
to
As part of my study of Operating Systems and embedded systems, one of
the things I've been looking at is compilers. I'm interested in
analyzing how different compilers optimize code for different
platforms. As part of this comparison, I was looking at the Intel
Compiler and how it optimizes code. The Intel Compilers have a free
evaluation download from here:
http://www.intel.com/products/software/index.htm?iid=Corporate+Header_prod_softwr&#compilers
 

One of the things that the version 8.0 of the Intel compiler included
was an "Intel-specific" flag. According to the documentation, binaries
compiled with this flag would only run on Intel processors and would
include Intel-specific optimizations to make them run faster. The
documentation was unfortunately lacking in explaining what these
optimizations were, so I decided to do some investigating. 

First I wanted to pick a primarily CPU-bound test to run, so I chose
SPEC CPU2000. The test system was a P4 3.2G Extreme Edition with 1 gig
of ram running WIndows XP Pro. First I compiled and ran spec with the
"generic x86 flag" (-QxW), which compiles code to run on any x86
processor. After running the generic version, I recompiled and ran
spec with the "Intel-specific flag" (-QxN) to see what kind of
difference that would make. For most benchmarks, there was not very
much change, but for 181.mcf, there was a win of almost 22% !

Curious as to what sort of optimizations the compiler was doing to
allow the Intel-specific version to run 22% faster, I tried running
the same binary on my friend's computer. His computer, the second test
machine, was an AMD FX51, also with 1 gig of ram, running Windows XP
Pro. First I ran the "generic x86" binaries on the FX51, and then
tried to run the "Intel-only" binaries. The Intel-specific ones
printed out an error message saying that the processor was not
supported and exited.  This wasn't very helpful, was it true that only
Intel processors could take advantage of this performance boost?

I started mucking around with a dissassembly of the Intel-specific
binary and found one particular call (proc_init_N) that appeared to be
performing this check. As far as I can tell, this call is supposed to
verify that the CPU supports SSE and SSE2 and it checks the CPUID to
ensure that its an Intel processor. I wrote a quick utility which I
call iccOut, to go through a binary that has been compiled with this
Intel-only flag and remove that check.

Once I ran the binary that was compiled with the Intel-specific flag
(-QxN) through iccOut, it was able to run on the FX51. Much to my
surprise, it ran fine and did not miscompare. On top of that, it got
the same 22% performance boost that I saw on the Pentium4 with an
actual Intel processor. This is very interesting to me, since it
appears that in fact no Intel-specific optimization has been done if
the AMD processor is also capable to taking advantage of these same
optimizations. If I'm missing something, I'd love for someone to point
it out for me. From the way it looks right now, it appears that Intel
is simply "cheating" to make their processors look better against
competitor's processors.

Links:
Intel Compiler:http://www.intel.com/products/software/index.htm?iid=Corporate+Header_prod_softwr&#compilers
 

iccOut

unread,
Feb 9, 2004, 5:13:49 PM2/9/04
to

glen herrmannsfeldt

unread,
Feb 10, 2004, 12:33:32 AM2/10/04
to
iccOut wrote:

> As part of my study of Operating Systems and embedded systems, one of
> the things I've been looking at is compilers. I'm interested in
> analyzing how different compilers optimize code for different
> platforms. As part of this comparison, I was looking at the Intel
> Compiler and how it optimizes code.

(OK, but try not to post the same thing three times.)

> One of the things that the version 8.0 of the Intel compiler included
> was an "Intel-specific" flag. According to the documentation, binaries
> compiled with this flag would only run on Intel processors and would
> include Intel-specific optimizations to make them run faster. The
> documentation was unfortunately lacking in explaining what these
> optimizations were, so I decided to do some investigating.

(snip)

> For most benchmarks, there was not very
> much change, but for 181.mcf, there was a win of almost 22% !

There are stories about compilers having optimization features
specifically to score high on SPEC. In some cases, they might
result in otherwise correct programs not working.

What does 181.mcf do?

-- glen

Jan C. Vorbrüggen

unread,
Feb 10, 2004, 3:38:06 AM2/10/04
to
> What does 181.mcf do?

From http://www.spec.org/cpu2000/CINT2000/181.mcf/docs/181.mcf.txt:

Benchmark Program General Category:
----------------------------------

Combinatorial optimization / Single-depot vehicle scheduling


Benchmark Description:
---------------------

A benchmark derived from a program used for single-depot vehicle scheduling in
public mass transportation. The program is written in C, the benchmark version
uses almost exclusively integer arithmetic.

The program is designed for the solution of single-depot vehicle scheduling
(sub-)problems occurring in the planning process of public transportation
companies. It considers one single depot and a homogeneous vehicle fleet.
Based on a line plan and service frequenciesd, so-called timetabled trips with
fixed departure/arrival locations and times are derived. Each of this
timetabled trip has to be serviced by exactly one vehicle. The links between
these trips are so-called dead-head trips. In addition, there are pull-out and
pull-in trips for leaving and entering the depot.

Cost coefficients are given for all dead-head, pull-out, and pull-in trips. It
is the task to schedule all timetabled trips to so-called blocks such that the
number of necessary vehicles is as small as possible and, subordinate, the
operational costs among all minimal fleet solutions are minimized.

For simplification in the benchmark test, we assume that each pull-out and
pull-in trip is defined implicitly with a duration of 15 minutes and a cost
coefficient of 15.

For the considered single-depot case, the problem can be formulated as a
large-scale minimum-cost flow problem that we solve with a network simplex
algorithm accelerated with a column generation. The core of the benchmark
181.mcf is the network simplex code "MCF Version 1.2 -- A network simplex
implementation", For this benchmark, MCF is embedded in the column generation
process.

The network simplex algorithm is a specialized version of the well known
simplex algorithm for network flow problems. The linear algebra of the general
algorithm is replaced by simple network operations such as finding cycles or
modifying spanning trees that can be performed very quickly. The main work of
our network simplex implementation is pointer and integer arithmetic.

Jan

Terje Mathisen

unread,
Feb 10, 2004, 10:21:28 AM2/10/04
to
iccOut wrote:
> One of the things that the version 8.0 of the Intel compiler included
> was an "Intel-specific" flag. According to the documentation, binaries
> compiled with this flag would only run on Intel processors and would
> include Intel-specific optimizations to make them run faster. The
> documentation was unfortunately lacking in explaining what these
> optimizations were, so I decided to do some investigating.

Intel's docs are pretty clear in that they would really like you to use
the Intel specific test for "GenuineIntel", instead of simply checking
for the required CPUID feature flags, before using some specific code
sequences.

I can see why they would like that, but it is still somewhat sneaky to
include such a check function as soon as you enable a specific compiler
flag.

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

0 new messages