You mean the 32page moderately technical one dated October 11, 1999,
with the conclusion I've included below, which for now is still online
at e.g.
http://www.cs.trinity.edu/~mlewis/CSCI3294-F01/Papers/alpha_ia64.pdf
Conclusion
An Alpha processor will be able to exploit static instruction-level
parallelism (discovered
by the compiler at compile-time) and dynamic instruction-level
parallelism (discovered
by the processor at run-time). An IA64 processor will only be able to
exploit static
instruction-level parallelism.
An Alpha processor can take advantage of the excellent compiler
technology developed
for IA64 and other VLIW processors; much of this technology is already
implemented in
the Alpha compilers. However, the Alpha compilers will be able to use
these
optimizations much more judiciously, avoiding excessive code growth,
because the Alpha
out-of-order processor can also discover instruction-level parallelism
at run-time.
An Alpha processor will be able to adapt to dynamic program behavior
at run-time. An
IA64 processor will not. An Alpha processor can adapt to memory
references that miss in
the cache, avoiding delays of 100 cycles or more. An IA64 processor
will stall. An
Alpha processor can find instruction-level parallelism when the
compiler does not
express it. And an Alpha processor can find instruction-level
parallelism at run-time
across branches, function calls, and compilation boundaries.
An Alpha processor will be able to exploit thread-level parallelism.
An IA64 processor
will not. Most server applications are divided into multiple threads,
and simultaneous
multithreading permits these applications to take full advantage of
the multiple execution
units on the processor. An Alpha processor can use thread-level
parallelism and
instruction-level parallelism interchangeably, adjusting to the
behavior of the application.
Amdahl’s law says that high performance requires speedups in both the
sequential and the
explicitly parallel portions of an application; an Alpha processor can
deliver these
speedups.
An Alpha processor will deliver the highest memory bandwidth in the
industry, and
systems built out of Alpha processors will lead the industry in high
performance technical
computing.
An Alpha processor will significantly outperform an IA64 processor on
commercial
applications. Alpha processors have addressed the main requirements of
commercial
applications: reducing the instruction cache footprint, tolerating
unpredictable cache
misses, increasing the pin bandwidth, and exploiting explicit thread-
level parallelism.
IA64 processors are not well designed for commercial applications.
They require a large
instruction cache footprint; they cannot dynamically tolerate cache
misses; and they
cannot exploit thread-level parallelism.
In the important server markets, Alpha will outperform IA64.
<followed by 14 references to other technical papers, patents, DTJ,
etc>