Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Final kick at the "Merced Guessing Game"

0 views
Skip to first unread message

Paul DeMone

unread,
Sep 30, 1997, 3:00:00 AM9/30/97
to

While reading through an old (Dec 1995) issue of Proceedings of the IEEE the
other day while waiting for an HSPICE run to finish I came across a remarkable
paper entitled "Compiler Technology for Future Microprocessors". The authors
(W. Hwu, R. Hank, D. Gallagher, S. Mahlke, D. Lavery, G. Haab, J. Gyllenhaal,
and D. August) are associated with the University of Illinois but the research
seems to have strong ties with HP.

The primary thrust of the paper was to describe compiler algorithms and hardware
mechanisms that could be used to get around the problem of branches and false
dependencies in gathering enough instruction level parallelism for an eight issue
RISC-like machine. The paper showed how predication (kind of like condition codes
that can be used to permit conditional execute any instruction, a generalization of
what ARM does) can be used in conjunction with loop unrolling and basic block
oalesing to dramatically decrease the number of conditional branches that need to
be predicted/executed (in one example mispredicted branches using the 2-bit branch
prediction model were reduced by almost three orders of magnitude within a complex
inner loop).

The authors give examples of how an eight issue processor with full predication
could give a remarkable improvement (1.2x to 4.5x) over an eight issue design with a
non-predicated ISA. Their description of their architecture strikes me as eerily
similar to some predictions about the Intel IA-64 Merced design so I will repeat it:

"The processor model used for these experiments is an eight-issue extension of
the HP PA-RISC[TM] architecture. The processor may issue eight instructions
of any type each cycle except branches, which are restricted to one per cycle.
The processor has 64 integer, 64 floating-point, and 64 predicate registers."

The design would seem to effectively avoid the complexity and cycle time impact of
the paraphenalia of aggressive out-of-order designs such as register renaming and
in-order completion unit. The paper gives brief descriptions of advanced compiler
techniques to transform around loops (essentially static compile time register re-
naming) and also discusses how to incorporate memory data dependence analysis (used
by the Multiflow Trace scheduling compiler for the TRACE series of VLIW computers)
for another significant speedup.

HP and Intel will disclose the native Merced instruction set (IA-64) in a couple of
weeks but if they exploit the advances described in this paper it could really be
something with the potential for dramatically raising the bar of microprocessor
performance and not just typical Intel hype and FUD. Comments?


All opinions strictly my own.

--
Paul W. DeMone The 801 experiment SPARCed an ARMs race
Kanata, Ontario to put more PRECISION and POWER into
dem...@mosaid.com architectures with MIPSed results but
PaulD...@EasyInternet.net ALPHA's well that ends well.


Guy Harris

unread,
Sep 30, 1997, 3:00:00 AM9/30/97
to

Paul DeMone <PaulD...@EasyInternet.net> wrote:
>While reading through an old (Dec 1995) issue of Proceedings of the IEEE the
>other day while waiting for an HSPICE run to finish I came across a remarkable
>paper entitled "Compiler Technology for Future Microprocessors".

A Web search for that title found a page at

http://www.cs.colorado.edu/~klauser/ilp/ilp.html

which is a reading list of papers in instruction-level parallelism. It
has a link to a PostScript version of a paper with that title, at:

ftp://ftp.crhc.uiuc.edu/pub/IMPACT/journal/ieeepro.future.95.ps

although in a quick look I didn't see the comment you quoted aabout the
processor model they used, so maybe this is an earlier version of the
paper.

The page cited above does have a number of papers from HP Labs....

> The design would seem to effectively avoid the complexity and cycle time
> impact of the paraphenalia of aggressive out-of-order designs such as
> register renaming and in-order completion unit.

So are they going to blow off making Merced run IA32 code really really
really fast, leaving that up to future IA32 implementations (perhaps on
the theory that Merced will be used mainly in machines that don't have
to run IA32 stuff really really really fast, just "fast enough"), and
maybe require software to do binary-to-binary translation of IA32
programs if you want them to run really really really fast? (I.e., it
doesn't have to run Windows 98 or Office 97 really really really fast,
as a P6-generation or next-generation-after-P6 machine can handle that
job, it just has to run big database XXX or big server YYY or big
number-crunching program ZZZ really really really fast, and those will
be compiled into native code?)
--
Reply, or follow up, but don't do both, please.

postmaster@localhost
postmaster@[127.0.0.1]

Nick Maclaren

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

In article <3431A2...@EasyInternet.net>, Paul DeMone <PaulD...@EasyInternet.net> writes:
|>
|> HP and Intel will disclose the native Merced instruction set (IA-64) in a couple of
|> weeks but if they exploit the advances described in this paper it could really be
|> something with the potential for dramatically raising the bar of microprocessor
|> performance and not just typical Intel hype and FUD. Comments?

It will have little or no practical effect, because the dominating
languages and programming styles have one unpredictable branch or
data indirection every few instructions. Until and unless C, C++,
most windowing systems and similar tripe are scrapped, almost all
systems will run at a miniscule fraction of their potential CPU
performance. A quick look at Java implies that it probably has the
same problems, but I cannot swear to it.


Nick Maclaren,
University of Cambridge Computer Laboratory,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679

J. Corbally

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

> It will have little or no practical effect, because the dominating
> languages and programming styles have one unpredictable branch or
> data indirection every few instructions. Until and unless C, C++,
> most windowing systems and similar tripe are scrapped, almost all
> systems will run at a miniscule fraction of their potential CPU
> performance. A quick look at Java implies that it probably has the
> same problems, but I cannot swear to it.
>
> Nick Maclaren,
> University of Cambridge Computer Laboratory,
> New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
> Email: nm...@cam.ac.uk
> Tel.: +44 1223 334761 Fax: +44 1223 334679

I'm sure you have some new hyper-advanced programming language in the
waiting that will realise this "potential CPU performance", and obsolete
all other languages. Yeah, right.....

Please explain yourself....

Regards,


James.....


--
All opinions expressed herein are my own personal observations
and DO NOT represent the views of my past, current, or future
employers. All opinions included herein due to quoting are
the personal opinions of their respective posters, and no
implication of agreement or endorsement on my part shall be
inferred in them.

Remove *NO_SPAM* to enable mailing

Ketil Z Malde

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:

> If you take a close look at C++, the X Windowing System, the various
> IBM and Microsoft windowing systems etc., you will see that an
> 'object' is implemented as a data structure and you invoke a method
> by picking an address up from that structure and branching to it.

Right - and in Pascal, Algol, Cobol, and all those other superior
languages, function calls are made without branching? Which is why
Pascal and Cobol is commonly used when performance is important? Right?

And since programs written in C runs at a ``miniscule fraction of their
potential CPU performance'', could you please inform me which Pascal,
Algol, Cobol or Modula-[23] compiler that create such vastly improved
speed of execution? And - how much faster can I expect applications to
run?

HIBT?

~kzm
--
If I haven't seen further, it is by standing in the footprints of giants

Nick Maclaren

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

In article <60t42r$4...@scnews.sc.intel.com>, "J. Corbally" <icorb*NO_SPAM*@*NO_SPAM*indigo.ie> writes:
|> > It will have little or no practical effect, because the dominating
|> > languages and programming styles have one unpredictable branch or
|> > data indirection every few instructions. Until and unless C, C++,
|> > most windowing systems and similar tripe are scrapped, almost all
|> > systems will run at a miniscule fraction of their potential CPU
|> > performance. A quick look at Java implies that it probably has the
|> > same problems, but I cannot swear to it.
|>
|> I'm sure you have some new hyper-advanced programming language in the
|> waiting that will realise this "potential CPU performance", and obsolete
|> all other languages. Yeah, right.....
|>
|> Please explain yourself....

Whether I do or not is irrelevant, as all of the Fortrans, Cobols,
Algols, Pascals, Modulas etc. are streets better. So, interestingly
enough, are 'object oriented' systems like Smalltalk and (I believe)
Eiffel - but they have other problems.

If you take a close look at C++, the X Windowing System, the various
IBM and Microsoft windowing systems etc., you will see that an
'object' is implemented as a data structure and you invoke a method
by picking an address up from that structure and branching to it.

Furthermore, most of the important data is indirected through multiple
levels of data structure.

When I first described this to a hardware designer, back in the early
days of the latest RISC hype, he said "But nobody writes code like
that!" When I said that the C++ and X Windowing System did little
else, he said "Well, people won't use them then."

Tom Thornhill

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

Paul DeMone <PaulD...@EasyInternet.net> wrote in article <3431A2...@EasyInternet.net>...

> While reading through an old (Dec 1995) issue of Proceedings of the IEEE the
> other day while waiting for an HSPICE run to finish I came across a remarkable
> paper entitled "Compiler Technology for Future Microprocessors". The authors
> (W. Hwu, R. Hank, D. Gallagher, S. Mahlke, D. Lavery, G. Haab, J. Gyllenhaal,
> and D. August) are associated with the University of Illinois but the research
> seems to have strong ties with HP.
>
> The primary thrust of the paper was to describe compiler algorithms and hardware
> mechanisms that could be used to get around the problem of branches and false
> dependencies in gathering enough instruction level parallelism for an eight issue
> RISC-like machine. The paper showed how predication (kind of like condition codes
> that can be used to permit conditional execute any instruction, a generalization of
> what ARM does) can be used in conjunction with loop unrolling and basic block
> oalesing to dramatically decrease the number of conditional branches that need to
> be predicted/executed (in one example mispredicted branches using the 2-bit branch
> prediction model were reduced by almost three orders of magnitude within a complex
> inner loop).
>
> The authors give examples of how an eight issue processor with full predication
> could give a remarkable improvement (1.2x to 4.5x) over an eight issue design with a
> non-predicated ISA. Their description of their architecture strikes me as eerily
> similar to some predictions about the Intel IA-64 Merced design so I will repeat it:
>
> "The processor model used for these experiments is an eight-issue extension of
> the HP PA-RISC[TM] architecture. The processor may issue eight instructions
> of any type each cycle except branches, which are restricted to one per cycle.
> The processor has 64 integer, 64 floating-point, and 64 predicate registers."
>
> The design would seem to effectively avoid the complexity and cycle time impact of
> the paraphenalia of aggressive out-of-order designs such as register renaming and
> in-order completion unit. The paper gives brief descriptions of advanced compiler
> techniques to transform around loops (essentially static compile time register re-
> naming) and also discusses how to incorporate memory data dependence analysis (used
> by the Multiflow Trace scheduling compiler for the TRACE series of VLIW computers)
> for another significant speedup.
>
> HP and Intel will disclose the native Merced instruction set (IA-64) in a couple of
> weeks but if they exploit the advances described in this paper it could really be

Do you have a date?

> something with the potential for dramatically raising the bar of microprocessor
> performance and not just typical Intel hype and FUD. Comments?
>
>

> All opinions strictly my own.
>
> --
> Paul W. DeMone The 801 experiment SPARCed an ARMs race
> Kanata, Ontario to put more PRECISION and POWER into
> dem...@mosaid.com architectures with MIPSed results but
> PaulD...@EasyInternet.net ALPHA's well that ends well.
>
>

I think you're onto something here - I would expect that if Intel introduce
a new architecture they'd try to ensure that relatively simple implementations
of it ( ie those without renaming and out of order execution ), outperform
agressive implementaions of Alpha AXP ( 21264 ? ). This gives them something
to scale towards.

Also since they've waited for so long before replacing IA32, you'd expect
them to come up with something other than an Alpha clone.

--
Mailto: ttho...@best.ms.philips.com-nospam ( remove the -nospam before replying )
Talkto: Tom Thornhill, Philips Medical Systems Nederland BV
Tel +31 4027 64080 (Work), +31 499 338211 (Hotel - after 5pm UK time)


Tom Thornhill

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

Guy Harris <g...@netapp.com> wrote in article <60slvu$3...@tooting.netapp.com>...

>
> So are they going to blow off making Merced run IA32 code really really
> really fast, leaving that up to future IA32 implementations (perhaps on
> the theory that Merced will be used mainly in machines that don't have
> to run IA32 stuff really really really fast, just "fast enough"), and
> maybe require software to do binary-to-binary translation of IA32
> programs if you want them to run really really really fast? (I.e., it
> doesn't have to run Windows 98 or Office 97 really really really fast,
> as a P6-generation or next-generation-after-P6 machine can handle that
> job, it just has to run big database XXX or big server YYY or big
> number-crunching program ZZZ really really really fast, and those will
> be compiled into native code?)

Surely for IA32 code it would be possible to do on the fly translation
a la the PPro from instructions fetched from memory to the instructions
executed by the pipeline(s). I suppose that moving complexity out of
the pipeline just means that you have to do more work on the translation,
but that work will already have been done for Willamette (sp?), assuming
that uses PPro-style decoupled architecture.

I suppose you could have a kind of modular design with the same pipeline
design in both chips but with a highly optimised ( and big )
translation unit in Willamette feeding a few pipelines, but a less optimised
and smaller translation unit feeding more pipelines in Merced.

That way you could have a modular design that could be tuned for executing
translated IA32 or native IA64. Eventually you can lose the translation unit
completely and move to an IA64 only architecture at a higher clock speed.
You could then scale this by adding superscalar execution for the VLIW instructions
( is this possible )

Hmm, maybe I should learn VHDL and find out how practical all this stuff is.

> --
> Reply, or follow up, but don't do both, please.
>
> postmaster@localhost
> postmaster@[127.0.0.1]
>


--
***If you can understand this then your brain needs servicing URGENTLY***

Richard A. Schumacher

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

[...]

>The authors give examples of how an eight issue processor with full predication
>could give a remarkable improvement (1.2x to 4.5x) over an eight issue design with a
>non-predicated ISA. Their description of their architecture strikes me as eerily
>similar to some predictions about the Intel IA-64 Merced design so I will repeat it:

[...]

>HP and Intel will disclose the native Merced instruction set (IA-64) in a couple of
>weeks but if they exploit the advances described in this paper it could really be

>something with the potential for dramatically raising the bar of microprocessor
>performance and not just typical Intel hype and FUD. Comments?


Yes. The label reads "Intel", but it's HP inside.

--
-----------------------------------------------
I do not speak in any capacity for the
Convex Division of the Hewlett Packard Company.

Michael D. Nahas

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

Ketil Z Malde wrote:

> nm...@cus.cam.ac.uk (Nick Maclaren) writes:
>
> > If you take a close look at C++, the X Windowing System, the various
> > IBM and Microsoft windowing systems etc., you will see that an
> > 'object' is implemented as a data structure and you invoke a method
> > by picking an address up from that structure and branching to it.
>

> Right - and in Pascal, Algol, Cobol, and all those other superior
> languages, function calls are made without branching? Which is why
> Pascal and Cobol is commonly used when performance is important? Right?
>
> And since programs written in C runs at a ``miniscule fraction of their
> potential CPU performance'', could you please inform me which Pascal,
> Algol, Cobol or Modula-[23] compiler that create such vastly improved
> speed of execution? And - how much faster can I expect applications to
> run?

I'm afraid that I must admit that Maclaren, even with his outrageous
claim, is correct. Malde has missed the problem - it is not branching,
but indirection, A.K.A. pointer chasing.

For a standard function, you just branch to the address. With C++'s
virtual
functions you either follow a pointer or perform an indexed lookup to a
jump
table, which will load the address you will jump to, and then branch to
that
address.

Why is indirection bad? When the destination address of a jump is known,
the processor
can begin fetching from the new address and continue executing. When the
destination address of the jump is unknown, the processor usually stalls
until
the address comes in from memory. Some processors do have hardware to
speed up indirect jumps, but the hardware was developed using C benchmarks,

where 90% of indirect jumps are just returns from functions.

The truely evil part of indirection is not in function calls, but in the
non-array
data structures of most programs. Think of traversing a linked list. To
load the
first node in the list, it takes as much time as the memory latency. To
load the
second node, you must first load the first node, and then pay the cost of
the
memory latency time again. So to traverse a N node linked list, it takes N
times
the memory latency.

For most cases, the nodes are contiguous in memory and most programs run
inside the L1 and L2 cache, so these latencies are not too bad for most
programs.
However, programs which do not fit in large L2 caches, such as sparse
matrix
programs and the X Windows system, pay heavily. The latency time of going
to
main memory is growing exponentially in terms of CPU cycles.

So, is this a problem, and if so, how do we fix it. Yes, it is a problem,
because any
stall to the pipeline which prevents speculative execution is bad. How do
we
solve it? The first thing to do is to make programs which have this
problem part
of the SPEC Benchmarks. This will make it the focus of everyone's
attention.
The other thing that can be done is to look at Compiler optimizations -
many of
the indirect function calls can be eliminated, especially if all the source
is compiled
together. Separate compilation of object files requires general interfaces
which
prevent many optimizations.

Also, a note on Java. All Java functions are virtual and all require a
general
interface on a class level. Lastly, all objects are referenced indirectly,
so even an
array of Java objects is two levels of indirection to get the object you
want! So,
Java is even worse than C++.

Mike

As always, comments and criticism welcome.

--
/********************************************************\
* Michael David Nahas na...@virginia.edu *
* Graduate Student in CS University of Virginia *
\********************************************************/


bri...@bit3.com

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

In article <34326556...@virginia.edu>,

Michael D. Nahas <na...@virginia.edu> wrote:
>I'm afraid that I must admit that Maclaren, even with his outrageous
>claim, is correct. Malde has missed the problem - it is not branching,
>but indirection, A.K.A. pointer chasing.
>
>For a standard function, you just branch to the address. With C++'s
>virtual
>functions you either follow a pointer or perform an indexed lookup to a
>jump
>table, which will load the address you will jump to, and then branch to
>that
>address.
>
>Why is indirection bad? When the destination address of a jump is known,
>the processor
>can begin fetching from the new address and continue executing. When the
>destination address of the jump is unknown, the processor usually stalls
>until
>the address comes in from memory. Some processors do have hardware to
>speed up indirect jumps, but the hardware was developed using C benchmarks,
>
>where 90% of indirect jumps are just returns from functions.

The problem is not the double indirection- which can be handled if the CPU
prefetchs data as well as instructions- but the fact that you are doing two
memory accesses now instead of one, and usually widely seperate memory
accesses (most implementations don't AFAIK actually keep the virtual
function tables with the class data, but in a seperate structure shared by
all objects of the same type- which is usually somewhere far away from
where the data ends up). This is a symptom of the fact that memory has not
increased in speed the same way CPUs have (I put 70ns memory into my 386-
granted, no FPM or EDO, but still... And the 386 still needed cache).

The only thing more painfull on performance than leaving cache and waiting on
main memory is leaving main memory and swapping pages into disk. Compilers
(and programmers!) should worry more about locality of reference.

If you are going to blow cache, the question arises "is it worth it?". For
sparse matricies, the answer is obviously yes (the alternative is to swap
like mad)- although the matrix solving programs should be written to be
cache aware, and keep locality of reference up. For virtual functions,
I think the benefits outweigh the gains, although it depends critically on
what you're doing. I note the Fortran is still one of the two best languages
for doing numeric computation in- only recently have there been C compilers
as good as Fortran compilers (and even in these cases you often find they
share an optimizer with a Fortran compiler- examples include SGI's MipsPro
compilers, and HP's compilers).

I don't speak for Bit 3.
Brian


--
"NT and security should never be used in the same breath."
- Winn Schwartau, EETimes #972, 22 September 1997 pp.96

-Brian

Stefan Monnier

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

"Michael D. Nahas" <na...@virginia.edu> writes:
> I'm afraid that I must admit that Maclaren, even with his outrageous
> claim, is correct. Malde has missed the problem - it is not branching,
> but indirection, A.K.A. pointer chasing.

I believe everyone here is aware of the fact that indirection is *the*
problem. The question was "why should indirection be more of an issue with
C/C++ than with Pascal, Algol, ... ?"


Stefan

Nick Maclaren

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

In article <5l90wd2...@tequila.systemsz.cs.yale.edu>,

In a word - "aliasing". With multiple levels of indirection, a conforming
C compiler has to assume that almost every location can be aliased, and
therefore must make a store reference every time. Almost all fancy
optimisations that are possible in those languages will cause at least
some legal C programs to fail. "const" helps a little, and "restrict"
(in C9X) will help a bit more, but at best will bring C up to the very
low level of Fortran. And I am not optimistic about even that.

Michael Nahas also correctly points out that C++, X, Java etc. have MORE
indirections than the equivalent Fortran or 'ordinary' C. This is a matter
of interface style as much as the underlying language, but that doesn't
help at all. We are stuck with both of them.

To respond to the previous posting by bri...@bit3.com, the reason that
the multiple indirection is nasty is precisely because it causes trouble
with data prefetching! Consider A->B->C->D->E. At each stage, the CPU
has to perform the address calculation before it can start prefetching
the next address. Now, this very simple case CAN be done by using a
multi-threaded architecture, with prefetching threads running ahead to
preload the next operands.

But what about (P = A->B ? (Q = P->C ? Q->D : P->E) : A->F)? To handle
THAT, you need speculative pre-execution, which was dabbled with in the
1960s and 1970s, and quite rightly dropped as an abomination. And, if
you look at the source of the X Windowing System, you will see code that
is pretty well equivalent :-(

For efficient use of modern architectures, we need languages in which we
programmers can express what we want to say in a fashion that can be
converted to the sort of instruction stream that executes fast. And
the current trends are in exactly the opposite direction.

P Murray

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

Nick Maclaren (nm...@cus.cam.ac.uk) wrote:

: For efficient use of modern architectures, we need languages in which we


: programmers can express what we want to say in a fashion that can be
: converted to the sort of instruction stream that executes fast. And
: the current trends are in exactly the opposite direction.

Any suggestions on what such a language would look like? Hopefully
it will not look like ASM. Would it be possible to modify C/C++ in
such a way as to allow this?

And finally, could anyone recommend ways to code in C/C++ that
avoid these problems and allow the optimiser to produce the
most efficient code? I assume virtual function calls would
be out, as well as all forms of aliasing.

Cheers,
- Peter.

Nick Maclaren

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

In article <60ui2q$33a$1...@cdn-news.telecom.com.au>,

P Murray <p...@vus002.telecom.com.au> wrote:
>Nick Maclaren (nm...@cus.cam.ac.uk) wrote:
>
>: For efficient use of modern architectures, we need languages in which we
>: programmers can express what we want to say in a fashion that can be
>: converted to the sort of instruction stream that executes fast. And
>: the current trends are in exactly the opposite direction.
>
>Any suggestions on what such a language would look like? Hopefully
>it will not look like ASM. Would it be possible to modify C/C++ in
>such a way as to allow this?

Well, I would favour something using the more successful ideas of
Algol 68 :-) But it would have to include some of the concepts of the
stricter functional languages.

>And finally, could anyone recommend ways to code in C/C++ that
>avoid these problems and allow the optimiser to produce the
>most efficient code? I assume virtual function calls would
>be out, as well as all forms of aliasing.

Yes. And, if you can code C without implying aliasing, I take my hat
off to you! The biggest problem with attempting this is that you can
only 'const qualify' at the top level and, for efficient code-generation,
you need to do it at other levels. I believe that the new 'restrict' in
C9X is a sort of recursive 'const', but have not seen a specification.

John Wilson

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

In article <60uek7$fon$1...@lyra.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>For efficient use of modern architectures, we need languages in which we
>programmers can express what we want to say in a fashion that can be
>converted to the sort of instruction stream that executes fast. And
>the current trends are in exactly the opposite direction.

Luckily we still have good ol' assembly language!

Sorry, couldn't resist -- the paragraph above nicely supports my
personal prejudices though. :-)

John Wilson
D Bit

Paul DeMone

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

Nick Maclaren wrote:
>
> In article <3431A2...@EasyInternet.net>, Paul DeMone <PaulD...@EasyInternet.net> writes:
> |>
> |> HP and Intel will disclose the native Merced instruction set (IA-64) in a couple of
> |> weeks but if they exploit the advances described in this paper it could really be
> |> something with the potential for dramatically raising the bar of microprocessor
> |> performance and not just typical Intel hype and FUD. Comments?
>
> It will have little or no practical effect, because the dominating
> languages and programming styles have one unpredictable branch or
> data indirection every few instructions. Until and unless C, C++,
> most windowing systems and similar tripe are scrapped, almost all
> systems will run at a miniscule fraction of their potential CPU
> performance. A quick look at Java implies that it probably has the
> same problems, but I cannot swear to it.

I thought much along the same line until I read this paper. some sample code is
the inner loop from Unix utilities. These are really twisty C programs chosen for
their branch-ugliness, i.e. the poor way conventional compilers and machines can
handle them. The job their compiler does on the inner loop of the wc (word count)
program is amazing. It eliminates most branches by replicating basic blocks and
using predication to coalesce around branches and form larger basic blocks. The
combination of loop unrolling and formation of hyperblocks allows a good deal of
instruction level parallelism to be extracted. In the case of wc the use of pred-
ication reduced dynamic branch execution from 572K to 315K while reducing branch
misprediction using the 2-bit scheme from 52K to 56. On the eight issue target CPU
wc was sped up by 5.5x versus a single issue RISC CPU while grep was sped up 4x.
The SPEC92 programs espresso, eqntott, alvinn, ear, and sc were sped up 3.3x, 5.5x,
7.7x, and 5.8x respectively versus a single issue RISC.

Considering that the three issue P6/Pentium II and quad issue RISCs like R10K and
Alpha 21164 probably average only about 2.5x (or less) speed up versus a single
issue RISC the performance of this new architecture with its advanced compiler is
damn impressive! Don't forget that this new design likely can be implemented without
all the overhead of out-of-order machines like P6 and R10K and so can be clocked
much, much faster (maybe even faster than Alpha in the same process). Granted some
of the apparent speedup of the new design will be lost from main memory latency and
bandwidth limitations or icache inefficiency from code bloat but it still could be
a significant step forward.

>
> Nick Maclaren,
> University of Cambridge Computer Laboratory,
> New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
> Email: nm...@cam.ac.uk
> Tel.: +44 1223 334761 Fax: +44 1223 334679

--

Paul DeMone

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

Tom Thornhill wrote:
[snip]

> > HP and Intel will disclose the native Merced instruction set (IA-64) in a couple of
> > weeks but if they exploit the advances described in this paper it could really be
>
> Do you have a date?

Approximately 1:00 pm Tuesday Oct 14 (at the 1997 Microprocessor Forum,
Fairmont Hotel in San Jose).

[snip]


> I think you're onto something here - I would expect that if Intel introduce
> a new architecture they'd try to ensure that relatively simple implementations
> of it ( ie those without renaming and out of order execution ), outperform
> agressive implementaions of Alpha AXP ( 21264 ? ). This gives them something
> to scale towards.
>
> Also since they've waited for so long before replacing IA32, you'd expect
> them to come up with something other than an Alpha clone.

Compiler technology and architecture seems to have advanced significantly
since Alpha was designed (itself essentially a classic RISC but streamlined
for superscalar and out-of-order implementations) so Intel would have to
stumble badly not to be able to get some degree of advantage.

>
> --


> Mailto: ttho...@best.ms.philips.com-nospam ( remove the -nospam before replying )
> Talkto: Tom Thornhill, Philips Medical Systems Nederland BV
> Tel +31 4027 64080 (Work), +31 499 338211 (Hotel - after 5pm UK time)

--

Allen Kim

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

Nick Maclaren wrote:
>
> But what about (P = A->B ? (Q = P->C ? Q->D : P->E) : A->F)? To handle
> THAT, you need speculative pre-execution, which was dabbled with in the
> 1960s and 1970s, and quite rightly dropped as an abomination. And, if
> you look at the source of the X Windowing System, you will see code that
> is pretty well equivalent :-(
>
> For efficient use of modern architectures, we need languages in which we
> programmers can express what we want to say in a fashion that can be
> converted to the sort of instruction stream that executes fast. And
> the current trends are in exactly the opposite direction.

Despite the fact that you guys are coming pretty close to speaking pure
"Greek" here (or is it "geek"?), I'll try and add a little to the
discussion.

My impression with Merced and VLIW-like architectures which move branch
optimization out to the compiler is that instructions like the one above
can be solved by very advanced techniques which effectively turn pieces
of code into "trees" which each calculate a number of possible outcomes
at once. Now I know that with that statement, I probably distorted the
actual concept, but I'm hoping you'll understand. Anyway, trees like
this would involve multiple memory accesses at once, which could be the
potential bottleneck unless some very sophisticated caching and bus
transaction mechanisms are developed.

See this link for more info:

http://www.research.ibm.com/vliw/home.html

Disclaimer: I don't speak officially for Intel, and I don't work on the
Merced processor itself. I *do* work on the chipset which will support
Merced, but that doesn't mean I know the intricate details of the
processor.

--
Allen Kim
alle...@scic.intel.com

Jim Hull

unread,
Oct 1, 1997, 3:00:00 AM10/1/97
to

> is Tom Thornhill
> > is Paul DeMone

> > HP and Intel will disclose the native Merced instruction set (IA-64)

> > in a couple of weeks ...

> Do you have a date?

October 14, at Microprocessor Forum.

For details, check:

http://www.MDRonline.com/mpf

--
Jim Hull

"If so strong the force in Yoda is, construct a sentence with words in
the proper order then why can't he?" -- Teng-Kiat Lee

Vesa Karvonen

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

Nick Maclaren wrote:
> In article <60ui2q$33a$1...@cdn-news.telecom.com.au>,
> P Murray <p...@vus002.telecom.com.au> wrote:
> >Nick Maclaren (nm...@cus.cam.ac.uk) wrote:
> >
> >: For efficient use of modern architectures, we need languages in which we

> >: programmers can express what we want to say in a fashion that can be
> >: converted to the sort of instruction stream that executes fast. And
> >: the current trends are in exactly the opposite direction.
> >
> >Any suggestions on what such a language would look like? Hopefully
> >it will not look like ASM. Would it be possible to modify C/C++ in
> >such a way as to allow this?
>
> Well, I would favour something using the more successful ideas of
> Algol 68 :-) But it would have to include some of the concepts of the
> stricter functional languages.
>
> >And finally, could anyone recommend ways to code in C/C++ that
> >avoid these problems and allow the optimiser to produce the
> >most efficient code? I assume virtual function calls would
> >be out, as well as all forms of aliasing.
>
> Yes. And, if you can code C without implying aliasing, I take my hat
> off to you! The biggest problem with attempting this is that you can
> only 'const qualify' at the top level and, for efficient code-generation,
> you need to do it at other levels. I believe that the new 'restrict' in
> C9X is a sort of recursive 'const', but have not seen a specification.

Being an assembly language programmer before learning C, I used
to think a bit like you are thinking after seeing the bad code
produced by aliasing problems and the like in C. However, after
reading "The C++ Programming Language" (second edition) by
Bjarne Stroustrup, and seeing that he actually tells you which
language constructs generate extra overhead, I have to disagree
about leaving language features out simply because they are "bad
for the optimizer/processor"; you can always tell the programmers
what is bad for the optimizer and the processor. Leaving out
useful language constructs, like virtual functions, will only
make the language hard to use, by modern standards, and restricts
the useability of the language by making certain applications
too difficult to implement in the language.

With the exception of aliasing problems and problems with
restructuring floating point expressions, I see very little
room for defining a "faster" C (for single processor systems
anyway).

The "restrict" qualifier would/will fix the aliasing problem
in C. Of course, the problem is that a lot of old code has to
be modified, which is usually rather costly.

It might not even be necessary to have "restrict", since
some of the worse aliasing problems could also be fixed by
specifying some kind of syntax for parallel execution of
statements/expressions. Such a syntax could come very handy
in vector code, for instance.

Most optimizing C compilers have optimization switches that
can be used to fix both the aliasing and the floating point
expression restructuring problems.

--
<-->
Vesa Karvonen | If I were speaking for my
Housemarque Games, Inc. | employer, you would not be
http://www.housemarque.com | reading this...

Joseph H Allen

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

In article <3432F8...@HousemarqueREM.fi>,
Vesa Karvonen <Vesa.K...@HousemarqueREM.fi> wrote:
>Nick Maclaren wrote:

>> Yes. And, if you can code C without implying aliasing, I take my hat
>> off to you! The biggest problem with attempting this is that you can
>> only 'const qualify' at the top level and, for efficient code-generation,
>> you need to do it at other levels. I believe that the new 'restrict' in
>> C9X is a sort of recursive 'const', but have not seen a specification.

More complicated key words. Yuck, that's just what C needs.

>Being an assembly language programmer before learning C, I used
>to think a bit like you are thinking after seeing the bad code
>produced by aliasing problems and the like in C.

I'm still waiting for someone to try a hardware fix for the aliasing
problem. One way is to make the internal registers more like a cache than
they are now. Then when you load a word into a register, you also load the
memory address where the word came from into some auxiliary part of the
register. The register snoops the bus at this point: if a subroutine
modifies the memory at that address, a bit in the register is set. If the
caller then accesses that register you get a trap or delay and a reload
occurs. Using this, you can have your compiler safely assume that there are
no aliases, and avoid doing tons of reloads after every subroutine call.

Of course this only solves the problem of aliases which depend on a single
memory location as their source, not complicated values which depend on
multiple memory locations. But in a language like C++, I would guess that
most of the aliases are like this anyway (like the base addresses of
objects, or whatever). I suppose you could even have more general alias
detection hardware which would allow you to detect if any of the sources of
a derived value were modified- it would just cost more.

Other hardware could conceivably solve more general alias and dependancy
problems than those caused by non-local subroutine calls. For example:

if a==z then x=5
y=x+3

causes a pipeline stall of some sort, because y is an alias for x, but
'y=x+3' can not occur until it is known whether the previous conditional is
going to modify x. The compiler can't do this because a and z could be
arbitrary non-constant values.

But with extra hardware, there are possibilities. Maybe you do y=x+3 at the
same time that you do the if(a==z), but also have a 'y=8' on standby, in
case the x=5 ended up happening. A branchless version of the above code
could be done with a few extra interesting flag bits, instructions which
occur only conditionally as on the ARM and a compiler which is aware of the
processor's pipeline structure.

--
/* jha...@world.std.com (192.74.137.5) */ /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Eric Hildum

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

Michael D. Nahas wrote:

>
> Ketil Z Malde wrote:
>
> So, is this a problem, and if so, how do we fix it. Yes, it is a problem,
> because any
> stall to the pipeline which prevents speculative execution is bad. How do
> we
> solve it? The first thing to do is to make programs which have this
> problem part
> of the SPEC Benchmarks. This will make it the focus of everyone's
> attention.
> The other thing that can be done is to look at Compiler optimizations -
> many of
> the indirect function calls can be eliminated, especially if all the source
> is compiled
> together. Separate compilation of object files requires general interfaces
> which
> prevent many optimizations.
>

No, it only only requires a reasonable object code format and linker.
That is, it requires a link editor such as exists under VMS which is
capable of selecting the branch or call instruction and addressing, not
the trival symbol table lookup and drop the bytes in the slot that Unix
linkers perform.

--
"The idea that Bill Gates has appeared like a knight in shining armour
to lead all customers out of a mire of technological chaos neatly
ignores the fact that it was he who, by peddling second-rate technology,
led them into it in the first place."-Douglas Adams

Eric Hildum
Eric....@Japan.NCR.COM

Torben AEgidius Mogensen

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

p...@vus002.telecom.com.au (P Murray) writes:

>Nick Maclaren (nm...@cus.cam.ac.uk) wrote:

>: For efficient use of modern architectures, we need languages in which we
>: programmers can express what we want to say in a fashion that can be
>: converted to the sort of instruction stream that executes fast. And
>: the current trends are in exactly the opposite direction.

>Any suggestions on what such a language would look like? Hopefully
>it will not look like ASM. Would it be possible to modify C/C++ in
>such a way as to allow this?

A possibility is SML using the region-based memory model. You would
still do pointer cahsing, but the regions model keep a fairly good
locality (elements of the same list stay close, even when other data
is allocated in-between). In SML you also avoid most of the aliasing
problems of C/C++. The technology still needs a few years to be
competitive with C. One problem is that the polymorphic type system in
SML introduces extra indirections (boxing). This can be solved if you
compile the entire program in one job, as the polymorphism can be
eliminated when all uses are known. Though, theoretically, this can
give an exponetial blow-up in code size, experiments show that the
code expansion is usually small.

You can use the regions model on C-like languages, but you would have
to give up on a lot of the 'dirty' features of C, such as type
casting and varargs.

>And finally, could anyone recommend ways to code in C/C++ that
>avoid these problems and allow the optimiser to produce the
>most efficient code? I assume virtual function calls would
>be out, as well as all forms of aliasing.

It is not enough that you avoid aliasing. The compiler must be able to
recognize that there is none. And this is not often possible,
especially if you use separate compilation. If a language is designed
such that aliasing doesn't occur or only does so in easily
identifiable contexts (e.g. array lookups), the compiler can use this
assumption safely, even without complex and inaccurate analysis.

Torben Mogensen (tor...@diku.dk)

Wilco Dijkstra

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

Joseph H Allen wrote:

> I'm still waiting for someone to try a hardware fix for the aliasing
> problem. One way is to make the internal registers more like a cache than
> they are now. Then when you load a word into a register, you also load the
> memory address where the word came from into some auxiliary part of the
> register. The register snoops the bus at this point: if a subroutine
> modifies the memory at that address, a bit in the register is set. If the
> caller then accesses that register you get a trap or delay and a reload
> occurs. Using this, you can have your compiler safely assume that there are
> no aliases, and avoid doing tons of reloads after every subroutine call.

I don't think it's a good idea to try to create expensive hardware
solutions
to the aliasing problem - it's software that created it in the first
place!
You just need to find the cases where aliasing is appropriate and ban
all other
uses. This is similar to gotos.
It's my belief that almost all aliasing is avoidable at low cost during
compilation. For example most aliasing is due to function calls. With
global
aliasing analysis the complete set of aliases can be calculated. Of
course
a good language which avoids most of the aliasing problems (eg. the ADA
parameter
passing mechanism) would help!

Nick Maclaren

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

In article <6108a1$1...@news-rocq.inria.fr>, har...@pauillac.inria.fr (Robert Harley) writes:
|> Nick Maclaren (nm...@cus.cam.ac.uk) writes:
|> >[...] The biggest problem with attempting this is that you can

|> >only 'const qualify' at the top level and, for efficient code-generation,
|> >you need to do it at other levels.
|>
|> Huh? What's wrong with 'double const*const*const triMat'?

How do you get a 'double *** triMat' to that? You have to put a
cast in, which essentially says "don't do ANY checking", and would
be equally happy with a 'double ** triMat' or 'double **** triMat'
argument.

|> >I believe that the new 'restrict' in
|> >C9X is a sort of recursive 'const', but have not seen a specification.
|>

|> The object accessed via pointers based on a given restricted pointer
|> is only accessed in that way. No recursive 'const' in sight.

Oh, hell. Well, that's not going to help a lot then, except for the
simplest of cases (which, admittedly, do include the BLAS).

Patrick F. McGehearty

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

In article <60vj9p$7...@vidar.diku.dk>,

Torben AEgidius Mogensen <tor...@diku.dk> wrote:
>p...@vus002.telecom.com.au (P Murray) writes:
>
>>Nick Maclaren (nm...@cus.cam.ac.uk) wrote:
>
>>: For efficient use of modern architectures, we need languages in which we
>>: programmers can express what we want to say in a fashion that can be
>>: converted to the sort of instruction stream that executes fast. And
>>: the current trends are in exactly the opposite direction.
>
>>Any suggestions on what such a language would look like? Hopefully
>>it will not look like ASM. Would it be possible to modify C/C++ in
>>such a way as to allow this?
>
>A possibility is SML using the region-based memory model. You would
>still do pointer cahsing, but the regions model keep a fairly good
>locality (elements of the same list stay close, even when other data
>is allocated in-between). In SML you also avoid most of the aliasing
>problems of C/C++. The technology still needs a few years to be
>competitive with C. One problem is that the polymorphic type system in
>SML introduces extra indirections (boxing). This can be solved if you
>compile the entire program in one job, as the polymorphism can be
>eliminated when all uses are known. Though, theoretically, this can
>give an exponetial blow-up in code size, experiments show that the
>code expansion is usually small.
>

Code expansion is not the problem. Compile time is.
Large development projects involve 100's of modules of code
in the 100,000 to 1,000,000 lines of code range. It only takes one
unbounded non-linear optimization algorithm to cause full-program compiles
to take so long that you start worrying about the halting problem. That is,
after a couple of days of computing, you don't know if the compiler is
in an infinite loop or still computing. The "couple of days" is not an
exaggeration. I've seen a 1000 line program take over a day due to a
single unbounded non-linear optimization algorithm.

As someone working professionally in compiler performance for the last few
years, I'll take any help I can get from language definition to bound the
amount of code that must be examined to determine if an optimization is
safe. Since I've been in graduate school (over 20 years ago), I've been
hearing about how this language feature or that can be optimized, but I've
seldom seen these optimizations actually be performed. It takes a lot of
work to cover all the cases, and rarely used language features are often
ignored when it comes time to enhance optimizers.

Nick is absolutely dead-on when he identifies the aliasing problem as one of
the most troubling in C. Fortran avoids that problem by defining it away.
Users are seldom aware of the compiler switches which say "ignore aliasing".
In addition, the switches are different for different platforms, so standard
make files for multiplatform products don't use them. A plausible
compromise for future languages would be to have a default that assumes
Fortran aliasing rules and to require a declaration to allow aliasing when
it is needed. Then most of the time, the compiler can generate fast code
and only when aliasing is strictly needed will optimization be degraded.

On the C++ virtual function issue, what is needed is more training of C++
programmers to avoid inefficient constructs except when they are really
necessary, and to provide whatever language advisories possible to help the
compiler along. While processor speeds continue to increase at exponential
rates, memory latencies are improving much more slowly. Ten years ago, the
cost of a non-cache memory access was less than 10 instructions. Four years
ago, it was less than 40 instructions. Now it is well over 100 instructions
on super-scalar architectures. The exact numbers will vary depending on
which architectural family you look at, but the trend line is clear. And
everything I'm aware of indicates that it will only get worse in the future.
Virtual functions that cannot be optimized to static addresses at compile or
link time are the enemy of performance, and future processor speed
improvements will only make situation more extreme.

- Patrick McGehearty
pat...@convex.hp.com

Robert Harley

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

Nick Maclaren (nm...@cus.cam.ac.uk) writes:
>[...] The biggest problem with attempting this is that you can
>only 'const qualify' at the top level and, for efficient code-generation,
>you need to do it at other levels.

Huh? What's wrong with 'double const*const*const triMat'?

>I believe that the new 'restrict' in
>C9X is a sort of recursive 'const', but have not seen a specification.

The object accessed via pointers based on a given restricted pointer
is only accessed in that way. No recursive 'const' in sight.

-- Rob.

Robert Harley

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

Nick Maclaren (nm...@cus.cam.ac.uk) wrote:

>har...@pauillac.inria.fr (Robert Harley) writes:
>> Nick Maclaren (nm...@cus.cam.ac.uk) writes:
>> >[...] The biggest problem with attempting this is that you can
>> >only 'const qualify' at the top level and, for efficient code-generation,
>> >you need to do it at other levels.
>>
>> Huh? What's wrong with 'double const*const*const triMat'?
>
>How do you get a 'double *** triMat' to that? You have to put a cast in,

Well sure, if for some strange reason you wanted to turn a
pointer-pointer-pointer into a pointer-pointer, C allows it via a cast.
But why on earth would you want to do that? I can only assume that
you meant 'double ** triMat' in which case assign it:
double const*const*const constTriMat;
constTriMat = triMat;

or pass it as an argument:
double myFunc(double const*const*const bla);
myFunc(triMat);

or something. Type qualifiers don't affect type compatibility, so
it's fine to add a 'const' in this manner. It's also OK discard a
'const' but compilers give a warning if you do so.


>which essentially says "don't do ANY checking", [...]

Whereas your beloved Fortran checks that the user's assumptions about
lack of aliasing are correct, perhaps?


>> The object accessed via pointers based on a given restricted pointer

>> is only accessed in that way. [...]


>
>Oh, hell. Well, that's not going to help a lot then, except for the

>simplest of cases [...]

Huh?

-- Rob.

Nick Maclaren

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

In article <610rbh$5...@news-rocq.inria.fr>,
Robert Harley <har...@pauillac.inria.fr> wrote:

[ Correctly pointing out that I had miscounted the number of '*'s in
the previous posters C type definition. ]

>... Type qualifiers don't affect type compatibility, so


>it's fine to add a 'const' in this manner. It's also OK discard a
>'const' but compilers give a warning if you do so.

This is getting rather off-group, but I don't want anyone to be taken in.
Type qualifiers DO affect type compatibility, your example is incorrect
C, and many compilers will refuse to compile it at all.

I could explain why such constructions are needed, and the consequences
for both efficiency and 'software engineering', but these newgroups are
not appropriate places for that.

Andrew Glew

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to Nick Maclaren

> Well, I would favour something using the more successful ideas of
> Algol 68 :-) But it would have to include some of the concepts of the
> stricter functional languages.

Nick: you and I both love Algol68!

But when I have looked at Algol68 compiler produced assembly code, it looks
remarkably like C++.

The best thing that Algol68 had going for it is that it allowed many of the
virtual function decisions to be made lexically, at compile time. But only, I
believe,
in the absence of separate compilation (or in the presence of a smart linker).

Stanley Chow

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

In article <610dd5$ia0$1...@lyra.csx.cam.ac.uk>,
Nick Maclaren <nm...@cus.cam.ac.uk> wrote:

>In article <6108a1$1...@news-rocq.inria.fr>, har...@pauillac.inria.fr (Robert Harley) writes:
>|> Nick Maclaren (nm...@cus.cam.ac.uk) writes:
>|> >[...] The biggest problem with attempting this is that you can
>|> >only 'const qualify' at the top level and, for efficient code-generation,
>|> >you need to do it at other levels.
>|>
>|> Huh? What's wrong with 'double const*const*const triMat'?
>
>How do you get a 'double *** triMat' to that? You have to put a
>cast in, which essentially says "don't do ANY checking", and would
>be equally happy with a 'double ** triMat' or 'double **** triMat'
>argument.

Or if the "double ***" is a type macro somewhat that may or may not
be changed from compile to compile.


--
Stanley Chow; sc...@nortel.ca, (613) 763-2831
Nortel/BNR, PO Box 3511 Station C, Ottawa, Ontario
Me? Represent other people? Don't make them laugh so hard.

Jon Leech

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

In article <60uek7$fon$1...@lyra.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>For efficient use of modern architectures, we need languages in which we
>programmers can express what we want to say in a fashion that can be
>converted to the sort of instruction stream that executes fast. And
>the current trends are in exactly the opposite direction.

Perhaps language designers are more concerned with efficient use of
programmers than efficient use of hardware. What we need to do on today's
hardware may be completely inappropriate 10 years down the road, while
fundamental data structures remain pretty much the same.
Jon
__@/

bri...@bit3.com

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

>On the C++ virtual function issue, what is needed is more training of C++
>programmers to avoid inefficient constructs except when they are really
>necessary, and to provide whatever language advisories possible to help the
>compiler along. While processor speeds continue to increase at exponential
>rates, memory latencies are improving much more slowly. Ten years ago, the
>cost of a non-cache memory access was less than 10 instructions. Four years
>ago, it was less than 40 instructions. Now it is well over 100 instructions
>on super-scalar architectures. The exact numbers will vary depending on
>which architectural family you look at, but the trend line is clear. And
>everything I'm aware of indicates that it will only get worse in the future.
>Virtual functions that cannot be optimized to static addresses at compile or
>link time are the enemy of performance, and future processor speed
>improvements will only make situation more extreme.

There's a reason Java defined all methods as virtual methods- and why it was
bad that C++ didn't. One of the big advantages of OO programming is code
reuse- the problem with this is that you never know, when you write an object,
what uses you may put that object to months or years from now. In fact, it
is the mark of a good object design when it's used in ways the original
author never thought of. The cost of this is (currently) an extra memory
reference, which is almost certainly not cached, and possibly even swapped
out.

We're trying to get away from the model where "this code will only be used
in this way in this program" is the norm. Two technologies that may make
the situation better, as I see it:

1) Just in time compliation, which can say "In the general case,
this needs to be a virtual function, but in this specific case I know it
will always be this function."

2) Large, native system libraries, which should be written and
compiled by people who really know the system, and only for that system,
probably with -no-aliasing flags set (among others).

But, in the end, we still have the basic problem: CPUs are way faster than
main memory and getting faster. _This_ is the problem, not "bad programming".
There is a solution to it (faster memory), but wether or not it's worth it
will be determined by the market. And we are seeing, at least in the
workstations, more attention being paid to the memory architectures to help
allieviate this problem. You're also seeing bigger caches, the other way
to allieviate the problem.

I don't speak for Bit 3.

Brian.

>
>- Patrick McGehearty
>pat...@convex.hp.com

Stefan Monnier

unread,
Oct 2, 1997, 3:00:00 AM10/2/97
to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> In article <5l90wd2...@tequila.systemsz.cs.yale.edu>,
> Stefan Monnier <monnier+comp/arch/news/@tequila.cs.yale.edu> wrote:
> >I believe everyone here is aware of the fact that indirection is *the*
> >problem. The question was "why should indirection be more of an issue with
> >C/C++ than with Pascal, Algol, ... ?"
> In a word - "aliasing". With multiple levels of indirection, a conforming
> C compiler has to assume that almost every location can be aliased, and
> therefore must make a store reference every time. Almost all fancy

I'm sorry, but I can't seem to remember that Pascal disallows aliasing (I
honestly don't know about Algol). Furthermore, do you really believe that
aliasing information would help for GUI-like code ?

Do you really think the problem is due to coding style (like the Xwin code) and
to programming languages (like C) rather than to what is being coded (like a
user interface rather than a numerical simulation) ?

> For efficient use of modern architectures, we need languages in which we
> programmers can express what we want to say in a fashion that can be
> converted to the sort of instruction stream that executes fast. And
> the current trends are in exactly the opposite direction.

Because the trend is to say "CPUs are fast enough, let's care about
programmer's convenience". You seem to care about Fortran code, so probably
more numerical-oriented, in which case convenience is still a secondary concern
whereas speed is paramount. I can only agree to the fact that languages should
be developped to better take advantage of current (and future) technology, but
programmer's convenience cannot be sacrificed so easily so it's a very hard
design problem and I still haven't heard of any convincing proposition (the
best I've heard is in the NA area where Sisal sounds pretty interesting).


Stefan

Richard A. O'Keefe

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

Stefan Monnier <monnier+comp/arch/news/@tequila.cs.yale.edu> writes:

>I'm sorry, but I can't seem to remember that Pascal disallows aliasing (I
>honestly don't know about Algol). Furthermore, do you really believe that
>aliasing information would help for GUI-like code ?

Do you mean "code like that which USES a GUI" or
"code like that which IMPLEMENTS a GUI"?

I note that one of my favourite languages (Clean) has an elegant
GUI *using* interface which has complete freedom from aliasing.
Clean is a *pure* lazy functional language, there are no assignments
at all. The intention is that the compiler will _implement_
"make a record/array just like this one except for these few
differences" destructively, _but_ only when it is safe, and the
type system tells the compiler at compile time which references
are the only reference to such a datum. Let me offer an example:

:: Complex = { re :: Real, im :: Real }

conj_in_place : *Complex -> *Complex
conj_in_place c=:{im=i} = {c & im= ~i}

The * here in front of the type before the arrow means that a call to
this function is legal only if the compiler can prove that this is the
_only_ reference remaining to that record. The * in front of the type
after the arrow means that there will be no other references to the
result. Because the argument is 'dead' after the call, the compiler
is allowed to update it in place.

The uniqueness type system in Clean is surprisingly easy to use, and
it means that the compiler can be *certain* that aliasing is not taking
place.

(To be fair, since Clean is a higher order language with Haskell-ish
type classes, it has its own fair share of calling functions via pointers.)

As for code that *implements* a GUI, the bit-shuffling stuff that
gets done in the server would indeed benefit from a language that
allows more and better optimisation.

>Do you really think the problem is due to coding style (like the Xwin
>code) and to programming languages (like C) rather than to what is being
>coded (like a user interface rather than a numerical simulation) ?

Yep. I keep on pointing out to students here (with examples) that the
*really* big speedups come from high level reasoning that eliminates
operations completely, rather than from *manual* low-level fiddling.

Computer hardware has changed *immensely* in my lifetime. My intuition
about what is fast and what is slow keeps needing dramatic revision.
The one guideline that still works after 20 years is "keep your code as
clean as possible so that the compiler has a chance of figuring out
what's supposed to happen". (Well, there's another one: keep your
working set small; avoid going to slow memory if you can; if you can't
avoid that, avoid the disc; if you can't avoid disc, avoid tape; and
whatever you do, don't use paper tape.)

>> For efficient use of modern architectures, we need languages in which we
>> programmers can express what we want to say in a fashion that can be
>> converted to the sort of instruction stream that executes fast. And
>> the current trends are in exactly the opposite direction.

>Because the trend is to say "CPUs are fast enough, let's care about
>programmer's convenience".

Hmm. Current trends include C, C++, and Java, which are all of them
remarkably opposed to the programmer's convenience. (I have a hard time
believing how clumsy and verbose Java is.)

--
John Æneas Byron O'Keefe; 1921/02/04-1997/09/27; TLG,TLTA,BBTNOTL.
Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.

Steven Correll

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

In article <610gbg$k...@deadzone.rsn.hp.com>,

Patrick F. McGehearty <pat...@convex.COM> wrote:
>On the C++ virtual function issue, what is needed is more training of C++
>programmers to avoid inefficient constructs except when they are really
>necessary, and to provide whatever language advisories possible to help the
>compiler along...

>Virtual functions that cannot be optimized to static addresses at compile or
>link time are the enemy of performance, and future processor speed
>improvements will only make situation more extreme.

Telling C++ programmers in 1997 to avoid virtual functions because they
make the program run slower is a lot like telling Fortran programmers
in 1970 to avoid subroutine calls for that same reason. They are all
too likely to follow your advice, and given the choice of maintaining a
ten-year-old C++ program with "virtual" missing from most but not all
of its functions, versus a twenty-year-old Fortran program which was
written as a single 100000 line subroutine, I might choose the Fortran
program--at least it's provably possible to restructure it based solely
on static information. But probably I'd just look for a new line of
work, like clearing stuck toilets.
--
Steven Correll == PO Box 66625, Scotts Valley, CA 95067 == s...@netcom.com

Richard A. O'Keefe

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

har...@pauillac.inria.fr (Robert Harley) writes:
>Well sure, if for some strange reason you wanted to turn a
>pointer-pointer-pointer into a pointer-pointer, C allows it via a cast.
>But why on earth would you want to do that? I can only assume that
>you meant 'double ** triMat' in which case assign it:
> double const*const*const constTriMat;
> constTriMat = triMat;

>or pass it as an argument:
> double myFunc(double const*const*const bla);
> myFunc(triMat);

Doesn't work.

There's an illuminating discussion in the rationale of the POSIX.1
standard about why they didn't chuck a const or two into the prototypes
for the exec() functions.

"Specifying two levels of const-qualification for the argv[] and envp[]
parameters for the exec functions may seem to be the natural choice,
given that these functions do not modify either the array of pointers
or the characters to which the function (sic) points, but this would
disallow existing correct code. Instead, only the array of pointers
is noted as constant. The table of assignment compatibility for
dst = src, derived from the C standard, summarizes the compatibility:

dst const char * const char
src char *[] char *[] const [] * const []
char *[] VALID VALID
const char *[] VALID VALID
char * const [] VALID
const char * const [] VALID

" Entries not marked VALID are not valid at all.

THIS is Nick Maclaren's point. Given
char **triMat;
the declaration
double const * const * const constTriMat = triMat;
is simply illegal.

It's illegal because there are plausible implementations where it won't
work. Anyone who thinks it does clearly hasn't bothered to read the
standard, or even a good textbook (not there are many).

>Whereas your beloved Fortran checks that the user's assumptions about
>lack of aliasing are correct, perhaps?

Yes, there are indeed static checkers available for Fortran that will
check such assumptions. I have not used any myself, but I am aware of
two.

Michael Williams

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

In article <610gbg$k...@deadzone.rsn.hp.com>,
Patrick F. McGehearty <pat...@convex.COM> wrote:
> That is,
>after a couple of days of computing, you don't know if the compiler is
>in an infinite loop or still computing. The "couple of days" is not an
>exaggeration. I've seen a 1000 line program take over a day due to a
>single unbounded non-linear optimization algorithm.

I once discovered a case where our compiler's CSE would take O(e^N)
to compile code, where N was the number of variables in the expression.

(The code was x5=x1+x2+x3+x4; x6=x2+x3+x4+x5; ... x4=xN+x1+x2+x3.)

I calculated that a simple 52 line program would take order of 250
years per line to compile. Needless to say, it got fixed.

Mike.

Robert Harley

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

Richard A. O'Keefe (o...@goanna.cs.rmit.edu.au) wrote:
>Doesn't work.
>
>There's an illuminating discussion in the rationale of the POSIX.1
>standard [...]

>
>THIS is Nick Maclaren's point. Given
> char **triMat;
>the declaration
> double const * const * const constTriMat = triMat;
>is simply illegal.

Sure, and it's "simply illegal" to assume that integer arithmetic is
done in binary.


>It's illegal because there are plausible implementations where it won't work.

>[...ad hominem deleted...]

But guess what, there are other implementations that are not only
plausible but actually exist! And they have implementation-dependent
features left open by the standard, and extensions not specified by
the standard! Gasp!

Over here in the real world, we compile our programs with them:
compilers for which integers are in binary and type qualifiers
give warnings sometimes, not errors. If you are allergic to such a
warning, stick in a cast as Nick suggested.


>>Whereas your beloved Fortran checks that the user's assumptions about
>>lack of aliasing are correct, perhaps?
>
>Yes, there are indeed static checkers available for Fortran that will
>check such assumptions. I have not used any myself, but I am aware of
>two.

Ah, so in the case of C refer to the standard only, in order to
criticize limitations in it. But for other languages, dredge up extra
features you "are aware of" in implementations you've never used.

Everybody is fooled by this devastating ploy, so we'll now delete our
programs, apps and O.S.es, the transport protocols that will send this
message and so on, and rewrite them in a superior language.

-- Rob.
.-. Robert...@inria.fr .-.
/ \ .-. .-. / \
/ \ / \ .-. _ .-. / \ / \
/ \ / \ / \ / \ / \ / \ / \
/ \ / \ / `-' `-' \ / \ / \
\ / `-' `-' \ /
`-' Linux + 500MHz Alpha + 256MB SDRAM = heaven `-'

Alex Colvin

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

>Hmmm... I think Pascal had the right approach when it came to reference
>parameters: the effects of passing the same variable into a function
>via more than one VAR parameter were "undefined", which left the
>language implementor more scope for optimization.

Last time I mentioned this, I was informed that this restrictition had
been dropped.

Support for aliasing may be a good thing. Undefining the effect of
aliasing looks likely to lead to seriously obscure bugs, unless you can
check this behavior at compile time.

--
Alex Colvin
alex....@dartmouth.edu


Mike Smith

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

Nick Maclaren <nm...@cus.cam.ac.uk> wrote in article
<60uek7$fon$1...@lyra.csx.cam.ac.uk>...

>
> But what about (P = A->B ? (Q = P->C ? Q->D : P->E) : A->F)? To handle
> THAT, you need speculative pre-execution, which was dabbled with in the
> 1960s and 1970s, and quite rightly dropped as an abomination. And, if

Eh? Somebody better tell this to the people at Intel, AMD,
and Cyrix, all of which have processors that perform specul-
ative execution, IIRC.

--Mike Smith

Andrew Glew

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

Y'all may be in a terminology battle:

Current usage:

Speculative execution = executing instructions before you know for certain that
you need to execute them, usually as a result of branch prediction (but also
assuming cache hit, no fault, etc., etc.) Usually speculative execution implies
only one predicted path.

Eager execution = executing both sides of some flakey branches.


Stefan Monnier

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

Wilco Dijkstra <Wilco.D...@NOSPAMarm.com> writes:

> With global aliasing analysis the complete set of aliases can be
> calculated.

A problem that requires global analysis to be solved is a BIG problem that
should be gotten rid of instead of "solved". Especially because global analysis
is not possible in some settings (that I consider as desirable, by the way).


Stefan

Stefan Monnier

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

jha...@world.std.com (Joseph H Allen) writes:
> I'm still waiting for someone to try a hardware fix for the aliasing
> problem.

It's not so simple: knowledge of "no-aliasing" is not used just to spare a
reload at each use of some value, it's used for dependency analysis to allow
code reordering.


Stefan

Nick Maclaren

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

In article <01bcd02b$58e67280$0200000a@kld_mcs>,

Please don't confuse preloading and branch prediction with speculative
execution. I gave a very simple example, which could be done solely by
preexecuting address calculation instructions but, in general, that is
not the case. I am pretty sure that none of the processors you mention
do any significant amount of speculative execution.

Rob Barris

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

In article <613p2b$g1r$1...@lyra.csx.cam.ac.uk>, nm...@cus.cam.ac.uk (Nick
Maclaren) wrote:

> In article <01bcd02b$58e67280$0200000a@kld_mcs>,
> Mike Smith <kld_m...@earthlink.nospam.net> wrote:
> >Nick Maclaren <nm...@cus.cam.ac.uk> wrote in article
> ><60uek7$fon$1...@lyra.csx.cam.ac.uk>...
> >>
> >> But what about (P = A->B ? (Q = P->C ? Q->D : P->E) : A->F)? To handle
> >> THAT, you need speculative pre-execution, which was dabbled with in the
> >> 1960s and 1970s, and quite rightly dropped as an abomination. And, if
> >
> >Eh? Somebody better tell this to the people at Intel, AMD,
> >and Cyrix, all of which have processors that perform specul-
> >ative execution, IIRC.
>
> Please don't confuse preloading and branch prediction with speculative
> execution. I gave a very simple example, which could be done solely by
> preexecuting address calculation instructions but, in general, that is
> not the case. I am pretty sure that none of the processors you mention
> do any significant amount of speculative execution.

I think what goes on inside processors such as PentiumPro / Pentium-II
as well as PowerPC-604 (and several others, Alpha included? dunno) very
much qualifies as "speculative execution", perhaps it would help if you
shared your definitions of "speculative" and "significant".

The fact that these processors employ register renaming and are able to
generate results speculatively and later choose whether or not to make
those results part of the visible state certainly sounds like "speculative
execution" to me.

Rob Barris Quicksilver Software Inc. rba...@quicksilver.com
* Opinions expressed not necessarily those of my employer *

Jason....@anu.edu.au

unread,
Oct 3, 1997, 3:00:00 AM10/3/97
to

In <3433AA...@NOSPAMarm.com> Wilco Dijkstra <Wilco.D...@NOSPAMarm.com> writes:

>It's my belief that almost all aliasing is avoidable at low cost during

>compilation. For example most aliasing is due to function calls. With
>global

>aliasing analysis the complete set of aliases can be calculated. Of
>course
>a good language which avoids most of the aliasing problems (eg. the ADA
>parameter
>passing mechanism) would help!

Hmmm... I think Pascal had the right approach when it came to reference


parameters: the effects of passing the same variable into a function
via more than one VAR parameter were "undefined", which left the

language implementor more scope for optimization. The mechanism
was perhaps less important than the simple fact that there was no
guarantee that reference parameters would work correctly in the presence
of aliasing, so you didn't do it.

Conclusion: C/C++ would be nicer languages if they had much more undefined
behaviour. :-) :-)

(Not having any C++ standards around, I ought to ask:
Are C++ references guaranteed to work in the presence of aliasing?)
_____________________________________________________________________________
Jason Ozolins (Jason....@anu.edu.au) | Put your hands over your eyes.
Teaching and Learning Technology Support Unit| Jump out of the plane.
Australian National University | There is no pilot.
My opinions, not ANU's. | You are not alone.
--

H. Eckert

unread,
Oct 4, 1997, 3:00:00 AM10/4/97
to

nm...@cus.cam.ac.uk (Nick Maclaren):
> I am, unfortunately, not joking when I say that there are critical paths
> in the X Windowing System etc. that need that sort of optimisation for
> efficiency.

If you define "efficiency" as getting the maximum possible data
throughput from a given machine, you're probably right since a general
purpose machine cannot suit every task equally well. But what does
"efficiency" mean to a GUI ? It is certainly nice to have a machine
that does every graphics operation faster than you can visibly follow
but that alone doesn't help a lot...

> Most foully, the paths can run ahead at different rates, and you
> STILL have to get interruption and recovery right :-(

So you need some synchronisation points where you lose some possible
CPU cycles in NOPs. Some friction is always present; the task for
optimization is to reduce them as much as possible...

> 2) EITHER Intel have reinvented the "Model 91 pipe-line drain"
> instruction under another name OR I could write a valid C program that
> would execute incorrectly.

Do you care to elaborate ? IMHO this would be a sign for a broken
compile which produces incorrect code.

> My point about languages is that it becomes possible to implement reliable
> speculative execution relatively easily, IF you can assume that certain
> constraints are met. In particular, you know that the operands may be
> invalid, but they will not change between the preloading and the final use.
> And that is PRECISELY what you do not get with C or any of its derivatives.

Ok, so you want to have a language where you can put additional
optimization hints into the code. This is fine but pretty useless
unless the programmer knows a lot more about performance implications
of certain high-level instructions. This is the school of "you need
to know how the computer works from inside to produce good code" to
which I agree to some extent. This is why assembly language is still
a part in a university computer science degree.

Greetings,
Ripley
--
Ach komm, ich denke schon, daß die Raver sehr tolerant sind. Gerade bei dieser
Love Parade. Du mußt dir nur mal ansehen, wer da sein wird: Westbam, Marusha,
Blümchen, ein Wagen vom Marienhof ... Also, das würde meine Toleranzgrenze weit
überschreiten. (M. Friese in bln.termine, <5pq8le$g1s$1...@news.cs.tu-berlin.de>)

Nick Maclaren

unread,
Oct 4, 1997, 3:00:00 AM10/4/97
to

In article <rbarris-ya0232800...@206.82.216.1>,

Rob Barris <rba...@quicksilver.com> wrote:
>
> I think what goes on inside processors such as PentiumPro / Pentium-II
>as well as PowerPC-604 (and several others, Alpha included? dunno) very
>much qualifies as "speculative execution", perhaps it would help if you
>shared your definitions of "speculative" and "significant".

Yes, I think that Andrew Glew and you are right, and there is considerable
terminological confusion. A paragraph that I left out of my previous
posting said that there was no clear boundary between simple operand
pre-loading and speculative/eager/etc/ execution in their fullest sense.
However, I was referring to actions like the following:

Preloading operands based on,
address calculations based on,
both paths of a conditional branch based on,
some tests based on,
preloaded operands based on,
address calculations based on,
both paths of a conditional branch based on,
a test that has not yet been evaluated.

I am, unfortunately, not joking when I say that there are critical paths
in the X Windowing System etc. that need that sort of optimisation for

efficiency. And remember that the address calculations will, in general,
need both addition and multiplication units. Most foully, the paths can


run ahead at different rates, and you STILL have to get interruption
and recovery right :-(

Naturally, I could be wrong, but I understood the Pentium XXX designs to
include only very limited speculative execution facilities. If I am
wrong, I am pretty certain of the following:

1) Interrupt handling has been broken again (though this is getting
be an Intel tradition, and they have got it codified in standards).

2) EITHER Intel have reinvented the "Model 91 pipe-line drain"
instruction under another name OR I could write a valid C program that
would execute incorrectly.

My point about languages is that it becomes possible to implement reliable


speculative execution relatively easily, IF you can assume that certain
constraints are met. In particular, you know that the operands may be
invalid, but they will not change between the preloading and the final use.
And that is PRECISELY what you do not get with C or any of its derivatives.

Andy "Krazy" Glew

unread,
Oct 5, 1997, 3:00:00 AM10/5/97
to Nick Maclaren

> I am, unfortunately, not joking when I say that there are critical paths
> in the X Windowing System etc. that need that sort of optimisation for
> efficiency. And remember that the address calculations will, in general,
> need both addition and multiplication units. Most foully, the paths can
> run ahead at different rates, and you STILL have to get interruption
> and recovery right :-(

Actually, precise interrupts seem to be a solved problem. There is some
residual advantage to imprecise interrupts and exceptions, as in Alpha, but it
mainly relates to buffer sizes. Further applications of the exception hierarchy
(see the earlier thread on out-of-order retirement) narrow it further, and, in
fact, seem to surpass exposed imprecise interrupts.

> Naturally, I could be wrong, but I understood the Pentium XXX designs to
> include only very limited speculative execution facilities. If I am
> wrong, I am pretty certain of the following:

Depends on what you mean by limited:

The Pentium Pro and Pentium II have a ReOrder Buffer of 40 instructions, a load
buffer of 16 (?), and a store buffer of 12. The reservation station size is 20.

Thus, 20 instructions can be waiting to execute at any time. At least an
additional 20 may have executed, and be waiting for the oldest to execute. In
addition, there are many more pipestages of instructions in the process of
being fetched.

There is no limit on how many predicted branches may be speculated past. I.e.
every entry in the reorder buffer, and all the instructions in earlier
pipestages, may be correctly predicted branches.

Cache misses can be performed speculatively: L1 I and D, L2. TLB misses can be
performed speculatively.

Stores are not performed speculatively. Nor are accesses to memory mapped I/O
and ordinary I/O spaces.

Is this "significant" speculative execution? I think that it is under any
reasonable definition of significant. It is not, of course, eager execution.
What's left is increasing the size of buffers, and tuning the algorithms to be
more efficient of resources.

The only significant restrictions on speculative execution are

(1) I/O not being done speculatively - which, by the way, I think can now be
done away with.

(2) The single Instruction Pointer (Program Counter) for instruction fetch.

and maybe

(3) The Pentium Pro family's mispredicted branch recovery is not so efficient.
If code is divided into segments between mispredicted branches, P6 can only
have one such segment under execution at any time (mo9dulo a limited amount of
overlap). This wasn't a limitation in our pipeline because the latencies
worked out right, and it should be obvious how it can be fixed.


> 1) Interrupt handling has been broken again (though this is getting
> be an Intel tradition, and they have got it codified in standards).

Depends. The APIC did break some interrupt code, but this was an APIC(Advanced
Programmable Interrupt Controller) issue, introduced in the i486 generation and
standardized in the Pentium (P54C) generation. Not an issue with the dynamic
execution model of the P6 (out-of-order, speculative).

> 2) EITHER Intel have reinvented the "Model 91 pipe-line drain"
> instruction under another name OR I could write a valid C program that
> would execute incorrectly.

Nope. "Serializing instructions" do exist. But almost nobody uses them.

I don't think that you can write a valid uniprocessor C program that will
break.

Gianni Mariani

unread,
Oct 5, 1997, 3:00:00 AM10/5/97
to

By this definition the MIPS R10K and the HP PA8XXX do speculative
execution. To some extent however, the MIPS compilers will
emit "eager" code in the sense that in some cases, both sides
of an if () statement are computed if scheduling of instructions
that way are an advantage.

--

_ ` _ ` Webtv Service Globalization
/ \ / / \ /-- /-- /
/ // / / / / / / / We mean to be understood
\_/ \ \_ \/ /_/ /_/ o
/ /
\_/ (415) 614 6043 Opinions mine etc ...

Richard A. O'Keefe

unread,
Oct 6, 1997, 3:00:00 AM10/6/97
to

har...@pauillac.inria.fr (Robert Harley) writes:

>Richard A. O'Keefe (o...@goanna.cs.rmit.edu.au) wrote:
>>Doesn't work.
>>
>>There's an illuminating discussion in the rationale of the POSIX.1
>>standard [...]
>>
>>THIS is Nick Maclaren's point. Given
>> char **triMat;
>>the declaration
>> double const * const * const constTriMat = triMat;
>>is simply illegal.

>Sure, and it's "simply illegal" to assume that integer arithmetic is
>done in binary.

This is completely wrong, and betrays a total lack of familiarity
with the history or content of the C standard.

The C standard *requires* binary integer arithmetic.
C89 was carefully worded to allow any of
sign-and-magnitude (like Unisys A series machines)
ones complement (like Unisys 2200 machines)
twos complement
But the C standard is quite clear that all integer arithmetic results
must be as if one of these systems were in use.

In contrast, the mismatched pointer example is one where the
compiler is not merely not required to make it work, the
compiler is *REQUIRED* to issue a diagnostic.

>But guess what, there are other implementations that are not only
>plausible but actually exist!

So? The C standardisers were not interested in limiting C to only
the machines where your prejudices happen to be valid. The C9X
revisers have no interest in such limitations either.

>And they have implementation-dependent
>features left open by the standard, and extensions not specified by
>the standard! Gasp!

Yeah, but mixing up pointers to differently qualified objects is NOT
only of those things left open.

>Over here in the real world, we compile our programs with them:
>compilers for which integers are in binary

which the C standard clearly and explicitly requires

>and type qualifiers
>give warnings sometimes, not errors.

Then you are not using standard C compilers. That is your privilege.

>If you are allergic to such a
>warning, stick in a cast as Nick suggested.

No. The diagnostics are required because a cast _wouldn't_ fix the
problem.

>>>Whereas your beloved Fortran checks that the user's assumptions about
>>>lack of aliasing are correct, perhaps?
>>
>>Yes, there are indeed static checkers available for Fortran that will
>>check such assumptions. I have not used any myself, but I am aware of
>>two.

>Ah, so in the case of C refer to the standard only, in order to
>criticize limitations in it.

Are you serious?

I have not criticised any limitation in the C standard at all.
I have pointed out that the kind of Stupid Pointer Tricks (comp.std.c
idiom) you want to play are explicitly forbidden by the standard.
I THINK THIS IS RIGHT. I am not criticising the standard there,
I am approving it.

>But for other languages, dredge up extra
>features you "are aware of" in implementations you've never used.

Those checkers enforce aspects of the Fortran standard.
C compilers are required to enforce aspects of the C standard.
As for me, I use the best static checkers for C that I can get
my hands on, and if I did as much Fortran coding as I do C,
I would make sure to use the best static checkers for Fortran
that I could get my hands on.


>Everybody is fooled by this devastating ploy, so we'll now delete our
>programs, apps and O.S.es, the transport protocols that will send this
>message and so on, and rewrite them in a superior language.

Oh grow up.

Where did I say C was inferior?
On *one specific point*, I provided factual support for Nick Maclaren's
claim. That has absolutely no bearing on whether one language is
better than the other *in general* or for any particular *nominated
purpose*.

C was not designed for high performance numeric computing, so it is
no disgrace that it isn't _quite_ as well suited to that task as
Fortran, which _was_ designed for high performance numeric computing.

Fortran was not designed for implementing operating systems, so it is
no disgrace that it (F95) isn't _quite_ as well suited to that task
as C, which _was_ designed for implementing operating systems.

C is good enough to stand on its own merits without people telling
lies about it.

Richard A. O'Keefe

unread,
Oct 6, 1997, 3:00:00 AM10/6/97
to

m...@coos.dartmouth.edu (Alex Colvin) writes:

>>Hmmm... I think Pascal had the right approach when it came to reference
>>parameters: the effects of passing the same variable into a function
>>via more than one VAR parameter were "undefined", which left the
>>language implementor more scope for optimization.

>Last time I mentioned this, I was informed that this restrictition had
>been dropped.

ISO 10206:1990, Extended Pascal has four kinds of parameters:

a: integer; pass by copy-in, modifiable
var b: integer; pass by reference, modifiable
protected c: integer; pass by copy-in, not modifiable
protected var d: integer; pass by reference, not modifiable

The confusing thing (Note 2, p55) is that a protected variable
parameter (protected var) may not be changed, but it may change.
This is the only mention of 'aliasing' that I can find in that
standard.

Of course this kind of thing is very upsetting to optimisers.
It means that in the sequence of statements

x := d;
<FOO!>
x := d;

where <FOO!> contains no other assignments to x, if <FOO!> contains
any assignments to non-local integer variables or procedure calls,
x will have to be reloaded, even tough _this_ procedure isn't allowed
to modify d directly. This is even more upsetting to human reasoning
than to optimisers.


>Support for aliasing may be a good thing. Undefining the effect of
>aliasing looks likely to lead to seriously obscure bugs, unless you can
>check this behavior at compile time.

Consider this:

C If this were F90, I'd have INTENT OUT stated explicitly
SUBROUTINE BREAK(X, Y)
REAL X, Y
X = 1.0
Y = 2.0
END

PROGRAM MAIN
REAL A(10)
INTEGER I, J

READ * I, J
CALL BREAK(A(I), A(J))
PRINT *, A(I), A(J)
END

What is the output of this program?
How is the compiler to check at compile time that I.NE.J?

It is important to distinguish between case where you WANT to support
aliasing (which is the expensive option) and cases where you DON'T
(which is the high performance option).

Ada 95 did the right thing here, requiring you to explicitly declare
any potentially aliased variables.

Aliasing is very confusing for people to deal with.
Of course, as soon as you have objects as in Smalltalk or Common Lisp or
Java, you have precisely the same kind of problem when the same object
reference can be passed (the _reference_ is passed by copy-in) to a
procedure.

Nick Maclaren

unread,
Oct 6, 1997, 3:00:00 AM10/6/97
to

In article <3437F16A...@cs.wisc.edu>,

"Andy \"Krazy\" Glew" <gl...@cs.wisc.edu> writes:
|>
|> The Pentium Pro and Pentium II have a ReOrder Buffer of 40 instructions, a load
|> buffer of 16 (?), and a store buffer of 12. The reservation station size is 20.
|>
|> There is no limit on how many predicted branches may be speculated past. I.e.
|> every entry in the reorder buffer, and all the instructions in earlier
|> pipestages, may be correctly predicted branches.
|>
|> Is this "significant" speculative execution? I think that it is under any
|> reasonable definition of significant. It is not, of course, eager execution.
|> What's left is increasing the size of buffers, and tuning the algorithms to be
|> more efficient of resources.

The traditional meaning of "speculative" included what people have now
separated off as "eager" (assuming that usage is now standard), and is
what I was referring to. In the cases that I was referring to, it is
essential to execute ahead of BOTH branches, and I was therefore
referring to what you call "eager execution". I made this quite clear
in the example immediately preceding the first paragraph you quoted.

The reason is that the same code is used for two objects, which are
likely to need different branch predictions. To get this "right" for
single-path speculative execution, you need to do branch prediction on
the basis of the DATA location as well as the CODE location.

To sidetrack a little, rip...@nostromo.in-berlin.de (H. Eckert) writes:

|> If you define "efficiency" as getting the maximum possible data
|> throughput from a given machine, you're probably right since a general
|> purpose machine cannot suit every task equally well. But what does
|> "efficiency" mean to a GUI ? It is certainly nice to have a machine
|> that does every graphics operation faster than you can visibly follow
|> but that alone doesn't help a lot...

Unfortunately, most GUIs (and definitely including anything based on X)
make timing assumptions about the responses. There are a very large
number of X applications where a 0.01 second delay will cause them to
behave incorrectly - and, in fact, I am using one right now (xrn)!
These may be bugs in the application, but are actually almost unfixable,
as they are really due to design errors in X.

I have had similar effects in OS/2, Windows 3.1 and MacOS. I have never
used Microsoft's latest utterings, but have reason to believe that the
same would apply.

|> Is this "significant" speculative execution? I think that it is under any
|> reasonable definition of significant. It is not, of course, eager execution.
|> What's left is increasing the size of buffers, and tuning the algorithms to be
|> more efficient of resources.

Yes, it is. But not under any reasonable definition of "speculative"!
Using your terminology, consider my posting to refer to "significant
eager execution".

Graham Hughes

unread,
Oct 6, 1997, 3:00:00 AM10/6/97
to

-----BEGIN PGP SIGNED MESSAGE-----

>>>>> "Richard" == Richard A O'Keefe <o...@goanna.cs.rmit.edu.au> writes:

Richard> I think perhaps it is time instead that _you_ took your
Richard> blinkers off. Fortran is not my favourite language by a long
Richard> way, but it has come a *long* way in the last 20 years, and
Richard> is by most criteria a more 'modern' language than C.

Well, that's partly because F90/95 really isn't the same language as
F77.

Certain language communities evolve in different ways; the Wirth crowd
tends to invent new languages with different names but similar M.O.'s
(Pascal->Modula 2->Ada|Oberon), LISP reinvents itself with different
semantics but almost exactly the same syntax, and Fortran invents new
languages and calls them the same thing. Note that both Fortran and
LISP are longer lived than anything out there, so this may not be an
entirely bad thing....

What was the epigram? ``I don't know what language we'll be using in
2030, but I know it'll be called FORTRAN?''
- --
Graham Hughes <graham...@resnet.ucsb.edu> / Never attribute to malice that
MIME OK, PGP preferred / which can adequately be
http://A31147.resnet.ucsb.edu/~graham/ / explained by stupidity.
/ -- Hanlon's Razor

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3
Charset: noconv
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface

iQCVAwUBNDnLziqNPSINiVE5AQEp1wP+LTa1OFoS+oBQxei5lQnu7RUUDUqe2kbt
FeNcuZyri5IL8D9qvjdG9hssfly0FbAQ5Pe34Cvy2MIav4FVc3XTtdtN1UTGVsoc
pKc2nS06sjXiFXlrGmWo8XEliaZy3piIKGIOSdcjaUPFeNxzJVfq/8BHSdjQI0QB
oo2J/Ey6XDM=
=Bd4P
-----END PGP SIGNATURE-----

boo

unread,
Oct 6, 1997, 3:00:00 AM10/6/97
to

Richard A. O'Keefe wrote:
> C was not designed for high performance numeric computing, so it is
> no disgrace that it isn't _quite_ as well suited to that task as
> Fortran, which _was_ designed for high performance numeric computing.
>
> Fortran was not designed for implementing operating systems, so it is
> no disgrace that it (F95) isn't _quite_ as well suited to that task
> as C, which _was_ designed for implementing operating systems.

Oh puh-leeze.

Who cares what it was designed for? C is far more powerful than
Fortran, especially for the kind of work that some of us
do in the real world. If Rob can show me how to do something
with a particular compiler that speeds up what I need done,
I am listening to him, not to you.

Time to dismount that Ivory Tower of yours.

Nick Maclaren

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

In article <61cqkc$9...@news-rocq.inria.fr>,
har...@pauillac.inria.fr (Robert Harley) writes:
|> o...@goanna.cs.rmit.edu.au (Richard A. O'Keefe) writes:
|> >There is precisely *one* construction present in C which is not
|> >supported as well or better by Fortran, and that is pointer arithmetic.
|>
|> That sounds suspiciously like "There is precisely *one* thing present
|> on car X which is not on car Y, and that is wheels".

Are you aware of the facts (a) that unchecked pointer arithmetic
is the source of most of the foulest bugs in existing software,
(b) that many software engineering experts believe that it is
impossible to use safely and (c) that a vast amount of experience
shows that it can usually be omitted with little hassle to the
programmer?

Your analogy is more than a little inappropriate.

|> Hey, if some people are under the delusion that C sucks but
|> Fortran/Lisp/etc are great, that's just fine. But please resist the
|> urge to barge into every thread and yammer on about it.

The original question was "what effect do you think that the amazing
new features will have", and I answered "very little, because of the
loathesome coding style (and language properties) of many standard
applications." You may regard that as irrelevant, but you would be
wrong.

Martin Rodgers

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

Serge Pachkovsky wheezed these wise words:

> Who cares what it was designed for? Fortran is far more powerful than
> C, especially for the kind of work that some of us do in the real world.

Which egg orientation do you prefer?
--
<URL:http://www.wildcard.demon.co.uk/> You can never browse enough
Please note: my email address is gubbish
assert(got(coat)).

Martin Rodgers

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

Robert Harley wheezed these wise words:

> Hey, if some people are under the delusion that C sucks but
> Fortran/Lisp/etc are great, that's just fine. But please resist the
> urge to barge into every thread and yammer on about it.

Let's ask the C/C++ if they could please follow the same advise. When
I started reading UseNet, for the first 3 years I saw massed attacks
from C/C++ programmers, massively crossposting, attacking Lisp.

If C++ is such a fine language, then do so many of its supporters
express their support by "dumping" on other languages? What are they
trying to prove? Ditto for C.

Now, can anyone tell me what was significant about 1995 that caused
those massed attacks (mentioned above) to cease? I think that the form
of those attacks may offer a wee clue: garbage collection. Is there no
stronger argument against Lisp than GC? It appears not, judging by the
post-95 nature of attacks on Lisp.

Surely there're better arguments for using C/C++ than kicking some
other language? I appreciate good intentions, but I'm puzzled by the
frequency with which I find people on UseNet (and elsewhere) trying to
convince a group of people that they shouldn't be using the tools that
they are in fact using. Even if we do all need saving from ourselves,
could there not be a better way of doing it than the "hard sell"?

Richard's comments look to me like a reasonable response to a hard
sell. It's a popular one, and it goes like this, "Thanks for your
time, but I'm perfectly happy with what I'm currently using." Some
people go as far as hanging up the phone or closing the door, rather
than continue an argument that has no end.

Of course, I could be wrong about Richard's comments. I can't
speak for him! I'll get my coat.

boo

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

Richard A. O'Keefe wrote:
> I strongly suspect that you don't know what modern Fortran is like.

> There is precisely *one* construction present in C which is not
> supported as well or better by Fortran, and that is pointer arithmetic.

And you think this is a minor difference?

>
> Modules? Fortran has them, C hasn't.
> Overloading? Fortran has it, C hasn't.
> Records? Fortran and C both have them.
> Pointers? Fortran and C both have them.
> Dynamic allocation? Fortran and C both have them.
> Recursive procedures? Fortran and C both have them.
> Nested procedures? Fortran has one level of nesting, C has none.
> Strings? Fortran has them, C hasn't.
> Character-oriented IO? Fortran and C both have it.
> Record-oriented IO? Fortran has it, C hasn't.
> User defined operators? Fortran has them, C hasn't.
> Free-format source form? Fortran and C both have it.
> Bitwise operations? Fortran and C both have them.
> Control over record layout? Neither language supports this. (Yes, I know
> about bitfields, and I also know how extremely weak are the
> guarantees you get about them.)

Ever heard of C++?

Hahaha! Now I get it! This is a troll! Nobody could possibly
think this is a serious reply.

What I do with my C++/C/assembly combination, you cannot possibly
do with Fortran.... as quickly and easily as I do. I work in the
virtual reality field, with parallel algorithms, gigabyte databases
and realtime 3D OpenGL animation, and some serious number crunching
too. The most heavily used routines I code in C with a little
inline assembly (which is quick enough to do for any CPU once you
have a little practice). What do you do? Simple number crunching
of the style done in the 60's and 70's? Sure, Fortran will suffice!

> I think perhaps it is time instead that _you_ took your blinkers off.
> Fortran is not my favourite language by a long way, but it has come
> a *long* way in the last 20 years, and is by most criteria a more
> 'modern' language than C.

I happen to have used modern variants of Fortran, I'll have you
know. Fortran has its place, certainly. It is possible to produce
very efficient code with a high level language that is easy to
master. But outside the realm of brain-dead number crunching, it
cannot compare with efficient C++/C.

Martin Rodgers

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

Greg Weeks wheezed these wise words:

> |> Which egg orientation do you prefer?
>

> Big Endian.

Anyone unsure what we're talking about should read:
<URL:http://www.jaffebros.com/lee/gulliver/bk1/chap1-4.html>

Greg Weeks

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

In article <MPG.ea44c489...@news.demon.co.uk>, mcr@this_email_address_intentionally_left_crap_wildcard.demon.co.uk (Martin Rodgers) writes:
|> Serge Pachkovsky wheezed these wise words:
|>
|> > Who cares what it was designed for? Fortran is far more powerful than
|> > C, especially for the kind of work that some of us do in the real world.
|>
|> Which egg orientation do you prefer?

Big Endian.

Greg Weeks

Carlie Coats

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

In article <34392537...@tonys.ecology.umn.edu>,

boo <b...@tonys.ecology.umn.edu> wrote:
| Richard A. O'Keefe wrote:

This from an advocate of a language that makes arrays second class
citizens? C can't even do the equivalent of

SUBROUTINE S( M, N, P, A )
INTEGER, INTENT( IN ): M, N, P
REAL, INTENT( IN ): A( M, N, P )
...

nor can C++, for that matter. Fortran can. So can lisp.

_Whose_ ivory tower?

William Clodius

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

Richard A. O'Keefe wrote:
> <snip>

> >Who cares what it was designed for? C is far more powerful than
> >Fortran, especially for the kind of work that some of us
> >do in the real world.
>
> I strongly suspect that you don't know what modern Fortran is like.
> There is precisely *one* construction present in C which is not
> supported as well or better by Fortran, and that is pointer arithmetic.
> <snip>

F90 also lacks pointers to procedures. It lacks stream based I/O. (There
are contexts where non-advancing I/O is not sufficient.)

--

William B. Clodius Phone: (505)-665-9370
Los Alamos Nat. Lab., NIS-2 FAX: (505)-667-3815
PO Box 1663, MS-C323 Group office: (505)-667-5776
Los Alamos, NM 87545 Email: wclo...@lanl.gov

Shankar Unni

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

Robert Harley wrote:

> o...@goanna.cs.rmit.edu.au (Richard A. O'Keefe) writes:

> >There is precisely *one* construction present in C which is not
> >supported as well or better by Fortran, and that is pointer arithmetic.
>

> That sounds suspiciously like "There is precisely *one* thing present
> on car X which is not on car Y, and that is wheels".

Yo, Robert (and you, too, "b...@somethingorother.umn.edu"):

Richard and Nick have probably forgotten more C than you'll ever learn
in your lifetime, OK? Give it a rest; listen and learn. There's a few
lifetimes of actual working experience behind these opinions..

We see this sort of crazed C cheerleading about the start of every
semester. We've all programmed in C (and in C++). We are painfully aware
of almost every strength and weakness in both C and C++.

No one's slamming C as the worst of all languages. I'll still write an
OS, if I have to, in C or C++, but after breaking my ankles on all the
gopher holes, I'll be real careful where I put my foot the next time.

But if I'm doing numerical analysis or linear regressions, I'll pick
Fortran almost any day. No, I won't do it in C++, even if I can make a
"Matrix" class ("look, Mommy! I made a Matrix!"). If you don't
understand why, you need to take a *real* "Numerical Analysis with C++"
class.

--
Shankar Unni Powertel Global, Inc.
(650) 259-1700 sha...@powertelglobal.com
(650) 259-1702 (fax) sha...@webnexus.com

William Clodius

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

boo wrote:
>
> Richard A. O'Keefe wrote:
> > I strongly suspect that you don't know what modern Fortran is like.
> > There is precisely *one* construction present in C which is not
> > supported as well or better by Fortran, and that is pointer arithmetic.
>
> And you think this is a minor difference?
>

I suspect he considers it to be a major advantage, for Fortran.

> <snip>


>
> What I do with my C++/C/assembly combination, you cannot possibly
> do with Fortran.... as quickly and easily as I do. I work in the
> virtual reality field, with parallel algorithms, gigabyte databases
> and realtime 3D OpenGL animation, and some serious number crunching
> too. The most heavily used routines I code in C with a little
> inline assembly (which is quick enough to do for any CPU once you
> have a little practice). What do you do? Simple number crunching
> of the style done in the 60's and 70's? Sure, Fortran will suffice!

> <snip>

It is relatively easy to find out what Richard O'Keefe does, check his
news postings using Dejanews (http://www2.dejanews.com/), I suspect you
might find his postings to comp.std.c to be of particular interest, also
check his web page (http://www.cs.rmit.edu.au/%7Eok), note Richard
should remove the trailing semicolon from his signature. While Richard
has done some number crunching, his interests primarilly lies elsewhere:
teaching, algorithms (parallel and sequential), databases, logic
programming, functional programming, language design, and language
implementation.

If Richard comments on an aspect of a language pay attention! I have
read knowledgeable comments from him on Ada (83 & 95), Algol 68, APL, C
(K&R & ISO), C++, Clean, Fortran (77, 90, & HPF), Lisp, Mercury, Pascal
(Standard, Extended, & Borland), Perl, Prolog, S, Scheme, and Sisal.
Comments that indicate that he has both thoroughly read the
documentation and used implementations for all of the above languages.
Further, he has worked on compilers for at least two languages: Prolog
and a dialect of S. He has claimed to have experience with 100+
languages, and I believe him. If you think he has said something that is
incorrect look closely: you probably read too quickly, but it might be a
rare case of ambiguity or (obvious) opinion that is not labeled as such.

boo

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

On 7 Oct 1997 12:53:16 -0400, Carlie Coats (x...@hilbert.ncsc.org) said:
> > This from an advocate of a language that makes arrays second class
> > citizens? C can't even do the equivalent of
> >
> > SUBROUTINE S( M, N, P, A )
> > INTEGER, INTENT( IN ): M, N, P
> > REAL, INTENT( IN ): A( M, N, P )
> > ...
> >
> > nor can C++, for that matter. Fortran can. So can lisp.

Oh, so you think that high level language constructs are
something to boast about?

How about

#include <LifeTheUniverseAndEverything.h>

int main(void){
SolveMyProblem();
return 0;
}

Can yours do that? That header file I included contains the
prototypes for the functions I like the most, in my
own particular implementation of C. Is this high level
enough for you?

Paul added:
> Excuse the rest of us C/C++ biggots who don't know the F90 syntax, but
> what does the above syntax signify?

I suspect it says something about how the data ("arrays") should
be laid out in memory, and says something about how you are
going to access it, to help the compiler out. In other
words, let the compiler do the thinking, and let the
programmer concentrate on the abstract stuff. This kind
of thing is possible in a simple language like Fortran.

To the original poster: If I know something about the
architecture of the machine you are using (such as a T3E, for
example), I could write some nodal C code with inline
assembly or optimized C (with some non-standard
code, perhaps) that will thrash your high level Fortran code
any day. I may have to use my noggin a little to figure
out the best way to lay the problem out, but that is ok.
Of course, you most likely will get your code running
before I do.... and THAT precisely is where the power
of Fortran is. Of course, Fortran is also ideal for people
who don't quite "get all the pointer nonsense in C"
(probably the biggest stumbling block that sends people
scurrying back to Fortran, basic, pascal, etc.).

boo

Richard A. O'Keefe

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

boo <b...@tonys.ecology.umn.edu> writes:
>Oh puh-leeze.

>Who cares what it was designed for? C is far more powerful than
>Fortran, especially for the kind of work that some of us
>do in the real world.

I strongly suspect that you don't know what modern Fortran is like.


There is precisely *one* construction present in C which is not
supported as well or better by Fortran, and that is pointer arithmetic.

Modules? Fortran has them, C hasn't.


Overloading? Fortran has it, C hasn't.
Records? Fortran and C both have them.
Pointers? Fortran and C both have them.
Dynamic allocation? Fortran and C both have them.
Recursive procedures? Fortran and C both have them.
Nested procedures? Fortran has one level of nesting, C has none.
Strings? Fortran has them, C hasn't.
Character-oriented IO? Fortran and C both have it.
Record-oriented IO? Fortran has it, C hasn't.
User defined operators? Fortran has them, C hasn't.
Free-format source form? Fortran and C both have it.
Bitwise operations? Fortran and C both have them.
Control over record layout? Neither language supports this. (Yes, I know
about bitfields, and I also know how extremely weak are the
guarantees you get about them.)

>If Rob can show me how to do something

>with a particular compiler that speeds up what I need done,
>I am listening to him, not to you.

Why shouldn't you listen to both of us? I am, after all, telling the truth,
as you can easily verify for yourself.

>Time to dismount that Ivory Tower of yours.

I think perhaps it is time instead that _you_ took your blinkers off.


Fortran is not my favourite language by a long way, but it has come
a *long* way in the last 20 years, and is by most criteria a more
'modern' language than C.

--

Robert Harley

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

o...@goanna.cs.rmit.edu.au (Richard A. O'Keefe) writes:
>There is precisely *one* construction present in C which is not
>supported as well or better by Fortran, and that is pointer arithmetic.

That sounds suspiciously like "There is precisely *one* thing present


on car X which is not on car Y, and that is wheels".


>[... FUD deleted...]

Hey, if some people are under the delusion that C sucks but
Fortran/Lisp/etc are great, that's just fine. But please resist the
urge to barge into every thread and yammer on about it.

-- R.

Serge Pachkovsky

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

boo (b...@tonys.ecology.umn.edu) wrote:
: Oh puh-leeze.

: Who cares what it was designed for? C is far more powerful than
: Fortran, especially for the kind of work that some of us
: do in the real world.

Oh puh-leeze.

Who cares what it was designed for? Fortran is far more powerful than
C, especially for the kind of work that some of us do in the real world.

Cheese,

/Serge.P

--

Russian guy from the Zurich university...


Paul Hsieh

unread,
Oct 7, 1997, 3:00:00 AM10/7/97
to

On 7 Oct 1997 12:53:16 -0400, Carlie Coats (x...@hilbert.ncsc.org) said:
> This from an advocate of a language that makes arrays second class
> citizens? C can't even do the equivalent of
>
> SUBROUTINE S( M, N, P, A )
> INTEGER, INTENT( IN ): M, N, P
> REAL, INTENT( IN ): A( M, N, P )
> ...
>
> nor can C++, for that matter. Fortran can. So can lisp.

Excuse the rest of us C/C++ biggots who don't know the F90 syntax, but

what does the above syntax signify?

--
Paul Hsieh
http://www.geocities.com/SiliconValley/9498/mailme.html

Zalman Stern

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

Richard A. O'Keefe (o...@goanna.cs.rmit.edu.au) wrote:
: Aliasing is very confusing for people to deal with.

: Of course, as soon as you have objects as in Smalltalk or Common Lisp or
: Java, you have precisely the same kind of problem when the same object
: reference can be passed (the _reference_ is passed by copy-in) to a
: procedure.

See for example item 17 in Scott Meyers' _Effective_C++_ titled "Check for
assignment to self in operator=." The item explains that the statement:
a = a;
in C++ may not be a "nop" for an improperly implemented assignment
operator. (I wonder if the C++ draft standard allows the compiler to punt
on calling the assignment operator in that case. I doubt it...)

-Z-

Richard A. O'Keefe

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

William Clodius <wclo...@lanl.gov> writes:

>F90 also lacks pointers to procedures.

True. As someone who is peeved with the oddly restricted function
pointers in Ada95, I *should* have remembered this one. Mind you,
function pointers in C are seriously restricted too.

>It lacks stream based I/O.

I am not quite certain what you mean by this.
If you mean the kind of thing that is found in C,
then I have a printed copy of the specification for stream-based IO in
Fortran somewhere in this office.

>(There
>are contexts where non-advancing I/O is not sufficient.)

Agreed.

Remember though, that the context of this thread was
- hardware and optimisation
- limited or not by common programming languages?

An important thing to understand is that if you are seriously concerned
about high performance and reliability, a language may be as useful for
what it _doesn't_ allow as for what it does.

Gregg E Economou

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

In article <61c9kl$94l$1...@goanna.cs.rmit.edu.au>,

Richard A. O'Keefe <o...@goanna.cs.rmit.edu.au> wrote:
>
>I strongly suspect that you don't know what modern Fortran is like.
>There is precisely *one* construction present in C which is not
>supported as well or better by Fortran, and that is pointer arithmetic.

I strongly suspect you dont know what C is like. Go read K&R.

>Modules? Fortran has them, C hasn't.

C is very simple, and designed such that you can implement things of
a higher nature with it if you wish. You can write code that
behaves in a particular style in C.

>Overloading? Fortran has it, C hasn't.

one legitimate arguemtn in a cast of many pointless ones. touche.
(need an accent-grave there)

>Records? Fortran and C both have them.
>Pointers? Fortran and C both have them.
>Dynamic allocation? Fortran and C both have them.

You're confusing issues of the library with the language.
That migth be fine in an HLL like Fortran but in C the library
is not part of the language, even if it is standardized and commonly
accepted.

>Recursive procedures? Fortran and C both have them.
>Nested procedures? Fortran has one level of nesting, C has none.
>Strings? Fortran has them, C hasn't.

But C is lower-level. You can *implement* those things inside a C progam
if you like, but dont have to take on the extra weight if you dont.
Much more flexible.

>Character-oriented IO? Fortran and C both have it.
>Record-oriented IO? Fortran has it, C hasn't.

C has NO I/O. All I/O in the C libraries is implemented using the
basics of the langugae itself in manipulating memory directly. There is
no I/O native to the language. There is almost nothing beyond the
basics of memory handling, logic, and conditionals in the language itself.
The fact that C is used for such a wide variety of things is testimony to
the incredible flexibility it has by NOT being bound to things liek I/O.

>User defined operators? Fortran has them, C hasn't.
>Free-format source form? Fortran and C both have it.
>Bitwise operations? Fortran and C both have them.
>Control over record layout? Neither language supports this. (Yes, I know
> about bitfields, and I also know how extremely weak are the
> guarantees you get about them.)

see above.


>
>Why shouldn't you listen to both of us? I am, after all, telling the truth,
>as you can easily verify for yourself.

only a truth that is based on misdirection. You are obviously mistaken as
to what C is and what the C libraries are, and are mixing the two fo them up.

>I think perhaps it is time instead that _you_ took your blinkers off.
>Fortran is not my favourite language by a long way, but it has come
>a *long* way in the last 20 years, and is by most criteria a more
>'modern' language than C.

a point to be thought of- Fortan is a high level language, C is not.
The utility of C is that it is very universally portable and applicable-
because the laguage itself assumes only the presence of memory, a processing
capability to perform logical and arithmetic operations, and the ability
to perform conditional executions. The manipulation of memory is the
implementation of almost all higher-level fucntionality in the C libraries.
The beauty fo this is that it ends up that C programs very closely resemble
the actual activity fo the machine and low-level routines with a high-level
structure and syntax, and lead to a much simpler and more portabel coding than
an assembly lends, but is still very close down there to the hardware.

However, it is not ncessarily the best thing for _every_ job. For example,
renaming a list of files would be much simpler to accomplish with a short shell
script, and would be much more logically done that way than with a C program
designed explicitly for such a purpose.

anwyays.. lets NOT start another language war, eh?

isildur

David Chase

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

Gregg E Economou wrote:
> ... The manipulation of memory is the
> implementation of almost all higher-level functionality in the C libraries.
> The beauty of this is that it ends up that C programs very closely resemble

a rat's nest of memory and functional unit latency.

> anwyays.. lets NOT start another language war, eh?

Why should the last word be given to someone who so clearly doesn't
know what he is talking about? Or were you just trolling?

--
David Chase, ch...@world.std.com

Richard A. O'Keefe

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

gee...@pitt.edu (Gregg E Economou) writes:
>I strongly suspect you dont know what C is like. Go read K&R.

I have been using C since the days when UNIX V6+ was the latest thing.
I've fixed bugs in two C compilers.
I've bought and read three copies of the C standard (they keep disappearing
from my office; I'm getting sick of it),
and have obtained and read every publically available document about C9x.
(I was writing a proposal to add offset types to C, but didn't get it
finished in time to submit it to the committee. Sigh. It'd sure make
putting linked structures in shared memory easier.)

>>Modules? Fortran has them, C hasn't.

>C is very simple, and designed such that you can implement things of
>a higher nature with it if you wish. You can write code that
>behaves in a particular style in C.

If you think C is simple, you probably haven't seen the insides of a
C compiler. And you are going to have a nasty shock when C9X comes out.

>You're confusing issues of the library with the language.

>That migth be fine in an HLL like Fortran but in C the library
>is not part of the language, even if it is standardized and commonly
>accepted.

Clearly you have never participated in any of the threads about this
in comp.std.c. The C `library' *is* an official part of the language;
malloc() is as standard as &; sqrt() is as standard as +. (Technically,
there are two kinds of

>But C is lower-level. You can *implement* those things inside a C progam
>if you like, but dont have to take on the extra weight if you dont.
>Much more flexible.

>C has NO I/O.

Wrong. stdio is a fully standard part of C. getc() is as standard
as x = *p++.

>The fact that C is used for such a wide variety of things is testimony to
>the incredible flexibility it has by NOT being bound to things liek I/O.

If you want to talk about C being useful for 8051s you are right.
If you are talking about C systems running on Macs, PCs, UNIX boxes,
MVS, CMS, VMS, BeOS, AmigaDOS, then you are talking through your hat.

>only a truth that is based on misdirection. You are obviously mistaken as
>to what C is and what the C libraries are, and are mixing the two fo them up.

Obviously? I think perhaps you have not read the standard with due care
and attention.

>a point to be thought of- Fortan is a high level language, C is not.
>The utility of C is that it is very universally portable and applicable-
>because the laguage itself assumes only the presence of memory, a processing
>capability to perform logical and arithmetic operations, and the ability
>to perform conditional executions.

Oh dear. There are so many misconceptions here that it is hard to know
where to start.

Take the B6700 mainframes (now the A series).
*Lovely* machines. Fortran runs *beautifully* on them, as do
COBOL and Pascal. There _is_ a C compiler for them now, but
the implementors had to fight past C's utterly inappropriate
assumptions about memory. Heck, C wasn't even a good fit to the PC,
thanks to its extremely strong assumptions about memory.

C in practice does not merely assume the PRESENCE of memory,
it demands a FLAT, LINEAR, BYTE-ADDRESSED memory.

There have been plenty of machines with segmented (PC) or tree-structured
(B6700) memories. There have been plenty of machines with word-addressed
memories. The first MIPS designs didn't have byte loads and stores.
The first Alphas didn't have byte loads or stores. I don't remember
which models of the AMD29k if any have byte loads and stores, but the
oldest ones didn't.

While the C _standard_ doesn't say that NULL pointers are all bits zero,
there are enough C programs out there that rely on this that one manufacturer
(Prime) had to add special instructions just for C to their computers.
Those same computers had supported Fortran, COBOL, Pascal, and even PL/I
just fine.

Indeed, I once ran into trouble porting a C program to a machine which
was just your ordinary average M68k UNIX box _except_ that the user's
half of the address space was the _negative_ half, and large chunks of
the code relied on user addresses looking positive when you cast them
to long. "Very universally portable?" I think not.

I'm trying to bring this thread back into relevance to comp.arch, which
is where I am reading it.

The basic point I want to make is that for *most* (not all) tasks, if you
want efficiency, you want a *high* level language, not an old low level
language. Why? Because hardware has changed a lot in the last 20 years
too. On a good modern machine, load/store can be a HUNDRED TIMES SLOWER
than an add (UltraSPARC II: 4-way superscalar, it's fairly easy to fill
all the slots in vector code with a bit of unrolling, load from main
memory is 30 cycles, 30*4 = 120).

There are plenty of tasks where a low level language is just right,
and C is a welcome step up from assembler. (Programming an 8051 is an
area that forcibly suggests itself.) But writing a program that is a
good fit to _this_ year's machine is a good way to end up with a program
that is relatively slow on _next_ year's machine.

Compiler writers are doing truly _awesome_ things these days. I for one
would rather not get in their way.

--
John Æneas Byron O'Keefe; 1921/02/04-1997/09/27; TLG,TLTA,BBTNOTL.

Richard A. O'Keefe; RMIT Comp.Sci; http://www.cs.rmit.edu.au/%7Eok

Robert Harley

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

Shankar Unni <sha...@powertelglobal.com> writes:
>Yo, Robert (and you, too, "b...@somethingorother.umn.edu"):
>
>Richard and Nick have probably forgotten more C than you'll ever learn
>in your lifetime, OK? Give it a rest; listen and learn. There's a few
>lifetimes of actual working experience behind these opinions..

Yo Shankar, their lifetimes of experience hacking primitive Fortrans and
Lisps in 500BC apparently don't help when talking about ANSI C and C9X.

Why exactly should we "listen and learn" to people who screw up on
'const' and admit to not knowing what 'restrict' is, when we're
shaving cycles off critical numerical code with both of them here and now?
Or who claim that:

>Nested procedures? Fortran has one level of nesting, C has none.

when GCC, for instance, has arbitrarily deep nesting with Pascal-style
lexical scoping? Oh no! It's not strictly standard-conforming!
La-di-frickin-da. It's works just fine on Alpha, ARM, Intel, Sparc
and experimental backroom chips nobody ever heard of, in my
ever-so-humble experience. (However I find it totally useless).

I believe that Richard does have a clue, however he also has a very
obvious bias. As Mark Twain said:

"First get the facts straight,
then you are free to distort them at your leisure."


>We see this sort of crazed C cheerleading about the start of every

>semester. We've all programmed in C (and in C++). [...]

Jeez, lose the patronising bullshit.

-- Rob.

Carlie Coats

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

In article <343AF933...@tonys.ecology.umn.edu>,
boo <b...@tonys.ecology.umn.edu> blathers:

> On 7 Oct 1997 12:53:16 -0400, Carlie Coats (x...@hilbert.ncsc.org) said:
> > > This from an advocate of a language that makes arrays second class
> > > citizens? C can't even do the equivalent of
> > >
> > > SUBROUTINE S( M, N, P, A )
> > > INTEGER, INTENT( IN ): M, N, P
> > > REAL, INTENT( IN ): A( M, N, P )
> > > ...
> > >
> > > nor can C++, for that matter. Fortran can. So can lisp.
>
> Oh, so you think that high level language constructs are
> something to boast about?

"High level constructs"? Since when is being able to do decent
things with (multi-dimensional) arrays high level? Maybe instead
I should have said

<RANT>
When I'm programming in C or C++, why the *_hell_* can't
I do easy things like

void foo( const int m,
const int n,
const int p,
float a[p][n][m] ) {...}

This idiocy about all subscripting except the outermost
being required to be compile-time constant is really and
truly _brain-dead_!
</RANT>

Would that satisfy you?


Robert Harley

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

x...@hilbert.ncsc.org (Carlie Coats) writes:
>When I'm programming in C or C++, why the *_hell_* can't
>I do easy things like
>
> void foo( const int m,
> const int n,
> const int p,
> float a[p][n][m] ) {...}

I think your problem is that C is too flexible for you. It doesn't
wrongly assume that there is one way of laying out and accessing
arrays that is appropriate for all purposes. For instance you can
choose a flat array, 'float *a', or a hierarchy like 'float ***a'.

Suppose you choose the flat array, Fortran style.

You can access a_{i,j,k} as a[(i*n+j)*m+k] or several other linear
combinations depending on whether you like row-wise or column-wise
access or whatever. Your hand isn't forced like in Fortran.

If you're doing simple constant-stride access, decent compilers will
strength-reduce the multiplications just like decent Fortran compilers
(or the pre-mungers that they often require to get top performance).

However a non-trivial access pattern will use a lot of slow
multiplications, just like in Fortran. In that case you have the
option of strength-reducing by hand if you know better than the compiler.

Also, this often forces you to waste a lot of space, doubling the size
of triangular matrices with useless zeroes, increasing the size of
simplicial 3D arrays sixfold etc., just like in Fortran.

Furthermore to swap rows, you have to swap all the elements, which
takes forever just like in Fortran.


Suppose that you choose a hierarchy instead.

Then non-trivial access patterns require pointer accesses, however
these are typically fast since the number of pointers is very small
compared to the size of the data. For instance on this Alpha with a
100x100x100 array the top level fits in L1 cache easily, the second
level in L2, whereas the data is twice as big as L3.

Also you can allocate exactly as much memory as you want or need.

Furthermore you can swap rows (or columns) with a single pointer swap.


In some cases the former is best, in others the latter. Of course, C
also lets you choose intermediate layouts such as an array 'float **a'
of pointers to rectangular matrices, if that is best for your
application. Perhaps you'd like to tag those sub-matrices with their
dimensions:

typedef struct { int width, height; float *data; } rectMatWithSizes;
rectMatWithSizes funky[100];

and set it up whatever way you like. In other languages you might be
forced to use training wheels, or you might be lucky enough to be
permitted to hack up some imitation of the C style. I'd just use C.

Bye,
Rob.
.-. Robert...@inria.fr .-.
/ \ .-. .-. / \
/ \ / \ .-. _ .-. / \ / \
/ \ / \ / \ / \ / \ / \ / \
/ \ / \ / `-' `-' \ / \ / \
\ / `-' `-' \ /
`-' Linux + 500MHz Alpha + 256MB SDRAM = heaven `-'

William Clodius

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

Richard A. O'Keefe wrote:
>
> William Clodius <wclo...@lanl.gov> writes:
> <snip>

> >It lacks stream based I/O.
>
> I am not quite certain what you mean by this.
> If you mean the kind of thing that is found in C,
> then I have a printed copy of the specification for stream-based IO in
> Fortran somewhere in this office.
> <snip>

This is low on the list of priorities for F200x, but still on the list
of priorities. In principle Fortran I/O remains record based. For files
that do not have a fixed record size, (or for which backspace does not
make sense, i.e., standar I/O to a terminal) the execution of any given
read or write statement cannot assume that it will not be followed by a
backspace command. As a result the file must retain some construct that
indicates the size of the record. For non-record based systems, e.g.,
UNIX this is (invariably) implemented by having an integer at the start
and end of the record indicating the size of the record. For writing a
file to be read by other systems this is a minor problem, but for
reading files produced by other processors, e.g., C, the absence of the
starting integer in the file causes no ends of problems. It is also a
problem for interfacing to hardware, e.g., instumentation, although an
appropriate Fortran specified interface that does not rely on READs or
WRITEs can be an acceptable, if sometimes non-intuitive, substitute.
Most vendors I believe offer this as an extension.

William Clodius

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

Jan Vorbrueggen wrote:
>
> William Clodius <wclo...@lanl.gov> writes:
>
> > F90 also lacks pointers to procedures.
>
> Is that true? Then it isn't a superset of F77 anymore. Or is the way you can
> pass the address of a function or subroutine in F77 not a pointer to a
> procedure?
> <snip>

F90 has this capability. This is typically implemented as a pointer to a
procedure and can be used to emulate some of the capabilities that you
want from pointers to procedures, but it is not an explicit pointer to a
procedure. In priciple in F90 terms you want something like

INTERFACE
FUNCTION A(x)
REAL :: A, x
END FUNCTION A
FUNCTION B(x)
REAL :: A, x
END FUNCTION B
END INTERFACE
INTERFACE(A), POINTER ::C ! define a pointer that accepts real arguments
! and returns real results

IF ... THEN
C => A
ELSE
C => B ! All subsequent invocations of C actually invoke B
ENDIF
...
y = C(z) ! Invokes A or B

This is equivalent to C pointers to procedures, and is roughly what is
planned for F200x. In the most general case, you want procedures to
return pointers to procedures as values, i.e., a form of closures. This
is not currently planned.

William Clodius

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

One other thing that F90 lacks, that C and F95 provide, is the ability
to define a NULL pointer. This, of course, was perhaps the major problem
with F90.

James C. Stutts

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

In article <34392537...@tonys.ecology.umn.edu>, boo <b...@tonys.ecology.umn.edu> writes:

<snip>

>Oh puh-leeze.
>
>Who cares what it was designed for? C is far more powerful than
>Fortran, especially for the kind of work that some of us

>do in the real world. If Rob can show me how to do something

>with a particular compiler that speeds up what I need done,
>I am listening to him, not to you.

It's the little touches in Fortran that set it apart from C for
numerical work. For instance: math.h doesn't give you
the ability to raise an int to an arbitrary power and return
an int. Pow() only returns floats. Minor things like that.
Actually having an operator (like ** in Fortran) for this
would also be nicer, IMHO, than pow().

Regards,
----------------------------------------------------------------------
James C. Stutts stu...@eng.auburn.edu

Adaptive Aerostructures Lab, Department of Aerospace Engineering
211 Aerospace Engineering W: (334)-844-6813
Auburn University, AL 36849 Lab: (334)-844-6832
All opinions expressed are my own.
----------------------------------------------------------------------

Michael Meissner

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

har...@pauillac.inria.fr (Robert Harley) writes:

> when GCC, for instance, has arbitrarily deep nesting with Pascal-style
> lexical scoping? Oh no! It's not strictly standard-conforming!
> La-di-frickin-da. It's works just fine on Alpha, ARM, Intel, Sparc
> and experimental backroom chips nobody ever heard of, in my
> ever-so-humble experience. (However I find it totally useless).

Nested functions do have problems on some chips (and under some OSes for other
chips) when you take the address of the nested function and the compiler has to
synthesize a trampoline on the stack.

Of course, building executable hunks at runtime comes in use in other
situations, such as just-in-time Java compilers, and writing optimial sort
programs that build the comparison function from a generalized input.

--
Michael Meissner, Cygnus Solutions (East Coast)
4th floor, 955 Massachusetts Avenue, Cambridge, MA 02139, USA
meis...@cygnus.com, 617-354-5416 (office), 617-354-7161 (fax)

William Clodius

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

Mark Gibson wrote:
> <snip>
>
> Wow, I haven't used FORTRAN much in at least ten years. (F77 was new
> when I last did any real work with FORTRAN.) All this fuss over
> which is better, C or FORTRAN, has made me curious. Is there a low-priced,
> high quality implementation of the most modern version of FORTRAN?

Depends on your definition of low priced/high quality. For most people
the answer is no, for some with deep pockets or a restricted range of
emphasis for quality, the answer is yes.

A full F90 highly optimizing implementation is almost invariably
relatively high cost: $300 for a stuent Wintel, $600 professional
Wintel, about twice that for UNIX. With full toolsets, i.e., debugers,
editors, integrated environment, special libraries, another factor of
two or more in price can be introduced.

There are low optimizing compilers, I believe Salford is one, in the
$100 range. There are F90 subset compilers (i.e., they are not supersets
of F77) from Lahey (ELF) and Imagine1 (F), available for free or the
$100 range. The subset compilers are primarilly intended for the
educational market, and will often not be appropriate for professionals.

> Does
> Gnu produce such a thing?

No although G77 does implement some of F90 (e.g. SELECT CASE) it does
not implement modules, pointers, array arithmetic, etc.

>
> Also, what native support for large-integer arithmetic do the latest
> versions of FORTRAN provide? Will they allow me to crunch 128-bit
> integers efficiently without writing my own arithmetic routines?

Processor dependent, but relatively easy to use if available. I believe
some vendors for 64 bit machines offer this capability, several of them
offer 128 bit REALs.
> <snip>

David Chase

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

Robert Harley wrote:

> I think your problem is that C is too flexible for you. It doesn't
> wrongly assume that there is one way of laying out and accessing
> arrays that is appropriate for all purposes. For instance you can
> choose a flat array, 'float *a', or a hierarchy like 'float ***a'.

C's flexibility has two problems:

1. it is mandatory.
2. it is confined to a set of uninteresting choices.

In C, can I choose to have the compiler autoparallelize my dusty
deck onto a 16-processors SparcServer 2000? In C, can I choose to
have my loops automatically interchanged for better cache use?
How about cache blocking? Any automatic vectorizers for C?

Oh, I guess I don't get those choices.

Given detailed knowledge of a particular chip, working in C,
my attempt at a too-big-for-the-cache double-precision matrix
multiply obtained only 30 Mflops (this was on a 50 or 55 Mhz
SuperSparc, back in 1994). The code in Fortran was about
1/10th the size, and ran at 34 Mflops. Both bits of code
contained some hand-optimization, but the C code had much
more (the Fortran was simply blocked for cache locality;
the C code was blocked for both register and cache locality,
as well as software pipelined).

Keep in mind that most C programmers don't know how to
software-pipeline a loop. Most of them are not sure how
to do register blocking. Most of them are not exactly
sure how many double-precision registers are on the chip
they are using. Most of them don't know the size of the
cache on the chip (chip? how about box, since ecache sizes
can vary). I'll bet most of them don't know what cache
blocking is. And, if they do, I'll bet they forget to check
to see if the ecache is physically-indexed direct-mapped
or not, and whether the OS is taking care to "color" virtual
pages so that non-colliding virtual addresses will also be
non-colliding in the external cache.

This stuff is awful to do by hand.

And this, of course, was just matrix multiplication, it
wasn't something tricky to optimize for locality, like
GEPP or QR (for QR, so I am told, you must use a
different algorithm to get your factorization if you
wish to block it for locality). This code is really
tedious to write.

So, this flexibility gets me what? The privilege of writing 10
times as much code, that runs 10% slower, and is harder to port,
understand and maintain?

It is mind-boggling to me how persistently ignorant some people
working in C are. Someone told you once that C is the fastest
language, and it grabbed your brain. Maybe it feels like it
just must be fast, because you work so hard to make it fast.
Maybe you think that all high-level transformations alike,
that since (e.g.) SETL generally runs pretty slowly, that
all languages which are not "close to the machine" must
therefore run slowly. It is not so.

--
David Chase, ch...@world.std.com

Robert Harley

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

ch...@world.std.com writes:
>C's flexibility has two problems: [...]

>Given detailed knowledge of a particular chip, working in C,
>my attempt at a too-big-for-the-cache double-precision matrix
>multiply obtained only 30 Mflops (this was on a 50 or 55 Mhz
>SuperSparc, back in 1994). The code in Fortran was about
>1/10th the size, and ran at 34 Mflops. [...]

So you're saying that 3 years ago, you were crap at C programming.

I can believe that.

By the way, did you have a point?


>It is mind-boggling to me how persistently ignorant some people
>working in C are. Someone told you once that C is the fastest

>language, and it grabbed your brain. [...]

Yeah that's it. Somebody told me and I'm really dumb so it "grabbed
my brain". Aren't ignorant people like me lucky to have
mind-bogglingly condescending asses like yourself to bray on about the
error of our ways?

-- Rob.
PS: Food for the flames: the only language worth using, apart from C, is ML!
PPS: Isn't Usenet fun? =:-)

Stanley Chow

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

In article <343BA4...@lanl.gov>,

William Clodius <wclo...@lanl.gov> wrote:
>
>F90 has this capability. This is typically implemented as a pointer to a
>procedure and can be used to emulate some of the capabilities that you
>want from pointers to procedures, but it is not an explicit pointer to a
>procedure. [...]

>IF ... THEN
> C => A
>ELSE
> C => B ! All subsequent invocations of C actually invoke B
>ENDIF
>...
>y = C(z) ! Invokes A or B

I would call this "Procedure Variables" as opposed to "Pointers to
Procedures". I consider procedure variables to be fairly safe (as
these things go) compared to pointers to procedures.

The usual need is for proc vars, and it is a vast overkill (with
dangerous consequences) to force users to implement proc vars via
pointers.

>This is equivalent to C pointers to procedures, and is roughly what is
>planned for F200x. In the most general case, you want procedures to
>return pointers to procedures as values, i.e., a form of closures. This
>is not currently planned.

Not all closures are good.

--
Stanley Chow; sc...@nortel.ca, (613) 763-2831
Nortel/BNR, PO Box 3511 Station C, Ottawa, Ontario
Me? Represent other people? Don't make them laugh so hard.

William Clodius

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

Robert Harley wrote:
> <snip>

>
> I think your problem is that C is too flexible for you. It doesn't
> wrongly assume that there is one way of laying out and accessing
> arrays that is appropriate for all purposes.

There are two points here, how arrays are laid out, and how they are
accessed. The compiler/machine determines how they are laid out (given
additional restrictions implicit in the code or language), the
programmer determines how they are accessed. Fortran has fewer
assumptions about how the machine lays out memory than C does. It allows
users the same flexibility in access that C does, at least for teh
examples that you give.



> For instance you can
> choose a flat array, 'float *a', or a hierarchy like 'float ***a'.
>

> Suppose you choose the flat array, Fortran style.
>
> You can access a_{i,j,k} as a[(i*n+j)*m+k] or several other linear
> combinations depending on whether you like row-wise or column-wise
> access or whatever. Your hand isn't forced like in Fortran.

Fortran doesn't force this, it simply povides a less error prone, more
easilly understood and optimized construct for doing the equivalent.

>
> If you're doing simple constant-stride access, decent compilers will
> strength-reduce the multiplications just like decent Fortran compilers
> (or the pre-mungers that they often require to get top performance).

As KAI about pre-mungers for C/C++.

>
> However a non-trivial access pattern will use a lot of slow
> multiplications, just like in Fortran. In that case you have the
> option of strength-reducing by hand if you know better than the compiler.

Same in Fortran.


>
> Also, this often forces you to waste a lot of space, doubling the size
> of triangular matrices with useless zeroes, increasing the size of
> simplicial 3D arrays sixfold etc., just like in Fortran.

Fortran does not require the initialization of unused space.


>
> Furthermore to swap rows, you have to swap all the elements, which
> takes forever just like in Fortran.
>
> Suppose that you choose a hierarchy instead.
>
> Then non-trivial access patterns require pointer accesses, however
> these are typically fast since the number of pointers is very small
> compared to the size of the data. For instance on this Alpha with a
> 100x100x100 array the top level fits in L1 cache easily, the second
> level in L2, whereas the data is twice as big as L3.

Same in Fortran


>
> Also you can allocate exactly as much memory as you want or need.

Same in Fortran


>
> Furthermore you can swap rows (or columns) with a single pointer swap.

Try Fortran array operations, not as good as APL but still... or Fortran
POINTER's

>
> In some cases the former is best, in others the latter. Of course, C
> also lets you choose intermediate layouts such as an array 'float **a'
> of pointers to rectangular matrices, if that is best for your
> application. Perhaps you'd like to tag those sub-matrices with their
> dimensions:
>
> typedef struct { int width, height; float *data; } rectMatWithSizes;
> rectMatWithSizes funky[100];

Different syntax using derived types in Fortran
TYPE rectMatWithSizes
INTEGER :: width, height
REAL, POINTER :: data(:)
END TYPE

TYPE(rectMatWithSizes) :: funky
ALLOCATE(funky%data(100))

>
> and set it up whatever way you like. In other languages you might be
> forced to use training wheels, or you might be lucky enough to be
> permitted to hack up some imitation of the C style. I'd just use C.

> <snip>

I'll take a language that lets you say more things more clearly.

boo

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

William Clodius wrote:
>
> Robert Harley wrote:
> >
> > ch...@world.std.com writes:
> > <snip>

> >
> > So you're saying that 3 years ago, you were crap at C programming.
> >
> > I can believe that.
> >
> > By the way, did you have a point?
> >
> > <snip>
>
> So you are saying that David Chase, (who has spent 10+ years writing
> C/C++ professionally, principally as a writer of compilers and compiler
> tools, who has done compiler work at Rice (Fortran under Ken Kennedy),
> Sun (Fortran?), Olivetti (Modula-3), and Centerline (C++), and is, with
> Boehm, one of the primary authors of the Boehm garbage collector for
> C++) writes lousy C code because he can't get within 10% of the
> performance of Fortran. There are several possibilities here, not all
> mutually exclusive

If someone cannot get within 10% of the performance of Fortran, I
would say he is a crap programmer. If this fellow is as qualified as
you say, then I can only assume that you have your facts wrong.

> Los Alamos Nat. Lab.

Ah, it all makes sense now.

Your comment about the performance of Fortran is quite funny.
About 5 years ago, I wrote a large simulation in CM Fortran for
the CM5 (and you had one of those at lanl, so I think you follow
what I am saying...), and was quite impressed with the performance.
I then wrote the crucial routines in C with CDPEAC (macros
which expand to vector unit assembly codes), and produced something
that thrashed the THEORETICAL PEAK EFFICIENCY POSSIBLE of CM Fortran at
that time, according to their manuals.

bri...@bit3.com

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

In article <61g30t$o...@news-rocq.inria.fr>,

Robert Harley <har...@pauillac.inria.fr> wrote:
>
>I think your problem is that C is too flexible for you. It doesn't
>wrongly assume that there is one way of laying out and accessing
>arrays that is appropriate for all purposes. For instance you can

>choose a flat array, 'float *a', or a hierarchy like 'float ***a'.

Speaking as someone who is writting a math library in C (in his free time,
but I'm actively grappling with these exact issues), also as someone
who hardly knows Fortran, this _SUCKS_. You are forced to either:
1) Only accept predefined sized matricies- generally not an option
2) Every time you access the array either type in the whole address
equation or use a macro (hideous and it tends to make the code hard to
read).
3) In addition to allocating and managing the matrix, also allocate and
manage arrays of pointers into the matrix (with resulting overhead and
additional complexity). This also kills any locality of reference
you're attempting to maintain. This thread started with the hw engineers
complaining that C does too much pointer chasing already.
4) Depend upon non-portable compiler-specific constructs.

The problem I have found with C's "flexibility" is that most of the time
it amounts to "Hey, not only can you reinvent the wheel, you have too!
But if, for some braindamaged reason, you want to reinvent the wheel as
a _square_ instead of the generally accepted circle, it lets you do that
as well!"

There are situations where you need to play pointer games (my day job is
writing device drivers). But it's hard to call being forced to play
pointer games an "advantage".

I don't speak for Bit 3.
Brian

--
"NT and security should never be used in the same breath."
- Winn Schwartau, EETimes #972, 22 September 1997 pp.96

-Brian

boo

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

Shankar Unni wrote:

> But if I'm doing numerical analysis or linear regressions, I'll pick
> Fortran almost any day.

Hahahah! linear regressions!! I can do those on my watch calculator!!
Hey, that is some high level stuff you are doing there, thank God
you have Fortran to help you out! I agree, with advanced (heh) stuff
like that, you need to have a specialized numerical language.

boo

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

bri...@bit3.com wrote:
>
> In article <61g30t$o...@news-rocq.inria.fr>,
> Robert Harley <har...@pauillac.inria.fr> wrote:
> >
> >I think your problem is that C is too flexible for you. It doesn't
> >wrongly assume that there is one way of laying out and accessing
> >arrays that is appropriate for all purposes. For instance you can
> >choose a flat array, 'float *a', or a hierarchy like 'float ***a'.
>
> Speaking as someone who is writting a math library in C (in his free time,
> but I'm actively grappling with these exact issues), also as someone
> who hardly knows Fortran, this _SUCKS_.

The first time you play with multi-dimensional "arrays" in C, it is
difficult and frustrating. These are much easier in Fortran for the
uninitiated, and Fortran surely is the language of choice for many
scientists and engineers. Once you have understood exactly how to implement
multiple dimensional "arrays" in C efficiently and have gained
some experience, it doesn't "suck" anymore, and you will understand
exactly how flexible C actually is. But it isn't for amateurs,
or for the Ivory Tower Set of this thread.

boo

unread,
Oct 8, 1997, 3:00:00 AM10/8/97
to

David Chase wrote:

> boo wrote:
>
> > About 5 years ago, I wrote a large simulation in CM Fortran for
> > the CM5 (and you had one of those at lanl, so I think you follow
> > what I am saying...), and was quite impressed with the performance.
> > I then wrote the crucial routines in C with CDPEAC (macros
> > which expand to vector unit assembly codes), and produced something
> > that thrashed the THEORETICAL PEAK EFFICIENCY POSSIBLE of CM Fortran at
> > that time, according to their manuals.
>
> CDPEAC is defined in what section of the C standard?

It isn't... but the code I wrote still looked like C,
with a huge macro library that I mastered quickly
enough. The CDPEAC macro package expanded into SPARC
asm blocks with specific memory accesses that
controlled the four VUs. Yes, I know this is all
dead now, but so what... It really is not a big
deal to learn a new architecture and new optimization
techniques, and to adapt all the time. It used to be
that Fortran was the only way to achieve performance
on the high end vector and MPP supercomputers, but
thankfully that is not the case anymore. I felt I
had more flexibility with C than the Fortran, and
I know that is my personal choice. I still believe
that (with the help of expert C programmers like
Robert Harley, whose routines are being used in
the fastest lattice QCD code running on the T3E)
I can beat Fortran every time. Fortran-like code is
usually mandatory on seriously custom MPP harware,
unfortunately.

> You
> do know that they're not making those vector units anymore,
> don't you, and that they're one of the reasons (but not the
> only reason) why TMC is no longer in the hardware business?
> Custom hardware sucks.
> [more CM5 stuff deleted]

Yes, I know all about that, and agree with you. However, it is
not a big deal for me to redo the crucial parts for whatever
architecture I am working on, in fact I enjoy it. I do
appreciate that many people have better things to
do with their time. It doesn't take long for me to read
several manuals, and attack the problem using the
optimal method. As I said before, I occasionally chose
Fortran in the past.... back to the example of the extinct
CM5, I learnt C* and CMF, and mastered both. C* (which is
a parallel version of C, quite beautiful in fact)
produced terribly inefficient binaries compared with
the CMF compiler. Thank you, Sweet Jesus, for the fact that
it is possible to NOT use Fortran these days! If a situation
should arise in which Fortran really is the better choice,
I would hesitate and then go for it. I don't think it
is going to happen, though.

> (otherwise, you were not using the machine efficiently, I
> don't want to think about what weird accounting, padding,
> and coding tricks this may have induced over time.)

I did things that were simply illegal in CDPEAC, but they
worked, they screamed, and they produced identical results
to my CMFortran equivalent. Except faster, of course, and
that is the bottom line for me. If someone else was going
to pick up the code, I would say I failed dismally!

David Chase

unread,
Oct 9, 1997, 3:00:00 AM10/9/97
to

> William Clodius wrote:

> > So you are saying that David Chase, [resume omitted] writes lousy C code


> > because he can't get within 10% of the performance of Fortran. There
> > are several possibilities here, not all
> > mutually exclusive

You have to be careful there. First, I try not to argue from
authority. Second, I know people with PhDs that I would not
necessarily trust to write C; skill at writing C does not
necessarily correlate with anything. I do think it was pretty
good code that I wrote, at least in terms of performance.

Also, I am emphatically NOT one of the primary authors
of the Boehm-Weiser collector. Early user, yes, enthusiastic
fan, yes, author, no. Hans and I worked on the optimizer-
versus-your-garbage-collector problem in the distant past,
trying to come up with some way to trick a C compiler into
certainly not destroying the properties of the generated
code that were needed by the collector, while still allowing
it to generate decent code.

The point, which apparently has not yet sunk in, is that
indeed, C gets you very close to the machine, and on
modern machines, if you want to get best performance,
this means you need to know all sorts of silly stuff,
and hand-apply atrocious transformations that will not
necessarily enhance performance on the next machine. The
tricks that were taught 5 years ago no longer apply, and the
tricks that will work best 5 years from now will probably
differ from the best tricks today. The guys at Tera think
they have a solution to this problem, and I hope they are
right (and I still want to know why they decided to build
the damn thing about of GaAs, as if they weren't taking on
enough risk already.)

boo wrote:

> About 5 years ago, I wrote a large simulation in CM Fortran for
> the CM5 (and you had one of those at lanl, so I think you follow
> what I am saying...), and was quite impressed with the performance.
> I then wrote the crucial routines in C with CDPEAC (macros
> which expand to vector unit assembly codes), and produced something
> that thrashed the THEORETICAL PEAK EFFICIENCY POSSIBLE of CM Fortran at
> that time, according to their manuals.

CDPEAC is defined in what section of the C standard? You


do know that they're not making those vector units anymore,
don't you, and that they're one of the reasons (but not the
only reason) why TMC is no longer in the hardware business?
Custom hardware sucks.

(Why do you think we were trying to see how fast
stock hardware could be used to multiply matrices,
given the best possible code we knew how to
generate for a typical numerical problem?)

Note to the non-CM users: a CM-5 contained a collection
of Sparc chips speaking to memory through a pair of
memory controllers, where each memory controlled
was "enhanced" with (sort of) a pair of vector processors.
Thus, you get 1 Sparc + 4 vector units. Each VU
executes pipelined memory-to-memory operations, with
a throughput of one per tick. The clock rate of the whole
blob is typically lower than the current Sparc workstation
clock rate; I don't know if this is the limit of the
VU performance, or if it was done to make the entire machine
reliable enough to use. The Cypress chip blobs were clocked
at 25Mhz, the SuperSparc blobs were clocked at 40 Mhz.
(I think this is correct; there were different terms
for the blobs depending upon whether you were counting
Sparc chips, memory controllers, or apparent vector units).

CMF at least, you can translate pretty easily into HPF, and
get it compiled onto an SP-2 or whatever else you choose.
If I remember correctly, the current versions of the PowerPC
ought to give the about the same performance as one of those
Sparc+VU combinations. 40 Mhz * 4 multiply adds/tick gives
you 320 Mflops speed-of-light, versus one multiply-add per tick
on a 300+ Mhz PPC, assuming you can keep the damn thing fed
operands, which is the entire point of the pipelining and
blocking nonsense. The experiments we ran indicated that
we were unlikely to get more than 2/3 of the speed-of-light
on general-purpose hardware, and this was wildly optimistic.
Compilers and Fortran "optimizing preprocessors" available
at the time did a pretty poor job. On the other hand, peak
performance from the CM was also pretty rare; I think that 1/6th
peak was considered to be "good enough" at one national lab


(otherwise, you were not using the machine efficiently, I
don't want to think about what weird accounting, padding,
and coding tricks this may have induced over time.)

--
David Chase, ch...@world.std.com

David Chase

unread,
Oct 9, 1997, 3:00:00 AM10/9/97
to

Robert Harley wrote:
> ... Aren't ignorant people like me lucky to have

> mind-bogglingly condescending asses like yourself
> to bray on about the error of our ways?

Hey, someone's got to do it. How exactly do you want this
stuff explained? How polite am I supposed to be to someone
who says (incorrectly)

> I think your problem is that C is too flexible for you. It doesn't
> wrongly assume that there is one way of laying out and accessing
> arrays that is appropriate for all purposes.

This is not condescending?

C's major performance fault is that as defined, and
especially as used, a compiler does not have
a great deal of freedom to move loads and stores around.
C programmers may be very careful to avoid "unnecessary"
loads and stores, but nowadays, latency is the big problem;
performance is reduced if you do your loads "too late", or
if a conditional branch depends on a result that you computed
"too late". A "too late" load is typically a bigger performance
hit than a "not necessary" load. Fortran has many fewer
problems with this, because the aliasing rules make life
much easier for compilers. The contiguously-allocated
multi-dimensional arrays help with this very much, too,
because it is easier to infer that A(i,w,x) addresses
different storage than A(i+1,y,z) than it is to infer
that

pa+i*n*p+w*p+x != pa+(i+1)*n*p+y*p+z

Modern hardware goes to some trouble to reduce these costs
in the local sense, by predicting branches and starting
loads early, but this only gets you so much. Parallelization
can get you superlinear speedup (as in, when I worked at
Sun, a happy customer observed superlinear speedup on one
of their dusty decks on a 16-processor machine) on a very
good day. Has anyone anywhere ever taken a dusty C deck,
dropped it on a multiprocessor, asked the compiler to
parallelize it, and obtained even a linear speedup?

Sun's Fortran compiler, back in 1994, would take care of loop
interchange for better cache use. The dependence analysis
finds many "vector" operations, and these are specially
code-generated as software pipelines to get the best
possible performance. You can get this sometimes for C
(it helps if the loop contains no stores to globally-accessible
variables) but you'll get it more often for Fortran.
Fortran performance is also pulling ahead on the newer
processors -- friends (former colleagues) told me that
the then-new version of the compilers improved Fortran
performance by 160% over the older compilers, but C and
C++, not nearly so much (I do not recall the figure).
The superoptimizer (converts control flow to very tricky
data flow) is looking better and better on these new
machines, but the combinatorial costs of searching ever-longer
instruction sequences limits usefulness (and the conditional
instructions added to many new processors also expands the
search space, though they also increase the chances of
hitting with a short sequence).

A decent combination of cache blocking, register blocking,
and software pipelining can get you a 20x speedup over
unoptimized code (if I recall, the large-matrix multiply
unoptimized ran at something like 1.5 Mflops). C compilers,
lacking the restrict keyword, cannot do that (everyone who
knows what the restrict keyword is and what it does, raise
your hand.) You can do this in C -- I did this in C,
apparently ineptly, since a Fortran program beat me -- but
it is incredibly tedious work, and the peak performance
is not portable (a conservative coding will improve
performance by quite a lot on most machines).
Cache blocking parameters depend on the cache size,
associativity, and line size.
Register blocking parameters depend on how many
registers you have, and registers used for blocking
are stolen from registers used for pipelining.
Pipelining depends on the latency of the loads and CPU
operations, and on where the operands are expected to come
from (dcache? ecache? main memory? contiguous addresses that
can be prefetched or fetched in a burst?)

And, again, this is slow and tedious work; I normally write
something like 100 lines of debugged code (C, Java, x86 assembly
language) per day over long periods of time (I can count the
lines of code that I have written in the last nine months),
but writing this stuff, my rate falls down to something
under 10 lines per day. Someone writing similar code in
i860 assembly language at Dana/Ardent/Stardent quoted me
a figure of one line of code per day, for a general case
version of the code (this one someone working on pipelined
implementations of vectorized transcendental functions).

If you were designing a new programming language ( to get back
to the original thread ) it would be a good idea to figure out
either how to make it easy to do alias analysis, or what "rules"
should come wired into the language that are both easy for
programmers to understand and obey, and useful for the compiler
to exploit. Fortran 77's rules are well outside the bounds of
what I'd call elegant, but they work, and the optimization benefits
are substantial.

Another possibility is to make it as easy as possible
to discover that a program will terminate -- though this is
general-case intractable, there's a large class of interesting
programs for which this can be determined (for instance, all
loops are "for" loops, or all recursion is governed by a
strictly decreasing parameter). Or, for instance, there is
some datatype that captures the notion "directed acyclic graph",
so that you know that any path traversal will terminate.

I'd also look into some way to make it more tractable to write
programs which use dataflow instead of control flow. C* went
quite some distance in that direction, and it was a relatively
easy way to get decent performance out of a Connection Machine,
but people always seemed to want more, and I think they tended
to get it with CMF(ortran) or explicit message-passing. (And,
as "boo" writes, some people programmed the vector units in their
assembly language. Yuck.)

--
David Chase, ch...@world.std.com

Patrick Chase

unread,
Oct 9, 1997, 3:00:00 AM10/9/97
to

In article <343C0C92...@tonys.ecology.umn.edu>, boo <b...@tonys.ecology.umn.edu> writes:
|> If someone cannot get within 10% of the performance of Fortran, I
|> would say he is a crap programmer.

Yes, someone who thoroughly understands the target architecture should be
able to beat (or at least get close to) what a FORTRAN compiler's optimizer
can do, if they are programming in a lower-level language like C (or, if
you want to take this argument to its logical conclusion, assembly!).

IMO, that argument misses the real point: Programmer-hours are expensive.
IMO the real "crap programmer" is the one who chooses to spend lots of time
hand-optimizing their code, when a different design or tool choice earlier
in the process (including but not limited to language selection) would have
given adequate (if not exactly equivalent) performance, saved them the
trouble, and allowed them to work on the next project that much earlier.

I _strongly_ prefer the C programming language to Fortran, but for
programming straightforward (no fancy I/O and rudimentary UI) quick and
dirty numerical apps I often use Fortran. The reason is simple: I know
that doing so will give me the most performance for a minimum level of
of effort invested.

From your references to programming Los Alamos' CM-5, I gather that you
come from a peformance-at-any-cost background. That's fine and well, but
you need to keep in mind that not everybody shares your perspective.

Regards,

Patrick

Richard A. O'Keefe

unread,
Oct 9, 1997, 3:00:00 AM10/9/97
to

har...@pauillac.inria.fr (Robert Harley) writes:

>Shankar Unni <sha...@powertelglobal.com> writes:
>>Yo, Robert (and you, too, "b...@somethingorother.umn.edu"):
>>
>>Richard and Nick have probably forgotten more C than you'll ever learn
>>in your lifetime, OK? Give it a rest; listen and learn. There's a few
>>lifetimes of actual working experience behind these opinions..

>Yo Shankar, their lifetimes of experience hacking primitive Fortrans and
>Lisps in 500BC apparently don't help when talking about ANSI C and C9X.

>Why exactly should we "listen and learn" to people who screw up on
>'const' and admit to not knowing what 'restrict' is, when we're
>shaving cycles off critical numerical code with both of them here and now?

Get your facts straight, buster. I have _not_ screwed up on 'const',
and I've known exactly what 'restrict' is for a couiple of years. I
even use every day a C compiler (SPARCompiler C) that supports it.

>Or who claim that:

>>Nested procedures? Fortran has one level of nesting, C has none.

>when GCC, for instance, has arbitrarily deep nesting with Pascal-style
>lexical scoping?

Hang on, you just said 'talking about ANSI C and C9X'.
There are no nested procedures in ANSI C.
There will be no nested procedures in C9X.

There *are* nested procedures in GNU C, which is a fine language,
but it is not the language I was talking about, and it is not the
language *you* claim to be talking about.

What's more, precisely *because* I care about performance, I have
mostly stopped using GCC. On a Sun UltraSPARC, Sun's C compiler
can, not surprisingly, run rings around GCC. *Because* I care
about performance, I *know* I will have to be prepared to switch
compilers at a benchmark's notice, which means I care a *lot* about
what is standard and what is not. (If gcc knew how to deal with
*.il files, I would be better pleased.)

>I believe that Richard does have a clue, however he also has a very
>obvious bias. As Mark Twain said:

> "First get the facts straight,
> then you are free to distort them at your leisure."

So what exactly do you think this bias is?

Ok, let me tell you. My real bias is that languages like Sisal and Nesl
and Clean and Mercury are _wonderful_. (This list includes only languages
known to have matched or bettered C on benchmarks.) I have also been
raving about Tcl/Tk to anyone who would listen ever since I learned it
about 6 weeks ago. I am *SICK AND TIRED* of other people's programs
crashing and trashing because the authors thought pointer arithmetic was
cool. Read
ftp://grilled.cs.wisc.edu/technical_papers/fuzz-revisited.ps.Z
some time.

If you can point to any fact about standard C (C89), draft standard C
(C9X), draft standard C++ (December 96 draft), or standard Fortran
that I have distorted, point it out.

Torben AEgidius Mogensen

unread,
Oct 9, 1997, 3:00:00 AM10/9/97
to

Bernd Paysan <pay...@no.informatik.spam.tu-muenchen.thanks.de> writes:

>If you compare language evolution
>with biologic evolution, we are just in the midth of the carbonic
>explosion of forms. Most languages known today won't survive on the long
>run, but there will be one small and unregarded with an early form of a
>backbone, and this one will take over later.

And the year 2000 problem is the giant meteor that will kill off the
dinosaurs? ;-)

Actually, I think the metaphorical meteor will be something else. As
has been noted earlier in this thread, tehre is a limit to how much
high-level languages can give you: You still have to tell the computer
exactly what you want done, and there is no way this description can
be arbitrarily shortened.

However, a lot of programming time today is taken with tracking down
bugs. Languages that will make this process easier/faster is bound to
win in the long run. One way is to remove sources of common bugs by
restricting the language. Common bugs in C are caused by the use of
integers to represent booleans. This makes mistakes like "if (x=y)"
pass the compiler, when a better type system would detect that an
integer was used in place of a boolean. This is only a simple example,
which b.t.w. many C complers will detect and give a warning (not an
error) about. Other things that a good language with a good comipler
can outlaw or catch are non-exhaustive tests, undefined variables,
out-of-bounds access and many more. We are probably only in the
beginning of an evolution towards languages/compilers that catch
inconsistent behaviours.

The trend today seems to try to reduce the programming effort by
providing HUGE libraries and hope that these will cover the majority
of what programs need. This can make some programs be quite short, but
if what you want isn't in the library, then you are out of luck.
Furthermore, this trend makes it take a long time to use a language
well, as most of the effort is taken by learning the libraries. Even
experienced programmers spend a major fraction of the programming time
reading manuals or searching in libraries. I think this trend will die
under its own weight. Libraries are not bad in themselves, but they
shouldn't dominate.

Torben Mogensen (tor...@diku.dk)

It is loading more messages.
0 new messages