Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Different Implementations of VLIW .

17 views
Skip to first unread message

Ripunjay Tripathi

unread,
Apr 8, 2008, 2:45:01 PM4/8/08
to
Different Implementations of the same VLIW architecture may not be
binary-compatible with each other.

I am looking for explaination on the above line.

Regards,
Ripunjay Tripathi

Mr Nick Maclaren is requested to NOT TO BOTHER reading this and do
some other useful job, if in hand.

Terje Mathisen

unread,
Apr 8, 2008, 3:21:10 PM4/8/08
to

In that case we'll have to answer on his behalf:

Have you ever considered the canonical way to gain knowledge, that is to
study?

You know, where you read the available literature, then sit down and think?

Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Del Cecchi

unread,
Apr 8, 2008, 7:25:44 PM4/8/08
to

"Ripunjay Tripathi" <ripunjay...@gmail.com> wrote in message
news:6d5c7197-880a-4b00...@n1g2000prb.googlegroups.com...

On behalf of Nick....
Bite me.

My answer...

Let me count the ways....

The binary encodings could be different
The handling of interrupts could be different.

Hint.... think of all the x86 implementations. Are they binary
compatible? Really? Without switches?

del


Ripunjay Tripathi

unread,
Apr 8, 2008, 10:39:31 PM4/8/08
to
On Apr 9, 12:21 am, Terje Mathisen <terje.mathi...@hda.hydro.com>
wrote:

> Ripunjay Tripathi wrote:
> > Different Implementations of the same VLIW architecture may not be
> > binary-compatible with each other.
>
> > I am looking for explaination on the above line.
>
> > Regards,
> > Ripunjay Tripathi
>
> > Mr Nick Maclaren is requested to NOT TO BOTHER reading this and do
> > some other useful job, if in hand.
>
> In that case we'll have to answer on his behalf:
>
> Have you ever considered the canonical way to gain knowledge, that is to
> study?
>
> You know, where you read the available literature, then sit down and think?
>
> Terje
> --
> - <Terje.Mathi...@hda.hydro.com>

> "almost all programming can be viewed as an exercise in caching"

Any single line of yours didnt helped me. Do you think, that I am
giving any exams and asking you guys for answers of the question
paper ???
What you said "to study and think is very correct" but then you are in
prejudice.

Importance of single line from a guru (that is mix of theory and
experience) can give me a good direction, else plz tell me if I should
leave this group

Regards,
Ripunjay Tripathi

Ripunjay Tripathi

unread,
Apr 8, 2008, 10:42:15 PM4/8/08
to
On Apr 9, 4:25 am, "Del Cecchi" <delcecchioftheno...@gmail.com> wrote:
> "Ripunjay Tripathi" <ripunjay.tripa...@gmail.com> wrote in message

Thanx Del

Greg Lindahl

unread,
Apr 9, 2008, 1:06:31 AM4/9/08
to
In article <810bb078-2719-48ae...@u12g2000prd.googlegroups.com>,
Ripunjay Tripathi <ripunjay...@gmail.com> wrote:

> Do you think, that I am
> giving any exams and asking you guys for answers of the question
> paper ???

It could be. But Terje screwed up, he's supposed to give you a funny
yet fake answer.

Seriously, have you looked at VLIW instruction encoding? Did you
try googling something like "vliw implementation dependent"? The
answer is on the first page...

-- greg

Quadibloc

unread,
Apr 9, 2008, 2:28:12 PM4/9/08
to
On Apr 8, 12:45 pm, Ripunjay Tripathi <ripunjay.tripa...@gmail.com>
wrote:

> Different Implementations of the same VLIW architecture may not be
> binary-compatible with each other.
>
> I am looking for explaination on the above line.

I guess they mean something different by "the same architecture" than
people usually do when talking about computers. So, to know what that
line meant, one would have to know the context - in the book where you
read that sentence, what meaning did they give to an architecture?

Was it a specific kind of machine, like "an IBM 360" or "a 386-
compatible microprocessor", or was it more general, for example, "a
parallel machine", "NUMA", "decoupled microarchitecture", and so on.
Different implementations of the Von Neumann "architecture" certainly
exist that are not binary-compatible.

So I can't give you an answer, since I'm not looking at the book you
saw that line in.

John Savard

Thomas Lindgren

unread,
Apr 9, 2008, 4:48:18 PM4/9/08
to
lin...@pbm.com (Greg Lindahl) writes:

I've always thought lecturers should ask questions that are answered
only on the second page of the google results, so that students learn
the vital skills necessary in today's fast paced work environment.

Here's something for Ripunjay to ponder for his non-exam
question. VLIW. VL ... IW. Hmmm.

Best,
Thomas
--
Thomas Lindgren

Wilco Dijkstra

unread,
Apr 9, 2008, 4:52:21 PM4/9/08
to

"Ripunjay Tripathi" <ripunjay...@gmail.com> wrote in message
news:810bb078-2719-48ae...@u12g2000prd.googlegroups.com...

On Apr 9, 12:21 am, Terje Mathisen <terje.mathi...@hda.hydro.com>
wrote:
> Ripunjay Tripathi wrote:
> > Different Implementations of the same VLIW architecture may not be
> > binary-compatible with each other.
>
> > I am looking for explaination on the above line.

Read this http://en.wikipedia.org/wiki/VLIW for a basic explanation.

Wilco


Nick Maclaren

unread,
Apr 9, 2008, 5:17:00 PM4/9/08
to

In article <90aLj.47758$kN5....@newsfe1-gui.ntli.net>,

Of which the first sentence is so misleading as to be effectively
false. I can't be bothered to read further.


Regards,
Nick Maclaren.

Joe Pfeiffer

unread,
Apr 9, 2008, 7:03:45 PM4/9/08
to
nm...@cus.cam.ac.uk (Nick Maclaren) writes:

Only when taken completely out of context.

Erik Trulsson

unread,
Apr 9, 2008, 7:15:36 PM4/9/08
to
Ripunjay Tripathi <ripunjay...@gmail.com> wrote:
> Different Implementations of the same VLIW architecture may not be
> binary-compatible with each other.
>
> I am looking for explaination on the above line.
>


There is nothing special about VLIW in that regard (depending slightly on
exactly which definition of 'architecture' and 'binary-compatible' you
are using.)

(Exercise 1: Define exectly what you mean by 'architecture')

(Exercise 2: Write a small program which will behave differently depending
on if it is running on an Intel 8086 or an 8088. Do these processors belong to
the same architecture according to your definition? )


--
<Insert your favourite quote here.>
Erik Trulsson
ertr...@student.uu.se

Nick Maclaren

unread,
Apr 10, 2008, 3:39:42 AM4/10/08
to

In article <1b8wzmi...@babs.wb.pfeifferfamily.net>,

Not at all, but you probably haven't been in this game long enough to
realise why I said that. To start an article by putting the cart
before the horse and rewriting history isn't good, even in Wikipedia,
though it is regrettably common.


Regards,
Nick Maclaren.

Ripunjay Tripathi

unread,
Apr 10, 2008, 6:00:50 AM4/10/08
to
On Apr 10, 4:15 am, Erik Trulsson <ertr1...@student.uu.se> wrote:
> ertr1...@student.uu.se

- Architecture scopes all those concepts, that are exposed to
programmer.
- I have not yet wrote such a program, but due to difference in
hardware (In 8086 Carry flag, Parity flag, Auxiliary carry flag, Zero
flag, Overflow flag, Trace flag, Interrupt flag, Direction flag, and
Sign flag) I think there can be certain programs that fail to run in
8088.

Ripunjay Tripathi

unread,
Apr 10, 2008, 6:19:04 AM4/10/08
to
On Apr 10, 2:17 am, n...@cus.cam.ac.uk (Nick Maclaren) wrote:
> In article <90aLj.47758$kN5.39...@newsfe1-gui.ntli.net>,"Wilco Dijkstra" <Wilco_dot_Dijks...@ntlworld.com> writes:
>
> |> "Ripunjay Tripathi" <ripunjay.tripa...@gmail.com> wrote in message

> |>news:810bb078-2719-48ae...@u12g2000prd.googlegroups.com...
> |> On Apr 9, 12:21 am, Terje Mathisen <terje.mathi...@hda.hydro.com>|> wrote:
>
> |> > Ripunjay Tripathi wrote:
> |> > > Different Implementations of the same VLIW architecture may not be
> |> > > binary-compatible with each other.
> |> >
> |> > > I am looking for explaination on the above line.
> |>
> |> Read thishttp://en.wikipedia.org/wiki/VLIWfor a basic explanation.

>
> Of which the first sentence is so misleading as to be effectively
> false.  I can't be bothered to read further.
>
> Regards,
> Nick Maclaren.

I read the first paragraph. I think author should have added about
VLIW's scope (i.e compile time and random ILP) . If you think
otherwise can you please tell me that ?? A constructive talk with
gurus is always helpful to us.

Regards,
Ripunjay Tripathi

Nick Maclaren

unread,
Apr 10, 2008, 6:26:40 AM4/10/08
to

In article <0e62eb13-3e48-43c3...@x19g2000prg.googlegroups.com>,

Ripunjay Tripathi <ripunjay...@gmail.com> writes:
|>
|> I read the first paragraph. I think author should have added about
|> VLIW's scope (i.e compile time and random ILP) . If you think
|> otherwise can you please tell me that ?? A constructive talk with
|> gurus is always helpful to us.

The point is that VLIW originated as a way of making direct use of
separate hardware units, and was quite commonly used for microcode
(e.g. on the IBM System/370 range). It was 'rediscovered' in the
1990s as a way of making use of the 'available' ILP in programs
written in 3GL languages (like Fortran or C).

Wikipedia has got that arsy-versy.


Regards,
Nick Maclaren.

Ripunjay Tripathi

unread,
Apr 10, 2008, 6:38:23 AM4/10/08
to
On Apr 10, 3:26 pm, n...@cus.cam.ac.uk (Nick Maclaren) wrote:

> In article <0e62eb13-3e48-43c3-acf6-f40298097...@x19g2000prg.googlegroups.com>,Ripunjay Tripathi <ripunjay.tripa...@gmail.com> writes:
>
> |>
> |> I read the first paragraph. I think author should have added about
> |> VLIW's scope (i.e compile time and random ILP) . If you think
> |> otherwise can you please tell me that ?? A constructive talk with
> |> gurus is always helpful to us.
>
> The point is that VLIW originated as a way of making direct use of
> separate hardware units, and was quite commonly used for microcode
> (e.g. on the IBM System/370 range).  It was 'rediscovered' in the
> 1990s as a way of making use of the 'available' ILP in programs
> written in 3GL languages (like Fortran or C).
>
> Wikipedia has got that arsy-versy.
>
> Regards,
> Nick Maclaren.

Thanx Nick. The first line I may NOT found in any of the text books.
These are the things my foolish questions are searching for. Thanx
Again.

Regards,
Ripunjay Tripathi

Anne & Lynn Wheeler

unread,
Apr 10, 2008, 7:29:57 AM4/10/08
to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> The point is that VLIW originated as a way of making direct use of
> separate hardware units, and was quite commonly used for microcode
> (e.g. on the IBM System/370 range). It was 'rediscovered' in the
> 1990s as a way of making use of the 'available' ILP in programs
> written in 3GL languages (like Fortran or C).

among others, 370/168 & 3033 were horizontal m'code (i.e. kind ov vliw).

and as mentioned here
http://www.garlic.com/~lynn/2008g.html#56 performance of hardware dynamic scheduling

... one of the itanium architects had worked on 370 as well as 801/risc
http://web.archive.org/web/20000816002838/http://www.hpl.hp.com/features/bill_worley_interview.html

another old itanium article
http://web.archive.org/web/20051031092309/http://www.hpl.hp.com/news/2001/apr-jun/itanium.html

the "itanium" architect is also credited with dual-address space mode
for 3033.

the mainstream batch 360 operating system had a heavily ingrained
pointer passing paradigm. in moving it to a (370) virtual memory
environment, each application was given a 16mbyte virtual address
space. However, first half of each virtual address space (8mbytes)
contained an image of the kernel.

Also, there were many application services that had been done by
"sub-systems" outside the kernel ... which also used pointer passing
paradigm. In the morph to MVS virtual memory, each of these sub-systems
resided in their separate virtual address space. In order to maintain
the pointer-passing paradigm, a "common segment" (initially one mbyte)
was created in each virtual address space ... where applications could
squirrel away paramemters, invoke a kernel call to transfer to a
specific subsystem ... passing a parameter pointer. The size of the
common segment was actually proportional to the subsystems ... which for
larger installations was hitting 4-5mbytes (and growing) ... only
leaving 3-4mbytes for actual application use.

in the mad rush to get stuff back into 370 product pipeline ...
after demise of future system project
http://www.garlic.com/~lynn/subtopic.html#futuresys

and also recently mentioned
http://www.garlic.com/~lynn/2008g.html#53 performance of hardware dynamic scheduling
http://www.garlic.com/~lynn/2008g.html#54 performance of hardware dynamic scheduling

the 370-xa effort was kicked off ... which was going to take eight
years. however, as a stop-gap there was the 303x effort. they took a
370/158 and slightly repackaged it as a 3031 and a 370/168 repackaged as
a 3032. The 3033 started out as 168 wiring diagram mapped to faster (&
denser chip technology). Initially just using the same circuits per
chip, the 3033 would have been 20percent faster than 168. Some last
minute design to take advantage of denser circuits/chip in critical
areas pushed 3033 to 50percent faster.

however, the larger machine would have also met that customers would
have run a larger variety of applications and subsystems on the 3033,
resulting in the common segment increasing to 8mbytes ... and leaving
zero bytes left for applications.

370-xa architecture had a generalized mechanism (access registers) to
handle semi-privileged subsystem instruction access to multiple virtual
address spaces (eliminating the need for using common segment for
stuffing in parameters in support of pointer passing paradigm) as well
as new hardware feature that allows direct calls&returns transferring
between virtual address spaces (bypassing requirement for call into
kernel for processing virtual address space switch). This was
independent of 370-xa being 31-bit virtual addressing ... change from
24-bit virtual addressing.

However, 370-xa was still several years away ... so a dual-address space
mode Q&D subset was rolled out for 3033 (attempting to mitigate common
segment growing to 8mbytes and leaving nothing in each 16mbyte virtual
address space for applications)

HT-Lab

unread,
Apr 10, 2008, 7:46:47 AM4/10/08
to

"Erik Trulsson" <ertr...@student.uu.se> wrote in message
news:47fd...@news.midgard.homeip.net...

> Ripunjay Tripathi <ripunjay...@gmail.com> wrote:
>> Different Implementations of the same VLIW architecture may not be
>> binary-compatible with each other.
>>
>> I am looking for explaination on the above line.
>>
>
>
> There is nothing special about VLIW in that regard (depending slightly on
> exactly which definition of 'architecture' and 'binary-compatible' you
> are using.)
>
> (Exercise 1: Define exectly what you mean by 'architecture')
>
> (Exercise 2: Write a small program which will behave differently
> depending
> on if it is running on an Intel 8086 or an 8088.

can this be done, and if so how?

Thanks,
Hans
www.ht-lab.com

Terje Mathisen

unread,
Apr 10, 2008, 11:44:55 AM4/10/08
to

Now you are at least looking like you're trying, good for you!

However, guessing that the difference between 8088 and 8086 could be in
any of the documented registers/flags is totally off the mark.

You have to look for tiny implementation differences between those two
chips, which after all were built around the exact same core.

Terje

--
- <Terje.M...@hda.hydro.com>

Quadibloc

unread,
Apr 10, 2008, 12:34:58 PM4/10/08
to
On Apr 9, 3:17 pm, n...@cus.cam.ac.uk (Nick Maclaren) wrote:
> In article <90aLj.47758$kN5.39...@newsfe1-gui.ntli.net>,"Wilco Dijkstra" <Wilco_dot_Dijks...@ntlworld.com> writes:
>
> |> "Ripunjay Tripathi" <ripunjay.tripa...@gmail.com> wrote in message

> |>news:810bb078-2719-48ae...@u12g2000prd.googlegroups.com...
> |> On Apr 9, 12:21 am, Terje Mathisen <terje.mathi...@hda.hydro.com>|> wrote:
>
> |> > Ripunjay Tripathi wrote:
> |> > > Different Implementations of the same VLIW architecture may not be
> |> > > binary-compatible with each other.
> |> >
> |> > > I am looking for explaination on the above line.
> |>
> |> Read thishttp://en.wikipedia.org/wiki/VLIWfor a basic explanation.

>
> Of which the first sentence is so misleading as to be effectively
> false.  I can't be bothered to read further.

VLIW does take advantage of instruction-level parallelism, just not
between different instructions the way decoupled microarchitecture
does; it forces you to code it by hand.

Or maybe you mean something else.

John Savard

Quadibloc

unread,
Apr 10, 2008, 12:39:25 PM4/10/08
to
On Apr 10, 4:26 am, n...@cus.cam.ac.uk (Nick Maclaren) wrote:

> The point is that VLIW originated as a way of making direct use of
> separate hardware units, and was quite commonly used for microcode
> (e.g. on the IBM System/370 range).  It was 'rediscovered' in the
> 1990s as a way of making use of the 'available' ILP in programs
> written in 3GL languages (like Fortran or C).
>
> Wikipedia has got that arsy-versy.

The term VLIW wasn't coined to name horizontal microcode, it was
coined to name things like the Control Data Cyberplus. The two were
still very similar in effect, though, because neither is produced on
the fly, so neither can take advantage of ILP between threads the way
decoupled microarchitecture can.

John Savard

Nick Maclaren

unread,
Apr 10, 2008, 12:56:06 PM4/10/08
to

In article <1192aadf-4742-4c5c...@k10g2000prm.googlegroups.com>,

Yes. But to say that horizontal microcode and the control processors
that used what-was-later-to-be-called-VLIW instruction sets were not
VLIW because the term hadn't been coined by then is the the sort of
terminological politics with which I will have no truck. If the term
VLIW means anything useful in computer architecture, those were VLIW.

Wikipedia starts "... VLIW refers to a CPU architecture designed to
take advantage of instruction level parallelism (ILP)." As I said
earlier, that puts the cart before the horse and rewrites history.


Regards,
Nick Maclaren.

Joe Pfeiffer

unread,
Apr 10, 2008, 1:59:03 PM4/10/08
to
nm...@cus.cam.ac.uk (Nick Maclaren) writes:

> In article <1b8wzmi...@babs.wb.pfeifferfamily.net>,
> Joe Pfeiffer <pfei...@cs.nmsu.edu> writes:
> |> nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> |> > In article <90aLj.47758$kN5....@newsfe1-gui.ntli.net>,
> |> > "Wilco Dijkstra" <Wilco_dot...@ntlworld.com> writes:
> |> >
> |> > |> Read this http://en.wikipedia.org/wiki/VLIW for a basic explanation.
> |> >
> |> > Of which the first sentence is so misleading as to be effectively
> |> > false. I can't be bothered to read further.
> |>
> |> Only when taken completely out of context.
>
> Not at all, but you probably haven't been in this game long enough to
> realise why I said that. To start an article by putting the cart
> before the horse and rewriting history isn't good, even in Wikipedia,
> though it is regrettably common.

I certainly remember horizontal microcode. You might be interested in
re-reading Fisher's VLIW paper, which the wikipedia article you find
so misleading contains a link to, and which does a very good job
explaining the similarities and differences between VLIW and
horizontal microcode, and how his goal was specifically to improve ILP.

Message has been deleted

Nick Maclaren

unread,
Apr 10, 2008, 3:12:09 PM4/10/08
to

In article <1b7if5v...@babs.wb.pfeifferfamily.net>,

I wasn't referring just to horizontal microcode, but to the control
systems that used VLIW instruction sets.

I haven't read Fisher's paper, mainly because the abstract set off
my, er, aromatic hyperbole alarms so loudly. That was because I
remembered the research into VLIW as a target in the 1960s or 1970s,
and why those people decided it was a non-starter. In particular,
his 'expectation' of a 10- to 30-fold improvement by the use of
what sounded like a variation of an old technique would imply that
you could use the same methodology to get a 3- to 10-fold improvement
on more conventional hardware.

I may be maligning him, but EPIC assuredly hasn't delivered on what
his abstract said he expected, by a factor of 10-30, so I suspect my
gut feeling was not too far off.


Regards,
Nick Maclaren.

Quadibloc

unread,
Apr 10, 2008, 5:20:29 PM4/10/08
to
On Apr 10, 10:56 am, n...@cus.cam.ac.uk (Nick Maclaren) wrote:
> But to say that horizontal microcode and the control processors
> that used what-was-later-to-be-called-VLIW  instruction sets were not
> VLIW because the term hadn't been coined by then is the the sort of
> terminological politics with which I will have no truck.  If the term
> VLIW means anything useful in computer architecture, those were VLIW.

I am sympathetic with that point of view, since obviously language is
meant to be a useful instrument of communication.

But if the things called VLIW really were architecturally different in
significant ways from their predecessors, then it would make sense to
draw the distinction. And if that which was called VLIW was introduced
for a specific reason, noting that fact is valid from a historical
perspective.

Disentangling the historical perspective from the engineering
perspective can be challenging at times, I will admit.

John Savard

Ripunjay Tripathi

unread,
Apr 11, 2008, 4:46:58 AM4/11/08
to


VLIW has fixed length instructions- tightly coupled to hardware
resources - In case the hardaware resources exceed in number as asked
in the instruction, even if the basic VLIW architecture is same the
program will not work.
Am I right ??

Even I dont think this number of resources (FUs) are open to
programmer. So the architecture may be same but pragram will not work.


Regards,
Ripunjay Tripathi

Message has been deleted

Quadibloc

unread,
Apr 11, 2008, 3:40:20 PM4/11/08
to
On Apr 11, 2:46 am, Ripunjay Tripathi <ripunjay.tripa...@gmail.com>
wrote:

> VLIW has fixed length instructions- tightly coupled to hardware
> resources - In case the hardaware resources exceed in number as asked
> in the instruction, even if the basic VLIW architecture is same the
> program will not work.
> Am I right ??
>
> Even I dont think this number of resources (FUs) are open to
> programmer. So the architecture may be same but pragram will not work.

The problem comes if the instruction asks for more instructions than
the hardware has, so if the resources change from one chip or machine
to another, the instruction format has to change from one machine to
the other.

But I think that you did know that, even if you phrased it backwards.

John Savard

already...@yahoo.com

unread,
Apr 12, 2008, 1:23:18 PM4/12/08
to
On Apr 10, 1:46 pm, "HT-Lab" <han...@ht-lab.com> wrote:
> "Erik Trulsson" <ertr1...@student.uu.se> wrote in message

>
> news:47fd...@news.midgard.homeip.net...
>
>
>
> > Ripunjay Tripathi <ripunjay.tripa...@gmail.com> wrote:
> >> Different Implementations of the same VLIW architecture may not be
> >> binary-compatible with each other.
>
> >> I am looking for explaination on the above line.
>
> > There is nothing special about VLIW in that regard (depending slightly on
> > exactly which definition of 'architecture' and 'binary-compatible' you
> > are using.)
>
> > (Exercise 1: Define exectly what you mean by 'architecture')
>
> > (Exercise 2: Write a small program which will behave differently
> > depending
> > on if it is running on an Intel 8086 or an 8088.
>
> can this be done, and if so how?
>
> Thanks,
> Hanswww.ht-lab.com
>

First hint: look at the size of instruction queue.

HT-Lab

unread,
Apr 12, 2008, 2:06:35 PM4/12/08
to

<already...@yahoo.com> wrote in message
news:e67ecb39-0ecd-49a8...@f63g2000hsf.googlegroups.com...

>> > (Exercise 2: Write a small program which will behave differently
>> > depending
>> > on if it is running on an Intel 8086 or an 8088.
>>
>> can this be done, and if so how?
>>
>> Thanks,
>> Hanswww.ht-lab.com
>>
>
> First hint: look at the size of instruction queue.

Found it thanks,

Hans
www.ht-lab.com

; Is It an 8086?
; Returns ax==0 for 8088, ax==1 for 8086
; Code takes advantage of the 8088's 4-byte prefetch queues and 8086's
; 6-byte prefetch queues. By self-modifying the code at a location exactly 5
; bytes away from IP, and determining if the modification took effect,
; you can differentiate between 8088s and 8086s.
IsItAn8086 proc
mov ax,cs ; es == code segment
mov es,ax
std ; Cause stosb to count backwards
mov dx,1 ; dx is flag and we'll start at 1
mov di,OFFSET EndLabel ; di==offset of code tail to modify
mov al,090h ; al==nop instruction opcode
mov cx,3 ; Set for 3 repetitions
REP stosb ; Store the bytes
cld ; Clear the direction flag
nop ; Three nops in a row
nop ; provide dummy instructions
nop
dec dx ; Decrement flag (only with 8088)
nop ; dummy instruction--6 bytes
EndLabel:
nop
mov ax,dx ; Store the flag in ax
ret ; Back to caller
IsItAn8086 endp


girish.gulawani

unread,
Apr 30, 2008, 11:44:43 AM4/30/08
to
On Apr 11, 4:12 am, n...@cus.cam.ac.uk (Nick Maclaren) wrote:

> In article <1b7if5vbfc....@babs.wb.pfeifferfamily.net>,Joe Pfeiffer <pfeif...@cs.nmsu.edu> writes:
>
> |>
> |> > Not at all, but you probably haven't been in this game long enough to
> |> > realise why I said that.  To start an article by putting the cart
> |> > before the horse and rewriting history isn't good, even in Wikipedia,
> |> > though it is regrettably common.
> |>
> |> I certainly remember horizontal microcode.  You might be interested in
> |> re-reading Fisher'sVLIWpaper, which the wikipedia article you find

> |> so misleading contains a link to, and which does a very good job
> |> explaining the similarities and differences betweenVLIWand
> |> horizontal microcode, and how his goal was specifically to improve ILP.
>
> I wasn't referring just to horizontal microcode, but to the control
> systems that usedVLIWinstruction sets.

>
> I haven't read Fisher's paper, mainly because the abstract set off
> my, er, aromatic hyperbole alarms so loudly.  That was because I
> remembered the research intoVLIWas a target in the 1960s or 1970s,

> and why those people decided it was a non-starter.  In particular,
> his 'expectation' of a 10- to 30-fold improvement by the use of
> what sounded like a variation of an old technique would imply that
> you could use the same methodology to get a 3- to 10-fold improvement
> on more conventional hardware.
>
> I may be maligning him, but EPIC assuredly hasn't delivered on what
> his abstract said he expected, by a factor of 10-30, so I suspect my
> gut feeling was not too far off.
>
> Regards,
> Nick Maclaren.

a newbie question -

Fisher, mentioned in this discussion (i guess) is an author of the
book - Embedded Computing: A VLIW Approach To Architecture, Compilers
And Tools Joseph A. Fisher, Paolo Faraboschi, Clifford Young. any
comments and/or other recommendations? planning to buy one.

thanks.
girish.

Eugene Miya

unread,
Apr 30, 2008, 2:00:04 PM4/30/08
to
In article <7303c2ed-1ff4-489e...@x19g2000prg.googlegroups.com>,
girish.gulawani <giri...@gmail.com> wrote:
>On Apr 11, 4:12=A0am, n...@cus.cam.ac.uk (Nick Maclaren) wrote:
>> In article <1b7if5vbfc....@babs.wb.pfeifferfamily.net>,Joe Pfeiffer <pfeif=

>...@cs.nmsu.edu> writes:
>> |> > Not at all, but you probably haven't been in this game long enough to
>> |> > realise why I said that. To start an article by putting the cart
>> |> > before the horse and rewriting history isn't good, even in Wikipedia,
>> |> > though it is regrettably common.

The wikipedia page on scientific computing is at best marginal. There's
a guy sitting on it who sort of half with in.

>> |> horizontal microcode.
>> |> re-reading Fisher's VLIW paper, ...


>> |> horizontal microcode, and how his goal was specifically to improve ILP.
>
>>
>> I wasn't referring just to horizontal microcode, but to the control

>> systems that used VLIW instruction sets.


>>
>> I haven't read Fisher's paper, mainly because the abstract set off
>> my, er, aromatic hyperbole alarms so loudly. That was because I

>> remembered the research into VLIW as a target in the 1960s or 1970s,


>> and why those people decided it was a non-starter. In particular,
>> his 'expectation' of a 10- to 30-fold improvement by the use of
>> what sounded like a variation of an old technique would imply that
>> you could use the same methodology to get a 3- to 10-fold improvement
>> on more conventional hardware.
>>
>> I may be maligning him, but EPIC assuredly hasn't delivered on what
>> his abstract said he expected, by a factor of 10-30, so I suspect my
>> gut feeling was not too far off.

You are maligning Josh. Slightly.

Whereas I'm not necessarily a fan of VLIW, a factor of say 10 is
conceivable. Had this conversation with Burton Smith some years back.
And we agreed with others present that factors of 8-10 were possible
(distinct from things which might appear vector in nature), but that
architecturally, it was hard to sustain much beyond that (8-10x).

ELI came before VLIW and that was not in the 60s or 70s which were likely
to be more horizontal in nature. It's mostly terminology, but what
Multiflow (Josh's firm) did had greater capability than what was done
earlier.

>a newbie question -
>
>Fisher, mentioned in this discussion (i guess) is an author of the
>book - Embedded Computing: A VLIW Approach To Architecture, Compilers
>And Tools Joseph A. Fisher, Paolo Faraboschi, Clifford Young. any
>comments and/or other recommendations? planning to buy one.

Sure buy a copy.
Well you should also get Ellis' old PhD thesis on the Bulldog compiler,
not that that was easy to read.

I saved a Multiflow Trace at the CHM, but I think it went into deep
storage (Milpitas no longer even in Mtn. View) because to the curators
it's just another rectangular box which doesn't have a sexy shape.
Whereas I was not a fan of the machine in its existence, it deserves to
have more respect than what "historians" give it. Maybe some day some
one will get it running again.

Far important to have running examples of architectures.

--

Nick Maclaren

unread,
Apr 30, 2008, 1:39:15 PM4/30/08
to

In article <4818a594$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|> In article <7303c2ed-1ff4-489e...@x19g2000prg.googlegroups.com>,
|> girish.gulawani <giri...@gmail.com> wrote:
|> >On Apr 11, 4:12=A0am, n...@cus.cam.ac.uk (Nick Maclaren) wrote:
|> >> In article <1b7if5vbfc....@babs.wb.pfeifferfamily.net>,Joe Pfeiffer <pfeif=
|> >...@cs.nmsu.edu> writes:
|> >> |> > Not at all, but you probably haven't been in this game long enough to
|> >> |> > realise why I said that. To start an article by putting the cart
|> >> |> > before the horse and rewriting history isn't good, even in Wikipedia,
|> >> |> > though it is regrettably common.
|>
|> The wikipedia page on scientific computing is at best marginal. There's
|> a guy sitting on it who sort of half with in.

Wikipedia is an excellent source of unreliable information :-)
That isn't snide, because information of known unreliability is a
damn sight better than no information - for example, it often gives
pointers to things to look up.

|> >> I may be maligning him, but EPIC assuredly hasn't delivered on what
|> >> his abstract said he expected, by a factor of 10-30, so I suspect my
|> >> gut feeling was not too far off.
|>
|> You are maligning Josh. Slightly.
|>
|> Whereas I'm not necessarily a fan of VLIW, a factor of say 10 is
|> conceivable. Had this conversation with Burton Smith some years back.
|> And we agreed with others present that factors of 8-10 were possible
|> (distinct from things which might appear vector in nature), but that
|> architecturally, it was hard to sustain much beyond that (8-10x).
|>
|> ELI came before VLIW and that was not in the 60s or 70s which were likely
|> to be more horizontal in nature. It's mostly terminology, but what
|> Multiflow (Josh's firm) did had greater capability than what was done
|> earlier.

A factor of 8-10 is certainly conceivable on some codes, but it isn't
on the sort of things that cause trouble on most desktops and many
commercial servers. In particular, C++-style O-O (including almost
all current GUIs, Java etc.) are death on wheels to such technologies.
Even when there is some ILP there, it's not extractable. HPC is a
more civilised environment.

That was the conclusion that the 1960s and 1970s research came to, and
it is why it was claimed that people needed to convert to 'higher-level'
and more 'structured' languages, where the ILP might be extractable.
Instead, most programmers have gone the other way ....


Regards,
Nick Maclaren.

Alex Colvin

unread,
Apr 30, 2008, 3:52:55 PM4/30/08
to
>Far important to have running examples of architectures.

Tilera: embedded multicore 3-way VLIW.
<http://arstechnica.com/articles/paedia/cpu/MIT-startup-raises-multicore-bar-with-new-64-core-CPU.ars>

VLIW seems to be more common in embedded systems, in preference to OOO.
Possibly because the workloads are better behaved or better understood.
Possibly because they offer more deterministic worst-case performance over
better average-case performance.

I remember the 56K DSPs could do two moves and an ALU op per cycle. Is
that a VLIW? maybe just a LIW?

--
mac the naďf

Eugene Miya

unread,
May 1, 2008, 6:07:08 PM5/1/08
to
In article <fvaimn$9g8$1...@pcls6.std.com>,

Alex Colvin <al...@TheWorld.com> wrote:
>I remember the 56K DSPs could do two moves and an ALU op per cycle. Is
>that a VLIW? maybe just a LIW?

I'm not big on the terminology used by ILP/VLIW proponents.
My impression is that stuff as simple as a move or op isn't enough to be
VLIW. Forking and branching might be more challenging. X-MPs in 1982
could do simultaneous 2-loads, a store, and arithmetic with mere pipelining.
No long instructions needed.

Note: I did type on a Trace in a cooling tank on a visit to Meade.
That's "cooling" in their sense of the word. It was a surprise to be given
the opportunity. I wish I could have played more with it, but Multiflow
folded before the team I was with could do a site visit.
--

Eugene Miya

unread,
May 1, 2008, 8:05:22 PM5/1/08
to
In article <fvaas3$7pj$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>|> >> |> > rewriting history isn't good, even in Wikipedia,
>|> >> |> > though it is regrettably common.
>
>Wikipedia is an excellent source of unreliable information :-)

The phrase I use is "popular cultural information."

The first wikipedia page I fixed was the supercomputer page when it
attributed Batcher's quote to Cray (since integrated into the body).
The most recent wikipedia page I modified was the Enigma page where I
placed an English translation of the German cover: it got moved to the
detailed blow up photo as an addition to the caption (acceptable in my
book).

>That isn't snide, because information of known unreliability is a
>damn sight better than no information - for example, it often gives
>pointers to things to look up.

No, some times half truths are more damaging than untruth. That's how
electronic counter measures like chalf work. It depends somewhat on
references and pointers but some time, the Drake plate fiasco is a good
one, those aren't enough.

>|> >> EPIC
>|> >> a factor of 10-30,

>|> >> gut feeling was not too far off.

>|> VLIW,


>|> factors of 8-10 were possible
>|>

>|> ELI came before VLIW and that was not in the 60s or 70s which were likely
>|> to be more horizontal in nature. It's mostly terminology, but what
>|> Multiflow (Josh's firm) did had greater capability than what was done
>|> earlier.
>
>A factor of 8-10 is certainly conceivable on some codes, but it isn't
>on the sort of things that cause trouble on most desktops and many
>commercial servers. In particular, C++-style O-O (including almost
>all current GUIs, Java etc.) are death on wheels to such technologies.
>Even when there is some ILP there, it's not extractable. HPC is a
>more civilised environment.

Knuth would like a TeX demo (8-10x speed up). I doubt that even Josh
would have promised that for TeX.

I'm not certain that I would go with HPC being more civilized. Parts
are regular so vectors benefit, but that only attraches irregular
applications in the next stage. Some of the HPC guys are Neanderthals.

>That was the conclusion that the 1960s and 1970s research came to, and
>it is why it was claimed that people needed to convert to 'higher-level'
>and more 'structured' languages, where the ILP might be extractable.
>Instead, most programmers have gone the other way ....

It may be wording, but I didn't see ILP driving CISC or structured
languages for instance. I don't recall most programmers even thinking
about parallelism. The number of parallel architectures in the 60s
could be counted on 1 hand. Barely in the 70s. Distributed
architectures were expesnive enough that few existed and certain few of
what we now call distributed languages. Programming language people had
a hard enough time trying to get away from assembly language coders that
parallelism wasn't much of a thought. IBM didn't make parallel machines.
They did attempt fault tolerant dual processors some with multiple
functional units. But not parallel machines.

--

Greg Lindahl

unread,
May 1, 2008, 7:22:08 PM5/1/08
to
In article <481a30fc$1@darkstar>, Eugene Miya <eug...@cse.ucsc.edu> wrote:

>My impression is that stuff as simple as a move or op isn't enough to be
>VLIW. Forking and branching might be more challenging. X-MPs in 1982
>could do simultaneous 2-loads, a store, and arithmetic with mere pipelining.
>No long instructions needed.

... I thought VLIW implied something other than just a lot of
instructions in flight. The X-MP vector architecture doesn't look like
any of the classic VLIW architectures.

-- greg


John Levine

unread,
May 1, 2008, 8:27:58 PM5/1/08
to
>... I thought VLIW implied something other than just a lot of
>instructions in flight.

I always understood it to be wide enough that you have to do trace
scheduling or something similarly aggressive to find enough work
to keep all the units busy.

R's,
John

Greg Lindahl

unread,
May 1, 2008, 8:47:00 PM5/1/08
to
In article <fvdn6e$h7c$1...@gal.iecc.com>, John Levine <jo...@iecc.com> wrote:

>I always understood it to be wide enough that you have to do trace
>scheduling or something similarly aggressive to find enough work
>to keep all the units busy.

That's what I thought, too. Which is nothing like how the X-MP works.
Ordinary vectorization and strip-mining is nothing like trace
scheduling.

-- greg

Andreas Ehliar

unread,
May 1, 2008, 11:49:41 PM5/1/08
to
On 2008-04-30, Alex Colvin <al...@TheWorld.com> wrote:
>>Far important to have running examples of architectures.
>
> Tilera: embedded multicore 3-way VLIW.
><http://arstechnica.com/articles/paedia/cpu/MIT-startup-raises-multicore-bar-with-new-64-core-CPU.ars>
>
> VLIW seems to be more common in embedded systems, in preference to OOO.
> Possibly because the workloads are better behaved or better understood.
> Possibly because they offer more deterministic worst-case performance over
> better average-case performance.

I would guess that another reason is that it is much easier to design a
VLIW processor compared to a superscalar processor. Especially when compared
to an out-of-order superscalar processor.

/Andreas

Nick Maclaren

unread,
May 2, 2008, 3:55:34 AM5/2/08
to

In article <481a4cb2$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|>
|> >That isn't snide, because information of known unreliability is a
|> >damn sight better than no information - for example, it often gives
|> >pointers to things to look up.
|>
|> No, some times half truths are more damaging than untruth. That's how
|> electronic counter measures like chalf work. It depends somewhat on
|> references and pointers but some time, the Drake plate fiasco is a good
|> one, those aren't enough.

That's why I said "KNOWN unreliability". The real danger comes from
sources that are believed (by at least some important people) to be
reliable but, in fact, aren't.



|> Knuth would like a TeX demo (8-10x speed up). I doubt that even Josh
|> would have promised that for TeX.

It would be a brave statement, to be sure :-)

|> I'm not certain that I would go with HPC being more civilized. Parts
|> are regular so vectors benefit, but that only attraches irregular
|> applications in the next stage. Some of the HPC guys are Neanderthals.

True, but have you looked at the code of the X Windowing System Toolkit
and widget sets? And C++ programs that use similar paradigms?

|> >That was the conclusion that the 1960s and 1970s research came to, and
|> >it is why it was claimed that people needed to convert to 'higher-level'
|> >and more 'structured' languages, where the ILP might be extractable.
|> >Instead, most programmers have gone the other way ....
|>
|> It may be wording, but I didn't see ILP driving CISC or structured
|> languages for instance. I don't recall most programmers even thinking

|> about parallelism. ...

The foresighted people who were saying it failed to influence the
unthinking majority[*]. If you look up some of the papers coming out
of the Algol 68 camp, you will find such references. In the event,
Algol 68 eventually sank (though a few of its features have returned
in Fortran 90), and the "higher level language" people faded away.


[*] So what else is new?


Regards,
Nick Maclaren.

Stephen Fuld

unread,
May 2, 2008, 12:25:51 PM5/2/08
to

Add to that another reason. A VLIW processor typically uses less power
than than an OOO superscaler processor as there is less logic to power.
This is important in some embedded applications.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Nick Maclaren

unread,
May 2, 2008, 12:37:10 PM5/2/08
to

In article <jgHSj.151047$D_3....@bgtnsc05-news.ops.worldnet.att.net>,

Well, they are really two sides of the same coin. With a reasonably
well-behaved program, you can get comparable parallelism on both, so
you go for the less power-hungry one. But, with GUI spaghetti etc.,
out-of-order superscalar is merely inefficient, compared with dire on
VLIW. So the cost/benefit ratio flips the other way.


Regards,
Nick Maclaren.

Eric P.

unread,
May 2, 2008, 2:26:22 PM5/2/08
to

Add that embedded systems don't have to worry about binary
compatibility across processor generations so exposing the
hardware parallelism doesn't cause worries for subsequent cpus.

Eric

Peter Grandi

unread,
May 3, 2008, 6:07:15 AM5/3/08
to
>>> On 30 Apr 2008 17:39:15 GMT, nm...@cus.cam.ac.uk (Nick
>>> Maclaren) said:

[ ... ]

nmm1> A factor of 8-10 is certainly conceivable on some codes,
nmm1> but it isn't on the sort of things that cause trouble on
nmm1> most desktops and many commercial servers. In particular,
nmm1> C++-style O-O (including almost all current GUIs, Java
nmm1> etc.) are death on wheels to such technologies. Even when
nmm1> there is some ILP there, it's not extractable.

For the benefit of the less experienced readers, here Nick is
making the usual case about the 3 most common categories of
code:

* "General purpose" code where most data accesses are LIFO and
there is some balance between forward and backwards jumps;
These tend to have either very fine grained or very coarse
grained parallelism.

* "Array" code where most data accesses are FIFO and almost all
jumps are backwards; these tend to have very regular, mass
scale operand parallelism ("embarassingly parallel").

* "Transactional" code where data accesses are RANDOM and almost
all jumps are forward. These tend to have enormous coarse
grained parallelism (the existence of this category is often
forgotten because it has a very narrow if very important user
base).

and arguing very plausibly that architectures where parallelism
is not inter-instruction but intra-instruction cannot do that
well on "general purpose" code, because its parallelism is too
irregular to be captured in neat 4-8 wide instruction packets.

nmm1> HPC is a more civilised environment.

You would say that :-).

nmm1> That was the conclusion that the 1960s and 1970s research
nmm1> came to, and it is why it was claimed that people needed
nmm1> to convert to 'higher-level' and more 'structured'
nmm1> languages, where the ILP might be extractable.

Uhm, I am not sure I agree that "ILP might be more extractable"
was the primary motivator. In very large part it was driven by
concerns about correctness of programs and compilers, that is to
make reasoning about programs easier.

Identifying data dependencies and flow for compiling code is of
course a form of reasoning about programs, so it would have
benefited too, but the focus was I think about removing needless
complexity from reasoning about programs (e.g. FORTRAN/C/C++
with some of their surprising rules).

nmm1> Instead, most programmers have gone the other way ....

The comprehensive defeat of that movement is I think due to some
large scale factors:

* Very buggy programs sell very well if their most popular code
paths are useful and less buggy than the rest. Users are
attracted by the large number of theoretical features, and
then learn to avoid the very buggy bits. The definition of
"works" is social rather than technical.

* Functional programming (and OO programming too) were rather
oversold for productivity enhancement, while most employers of
programmers care almost only about lines of code per day per
dollar of pay, as quality (previous point) is irrelevant (to
the bonuses of managers) beyond a minimum level.

However there are examples of languages, mostly fairly topic
specific ones, the main but not sole example is SQL, that have
succeeded in providing high level descriptions that are amenable
to easier reasoning, extraction of parallelism, and massive
optimization. Many popular DBMSes however don't do it that well
either.

Nick Maclaren

unread,
May 3, 2008, 6:46:15 AM5/3/08
to

In article <yf3ej8j...@tree.gp.example.com>,

pg...@0710.exp.sabi.co.UK (Peter Grandi) writes:
|>
|> For the benefit of the less experienced readers, here Nick is
|> making the usual case about the 3 most common categories of
|> code:

Yes. There are other categories and sub-categories, but they don't
change the basic argument.

|> nmm1> That was the conclusion that the 1960s and 1970s research
|> nmm1> came to, and it is why it was claimed that people needed
|> nmm1> to convert to 'higher-level' and more 'structured'
|> nmm1> languages, where the ILP might be extractable.
|>
|> Uhm, I am not sure I agree that "ILP might be more extractable"
|> was the primary motivator. In very large part it was driven by
|> concerns about correctness of programs and compilers, that is to
|> make reasoning about programs easier.

I didn't mean that it was - as you say, "correctness" was the primary
motivation.

But the Algol 68 people use to tell the Fortran 66 (and assembler)
people that performance in the future would depend more on compiler-
based optimisability and less on getting close to the hardware and
manual optimisation. They lost out, though Fortran 90 has several
of the Algol 68 concepts in it :-)

C is, of course, the classic "3 GL" language that was designed to
be able to get close to the hardware, and for manual optimisation.

|> However there are examples of languages, mostly fairly topic
|> specific ones, the main but not sole example is SQL, that have
|> succeeded in providing high level descriptions that are amenable
|> to easier reasoning, extraction of parallelism, and massive
|> optimization. Many popular DBMSes however don't do it that well
|> either.

And have been since the 1960s, but the early ones tended to be based
primarily on matrix primitives (e.g. the engineering and statistical
languages). Unfortunately, your remark about "topic based" seems to
have become more true rather than less.

I still think that one could design a GUI language based on such
principles that would be orders of magnitude more reliable, secure
and efficient. The trouble is that it would be completely alien to
current GUIs even at the conceptual level :-(

But imagine a reliable, secure GUI that used a Sun Niagara or
IBM/Sony Cell to its full potential - and NOT just for rendering ....


Regards,
Nick Maclaren.

Peter Grandi

unread,
May 3, 2008, 6:53:36 AM5/3/08
to
[ ... ]

>> That was the conclusion that the 1960s and 1970s research

>> came to, and it is why it was claimed that people needed to
>> convert to 'higher-level' and more 'structured' languages,


>> where the ILP might be extractable.

> Uhm, I am not sure I agree that "ILP might be more
> extractable" was the primary motivator. In very large part it
> was driven by concerns about correctness of programs and
> compilers, that is to make reasoning about programs easier.

[ ... ]

> * Functional programming (and OO programming too) were rather
> oversold for productivity enhancement, while most employers of
> programmers care almost only about lines of code per day per
> dollar of pay, as quality (previous point) is irrelevant (to
> the bonuses of managers) beyond a minimum level.

As to this I want here to recount one of two great contributions
to computer science that I have made (CPUs with multiple stacks
is not as great :->), and very highly relevant to ILP.

I have studied parallel assignment, and discovered that it is
fairly easy to compile it into quite optimal code.

Examples of parallel assignment in order of increasing
"difficulty":

a, b := 1, 2; # easy
a, b := 1, a; # less easy
a, b := b, a; # tricky
i, a[i] := a[i], i; # uhhh tricky

The key is the number of temporaries and the ordering of the
assignments. One need to take into account the depencies among
left hand sides, and which right hand sides depend on which left
hand sides.

The algorithm is tunable in the sense that even when there are
dependencies it is possible to parallelize more by adding more
temporaries than strictly needed.

So far so good, and the algorithm is published in IPL.

What I did not publish is a paper on the implications:

* Very many bits of code (whole basic blocks usually) in which
there is ILP are in effect the output of the parallel
assignment expansion algorithm.

* These bits of code are very difficult to analyze because the
simple but far from obvious expansion algorithm is implicit in
them, more than because of the (partial) ILP.

* Parallel assignments are usually very much easier to
understand and to optimize than one of their expansions.

* The implicit parallel assignment expresses the ILP in the code
very explicitly, and in many codes the ILP is almost entirely
captured by parallel assignments.

* A lot of the difficulty in extracting ILP from code amounts to
figuring out from scattered code what the implicit parallel
assignment that gave rise to that code and then re-expanding
it in another form.

This (and an important but also forgotten paper on sliding and
rotation assingments in CACM 1982 also inspired me) has lead me
to speculate that the main benefit of "functional" languages was
not their being functional, in the sense of replacing assignment
with argument to parameter binding, but that such bindings are
particularly clumsy and usually inefficient parallel assignment.

In other words, my contention is that a language with properly
implemented parallel assignment has (if the parallel assignment
are used) the advantages of functional languages without the
sometimes inconvenient restrictions.

Naturally there is the downside: even if code turns out to be
composed mostly of parallel assignments with some bit of
conditional logic glue, a lot of those are indeed rife with
dependencies in "general purpose" programs.

But at least the analysis is easier and it becomes obvious that
adding temporaries might add exploitable parallelism.

To the point that sometimes I muse that trace scheduling can be
reinterpreted as seeing programs as being composed of parallel
assignments and then adding "too many" temporaries to increase
the potential for parallelism in some traces.

John Dallman

unread,
May 3, 2008, 7:01:00 AM5/3/08
to
In article <yf3ej8j...@tree.gp.example.com>,
pg...@0710.exp.sabi.co.UK (Peter Grandi) wrote:

> * Functional programming (and OO programming too) were rather
> oversold for productivity enhancement,

Absolutely.

> while most employers of programmers care almost only about lines
> of code per day per dollar of pay, as quality (previous point)
> is irrelevant (to the bonuses of managers) beyond a minimum level.

IME, they don't give a damn about lines of code per dollar of programmer
pay, and they are aware of the fact that different programmers will code
the same things in very different numbers of lines.

What they care about is the number of man-days it takes to produce "new
stuff" - new features, or important bug-fixes.

--
John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.

Nick Maclaren

unread,
May 3, 2008, 7:57:41 AM5/3/08
to

In article <yf3abj7...@tree.gp.example.com>,
pg...@0710.exp.sabi.co.UK (Peter Grandi) writes:

[ A most interesting post, clarifying a lot of muddy gut feelings. ]

|> In other words, my contention is that a language with properly
|> implemented parallel assignment has (if the parallel assignment
|> are used) the advantages of functional languages without the
|> sometimes inconvenient restrictions.

MOST interesting. A long time ago, I thought of doing that, but
it was for other reasons, and hadn't realise that it might have
major efficiency advantages. Incidentally, Fortran 90, Python
etc. include quite a lot of such features, which is clearly good;
the first language I saw that used the concept was Genstat, but
it may have had predecessors.

My idea was (and is) to make ALL function calls into monadic
operators, acting on and returning displays - displays would
be first-class concepts, and could be named, merged, split etc.
I got the idea from languages that had such facilities as extras,
and was thinking of what could be gained by making them the ONLY
mechanism. There were two reasons for that:

Quite a lot of LAPACK-like code would be clarified, and could
be optimised without trace analysis. Yes, one function could
return the argument list to pass to another, while maintaining
conventional procedural programming style.

It would expose the parallelism in some graph-theoretic codes,
which can be exposed today only by encapsulating the actions
in "O-O" functions (which then loses most of the efficiency
gained).

Now, obviously, such code would compile a lot better to EPIC and
VLIW than most existing languages, but how easy would it be to use
and how much gain would there be? You may be able to extimate
that better than I can.

|> To the point that sometimes I muse that trace scheduling can be
|> reinterpreted as seeing programs as being composed of parallel
|> assignments and then adding "too many" temporaries to increase
|> the potential for parallelism in some traces.

Indeed. Mathematics is full of dual concepts :-) Stacks versus
recursion, anyone?


Regards,
Nick Maclaren.

Peter Grandi

unread,
May 3, 2008, 8:27:02 AM5/3/08
to
>> * Functional programming (and OO programming too) were rather
>> oversold for productivity enhancement,

> Absolutely.

>> while most employers of programmers care almost only about
>> lines of code per day per dollar of pay, as quality (previous
>> point) is irrelevant (to the bonuses of managers) beyond a
>> minimum level.

> IME, they don't give a damn about lines of code per dollar of
> programmer pay, and they are aware of the fact that different
> programmers will code the same things in very different
> numbers of lines.

Yes, yes, but...

> What they care about is the number of man-days it takes to
> produce "new stuff" - new features, or important bug-fixes.

Believe me when I say that I considered saying "feature points"
instead of "lines of code".

But I settle for "lines of code" because in my experience most
managers of programming shops have no idea what "feature points"
are or if they do how to identify and measure them.

If one wanted to be more precise, usually bonuses are earned on
"deliverables" by "date", and "deliverables" are not defined in
terms of "these many things work" but in bulk terms, which in
effect translate to quantity of code written, things like "the
application is complete", or "new stuff" as you say.

But more new stuff means more lines of code written even if the
relationship between "lines of code" and "new stuff" is far from
linear or stable as you say...

We are not really disagreeing, but I prefer to emphasize the gross
quantitative aspect of the "new stuff".

John Dallman

unread,
May 3, 2008, 10:42:00 AM5/3/08
to
In article <yf363tv...@tree.gp.example.com>,
pg...@0710.exp.sabi.co.UK (Peter Grandi) wrote:

> We are not really disagreeing, but I prefer to emphasize the gross
> quantitative aspect of the "new stuff".

I see your point, but using lines of code as a measure without being
really, blatantly, rhetorical about it risks fatally under-estimating
the skill with which people deceive themselves.

Nick Maclaren

unread,
May 3, 2008, 12:39:02 PM5/3/08
to

In article <memo.2008050...@jgd.compulink.co.uk>,

j...@cix.co.uk (John Dallman) writes:
|> In article <yf363tv...@tree.gp.example.com>,
|> pg...@0710.exp.sabi.co.UK (Peter Grandi) wrote:
|>
|> > We are not really disagreeing, but I prefer to emphasize the gross
|> > quantitative aspect of the "new stuff".
|>
|> I see your point, but using lines of code as a measure without being
|> really, blatantly, rhetorical about it risks fatally under-estimating
|> the skill with which people deceive themselves.

True, but I no longer think that is an important delusion, at least
compared with many of the other near-universal ones!


Regards,
Nick Maclaren.

Pertti Kellomäki

unread,
May 5, 2008, 3:03:55 AM5/5/08
to
Peter Grandi wrote:
> In other words, my contention is that a language with properly
> implemented parallel assignment has (if the parallel assignment
> are used) the advantages of functional languages without the
> sometimes inconvenient restrictions.

A related point that I have become more and more convinced about
is that tail recursion would be quite a useful structuring mechanism
even outside of functional programming. A tail call combines a
parallel assignment with a jump, which makes any ILP pretty
obvious, and it also allows reasoning about programs in terms
of largish atomic state changes.

Two languages that I know where tail calls are explicitly used
for this are Scheme and Erlang. In Scheme, iteration constructs
are often expressed using functions as follows:

(define (iterate a b c)
... some code here...
(iterate (compute-new-a) (compute-new-b) (compute-new-c)))

The semantics of the language dictate that the tail call to
iterate must not consume any extra space, which in practice
means that it is turned into a jump. The expressions (compute-new-X)
clearly stand out as candidates for ILP. For the reader, the
implicit parallel assignment is an atomic change to the state
variables a, b, and c.

In Erlang, a functional language for distributed programming (think
phone exchanges and the like), a similar technique is used for
expressing processes. A process keeps itself alive by making
tail recursive calls, and the state of the process is kept in
the parameters passed in the calls. This also provides a convenient
mechanism for hot-swapping in new code if the tail calls are
made indirectly via a module name space, but that's another
story.
--
Pertti

Nick Maclaren

unread,
May 5, 2008, 5:33:37 AM5/5/08
to

In article <fvmbgr$v47$1...@news.cc.tut.fi>,

=?ISO-8859-1?Q?Pertti_Kellom=E4ki?= <pertti.k...@tut.fi> writes:
|> Peter Grandi wrote:
|> > In other words, my contention is that a language with properly
|> > implemented parallel assignment has (if the parallel assignment
|> > are used) the advantages of functional languages without the
|> > sometimes inconvenient restrictions.
|>
|> A related point that I have become more and more convinced about
|> is that tail recursion would be quite a useful structuring mechanism
|> even outside of functional programming. A tail call combines a
|> parallel assignment with a jump, which makes any ILP pretty
|> obvious, and it also allows reasoning about programs in terms
|> of largish atomic state changes.

Grrk. My experience is that tail recursion is useful, as you say,
but also a pain in the neck. When it is optimised out, it makes it
vastly harder for debuggers and tuning tools to display any useful
information. Even if it is not optimised out, tracebacks are much
less useful.

So you win some and lose some :-(

|> In Erlang, a functional language for distributed programming (think
|> phone exchanges and the like), a similar technique is used for
|> expressing processes. A process keeps itself alive by making
|> tail recursive calls, and the state of the process is kept in
|> the parameters passed in the calls. This also provides a convenient
|> mechanism for hot-swapping in new code if the tail calls are
|> made indirectly via a module name space, but that's another
|> story.

A really ancient technique (the 360/370 operating system used it) that
was brought into high-level languages some 20+ years ago. See BCPL
coroutines, for example. Yes, I agree that it is probably the best
way to support multiple non-parallel processes. MUCH cleaner than
POSIX-style threading.


Regards,
Nick Maclaren.

Pertti Kellomäki

unread,
May 5, 2008, 5:48:35 AM5/5/08
to
Nick Maclaren wrote:
> Grrk. My experience is that tail recursion is useful, as you say,
> but also a pain in the neck. When it is optimised out, it makes it
> vastly harder for debuggers and tuning tools to display any useful
> information. Even if it is not optimised out, tracebacks are much
> less useful.
>
> So you win some and lose some :-(

I took a bit of a beating over this in comp.lang.lisp a while
ago, so I know about the drawbacks ;-)

However, when tail recursion is used explicitly for creating
a computation that runs in constant space, then the alternative
is to use some kind of loop or goto, which also do not leave
any backtrace.

The best of both worlds would probably be special syntax for the
tail calls that should be optimized away.
--
Pertti

Nick Maclaren

unread,
May 5, 2008, 6:25:27 AM5/5/08
to

In article <fvml5k$4fl$1...@news.cc.tut.fi>,

=?ISO-8859-1?Q?Pertti_Kellom=E4ki?= <pertti.k...@tut.fi> writes:
|>
|> > Grrk. My experience is that tail recursion is useful, as you say,
|> > but also a pain in the neck. When it is optimised out, it makes it
|> > vastly harder for debuggers and tuning tools to display any useful
|> > information. Even if it is not optimised out, tracebacks are much
|> > less useful.
|> >
|> > So you win some and lose some :-(
|>
|> I took a bit of a beating over this in comp.lang.lisp a while
|> ago, so I know about the drawbacks ;-)

That's the place that has the experience :-)

|> However, when tail recursion is used explicitly for creating
|> a computation that runs in constant space, then the alternative
|> is to use some kind of loop or goto, which also do not leave
|> any backtrace.

Agreed.

|> The best of both worlds would probably be special syntax for the
|> tail calls that should be optimized away.

It's a reasonable approach, certainly.


Regards,
Nick Maclaren.

Peter Grandi

unread,
May 5, 2008, 12:43:09 PM5/5/08
to
[ ... ]

>> In other words, my contention is that a language with properly
>> implemented parallel assignment has (if the parallel assignment
>> are used) the advantages of functional languages without the
>> sometimes inconvenient restrictions.

> A related point that I have become more and more convinced about
> is that tail recursion would be quite a useful structuring
> mechanism even outside of functional programming.

GCC optimizes tail recursion... At least simple cases. I think it
has become more extensive with time.

Anyhow tail recursion is a very useful descriptive device even when
it is not optimized into stack reuse, it just makes clearer program
flow and programmer intent.

> A tail call combines a parallel assignment with a jump, which
> makes any ILP pretty obvious, and it also allows reasoning about
> programs in terms of largish atomic state changes.

That's one way of putting it. But it still uses something that
looks more like binding rather than proper parallel assignment,
which has weaker and thus more optimizable semantics than binding
as defined in most languages (aliasing...).

> Two languages that I know where tail calls are explicitly used
> for this are Scheme and Erlang. In Scheme, iteration constructs
> are often expressed using functions as follows:

> (define (iterate a b c)
> ... some code here...
> (iterate (compute-new-a) (compute-new-b) (compute-new-c)))

That kind of pattern more often looks slightly differently, with
a main function calling a tail recursive function with some extra
parameters. A bit of Emacs Lisp for an excessively simple example:

(defun factorial (n)
(factorial-step 1 n)
)

(defun factorial-step (i j)
(if (< j 2) i
(factorial-step (* i j) (- j 1))
)
)

IIRC this is mentioned in "Anatomy of Lisp" by Allen (one of those
extraordinary classics).

Then there was Greussay's splendid article and thesis showing how
by rearranging slightly things tail recursive implementations of
*mutually recursive* procedures are free.

Nick Maclaren

unread,
May 5, 2008, 1:42:46 PM5/5/08
to

In article <yf31w4g...@tree.gp.example.com>,

pg...@0710.exp.sabi.co.UK (Peter Grandi) writes:
|>
|> Anyhow tail recursion is a very useful descriptive device even when
|> it is not optimized into stack reuse, it just makes clearer program
|> flow and programmer intent.

I strongly disagree.

In a functional language, or even a functional programming paradigm,
that is true. But, in a language and programming paradigm where
much or most of the 'importance' is in the side-effects, I find that
tail recursion obfuscates both the program flow and the programmer's
intent.

That isn't just my personal dislike, either, but is based on helping
people who have had trouble in a good many imperative languages:
Algol 68, BCPL, C, Fortran and others.


There is also a secondary menace of the technique, caused by code
like:

Function fred
If ...
...
Return joe(...)
Else
... lots of code ...
Return <data value>
Endif
Endfunction

I and others have wasted HOURS due to that "gotcha" - again, it is
the combination of that with non-functional features that causes the
problem.


Regards,
Nick Maclaren.

Peter Grandi

unread,
May 5, 2008, 4:03:40 PM5/5/08
to
[ ... ]

>> Anyhow tail recursion is a very useful descriptive device
>> even when it is not optimized into stack reuse, it just makes
>> clearer program flow and programmer intent.

> I strongly disagree. In a functional language, or even a
> functional programming paradigm, that is true. But, in a
> language and programming paradigm where much or most of the
> 'importance' is in the side-effects, I find that tail recursion
> obfuscates both the program flow and the programmer's intent.

Uhm, there is a point about side effects, but that affects all
forms of recursion.

However I think that as to *descriptive* I have a stronger point,
which is part of a general argument.

The general argument is that there is a natural correspondence
between data structures and control structures, which goes like
this, in a very simplified form:

data structure control structure

constant expression
variable assignment
record block ('with', parallel assignment
array repetition (bounded, 'for')
streams repetition (logical, 'while')
frame, hash repetition (keyed, 'foreach')
lists recursion (tail)
binary trees mixed recursion
trees recursion
acyclic graphs iterators
general graphs generators

NOTES:
* some of the distinctions above are a bit fine.
* 'with' means a Pascal-like 'with' block that 'opens'
a record's scope.
* a frame is a record with potentially changing number and
names of field.

The correspondence means that to explore/enumerate a certain class
of data structure it is most proper/descriptive to use the
corresponding control structure.

My practice is not just that using the proper-level control
structure for each data structure category is quite important
(and from your argument above you seem to agree), but that there
are data structures that are defined recursively like lists but
in essence without requiring a stack, and for these tail
recursion is most appropriate.

Consider the three middle structures above:

array: defined as a bounded repetition of N "contiguous"
elements, control structure is 'for'.

stream: defined as an unbounded repetition of "contiguous"
elements (\0 terminated string, files), control
structure is 'while'.

list: defined as a non-branching tree of independent,
non-contiguous nodes.

For the latter tail recursion seems to me to be ideal: because a
linear tree of nodes is not the same as a stream of values, and
I'd rather not use the same control structure to enumerate both.
Consider the length of a string or a single headed list in GNU C:

unsigned strlen1(const char *const s)
{
register unsigned n = 0;
register const char *p = s;
while (*p != 0)
n++;
return n;
}

unsigned strlen2(const char *const s)
{
inline unsigned (strlen2step(const unsigned n,const char *const s)
{ return (s[0] == '\0') ? n : strlen2step(n+1,&s[1]);
return strlen2step(0,s);
}

unsigned listlen1(const struct list *const l)
{
register unsigned n = 0;
register const list *p = l;
while (p != 0)
{ p = p->next; n++; }
return n;
}

unsigned listlen2(const struct list *const l)
{
inline unsigned listlen2step(const unsigned n,const struct list *const l)
{ return (l == 0) ? n : listlen2step(n+1,l->next); }
return listlen2do(0,l0);
}

Of these I'd prefer as 'strlen1' and 'listlen2', as they match the
underlying structure better than 'strlen2' and 'listlen2'.

Let me put it another way: a sequence (bounded as an array,
unbounded as a stream/string) is fundamentally different from a
list because while both are unidimensional, the sequence cannot
but be so, and it has more of an cardinal nature (most sequences
can be easily accesses positionally), while lists are more
ambiguous, being degenerate trees, and they have more of an
ordinal nature like trees and graphs.

Stylistically I'd rather distinguish the exploring/enumerating
a sequence from a list.

Still another way: Dijkstra powerfully argued that using
recursion when looping is sufficient is a bad idea, basing the
argument on the observation that looping creates a relationship
between two predicates (like a block or an assignment or an
expression), while recursion between predicate transformers, and
there is a significant semantic jump between the two.

Well, in my view in that significant semantic jump there is an
intermediate step for lists and tail recursion, because of that
cardinal to ordinal jump.

Finally, I will explain a special case that the observant will
have spotted above, which is binary trees.

These are a special case because they can be interpreted as a
linear list with branches; visualize it not has having left and
right branches, but next and down ones.

This means that a very common way of exploring/enumerating them
is by linear progression in the next direction, and recursion
in the down direction.

In that case it is fairly natural to 'while' nextwards and
recurse downwards. But it is I think more appropriat to
tail-recurse righward and recurse downward, because after all it
is a tree. Consider the count of elements in a binary tree in
GNU C:

unsigned btlen1(const struct bt *const t)
{
register unsigned n = 0;
register const struct bt *p = t;
while (p != 0)
{ if (p->down != 0) n += btlen1(p->down); n++; p = p->next; }
return n;
}

unsigned btlen2(const struct bt *const t)
{
inline unsigned btlen2step(const unsigned n,const struct bt *const t)
{ return (t == 0) ? n
: btlen2step(n+((t->down == 0) ? 0 : btlen2(t->down))+1,t->next); }
return btlen2step(0,t);
}

where I prefer 'btlen2' to 'btlen1'.

NOTES:
* I would not write these functions quite like that, this
particular shape is to illustrate my point, but still good.
* GCC 4.1 does compile 'btlen2step' as a loop, and even does
loop unrolling. This did amuse me.

Some general points:

* Yes, my preferred C programming style is quite expression
oriented, Algol 68-ish, functional. Also because with GCC it
optimizes particularly well specially with asm inlines and
builtins, e.g. SSE.

* Yes, I try to be quite tidy and precise in the descriptive
style of my coding. It costs no more than messy hard to
follow code. Also because I prefer programs that require
very little debugging and to have very few lifetime bugs.

Peter Grandi

unread,
May 5, 2008, 4:46:33 PM5/5/08
to
[ ... ]

> The general argument is that there is a natural correspondence
> between data structures and control structures, which goes like
> this, in a very simplified form:

> data structure control structure

> constant expression
> variable assignment
> record block ('with', parallel assignment
> array repetition (bounded, 'for')
> streams repetition (logical, 'while')
> frame, hash repetition (keyed, 'foreach')
> lists recursion (tail)
> binary trees mixed recursion
> trees recursion
> acyclic graphs iterators
> general graphs generators

The architectural implications, especially as to optimization of
bulk operation, should be fairly obvious.

Also, my classification above is very much indebted to a few
great bits of literature and research, regrettably like so many
others largely lost in obscurity:

* Chapter 2 "On data structures" of "Stuctured programming".
* Chapter 3 "Hierarchical program structures" of "Structured
programming".
* POP2 and MDL, and to some extent EL1.

[ ... ]

As to "Structured programming", whose chapters 2 and 3 are so
uniquely important yet obscure even when it was a hot book, some
years ago I listened to D. Knuth at a lecture saying that he had
asked AP if it still was selling, and he was told that indeed it
was still selling, *three* copies per year. O tempora, o mores :-).

But then I think this classic cartoon tragically realistic:

http://www.iBiblio.org/Dave/Dr-Fun/df200002/df20000210.jpg

Greg Lindahl

unread,
May 5, 2008, 5:00:27 PM5/5/08
to
In article <yf31w4g...@tree.gp.example.com>,
Peter Grandi <pg...@0710.exp.sabi.co.UK> wrote:

>GCC optimizes tail recursion... At least simple cases. I think it
>has become more extensive with time.

I think one of the SPECcpu benchmarks has a significant opportunity
with tail recursion, certainly it's something that most
performance-oriented compilers do.

-- greg


Nick Maclaren

unread,
May 5, 2008, 5:33:56 PM5/5/08
to

In article <yf3od7k...@tree.gp.example.com>,

pg...@0710.exp.sabi.co.UK (Peter Grandi) writes:
|>
|> > The general argument is that there is a natural correspondence
|> > between data structures and control structures, which goes like
|> > this, in a very simplified form:
|>
|> The architectural implications, especially as to optimization of
|> bulk operation, should be fairly obvious.

Agreed. I am not going to respond to your posting, as I agree with
you in almost all of it. I was only objecting to the implication
that tail recursion in itself clarifies code - i.e. that code with
it is likely to be clearer than code without it, in the absence of
other conditions.

|> As to "Structured programming", whose chapters 2 and 3 are so
|> uniquely important yet obscure even when it was a hot book, some
|> years ago I listened to D. Knuth at a lecture saying that he had
|> asked AP if it still was selling, and he was told that indeed it
|> was still selling, *three* copies per year. O tempora, o mores :-).

What is so catastrophic is the number of (now influential) professors
who believe the myths and dogmas of yesterday - such as software
engineering is primarily about development rituals and toolkits, or
that dynamic (run-time) error handling is intrinsically undesirable
(programs should be validated statically).

|> But then I think this classic cartoon tragically realistic:
|>
|> http://www.iBiblio.org/Dave/Dr-Fun/df200002/df20000210.jpg

Unfortunately :-(


Regards,
Nick Maclaren.

Eugene Miya

unread,
May 5, 2008, 8:16:15 PM5/5/08
to
In article <481a50a0$1...@news.meer.net>, Greg Lindahl <lin...@pbm.com> wrote:
>In article <481a30fc$1@darkstar>, Eugene Miya <eug...@cse.ucsc.edu> wrote:
>>My impression is that stuff as simple as a move or op isn't enough to be
>>VLIW. Forking and branching might be more challenging. X-MPs in 1982
>>could do simultaneous 2-loads, a store, and arithmetic with mere pipelining.
>>No long instructions needed.

>
>... I thought VLIW implied something other than just a lot of
>instructions in flight.

Yeah, I think that, too. So we have dealt with the quantitative.
That brings up the qualitative. Might, in some case, we be running an
ILP architecture in a lot of vector like patterns? Maybe. pause.
I don't know, I think aside from arm chair quarterbacking, I think we
have to get the Trace up and running again. I would want to see
empirical stuff. Not conjecture.

>The X-MP vector architecture doesn't look like
>any of the classic VLIW architectures.

Agreed.

But the comparisons were for an architecture I had no familiarity with
and the 360/370. And I'm not so certain the IBM's horizontally coded
machines would qualify as VLIW/ILP.

--

Eugene Miya

unread,
May 5, 2008, 8:20:16 PM5/5/08
to

Agreed.
But the end timed results won't have qualitative differences if you
don't see how things work. You want to run a variety of codes to get
differences.


--

Eugene Miya

unread,
May 5, 2008, 8:27:24 PM5/5/08
to
In article <fvehdm$t88$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>|> >That isn't snide, because information of known unreliability is a
>|> >damn sight better than no information - for example, it often gives
>|> >pointers to things to look up.

In article <481a4cb2$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
>|> No, some times half truths are more damaging than untruth. That's how
>|> electronic counter measures like chalf work. It depends somewhat on
>|> references and pointers but some time, the Drake plate fiasco is a good
>|> one, those aren't enough.

>That's why I said "KNOWN unreliability". The real danger comes from
>sources that are believed (by at least some important people) to be
>reliable but, in fact, aren't.
>
>|> Knuth would like a TeX demo (8-10x speed up). I doubt that even Josh
>|> would have promised that for TeX.
>
>It would be a brave statement, to be sure :-)

Yep.

>|> I'm not certain that I would go with HPC being more civilized. Parts
>|> are regular so vectors benefit, but that only attraches irregular
>|> applications in the next stage. Some of the HPC guys are Neanderthals.
>
>True, but have you looked at the code of the X Windowing System Toolkit
>and widget sets? And C++ programs that use similar paradigms?

I've insufficent knowledge of X internals. C++ OOP has its on set of
problems. I'd put X in the TeX class of sequential algorithms for the
time being.

>|> >That was the conclusion that the 1960s and 1970s research came to, and
>|> >it is why it was claimed that people needed to convert to 'higher-level'
>|> >and more 'structured' languages, where the ILP might be extractable.
>|> >Instead, most programmers have gone the other way ....
>|>
>|> It may be wording, but I didn't see ILP driving CISC or structured
>|> languages for instance. I don't recall most programmers even thinking
>|> about parallelism. ...
>
>The foresighted people who were saying it failed to influence the
>unthinking majority[*]. If you look up some of the papers coming out
>of the Algol 68 camp, you will find such references. In the event,
>Algol 68 eventually sank (though a few of its features have returned
>in Fortran 90), and the "higher level language" people faded away.
>
>[*] So what else is new?

New? Nathan is taking delivery on Doran's 2nd Babbage difference engine.

I know few who worked on 68. Buzz U. maybe. Dennis maybe to a small
degree. Wirth, too maybe. It's not something anyone really chews the
fat about. Most people look forward.

You likely have to be more specific about references. Given their own
devices most people would look up the wrong references.

--

Rick Jones

unread,
May 5, 2008, 7:58:16 PM5/5/08
to
Greg Lindahl <lin...@pbm.com> wrote:
> I think one of the SPECcpu benchmarks has a significant opportunity

Nit picking, but _which_ SPECcpu? There are at least SPECcpu2006,
SPECcpu2000, SPECcpu95 and SPECcpu92, and I seem to recall an '89
version but that one isn't mentioned on SPEC's website:

http://www.spec.org/benchmarks.html

rick jones
--
Process shall set you free from the need for rational thought.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

Greg Lindahl

unread,
May 5, 2008, 8:36:08 PM5/5/08
to
In article <fvo6uo$eb8$3...@usenet01.boi.hp.com>,

Rick Jones <rick....@hp.com> wrote:
>Greg Lindahl <lin...@pbm.com> wrote:
>> I think one of the SPECcpu benchmarks has a significant opportunity
>
>Nit picking, but _which_ SPECcpu?

I don't recall. It was something discussed while I was at PathScale,
so it must be one or both of SPECcpu2000 and SPECcpu2006.

-- greg

Nick Maclaren

unread,
May 6, 2008, 4:19:23 AM5/6/08
to

In article <481f97dc$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|>
|> >True, but have you looked at the code of the X Windowing System Toolkit
|> >and widget sets? And C++ programs that use similar paradigms?
|>
|> I've insufficent knowledge of X internals. C++ OOP has its on set of
|> problems. I'd put X in the TeX class of sequential algorithms for the
|> time being.

I spend some months on it, and have looked at TeX. I wouldn't.
I would put it in the garbage.

|> >The foresighted people who were saying it failed to influence the
|> >unthinking majority[*]. If you look up some of the papers coming out
|> >of the Algol 68 camp, you will find such references. In the event,
|> >Algol 68 eventually sank (though a few of its features have returned
|> >in Fortran 90), and the "higher level language" people faded away.
|>

|> I know few who worked on 68. Buzz U. maybe. Dennis maybe to a small
|> degree. Wirth, too maybe. It's not something anyone really chews the
|> fat about. Most people look forward.

Er, no. In all respects. The sad thing about the attitude of the
'winners' in IT and politics is that they try to deny the positive
achievements of the 'losers'. Algol 68 showed that many of the
current problems with programming languages are simply unnecessary.
Those who will not study history are condemned to repeat it.

And I am not referring to Dennis Ritchie in that - he did not belong
to an opposing camp, but to an orthogonal camp.

|> You likely have to be more specific about references. Given their own
|> devices most people would look up the wrong references.

Doubtless. However, they are all on paper, some are a couple of miles
from where I work, and most are not available in Cambridge. You can
do your own leg work :-)


Regards,
Nick Maclaren.

Bernd Paysan

unread,
May 6, 2008, 7:21:26 AM5/6/08
to
Nick Maclaren wrote:
> I spend some months on it, and have looked at TeX. I wouldn't.
> I would put it in the garbage.

Actually, I think there's a lot of inherent parallelism in the task TeX
tries to achieve (typesetting). The way it's implemented however assures
that everything is serialized. The same is true for GCC (with less impact
for the user, because you can always make -j for more parallelism).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Nick Maclaren

unread,
May 6, 2008, 7:38:50 AM5/6/08
to

In article <mir5f5-...@annette.mikron.de>,

Bernd Paysan <bernd....@gmx.de> writes:
|> Nick Maclaren wrote:
|> > I spend some months on it, and have looked at TeX. I wouldn't.
|> > I would put it in the garbage.
|>
|> Actually, I think there's a lot of inherent parallelism in the task TeX
|> tries to achieve (typesetting). The way it's implemented however assures
|> that everything is serialized. The same is true for GCC (with less impact
|> for the user, because you can always make -j for more parallelism).

Yes. However, that paragraph on its own is misleading - I wasn't
referring to TeX, but to the X Windowing System Toolkit and widget
sets - the context made that clear.

TeX is merely grim, as one would expect from a Pascal-derived source.


Regards,
Nick Maclaren.

Eric P.

unread,
May 6, 2008, 1:48:12 PM5/6/08
to

This is actually a difficult distinction to draw but
I'll take a crack at it.

Horizontal microcode (of that era) and VLIW both have instructions
that use "a large number of bits" and use a fixed layout to dedicate
instruction fields for each function. Both expect the programmer
(or compiler) to track function unit latency and manually schedule
results. However that is where the similarity ends.

While horizontal microcode typically does allow concurrent actions,
in that there are multiple fields that can be turned on at the same
time, these are not the kind of actions that would be of interest
to a programmer. Horizontal microcode is dedicated to sequencing
the internal processor control lines. It wouldn't have the ability
to control multiple identical function units at the same time using
separate instruction fields. For example, they would not have two
ALU's and two control fields. Concurrency, if it is done at all,
is achieved by pipelining multiple microinstructions at the same time.
The microword would typically not contain fields to specify registers,
as these are given by the CISC instruction, and certainly not contain
the multiple register specifier fields required for multiple
concurrent operations.

A horizontal microword might have separate fields to control, say,
an ALU, a flash multiplier, and a shifter. However you would most
likely find that, because the machine was never designed to operate
these concurrently, there was a resource like a common bus that all
function units attach to.

VLIW does have the ability to perform program operations concurrently
because it has multiple function units, of each type, along with the
supporting infrastructure to move multiple operands and results.
So you might find 3 ALUs and 2 multipliers with the 15 associated
busses to move values from the registers file to each unit and back.
It then makes the function units available through 5 separate fields,
each with its own control and source and destination registers,
of the VLIW instruction.

Eric

Peter Grandi

unread,
May 6, 2008, 4:49:31 PM5/6/08
to
[ ... ]

> Actually, I think there's a lot of inherent parallelism in the
> task TeX tries to achieve (typesetting).

> The way it's implemented however assures that everything is
> serialized. The same is true for GCC (with less impact for the
> user, because you can always make -j for more parallelism).

And to merge Nick's (and mine) liking of Algol 68 and the topic
of microparallelism, I will mention here a rather amazing paper
by Banatre on compiling Algol 68 by starting a new thread for
every syntactic construct. This was done more to avoid explicit
multiple passes in a language with perverse forward references:

http://DOI.ACM.org/10.1145/357121.357123
http://DOI.ACM.org/10.1145/359046.359055

On a similar, but not parallel-programming note, there was the
diabolical left-to-right and right-to-left parsing technique of
Bohm from the MC, which could parse Algol 68 in two passes.

Eugene Miya

unread,
May 6, 2008, 6:36:33 PM5/6/08
to
In article <fvpg0a$nap$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>In article <mir5f5-...@annette.mikron.de>,
>Bernd Paysan <bernd....@gmx.de> writes:
>|> Nick Maclaren wrote:
>|> > I spend some months on it, and have looked at TeX. I wouldn't.
>|> > I would put it in the garbage.
>|>
>|> Actually, I think there's a lot of inherent parallelism in the task TeX
>|> tries to achieve (typesetting). The way it's implemented however assures
>|> that everything is serialized. The same is true for GCC (with less impact
>|> for the user, because you can always make -j for more parallelism).
>
>Yes. However, that paragraph on its own is misleading - I wasn't
>referring to TeX, but to the X Windowing System Toolkit and widget
>sets - the context made that clear.

It's Knuth's challenge.
If either of you even thinks you can get significant parallelism from TeX,
show me, and I will forward to DEK. I can assure you that he would be
amused. I watched as about a 100 people at Stanford back from that
challenge.

Remember we are looking for a factor of at least 8.


>TeX is merely grim, as one would expect from a Pascal-derived source.

Don't use. I don't. Usually. Jill, his wife, doesn't use it.

--

Eugene Miya

unread,
May 6, 2008, 6:49:04 PM5/6/08
to

The TeX issue is a separate trhead with Bernd.

In article <fvp4ab$isu$1...@gemini.csx.cam.ac.uk>,


Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>|> >The foresighted people who were saying it failed to influence the
>|> >unthinking majority[*]. If you look up some of the papers coming out
>|> >of the Algol 68 camp, you will find such references. In the event,
>|> >Algol 68 eventually sank
>

>Er, no. In all respects. The sad thing about the attitude of the
>'winners' in IT and politics is that they try to deny the positive
>achievements of the 'losers'. Algol 68 showed that many of the
>current problems with programming languages are simply unnecessary.
>Those who will not study history are condemned to repeat it.

Yeah yeah, we've heard that all before with the Multics guys.
Meanwhile others write Java, Perl, Python.... And even BASIC.

Winners get to write history. For better or worse.

>And I am not referring to Dennis Ritchie in that - he did not belong
>to an opposing camp, but to an orthogonal camp.
>
>|> You likely have to be more specific about references. Given their own
>|> devices most people would look up the wrong references.
>
>Doubtless. However, they are all on paper, some are a couple of miles
>from where I work, and most are not available in Cambridge. You can
>do your own leg work :-)

Your problem, not mine.

Like I noted the Kevin Bacon character in Animal House jumping up an
down trying to keep "order."

--

Eugene Miya

unread,
May 6, 2008, 7:04:54 PM5/6/08
to
>> Actually, I think there's a lot of inherent parallelism in the
>> task TeX tries to achieve (typesetting).

In article <yf3ve1r...@tree.gp.example.com>,


Peter Grandi <pg...@0710.exp.sabi.co.UK> wrote:
>And to merge Nick's (and mine) liking of Algol 68 and the topic
>of microparallelism, I will mention here a rather amazing paper
>by Banatre on compiling Algol 68 by starting a new thread for
>every syntactic construct. This was done more to avoid explicit

I thought A68 had fork-join syntax. You don't have to get into
microparallelism, because you have MIMD parallelism in the language to begin
with right? But I'm also not aware of any of the European vector
machines Cray or CDC/ETA (ha) taking the time to move the a68 compiler over
so no one ever explored vectorizing a68 (which merely would have ended
up looking like CFT or cft77 fortran) much less the ICL/AMT DAP, the
SUPRENUM, the LAU, the ...fill in your favorite research
machine/Manchester.... Much less APL or that ascii APL (what was it? J?).

>multiple passes in a language with perverse forward references:

...


>On a similar, but not parallel-programming note, there was the
>diabolical left-to-right and right-to-left parsing technique of
>Bohm from the MC, which could parse Algol 68 in two passes.

Ah the things done with memory was more expensive.

--

Nick Maclaren

unread,
May 6, 2008, 6:08:18 PM5/6/08
to

In article <4820cf61$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|>
|> >Yes. However, that paragraph on its own is misleading - I wasn't
|> >referring to TeX, but to the X Windowing System Toolkit and widget
|> >sets - the context made that clear.
|>
|> It's Knuth's challenge.
|> If either of you even thinks you can get significant parallelism from TeX,
|> show me, and I will forward to DEK. I can assure you that he would be
|> amused. I watched as about a 100 people at Stanford back from that
|> challenge.
|>
|> Remember we are looking for a factor of at least 8.

I suggest that you try to get a factor of 2 from the X Windowing
System Toolkit - the exercise would stretch your mind.


Regards,
Nick Maclaren.

Bernd Paysan

unread,
May 7, 2008, 7:16:15 AM5/7/08
to
Eugene Miya wrote:
> It's Knuth's challenge.
> If either of you even thinks you can get significant parallelism from TeX,
> show me, and I will forward to DEK. I can assure you that he would be
> amused. I watched as about a 100 people at Stanford back from that
> challenge.

I would certainly have to completely rewrite TeX (rebuild from scratch). I'm
talking about the task TeX has to do, no the implementation. The
implementation is serialized beyond repair. However, there are different
ways to achieve parallelism:

For a mere factor of 8, a simple filter design would do (using multiple
cores): First filter takes TeX source and expands macros. Next filter takes
expanded TeX and builds paragraphs out of it. Paragraphs then can be
wrapped in parallel (TeX gives each paragraph several tries and chooses the
best one). In effect, you can wrap all paragraphs of a single document at
once. Next is to combine paragraphs to pages, this needs to serialize again
(but for a factor 8, one paragraph at a time is completely sufficient).

Ok, then let's assume we want to use VLIW or other vector techniques to get
more fine-grain parallelism out of TeX. This requires different data
structures. For example kerning: Optimally, we build a matrix of kerning
for character A followed by character B (wastes space, because most of the
entries are 0). If our text also is placed in an array (instead of a linked
list), we can see that adding kerning is a 100% parallel task - there's no
interdependency, the limiting factor is just how many accesses we can make
to that kerning matrix (random access, but the kerning matrix fits easily
into L1 cache).

For breaking paragraphs at high speed (and thus high parallelism), we first
fold words (or syllables after hyphenating) into single object (reduce
data) - high parallelism, O(log(N)) sequential limit. Then we compute fixed
and glue width for the sequence of all words in the paragraph, this is a
vector operation with a sequential sum part, i.e. we have a O(log(N))
sequential limit in this operation. Then we bisect the paragraphs to find
possible line breaks - this is a sequential algorithm, but since we have an
array, we don't need to walk through the list while we do the bisection
steps. Finally, calculate "badness", and select the best.

Robert Myers

unread,
May 7, 2008, 2:33:51 PM5/7/08
to
On May 6, 6:36 pm, eug...@cse.ucsc.edu (Eugene Miya) wrote:

> It's Knuth's challenge.
> If either of you even thinks you can get significant parallelism from TeX,
> show me, and I will forward to DEK. I can assure you that he would be
> amused. I watched as about a 100 people at Stanford back from that
> challenge.
>

Elephant graveyard of naysayers. The beginnings of "computer science"
were fun and productive. Since then?

Because the first machines weren't parallel, we don't know anything
about parallel programming, and as long as we're citing the experts on
those first machines, we never will.

Time for some productive amnesia. Blank pieces of paper. No name-
dropping.

Robert.

Eugene Miya

unread,
May 7, 2008, 3:44:00 PM5/7/08
to
In article <fvqksi$6pd$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>I suggest that you try to get a factor of 2 from the X Windowing
>System Toolkit - the exercise would stretch your mind.

Right.
You handle the Maclaren challenge.
Might be a good idea to factor out semiconductor device improvement effects.
But it's your challlenge.

--

Eugene Miya

unread,
May 7, 2008, 3:59:22 PM5/7/08
to
>> Knuth's challenge.
>> If either of you even thinks you can get significant parallelism from TeX,
>> show me, and I will forward to DEK. I can assure you that he would be
>> amused. I watched as about a 100 people at Stanford back from that
>> challenge.

In article <vkf8f5-...@annette.mikron.de>,


Bernd Paysan <bernd....@gmx.de> wrote:
>I would certainly have to completely rewrite TeX (rebuild from scratch).

>I'm talking about the task TeX has to do, not the implementation.

Well email me when you are done. I only pass thru the ng occasionally.
I have a real job, I don't have time to read or post with any frequency.


> The implementation is serialized beyond repair.

Are you saying the TeX is broken?

I wonder how big a 2^n pennys check he would write you?

The exception handling might be tough.

>However, there are different ways to achieve parallelism:
>
>For a mere factor of 8, a simple filter design would do (using multiple
>cores): First filter takes TeX source and expands macros. Next filter takes
>expanded TeX and builds paragraphs out of it. Paragraphs then can be
>wrapped in parallel (TeX gives each paragraph several tries and chooses the
>best one). In effect, you can wrap all paragraphs of a single document at
>once. Next is to combine paragraphs to pages, this needs to serialize again
>(but for a factor 8, one paragraph at a time is completely sufficient).

You'd need 8 balanced stages of pipelining or possibly more stages of
unbalanced pipelining to be a virtual factor of 8, and no weird cases like the
sample spirialing sentences.

>Ok, then let's assume we want to use VLIW or other vector techniques to get
>more fine-grain parallelism out of TeX. This requires different data
>structures. For example kerning: Optimally, we build a matrix of kerning
>for character A followed by character B (wastes space, because most of the
>entries are 0). If our text also is placed in an array (instead of a linked
>list), we can see that adding kerning is a 100% parallel task - there's no
>interdependency, the limiting factor is just how many accesses we can make
>to that kerning matrix (random access, but the kerning matrix fits easily
>into L1 cache).

So let's say we get the Multiflow running again. You can run both cases
(list control and rewritten array experiment).

>For breaking paragraphs at high speed (and thus high parallelism), we first
>fold words (or syllables after hyphenating) into single object (reduce
>data) - high parallelism, O(log(N)) sequential limit. Then we compute fixed
>and glue width for the sequence of all words in the paragraph, this is a
>vector operation with a sequential sum part, i.e. we have a O(log(N))
>sequential limit in this operation. Then we bisect the paragraphs to find
>possible line breaks - this is a sequential algorithm, but since we have an
>array, we don't need to walk through the list while we do the bisection
>steps. Finally, calculate "badness", and select the best.

Daze us.

--

Nick Maclaren

unread,
May 7, 2008, 4:30:05 PM5/7/08
to

In article <4821fc0a$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|> In article <vkf8f5-...@annette.mikron.de>,
|> Bernd Paysan <bernd....@gmx.de> wrote:
|>
|> > The implementation is serialized beyond repair.
|>
|> Are you saying the TeX is broken?
|>
|> I wonder how big a 2^n pennys check he would write you?

Sigh. It was no part of the specification of the code that it
was parallelisable.


Regards,
Nick Maclaren.

Bernd Paysan

unread,
May 7, 2008, 5:02:37 PM5/7/08
to
Eugene Miya wrote:

>> The implementation is serialized beyond repair.
>
> Are you saying the TeX is broken?

From the last interview with Don Knuth, I don't think he would agree. He
said he wouldn't write something about parallel algorithms in TAOCP,
because he thinks there is little value in this topic. I'm with Robert
Myers here, the first machines weren't parallel, the experts of those first
machines had their fun with them, and for parallel programming, we need to
move on. The good thing is that Knuth can actually finish TAOCP, because he
doesn't need to go into this new fields.

> I wonder how big a 2^n pennys check he would write you?

TeX works to the specification, speed or desired ILP is not specified. Some
other things are not specified, as well, and I'm in fact brainstorming
about a typesetting program that "fixes TeX", at least for some more
complex tasks. I don't think speed will be specified for this typesetting
program, either, but there will be hooks for the actual line breaking
algorithm; so this program will allow to try fast approaches. Fast simple
line breaking as one option *is* part of the spec, because people who write
large books want instant results for proofreading, too.
Fortunately, "large" in data for books is not really large anymore, the
text data of a large book fits into the cache of a current PC processor.

Look again in 10 years, then the project is probably mature enough to be a
real replacement for TeX ;-).

> The exception handling might be tough.

Not really. You find a problem somewhere, you want to know where in the
f*cking source is this problem - as long as you keep that information
around (somewhat inaccurate, e.g. line number for the start of the
offending paragraph is enough for all problems past the macro expansion),
your users will be happy. TeX is reconstructing a lot of its error message
information in this stage from the output, too, and it's often awful
(because TeX needs to print it on the terminal, and it's very difficult to
identify a complicated box from this output). If you keep track of the
source location of every box, you probably can produce a better error
message.

Jan Vorbrüggen

unread,
May 8, 2008, 2:39:51 AM5/8/08
to
> You'd need 8 balanced stages of pipelining or possibly more stages of
> unbalanced pipelining to be a virtual factor of 8, and no weird cases like the
> sample spirialing sentences.

No, you need about three or four roughly balanced stages, with at least
one of those stages - the paragraph building and line breaking - being
parallelized at the loop level, which balances that stage.

Weird cases are certainly possible because TeX is in fact a programming
language. However, people care about the average case much more than
about the worst case. IMNSHO, the typical "computer science" worst case
considerations are largely a waste of time and effort, anyway. It's good
to know about them, sure, but that's about it.

Jan

Jan Vorbrüggen

unread,
May 8, 2008, 2:43:29 AM5/8/08
to
> I would certainly have to completely rewrite TeX (rebuild from scratch).

I would have thought one could use the lower-level routines doing the
grund work, and putting them into a new harness using data structures
that allow them to work in parallel instantiations.

> Ok, then let's assume we want to use VLIW or other vector techniques to get
> more fine-grain parallelism out of TeX. This requires different data
> structures.

In my mind's eye, TeX is really building and then modifying a large tree
structure of boxes and glue, first a top-down build step and then a
bottom-up assembling step, setting glue in the process. That would be an
intermediate level of parallelism between the paragraphs and the
kerning, and up to the page breaking and float processing stage (which
is the one that clearly needs improvement, I would say).

Jan

Bernd Paysan

unread,
May 8, 2008, 7:54:53 AM5/8/08
to
Jan Vorbrüggen wrote:

>> I would certainly have to completely rewrite TeX (rebuild from scratch).
>
> I would have thought one could use the lower-level routines doing the
> grund work, and putting them into a new harness using data structures
> that allow them to work in parallel instantiations.

This would imply that TeX is correctly factored, i.e. all the higher-level
parts don't make use of carnal knowledge how the lower-level stuff and data
structures are build. That's unfortunately wrong.

> In my mind's eye, TeX is really building and then modifying a large tree
> structure of boxes and glue, first a top-down build step and then a
> bottom-up assembling step, setting glue in the process. That would be an
> intermediate level of parallelism between the paragraphs and the
> kerning, and up to the page breaking and float processing stage (which
> is the one that clearly needs improvement, I would say).

For most of the text, the tree is just two-level: a vertical list of lines,
and a horizontal list of words. Sometimes, there are boxes instead of words
(like equations), sometimes there are boxes instead of lines (like tables).
TeX writes out a complete page and reuses the space (instead of e.g. having
a third level, the list of pages).

The fact that TeX doesn't try to "globally" optimize page layout manifests
itself in problems with floats, and also makes some sort of footnote
typesetting quite non-trivial, e.g. two-column footnotes in a one-column
text. Floating images in multicolumn text is also quite difficult.

Eugene Miya

unread,
May 8, 2008, 1:36:48 PM5/8/08
to
In article <fdd9df9f-45a6-4042...@56g2000hsm.googlegroups.com>,

Robert Myers <rbmye...@gmail.com> wrote:
>On May 6, 6:36 pm, eug...@cse.ucsc.edu (Eugene Miya) wrote:
>> It's Knuth's challenge.
>> If either of you even thinks you can get significant parallelism from TeX,
>> show me, and I will forward to DEK. I can assure you that he would be
>> amused. I watched as about a 100 people at Stanford back from that
>> challenge.
>>
>
>Elephant graveyard of naysayers. The beginnings of "computer science"
>were fun and productive. Since then?

Well, I try to have fun. And most of the people around me appear to be
having fun or trying to find positions where they can have fun.

>Because the first machines weren't parallel, we don't know anything
>about parallel programming, and as long as we're citing the experts on
>those first machines, we never will.

There's semantics in that. The first machines weren't bit serial either.
Nick, for instance, wants to cite history. And he also wants due credit.

>Time for some productive amnesia. Blank pieces of paper. No name-
>dropping.

OK: get to it. Show us some code. As this is a thread about VLIW/ILP,
I'll see what I can do to get the Multiflow running. Unless you can
find other hardware.

--

Eugene Miya

unread,
May 8, 2008, 1:43:31 PM5/8/08
to
In article <fvt3gd$5nr$1...@gemini.csx.cam.ac.uk>,

Specification?
I think Bernd also noted this. I think that Bernd just needed a
different choice of words than "repair."

I think you had better stick with your 2x X challenge and award some
power of pence.

--

Nick Maclaren

unread,
May 8, 2008, 12:47:08 PM5/8/08
to

In article <48232c20$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|> In article <fdd9df9f-45a6-4042...@56g2000hsm.googlegroups.com>,
|> Robert Myers <rbmye...@gmail.com> wrote:
|> >On May 6, 6:36 pm, eug...@cse.ucsc.edu (Eugene Miya) wrote:
|>
|> >Because the first machines weren't parallel, we don't know anything
|> >about parallel programming, and as long as we're citing the experts on
|> >those first machines, we never will.
|>
|> There's semantics in that. The first machines weren't bit serial either.
|> Nick, for instance, wants to cite history. And he also wants due credit.

Please do not claim that you are speaking for me; I doubt that you
have a clue what I want, or even have much understanding of what I
say, judging from your postings.

In case anyone is misunderstands your posting, I had no part in any
of the hardware or software design of either first- or second-
generation systems. At 60, I am too young for that. Most of the
references I cite are ones I have read, and respect, and I should
be given no credit for more than reading and understanding them.


Regards,
Nick Maclaren.

Eugene Miya

unread,
May 8, 2008, 1:52:59 PM5/8/08
to
>> You'd need 8 balanced stages of pipelining or possibly more stages of
>> unbalanced pipelining to be a virtual factor of 8, and no weird cases like
>> the sample spirialing sentences.

In article <fvueeh$hhd$1...@s1.news.oleane.net>,
=?ISO-8859-15?Q?Jan_Vorbr=FCggen?=


<Jan.Vor...@not-thomson.net> wrote:
>No, you need about three or four roughly balanced stages, with at least
>one of those stages - the paragraph building and line breaking - being
>parallelized at the loop level, which balances that stage.

Don't forget that this is a VLIW thread. See Subject line. Even Nick
and I agree that a performance factor of 8-10 needs to be seen. Unless
your 3-4 stages have superlinear performance in and of themselves. ...

>Weird cases are certainly possible because TeX is in fact a programming
>language. However, people care about the average case much more than
>about the worst case. IMNSHO, the typical "computer science" worst case
>considerations are largely a waste of time and effort, anyway. It's good
>to know about them, sure, but that's about it.

People care about the average case for consistency reasons. They
similarly as skeptical of best case claims of performance as well.

Remember VLIW. If it's not going to show you 8x, don't use it.

--

Nick Maclaren

unread,
May 8, 2008, 12:50:42 PM5/8/08
to

In article <48232db3$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|> >|> In article <vkf8f5-...@annette.mikron.de>,
|> >|> Bernd Paysan <bernd....@gmx.de> wrote:
|> >|> > The implementation is serialized beyond repair.
|> >|>
|> >|> Are you saying the TeX is broken?
|> >|>
|> >|> I wonder how big a 2^n pennys check he would write you?
|> >
|> >Sigh. It was no part of the specification of the code that it
|> >was parallelisable.
|>
|> Specification?

Sigh. The specification of Knuth's challenge.

|> I think Bernd also noted this. I think that Bernd just needed a
|> different choice of words than "repair."

No, he didn't. A design can be broken by changing circumstances.
The word "repair" is entirely suitable.


Regards,
Nick Maclaren.

Eugene Miya

unread,
May 8, 2008, 2:06:30 PM5/8/08
to
In article <e0i9f5-...@vimes.paysan.nom>,

Bernd Paysan <bernd....@gmx.de> wrote:
>>> The implementation is serialized beyond repair.

Eugene Miya wrote:
>> Are you saying the TeX is broken?
>
>From the last interview with Don Knuth, I don't think he would agree. He
>said he wouldn't write something about parallel algorithms in TAOCP,
>because he thinks there is little value in this topic. I'm with Robert
>Myers here, the first machines weren't parallel, the experts of those first
>machines had their fun with them, and for parallel programming, we need to
>move on. The good thing is that Knuth can actually finish TAOCP, because he
>doesn't need to go into this new fields.

Finishing/halting TAOCP is one matter for a long bet.
Parallel algorithms aren't a planned TAOCP topic, directly, but not
because "there is little value".

>> I wonder how big a 2^n pennys check he would write you?
>
>TeX works to the specification, speed or desired ILP is not specified. Some
>other things are not specified, as well, and I'm in fact brainstorming
>about a typesetting program that "fixes TeX", at least for some more
>complex tasks. I don't think speed will be specified for this typesetting
>program, either, but there will be hooks for the actual line breaking
>algorithm; so this program will allow to try fast approaches. Fast simple
>line breaking as one option *is* part of the spec, because people who write
>large books want instant results for proofreading, too.
>Fortunately, "large" in data for books is not really large anymore, the
>text data of a large book fits into the cache of a current PC processor.
>
>Look again in 10 years, then the project is probably mature enough to be a
>real replacement for TeX ;-).

Let's see. 3 years gets you 2 Moore's generations and 4x perceived
feature improvement, so 10 years round to 9 years gets you 12x Moore's
law improvement which is in the ball park. 2018: after I retire.


>> The exception handling might be tough.
>
>Not really. You find a problem somewhere, you want to know where in the
>f*cking source is this problem - as long as you keep that information
>around (somewhat inaccurate, e.g. line number for the start of the
>offending paragraph is enough for all problems past the macro expansion),
>your users will be happy. TeX is reconstructing a lot of its error message
>information in this stage from the output, too, and it's often awful
>(because TeX needs to print it on the terminal, and it's very difficult to
>identify a complicated box from this output). If you keep track of the
>source location of every box, you probably can produce a better error
>message.

I'm not talking solely about error messages. I include the dangling
bits, the notes and foot notes. But I'd also expect you to do the
circular sentences. But if you are going to bow out, as this is an ILP
thread, and I am a bit skeptical of the gains of ILP (call them limitations),
if you are going to bow out above, there's no point.

But I can wait 10 years. At least I would like to hope I'm around 10
years from now. I'm not a fan of TeX either, but it is a common
program, and I have access to a real VLIW machine. We just have to get
it running again.

--

Eugene Miya

unread,
May 8, 2008, 2:11:32 PM5/8/08
to
In article <fvvaqc$ga8$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>In article <48232c20$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
>|> There's semantics in that. The first machines weren't bit serial either.
>|> Nick, for instance, wants to cite history. And he also wants due credit.
>
>Please do not claim that you are speaking for me; I doubt that you
>have a clue what I want, or even have much understanding of what I
>say, judging from your postings.
>
>In case anyone is misunderstands your posting, I had no part in any
I leave your own English problems.

>of the hardware or software design of either first- or second-
>generation systems. At 60, I am too young for that. Most of the
>references I cite are ones I have read, and respect, and I should
>be given no credit for more than reading and understanding them.

I'm not saying that you personally want due credit. You ARE too young.

Anyone can find what you say, right or wrong, just by going back in the
news group archives.

Semantics are another topic.

--

Robert Myers

unread,
May 8, 2008, 4:57:35 PM5/8/08
to
On May 8, 1:36 pm, eug...@cse.ucsc.edu (Eugene Miya) wrote:
> In article <fdd9df9f-45a6-4042-abcb-a3d4400de...@56g2000hsm.googlegroups.com>,

> Robert Myers <rbmyers...@gmail.com> wrote:
>
>
> >Because the first machines weren't parallel, we don't know anything
> >about parallel programming, and as long as we're citing the experts on
> >those first machines, we never will.
>
> There's semantics in that. The first machines weren't bit serial either.

Even coding at the assembly language level, anyone can exploit almost
unimaginable levels of parallelism because it's already in the
hardware, and thus naturally exposed in the specification of the
program. If the exploitable level of parallelism is naturally
included in the way that programmers think about and specify programs,
our agenda is complete, and almost anyone can write what Knuth has
dodged.

> Nick, for instance, wants to cite history. And he also wants due credit.

Nick has a tendency to want to turn his extensive experience into
mathematical truths. For the most part, we don't have the
mathematical truths. All we have is experience, heavily colored by
market forces and investment decisions.

>
> >Time for some productive amnesia. Blank pieces of paper. No name-
> >dropping.
>
> OK: get to it. Show us some code. As this is a thread about VLIW/ILP,
> I'll see what I can do to get the Multiflow running. Unless you can
> find other hardware.
>

I'd much rather hear you talk about Multiflow than Donald Knuth in the
current context.

As to what I'd show you if I could, it would be a way to specify
programs so that the exploitable parallelism isn't obscured by the
programming language.

Robert.

Robert.

Nick Maclaren

unread,
May 8, 2008, 5:54:50 PM5/8/08
to

In article <5ed52f05-d6c5-48d0...@k13g2000hse.googlegroups.com>,

Robert Myers <rbmye...@gmail.com> writes:
|>
|> For the most part, we don't have the
|> mathematical truths. All we have is experience, heavily colored by
|> market forces and investment decisions.

We have rather more than most people realise. Unfortunately, most
of them are of the form "Approach XXX to objective YYY under
constraints ZZZ can't be made to work - it is imperative to change
the approach, change the objective or remove some of the constraints."

However, that is regarded as a negative attitude, even by people who
ought to know better.


Regards,
Nick Maclaren.

Robert Myers

unread,
May 8, 2008, 7:42:06 PM5/8/08
to
On May 8, 5:54 pm, n...@cus.cam.ac.uk (Nick Maclaren) wrote:

> In article <5ed52f05-d6c5-48d0-9ab6-926998371...@k13g2000hse.googlegroups.com>,Robert Myers <rbmyers...@gmail.com> writes:
>
> |>
> |> For the most part, we don't have the
> |> mathematical truths. All we have is experience, heavily colored by
> |> market forces and investment decisions.
>
> We have rather more than most people realise. Unfortunately, most
> of them are of the form "Approach XXX to objective YYY under
> constraints ZZZ can't be made to work - it is imperative to change
> the approach, change the objective or remove some of the constraints."
>
> However, that is regarded as a negative attitude, even by people who
> ought to know better.
>
I'm looking for the parallel programming equivalent of the Shannon-
Hartley theorem, and you speak as if with the certainty that would
come from possessing such insight. When a credible equivalent stating
what is theoretically possible is presented, I'll shut up about
unfounded negativism because the negativism, whatever it is, will be
well-founded.

Amdahl's law would be a prototype. Maybe someone can produce a
refinement. In any case, that one can do no better than Amdahl's law
is not a matter of speculation. Beyond that, what we have now is a
collection of hopelessly specific instances without even a taxonomy,
never mind some kind of predictive insight.

Robert.

Robert.

Nick Maclaren

unread,
May 8, 2008, 8:56:31 PM5/8/08
to

In article <fccdce8a-e4b6-42f6...@f63g2000hsf.googlegroups.com>,

Robert Myers <rbmye...@gmail.com> writes:
|>
|> > |> For the most part, we don't have the
|> > |> mathematical truths. All we have is experience, heavily colored by
|> > |> market forces and investment decisions.
|> >
|> > We have rather more than most people realise. Unfortunately, most
|> > of them are of the form "Approach XXX to objective YYY under
|> > constraints ZZZ can't be made to work - it is imperative to change
|> > the approach, change the objective or remove some of the constraints."
|> >
|> > However, that is regarded as a negative attitude, even by people who
|> > ought to know better.
|> >
|> I'm looking for the parallel programming equivalent of the Shannon-
|> Hartley theorem, and you speak as if with the certainty that would
|> come from possessing such insight.

Eh? I hope that I have misunderstood you, or else I am afraid that
you don't understand complex mathematical systems at all.

Even in very simple cases, it is rare for there to be a theorem that
gives an explicit formula for the limits of a particular class of
technique. Indeed, Goedel's theorem states clearly that there can
be no such formula, in general. I have no idea whether that is the
case for parallelism in IT.

What I said in the above, and have repeatedly said before, is that
there are known limits to specific techniques, as measured with
specific objectives, and with specific constraints. I have no idea
where you got the idea from that I was using some universal formula
to make such statements; I most certainly was not.

|> When a credible equivalent stating
|> what is theoretically possible is presented, I'll shut up about
|> unfounded negativism because the negativism, whatever it is, will be
|> well-founded.

Then you will continually be believing that the impossible is
possible. You are welcome to do that, but I suggest not referring
to other opinions as "unfounded negativism", because it is your
optimism that is unfounded.

|> Amdahl's law would be a prototype. Maybe someone can produce a
|> refinement. In any case, that one can do no better than Amdahl's law
|> is not a matter of speculation. Beyond that, what we have now is a
|> collection of hopelessly specific instances without even a taxonomy,
|> never mind some kind of predictive insight.

Amdahl's law is an empirical rule, based on a simplistic model of
how programs are constructed. It is in no way a prototype of anything
comparable to the Shannon-Hartley theorem.

You are, however, correct in your last statement. That does not
mean that the kind of systematisation you are demanding exists, or is
findable if it does (and those are different questions). I know
enough to know that I don't know, and why.


Regards,
Nick Maclaren.

Robert Myers

unread,
May 8, 2008, 10:23:31 PM5/8/08
to
On May 8, 8:56 pm, n...@cus.cam.ac.uk (Nick Maclaren) wrote:

> In article <fccdce8a-e4b6-42f6-86ad-aeb70ba5b...@f63g2000hsf.googlegroups.com>,Robert Myers <rbmyers...@gmail.com> writes:
>
> |>
> |> >
> |> I'm looking for the parallel programming equivalent of the Shannon-
> |> Hartley theorem, and you speak as if with the certainty that would
> |> come from possessing such insight.
>
> Eh? I hope that I have misunderstood you, or else I am afraid that
> you don't understand complex mathematical systems at all.
>
> Even in very simple cases, it is rare for there to be a theorem that
> gives an explicit formula for the limits of a particular class of
> technique. Indeed, Goedel's theorem states clearly that there can
> be no such formula, in general. I have no idea whether that is the
> case for parallelism in IT.

I'm probably losing millions of gray cells every day, but I think I
can cope with this conversation. The Shannon-Hartley theorem seemed
like an appropriate prototype because it characterizes throughput in
terms of bandwidth, which is a property of the channel, and the signal-
to-noise ratio, a property of what you're trying to put through the
channel. A rule for estimating maximum throughput for hardware would
similarly have to involve properties of the hardware (analogous to
bandwidth) and what you're trying to do with the hardware (analogous
to the signal-to-noise ratio). I'm not sure I can imagine what Gödel
has to do with it, but perhaps you can explain to me briefly, keeping
in mind that I know very well what Gödel's theorem says and how a
rudimentary proof works.

>
> What I said in the above, and have repeatedly said before, is that
> there are known limits to specific techniques, as measured with
> specific objectives, and with specific constraints. I have no idea
> where you got the idea from that I was using some universal formula
> to make such statements; I most certainly was not.
>

You do tend to state things with a great deal of certainty. I try not
to say things that way unless I have a theorem in my back pocket.

> |> When a credible equivalent stating
> |> what is theoretically possible is presented, I'll shut up about
> |> unfounded negativism because the negativism, whatever it is, will be
> |> well-founded.
>
> Then you will continually be believing that the impossible is
> possible. You are welcome to do that, but I suggest not referring
> to other opinions as "unfounded negativism", because it is your
> optimism that is unfounded.
>

I have no desire to promote unfounded optimism. I desire only clarity
and the elimination of overstated claims.

> |> Amdahl's law would be a prototype. Maybe someone can produce a
> |> refinement. In any case, that one can do no better than Amdahl's law
> |> is not a matter of speculation. Beyond that, what we have now is a
> |> collection of hopelessly specific instances without even a taxonomy,
> |> never mind some kind of predictive insight.
>
> Amdahl's law is an empirical rule, based on a simplistic model of
> how programs are constructed. It is in no way a prototype of anything
> comparable to the Shannon-Hartley theorem.
>

Huh? Properly stated, Amdahl's law is a theorem. There are
generalizations in the literature. Whether they are useful
generalizations or not is another question entirely.

> You are, however, correct in your last statement. That does not
> mean that the kind of systematisation you are demanding exists, or is
> findable if it does (and those are different questions). I know
> enough to know that I don't know, and why.
>

There is a formula that circuit designer's use called "Rent's Rule."
So far as I know, it has no theoretical foundations of any kind, and
the estimates for the single parameter involved are all over the
place. Merely the idea that you might try to organize information
that way provides a useful framework, no matter how weak (or non-
existent) the foundation. It seems to me that, if one can cook up a
Rent's rule for circuits, one should be able to do it for software.
The discussion isn't even at *that* level.

Robert.

Nick Maclaren

unread,
May 9, 2008, 4:12:42 AM5/9/08
to

In article <f44f735c-e1ae-4b03...@i76g2000hsf.googlegroups.com>,

Robert Myers <rbmye...@gmail.com> writes:
|>
|> I'm probably losing millions of gray cells every day, but I think I
|> can cope with this conversation. The Shannon-Hartley theorem seemed
|> like an appropriate prototype because it characterizes throughput in
|> terms of bandwidth, which is a property of the channel, and the signal-
|> to-noise ratio, a property of what you're trying to put through the
|> channel. A rule for estimating maximum throughput for hardware would
|> similarly have to involve properties of the hardware (analogous to
|> bandwidth) and what you're trying to do with the hardware (analogous
|> to the signal-to-noise ratio). I'm not sure I can imagine what Goedel

|> has to do with it, but perhaps you can explain to me briefly, keeping
|> in mind that I know very well what Goedel's theorem says and how a
|> rudimentary proof works.

The point that you are missing is that the Shannon-Hartley theorem
deals with a mathematically very simple model, which happens to map
fairly well to real problems - and IT parallelism is a LOT more
complicated, both mathematically and in practice. Indeed, I don't
even know of a clean mathematical model that is adequate for all
extant forms of parallelism, and you can't start looking for a
theorem until you have a model.

Goedel is relevant because you cannot provide any such limiting
formula if the technique you are modelling includes basic arithmetic
as a sub-model. Parallelism probably doesn't (though I am not quite
sure about that), but things like static error detection of arbitrary
programs do.

|> > What I said in the above, and have repeatedly said before, is that
|> > there are known limits to specific techniques, as measured with
|> > specific objectives, and with specific constraints. I have no idea
|> > where you got the idea from that I was using some universal formula
|> > to make such statements; I most certainly was not.
|> >
|> You do tend to state things with a great deal of certainty. I try not
|> to say things that way unless I have a theorem in my back pocket.

I do. Lots of them. All of the theorems of basic mathematics that I
can remember. Those are all that I am using.

|> Huh? Properly stated, Amdahl's law is a theorem. There are
|> generalizations in the literature. Whether they are useful
|> generalizations or not is another question entirely.

Not to a mathematician, it isn't. It doesn't describe the precise
model for which it states its conclusions, and there are plenty of
types of program for which it makes no sense (note that is different
from being wrong). Extreme dataflow ones, such as some Prolog codes,
for example.

I am not denigrating it - see below - merely correcting your
classification of it.

|> There is a formula that circuit designer's use called "Rent's Rule."
|> So far as I know, it has no theoretical foundations of any kind, and
|> the estimates for the single parameter involved are all over the
|> place. Merely the idea that you might try to organize information
|> that way provides a useful framework, no matter how weak (or non-
|> existent) the foundation. It seems to me that, if one can cook up a
|> Rent's rule for circuits, one should be able to do it for software.
|> The discussion isn't even at *that* level.

Now, THERE I agree with you!

I have no objections to using empirical rules to organise information;
after all, as you may have observed, I flip between a mathematician's
viewpoint and an engineer's without notice. But I do dislike the
mistakes of thinking that engineers' empirical rules have the same
rigour as mathematical theorems, and that mathematical theorems have
the same flexibity as engineers' empirical rules.


Regards,
Nick Maclaren.

It is loading more messages.
0 new messages