Why so little parallelism?

BDH

unread,

Oct 25, 2006, 7:36:37 AM10/25/06

to

Arbitrary vector permutation by a permutation vector, vector reversal,
insertion of a vector into a vector, and so forth in log(n) time and
O(n) switches if you allow pipelining is pretty obvious. Building on
that, log(n) time sorting with O(n) elements with pipelining is pretty
obvious. I just spent hours waiting on a suffix array because my
computer is badly designed. Why?

John Dallman

unread,

Oct 25, 2006, 2:11:00 PM10/25/06

to

In article <1161776197.0...@f16g2000cwb.googlegroups.com>,
bha...@gmail.com (BDH) wrote:

Err, are you suggesting that processors should have instructions for
doing these operations? Because I'm afraid that viewpoint is now sadly
out of date. That kind of approach was tried, but it was found that the
instructions were often not usable, for a wide assortment of reasons.
One always feels that one's own problem is the simple and obvious case,
but this is not normally true. And trying to design processors with
sufficiently varied instruction sets to handle everyone's needs with
purpose-designed instructions doesn't seem to be very practical.

Looking up the history of "RISC" should explain it tolerably well. The
Wikipedia article at http://en.wikipedia.org/wiki/RISC seems to cover
the basics.

As of current technology, such algorithms as you describe will fit
easily into the on-chip instruction caches of processors. The time taken
for them to run is entirely dominated by the time taken to get stuff out
of memory and put it back there. Caches can help with that, but are not
perfect.

If you have a very large amount of processing to do, it is worth getting
a chip programmed at the hardware level to do it faster than software.
That's an FPGA: http://en.wikipedia.org/wiki/FPGA However, "very large"
usually translates to "will take months to run on conventional hardware".

---
John Dallman j...@cix.co.uk
"Any sufficiently advanced technology is indistinguishable from a
well-rigged demo"

Thomas Womack

unread,

Oct 25, 2006, 2:20:46 PM10/25/06

to

In article <1161776197.0...@f16g2000cwb.googlegroups.com>,

a) What do you mean by 'element' -- sorting usually requires
comparison of things more complicated than integers, and if you're
having to indirect to do your comparison you're limited by memory
speeds.

b) O(n) time to get the data into the elements plus O(n) time to sort
them plus O(n) time to get the data out again; very difficult to
get buses wide enough to get the data in in time O(1), as you'd need
if you want to pipeline the sorting-array.

c) O(n log n) elements only practical for either uninterestingly small
elements or uninterestingly small n.

On the other hand, there may be useful factor-several speedups to be
had with instructions like

PORDERUSW xmm0, xmm1

The 16-bit packets of xmm1 contain the numbers 0 through 7, in such a
way that xmm0[xmm1[i]], i=0 .. 7, are in non-decreasing order, and if
xmm0[j] = xmm0[k] with j<k, the packet of xmm0 with value j will
precede the packet with value k.

PSHUFB, PMAX and PMIN let you implement a sorting network reasonably
efficiently ... I'm sure I've read a surprisingly non-obvious article
about fast median filters on SIMD architectures, but the obvious
Google terms tend to pick up a patent on a way to do it with
bit-slices.

Tom

BDH

unread,

Oct 25, 2006, 5:01:59 PM10/25/06

to

> As of current technology, such algorithms as you describe will fit
> easily into the on-chip instruction caches of processors. The time taken
> for them to run is entirely dominated by the time taken to get stuff out
> of memory and put it back there. Caches can help with that, but are not
> perfect.

That's most of the problem. Moving data around is so slow - you need to
be able to shuffle all of a vector at the same time. In fact, when you
can do that, why bother with a standard bus system for most things?

BDH

unread,

Oct 25, 2006, 5:07:39 PM10/25/06

to

> a) What do you mean by 'element'

Logic gate or single pole double throw switch, of course.

> b) O(n) time to get the data into the elements plus O(n) time to sort
> them plus O(n) time to get the data out again; very difficult to
> get buses wide enough to get the data in in time O(1), as you'd need
> if you want to pipeline the sorting-array.

Like I said, element count includes the "bus" architecture, and no it's
not constant time but logarithmic time...

BDH

unread,

Oct 25, 2006, 5:10:25 PM10/25/06

to

> And trying to design processors with
> sufficiently varied instruction sets to handle everyone's needs with
> purpose-designed instructions doesn't seem to be very practical.

There are only three important expensive operations: sorting, matrix
multiplication, and fourier transforms. Current architectures fail
miserably at all three.

BDH

unread,

Oct 25, 2006, 5:13:08 PM10/25/06

to

> Current architectures fail
> miserably at all three.

Well, I suppose GPUs do a half decent job with matrices.

Greg Lindahl

unread,

Oct 25, 2006, 9:00:17 PM10/25/06

to

In article <1161810625.5...@f16g2000cwb.googlegroups.com>,
BDH <bha...@gmail.com> wrote:

>There are only three important expensive operations: sorting, matrix
>multiplication, and fourier transforms. Current architectures fail
>miserably at all three.

Wow, now that's a sweeping over-generalization.

Would you care to give some examples? For example, floating point
matrix-matrix multiply gets close to peak for most array sizes on most
current architectures. Perhaps you're talking about bit-matrix
multiply? In which case saying so would generate much better
discussion.

-- greg

BDH

unread,

Oct 26, 2006, 3:49:50 AM10/26/06

to

> >There are only three important expensive operations: sorting, matrix
> >multiplication, and fourier transforms. Current architectures fail
> >miserably at all three.

> Wow, now that's a sweeping over-generalization.

You're right. Let me correct it. Prefix array calculation, vector
permutation, matrix arithmetic, and fourier and number theoretic
transforms are the only important expensive computations, and current
architectures generally do very badly with all of these.

> Would you care to give some examples? For example, floating point
> matrix-matrix multiply gets close to peak for most array sizes on most
> current architectures.

Leaving aside algorithms with order less than n^3 (matrix
multiplication is probably n^2 or n^2 log n) - does matrix
multiplication of arrays of size n take more than O(n) time with O(n^2)
computational elements on your computer?

Nick Maclaren

unread,

Oct 26, 2006, 4:53:03 AM10/26/06

to

In article <memo.2006102...@jgd.compulink.co.uk>,

j...@cix.co.uk (John Dallman) writes:
|> In article <1161776197.0...@f16g2000cwb.googlegroups.com>,
|> bha...@gmail.com (BDH) wrote:
|>
|> > Arbitrary vector permutation by a permutation vector, vector reversal,
|> > insertion of a vector into a vector, and so forth in log(n) time and
|> > O(n) switches if you allow pipelining is pretty obvious. Building on
|> > that, log(n) time sorting with O(n) elements with pipelining is pretty
|> > obvious. I just spent hours waiting on a suffix array because my
|> > computer is badly designed. Why?
|>
|> Err, are you suggesting that processors should have instructions for
|> doing these operations? Because I'm afraid that viewpoint is now sadly
|> out of date. That kind of approach was tried, but it was found that the
|> instructions were often not usable, for a wide assortment of reasons.
|> One always feels that one's own problem is the simple and obvious case,
|> but this is not normally true. And trying to design processors with
|> sufficiently varied instruction sets to handle everyone's needs with
|> purpose-designed instructions doesn't seem to be very practical.

That is, indeed, true, and the reason the misapprehension is widespread
is due to the sorry state of "computer science" "complexity theory".
Real mathematical complexity theory is fine, but does not pretend to
match computer hardware; it is the part that claims to be applied that
I am damning.

Back in the late 1960s and all the 1970s, a few of us were saying that
we should NOT be measuring KFlops but KMaccs (memory accesses); even
now, that mindset is not universal. And 90% of the double-quoted
nonsense uses completely unrealistic and often inconsistent models.
If you do an analysis using a simple, but more realistic model[*], you
get VERY different complexities for most such operations. But such
approaches are VERY hard to get published, as the establishment is not
keen to have people point out that it has been bullshitting for many
decades ....

So most of the books on algorithms are misleading to the point of being
actually incorrect, which is almost certainly where the original poster
got his misapprehensions from.

|> Looking up the history of "RISC" should explain it tolerably well. The
|> Wikipedia article at http://en.wikipedia.org/wiki/RISC seems to cover
|> the basics.

Er, no, but I agree that it is entirely relevant. However, if you don't
understand the issues, it will merely get you more confused, because you
may get taken in by the RISC dogmas rather than the RISC engineering
philosophy.

[*] E.g. by counting the number of non-sequential, non-repeated, scalar
memory accesses - i.e. those that aren't amenable to optimisation by
caching or prefetching. That gives very good results for operations
like sorting.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Oct 26, 2006, 5:02:14 AM10/26/06

to

In article <1161848990....@m73g2000cwd.googlegroups.com>,

"BDH" <bha...@gmail.com> writes:
|> > >There are only three important expensive operations: sorting, matrix
|> > >multiplication, and fourier transforms. Current architectures fail
|> > >miserably at all three.
|>
|> > Wow, now that's a sweeping over-generalization.
|>
|> You're right. Let me correct it. Prefix array calculation, vector
|> permutation, matrix arithmetic, and fourier and number theoretic
|> transforms are the only important expensive computations, and current
|> architectures generally do very badly with all of these.

That over-generalisation should be swept into the bin, and the lid
closed on it.

Other important expensive computations include keyed data access
(including sorting) and many graph-theoretic analyses (including linear
and quadratic programming).

|> > Would you care to give some examples? For example, floating point
|> > matrix-matrix multiply gets close to peak for most array sizes on most
|> > current architectures.
|>
|> Leaving aside algorithms with order less than n^3 (matrix
|> multiplication is probably n^2 or n^2 log n) - does matrix
|> multiplication of arrays of size n take more than O(n) time with O(n^2)
|> computational elements on your computer?

1) I await reading your paper showing how matrix multiplication can be
done in O(n^2 log n) with great interest, but expect that it will use
a complete unrealistic model.

2) Multiplication of arrays of size n with O(n^2) computational elements
is generally useful only for n <= 4, and many architectures have done it
very efficiently. In fact, MOST modern CPUs will spend MUCH less time
doing the computation than in loading the data from memory.

The issue is data access - see my other posting today.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Oct 26, 2006, 5:08:22 AM10/26/06

to

In article <ehpt1f$ask$1...@gemini.csx.cam.ac.uk>,

nm...@cus.cam.ac.uk (Nick Maclaren) writes:
|>
|> That is, indeed, true, and the reason the misapprehension is widespread
|> is due to the sorry state of "computer science" "complexity theory".
|> Real mathematical complexity theory is fine, but does not pretend to
|> match computer hardware; it is the part that claims to be applied that
|> I am damning.

And, since someone will flame me for this, it is possible to damn an
area without damning everyone in it. There are, naturally, worthy
exceptions, but not that many and they don't dominate the area.

|> So most of the books on algorithms are misleading to the point of being
|> actually incorrect, which is almost certainly where the original poster
|> got his misapprehensions from.

Note that I said MOST.

Regards,
Nick Maclaren.

BDH

unread,

Oct 26, 2006, 5:22:56 AM10/26/06

to

> Other important expensive computations include keyed data access

...a sorting problem...

> and many graph-theoretic analyses (including linear
> and quadratic programming).

...usually a combination of matrix arithmetic and sorting...

> 1) I await reading your paper showing how matrix multiplication can be
> done in O(n^2 log n) with great interest, but expect that it will use
> a complete unrealistic model.

I doubt I'll be the one to show this, but I believe I'll live to see
it. There is progress to be made in group theory.

BDH

unread,

Oct 26, 2006, 5:29:20 AM10/26/06

to

> Back in the late 1960s and all the 1970s, a few of us were saying that
> we should NOT be measuring KFlops but KMaccs (memory accesses); even
> now, that mindset is not universal. And 90% of the double-quoted
> nonsense uses completely unrealistic and often inconsistent models.
> If you do an analysis using a simple, but more realistic model[*], you
> get VERY different complexities for most such operations. But such
> approaches are VERY hard to get published, as the establishment is not
> keen to have people point out that it has been bullshitting for many
> decades ....

You need some memory while gates wait for their input or a clock cycle,
but why would you send your data to some sort of separate RAM and back?

ranjit_...@yahoo.com

unread,

Oct 26, 2006, 5:54:50 AM10/26/06

to

For n<=4, a matrix will typically fit into a cache line. Often, the
code can be rejiggered to cache align the matrix and prefetch cache
lines, although how far in advance of using a matrix its prefetch can
be done varies by the problem.

Nick Maclaren

unread,

Oct 26, 2006, 6:39:12 AM10/26/06

to

In article <1161856490.4...@b28g2000cwb.googlegroups.com>,

"ranjit_...@yahoo.com" <ranjit_...@yahoo.com> writes:
|>
|> For n<=4, a matrix will typically fit into a cache line. Often, the
|> code can be rejiggered to cache align the matrix and prefetch cache
|> lines, although how far in advance of using a matrix its prefetch can
|> be done varies by the problem.

Not quite. A 4x4 matrix of 64-bit reals is 128 bytes. Aligning the
matrix gives at most a factor of two gain. Prefetching is independent
of whether the data are scalars or small matrices.

Regards,
Nick Maclaren.

ranjit_...@yahoo.com

unread,

Oct 26, 2006, 10:48:32 AM10/26/06

to

Nick Maclaren wrote:
> In article <1161856490.4...@b28g2000cwb.googlegroups.com>,
> "ranjit_...@yahoo.com" <ranjit_...@yahoo.com> writes:
> |>
> |> For n<=4, a matrix will typically fit into a cache line. Often, the
> |> code can be rejiggered to cache align the matrix and prefetch cache
> |> lines, although how far in advance of using a matrix its prefetch can
> |> be done varies by the problem.
>
> Not quite. A 4x4 matrix of 64-bit reals is 128 bytes.

IA64, MIPS 4x00, Power4, and the SUN Fireplane Interconnect have 128
byte cache lines.

> Aligning the
> matrix gives at most a factor of two gain. Prefetching is independent
> of whether the data are scalars or small matrices.

An operation on a pair of matrices takes longer than an operation on
scalars. So, prefetching before the former operation is prefetching
well in advance of use whereas prefetching before the latter operation
is prefetching little in advance of use.

Nick Maclaren

unread,

Oct 26, 2006, 1:03:41 PM10/26/06

to

In article <1161874112.4...@m7g2000cwm.googlegroups.com>,

"ranjit_...@yahoo.com" <ranjit_...@yahoo.com> writes:
|>
|> An operation on a pair of matrices takes longer than an operation on
|> scalars. So, prefetching before the former operation is prefetching
|> well in advance of use whereas prefetching before the latter operation
|> is prefetching little in advance of use.

Er, no.

On a modern system, the time taken to perform even a 4x4 matrix
multiply is much less than the time spent fetching the data from
memory, so that the time is dominated by the delay in fetching the
first values. Whether the matrix is in one cache line or four, the
other values will be there when needed.

The prefetch has to be the same amount ahead in both cases to
eliminate the delay due to the fetch.

Regards,
Nick Maclaren.

ranjit_...@yahoo.com

unread,

Oct 26, 2006, 4:12:19 PM10/26/06

to

Nick Maclaren wrote:
> In article <1161874112.4...@m7g2000cwm.googlegroups.com>,
> "ranjit_...@yahoo.com" <ranjit_...@yahoo.com> writes:
> |>
> |> An operation on a pair of matrices takes longer than an operation on
> |> scalars. So, prefetching before the former operation is prefetching
> |> well in advance of use whereas prefetching before the latter operation
> |> is prefetching little in advance of use.
>
> Er, no.
>
> On a modern system, the time taken to perform even a 4x4 matrix
> multiply is much less than the time spent fetching the data from
> memory, so that the time is dominated by the delay in fetching the
> first values. Whether the matrix is in one cache line or four, the
> other values will be there when needed.

Measure the timing of this C++ code on x86-64 with
(1) a prefetch member routine that prefetches, and
(2) a prefetch member routine that does nothing.

class matrix {
// fill in the details
...
};

typedef matrix(4,4) mat4;
#define N (1024*1024);

....
mat4 a[N], b, c[N];
// fill in matrixes with random 64 bit floats
....
for (i=0; i<N; i++) {
if (i<N-1) a[i+1].prefetch();
c[i] = a[i]*b;
}

> The prefetch has to be the same amount ahead in both cases to
> eliminate the delay due to the fetch.

The latency might not be eliminated but would be reduced to a greater
extent in the case of matrix multiplication than in the case of scalar
multiplication. Try the above example with scalars. There would be
little difference with and without prefetch. In the case of the
matrixes, prefetching would make some difference.

> Regards,
> Nick Maclaren.

ranjit_...@yahoo.com

unread,

Oct 26, 2006, 4:24:53 PM10/26/06

to

ranjit_...@yahoo.com wrote:
> Nick Maclaren wrote:
> > In article <1161874112.4...@m7g2000cwm.googlegroups.com>,
> > "ranjit_...@yahoo.com" <ranjit_...@yahoo.com> writes:
> > |>
> > |> An operation on a pair of matrices takes longer than an operation on
> > |> scalars. So, prefetching before the former operation is prefetching
> > |> well in advance of use whereas prefetching before the latter operation
> > |> is prefetching little in advance of use.
> >
> > Er, no.
> >
> > On a modern system, the time taken to perform even a 4x4 matrix
> > multiply is much less than the time spent fetching the data from
> > memory, so that the time is dominated by the delay in fetching the
> > first values. Whether the matrix is in one cache line or four, the
> > other values will be there when needed.
>
> Measure the timing of this C++ code on x86-64 with
> (1) a prefetch member routine that prefetches, and
> (2) a prefetch member routine that does nothing.

On x86-64, if you do an unaligned implementation, your prefetch routine
would have to prefetch the first, last and middle 32 bit word of the
storage used for the matrix.

Thomas Womack

unread,

Oct 26, 2006, 5:37:21 PM10/26/06

to

In article <1161854959.9...@k70g2000cwa.googlegroups.com>,

Because all interesting problems are bigger than all practical arrays
of combinatorial logic -- if your operation is at all interesting, you
probably can't fit 1000^2 of them onto a chip even with fancy
lithography.

So you have to get the data onto and off the chip, and chips don't
seem to find it easy to have more than about 2^7 I/O pins clocked at
more than about 1GBps.

Easy to build a chip with the pipelined FADD and FMUL units explicitly
laid out so that it can do an FFT on a new set of 16 complex*16
variables every nanosecond, or the pipelined compare-and-swap units
laid out in a sorting network to sorting 32 int*8s each nanosecond;
probably not impossible to build a chip with generalish-purpose units
and wide routers so that it could quickly be configured to do either;
absolutely bloody impossible to get the data to it at the 2Tbit/s
needed to keep up with the pipeline.

If your problem's small enough that the whole thing fits into memories
small enough that you can fit one beside each logic element, you can
avoid that issue, but who save those whose governments oblige them to
solve medium-sized arrays of Boolean equations by successive
enumeration of possibilities has that class of problem?

Tom

BDH

unread,

Oct 26, 2006, 9:59:24 PM10/26/06

to

> >You need some memory while gates wait for their input or a clock cycle,
> >but why would you send your data to some sort of separate RAM and back?
>
> Because all interesting problems are bigger than all practical arrays
> of combinatorial logic -- if your operation is at all interesting, you
> probably can't fit 1000^2 of them onto a chip even with fancy
> lithography.

Very good! A logarithmic time sort for a million ints would fill up a
modern CPU, and would only run a million times faster. Thus, it's only
worthwhile for specialized applications now, but will eventually become
common.

> Easy to build a chip with the pipelined FADD and FMUL units explicitly
> laid out so that it can do an FFT on a new set of 16 complex*16
> variables every nanosecond, or the pipelined compare-and-swap units
> laid out in a sorting network to sorting 32 int*8s each nanosecond;
> probably not impossible to build a chip with generalish-purpose units
> and wide routers so that it could quickly be configured to do either;
> absolutely bloody impossible to get the data to it at the 2Tbit/s
> needed to keep up with the pipeline.

Fourier transforms? Probably end up being done optically.

Thomas Womack

unread,

Oct 27, 2006, 12:18:17 PM10/27/06

to

In article <1161914364.0...@i42g2000cwa.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>> >You need some memory while gates wait for their input or a clock cycle,
>> >but why would you send your data to some sort of separate RAM and back?
>>
>> Because all interesting problems are bigger than all practical arrays
>> of combinatorial logic -- if your operation is at all interesting, you
>> probably can't fit 1000^2 of them onto a chip even with fancy
>> lithography.
>
>Very good! A logarithmic time sort for a million ints would fill up a
>modern CPU, and would only run a million times faster. Thus, it's only
>worthwhile for specialized applications now, but will eventually become
>common.

The whole point is that it wouldn't come close to running a million
times faster, because you can't possibly get the ints into the chip
quickly enough, and almost no application is specialised enough that
you can generate the ints on chip at the rate that the hardware
sorter-network can sort them; I have trouble thinking of even
ridiculous applications where you want the whole sorted list rather
than a few order-statistics.

Suppose you have a cycle time of 0.1ns, and feed the chip with a
thousand 10Gbps input lines. It'll still take 3200 nanoseconds to get
the ints into the chip and 3200 nanoseconds to get the sorted outputs
out again, dwarfing the 2ns to do the sort.

It would be stunningly quick for, given N, computing the sum of the
(1024r+512)-th order statistics of the AES encryption of the numbers N
.. N+1048575; that I admit. If you find a use for this task, I will
be surprised.

FFTs done optically: don't make me laugh. Dynamic range is dreadful,
and you have to get the data into the light modulator and read it out
of the CCD at the end, which gets you straight into the bus bottleneck
again. It might make sense if you're happy for your input to be of
constant amplitude and with a couple of bits of phase information, and
if you've got enough logic integrated on the detector to do (say) a
thresholded centroid per 16x16 block of pixels. But if you're in a
context where you're talking about number-theoretic transforms, the
sort of bignum maths where 23 bits of mantissa is nothing like enough,
the idea of inserting a physics stage is crazy.

Tom

Eugene Miya

unread,

Oct 27, 2006, 12:37:50 PM10/27/06

to

In article <1161776197.0...@f16g2000cwb.googlegroups.com>,
BDH <bha...@gmail.com> wrote:

>Arbitrary vector permutation by a permutation vector, vector reversal,
>insertion of a vector into a vector, and so forth in log(n) time and
>O(n) switches if you allow pipelining is pretty obvious.

Pipelining is your bottleneck. Don't do it.

"The only reason time exists is so that everything doesn't happen at once."

>Building on
>that, log(n) time sorting with O(n) elements with pipelining is pretty
>obvious.

Use better data structures. Don't sort.

>I just spent hours waiting on a suffix array because my
>computer is badly designed. Why?

The instructure it takes to make computers. You are working with the
wrong computers.

--

BDH

unread,

Oct 27, 2006, 8:16:33 PM10/27/06

to

> >Very good! A logarithmic time sort for a million ints would fill up a
> >modern CPU, and would only run a million times faster.

On further thought, flash memory would be more representative of the
density a device of this form could achieve. Make that 4 million
floats.

> Suppose you have a cycle time of 0.1ns, and feed the chip with a
> thousand 10Gbps input lines. It'll still take 3200 nanoseconds to get
> the ints into the chip and 3200 nanoseconds to get the sorted outputs
> out again, dwarfing the 2ns to do the sort.

2ns per sort? You overestimate what I'm pushing. More like 1
microsecond per sort. with a few pipelined at a time. Second, making
bigger chips is going to be easier than smaller elements.

> FFTs done optically: don't make me laugh. Dynamic range is dreadful,

8 bits is enough. Turning an FFT into more FFT of smaller numbers is
straightforward.

> and you have to get the data into the light modulator and read it out
> of the CCD at the end, which gets you straight into the bus bottleneck
> again.

Still much faster. Want to argue busses are badly designed too? I'll
agree.

> But if you're in a
> context where you're talking about number-theoretic transforms, the
> sort of bignum maths where 23 bits of mantissa is nothing like enough,
> the idea of inserting a physics stage is crazy.

Not really.

BDH

unread,

Oct 27, 2006, 8:20:26 PM10/27/06

to

> Use better data structures. Don't sort.

Inconsistent instructions.

> >I just spent hours waiting on a suffix array because my
> >computer is badly designed. Why?
>
> The instructure it takes to make computers. You are working with the
> wrong computers.

You use what kind?

Eugene Miya

unread,

Oct 30, 2006, 11:15:34 AM10/30/06

to

>> Use better data structures. Don't sort.

In article <1161994826....@e3g2000cwe.googlegroups.com>,
BDH <bha...@gmail.com> wrote:
>Inconsistent instructions.

You can tell a programmer's perspective when their view of
data structures are arrays. And that's not just my observation.

>> >I just spent hours waiting on a suffix array because my
>> >computer is badly designed. Why?
>>
>> The instructure it takes to make computers. You are working with the
>> wrong computers.
>
>You use what kind?

The kind I use is irrelevant, PCs, Macs, SUNs, SGIs, Crays, etc.
You want to design your own.

--

BDH

unread,

Oct 31, 2006, 2:47:06 AM10/31/06

to

> >> Use better data structures. Don't sort.
>

> >Inconsistent instructions.
>
> You can tell a programmer's perspective when their view of
> data structures are arrays. And that's not just my observation.

Your data structures are just arrays your computer makes from other
arrays. The best ones are based on sorting.

> >> You are working with the
> >> wrong computers.
> >
> >You use what kind?
>
> The kind I use is irrelevant, PCs, Macs, SUNs, SGIs, Crays, etc.
> You want to design your own.

If I had a couple billion I would have better uses than building a fab.

Eugene Miya

unread,

Oct 31, 2006, 12:15:54 PM10/31/06

to

>> >> Use better data structures. Don't sort.

>> You can tell a programmer's perspective when their view of
>> data structures are arrays. And that's not just my observation.

In article <1162280826....@m73g2000cwd.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>Your data structures are just arrays your computer makes from other
>arrays. The best ones are based on sorting.

Computer memory can be seen that way; but the software for indexing can
be used in non-numeric ways (semi-numeric).

>> >> wrong computers.

>> You want to design your own.
>
>If I had a couple billion I would have better uses than building a fab.

Then you have to be satisified with Moose turd pie.
Or find someone elses facilities.

--

Del Cecchi

unread,

Oct 31, 2006, 1:46:03 PM10/31/06

to

Now Eugene, he doesn't need a few billion. He can design his own and
get help with the gory details, as well as access to a world class fab
for much less than a few billion.

I would be happy to facilitate the transaction. A few million as
starters would be sufficient.

--
Del Cecchi
"This post is my own and doesn’t necessarily represent IBM’s positions,
strategies or opinions.”

Eugene Miya

unread,

Oct 31, 2006, 2:42:58 PM10/31/06

to

>>>If I had a couple billion I would have better uses than building a fab.
>>
>> Then you have to be satisified with Moose turd pie.
>> Or find someone elses facilities.

In article <4qpjvfF...@individual.net>,

Del Cecchi <cecchi...@us.ibm.com> wrote:
>Now Eugene, he doesn't need a few billion. He can design his own and
>get help with the gory details, as well as access to a world class fab
>for much less than a few billion.

Actually I know this so long as you don't need performance.

Besides we need to keep Applied Materials and Lam Research and all those
other firms employed and advancing semiconductors.

One computer designer I know is completely self taught, didn't finish
high school and sold 256K units. Taught herself how to program FPGAs.

>I would be happy to facilitate the transaction. A few million as
>starters would be sufficient.

Cheap!

--

Del Cecchi

unread,

Oct 31, 2006, 6:44:31 PM10/31/06

to

"Eugene Miya" <eug...@cse.ucsc.edu> wrote in message
news:4547b552$1@darkstar...

>>>>If I had a couple billion I would have better uses than building a
>>>>fab.
>>>
>>> Then you have to be satisified with Moose turd pie.
>>> Or find someone elses facilities.
>
> In article <4qpjvfF...@individual.net>,
> Del Cecchi <cecchi...@us.ibm.com> wrote:
>>Now Eugene, he doesn't need a few billion. He can design his own and
>>get help with the gory details, as well as access to a world class fab
>>for much less than a few billion.
>
> Actually I know this so long as you don't need performance.

Performance can be provided. As the hot rodders say, performance costs
money. How fast do you want to go?

>
> Besides we need to keep Applied Materials and Lam Research and all
> those
> other firms employed and advancing semiconductors.

Don't worry, they get theirs.

>
> One computer designer I know is completely self taught, didn't finish
> high school and sold 256K units. Taught herself how to program FPGAs.
>
>
>>I would be happy to facilitate the transaction. A few million as
>>starters would be sufficient.
>
> Cheap!

That was for starters.
>
> --

Nick Maclaren

unread,

Nov 1, 2006, 3:57:02 AM11/1/06

to

In article <4qq5elF...@individual.net>,

Yup. Fast, cheap, reliable. Pick any two.

Regards,
Nick Maclaren.

Eugene Miya

unread,

Nov 1, 2006, 10:28:27 AM11/1/06

to

In article <4qq5elF...@individual.net>,

Del Cecchi <delcecchi...@gmail.com> wrote:
>
>"Eugene Miya" <eug...@cse.ucsc.edu> wrote in message
>news:4547b552$1@darkstar...

Del, you have to edit a tad better. Move the attribution to my text.

>>>>>If I had a couple billion I would have better uses than building a
>>>>>fab.
>>>>
>>>> Then you have to be satisified with Moose turd pie.
>>>> Or find someone elses facilities.
>>
>> In article <4qpjvfF...@individual.net>,
>> Del Cecchi <cecchi...@us.ibm.com> wrote:
>>>Now Eugene, he doesn't need a few billion. He can design his own and
>>>get help with the gory details, as well as access to a world class fab
>>>for much less than a few billion.
>>
>> Actually I know this so long as you don't need performance.
>
>Performance can be provided. As the hot rodders say, performance costs
>money. How fast do you want to go?

Well, you guys continue those peta-scale things with the DOE over at LLNL.

Let them pay for it. And if something architecturally interesting comes
of it, we can sign NDAs. But our budgets are tight due to the
perceptions of computing and flight realities. No bucks, no Buck Rogers.

>> Besides we need to keep Applied Materials and Lam Research and all
>> those other firms employed and advancing semiconductors.
>
>Don't worry, they get theirs.

That's the job.
Policy makers are clueless to infrastructure needs.
If the world had only seen the attempt by the former Gov. of Penn./Sec.
DHS attempt to move businesses from here in a pitiful attempt to help
his economy.

>>>I would be happy to facilitate the transaction. A few million as
>>>starters would be sufficient.
>>
>> Cheap!
>That was for starters.

Go for him.

Too bad that Amdahl doesn't bother with news groups.

--

jacko

unread,

Nov 1, 2006, 1:10:06 PM11/1/06

to

hi

back to the OPs question. why so little parallel?

it's to do with the language used to program gets turned into an RTL
(register transfer language) which specifies all variables as
registers, and then this has to be mapped onto a machine which only has
so many regs.

this is why we use paper for calculations, so that the variable space
can fit on it while the short term memory (register space) can only
hold so much.

making better RTL -> machine code translators which efficiently
schedule transfers of variables between cores, to provide modular
computation units.

it is not a hardware problem per se, but a communications problem.
maybe the RTL is not expressive enough. What would you have in an
OpenRTL? OpenAPL??

cheers.

Eugene Miya

unread,

Nov 1, 2006, 4:24:20 PM11/1/06

to

In article <1162404604....@e3g2000cwe.googlegroups.com>,

jacko <jacko...@gmail.com> wrote:
>back to the OPs question. why so little parallel?

Look not everything is parallel.

As a start, you should read Amdahl's (law) paper. It's barely 3 pages
long. Amdahl himself has given permission and it's the comp.parallel
FAQ panel on the 20th at the very end. It goes out monthly.

I think for theoretic purposes, Backus' Turing lecture on getting out of
the von Neumann paradigm has potential (you have to read these words
carefully, FP didn't go far, and FL also stagnated).

>it's to do with the language used to program gets turned into an RTL
>(register transfer language) which specifies all variables as
>registers, and then this has to be mapped onto a machine which only has
>so many regs.

Mapping is not so easy. You get into the 80s data flow (static and
dynamic) which didn't go far (a number of machines were built, a number
of languages and compilers were tried)....

>this is why we use paper for calculations, so that the variable space
>can fit on it while the short term memory (register space) can only
>hold so much.

Paper is somewhat irrelevant. Thinking is what's needed.

>making better RTL -> machine code translators which efficiently
>schedule transfers of variables between cores, to provide modular
>computation units.
>
>it is not a hardware problem per se, but a communications problem.

This is true, you are on the right track, but you are where the field
stood oh maybe 20 years ago.

>maybe the RTL is not expressive enough. What would you have in an
>OpenRTL? OpenAPL??

Have you used APL?

That's where part of the field was 30 years ago.

--

Richard

unread,

Nov 1, 2006, 5:36:29 PM11/1/06

to

[Please do not mail me a copy of your followup]

eug...@cse.ucsc.edu (Eugene Miya) spake the secret code
<45491e94$1@darkstar> thusly:

>As a start, you should read Amdahl's (law) paper. It's barely 3 pages
>long. Amdahl himself has given permission and it's the comp.parallel
>FAQ panel on the 20th at the very end. It goes out monthly.

Also: <http://en.wikipedia.org/wiki/Amdahl's_law>
--
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
<http://www.xmission.com/~legalize/book/download/index.html>

BDH

unread,

Nov 1, 2006, 5:47:28 PM11/1/06

to

> Look not everything is parallel.
>
> As a start, you should read Amdahl's (law) paper.

The core of that is pretty much obvious. But the slow things can be
made more parallel.

> I think for theoretic purposes, Backus' Turing lecture on getting out of
> the von Neumann paradigm has potential (you have to read these words
> carefully, FP didn't go far, and FL also stagnated).

I don't know what FP and OO and half a dozen other acronyms are
supposed to be. They both succeed when they do by making certain kinds
of abstraction easier in a half-assed way, and then say they're really
about something totally ridiculous. Let's get rid of state on a machine
that's mostly state! Let's put our code in objects because...I don't
know, some guy thinks without justification that it will make X easier!

> >it's to do with the language used to program gets turned into an RTL
> >(register transfer language) which specifies all variables as
> >registers, and then this has to be mapped onto a machine which only has
> >so many regs.

Thanks a lot, Mr Von Neumann. That is not a very good system.

> >this is why we use paper for calculations, so that the variable space
> >can fit on it while the short term memory (register space) can only
> >hold so much.

You're going to argue that we should base computer I/O on human I/O?

> Have you used APL?
>
> That's where part of the field was 30 years ago.

What happened to APL? There's K and J but it just hasn't had the
influence on modern popular languages of Lisp or Smalltalk or C. Or
Pascal or Fortran or Algol probably.

I've been half-assedly working on sort of an APL/Lisp hybrid.

Richard

unread,

Nov 1, 2006, 5:57:18 PM11/1/06

to

[Please do not mail me a copy of your followup]

"BDH" <bha...@gmail.com> spake the secret code
<1162421248.3...@m7g2000cwm.googlegroups.com> thusly:

>[...] Let's put our code in objects because...I don't

>know, some guy thinks without justification that it will make X easier!

Shirley, you're not serious.

John Dallman

unread,

Nov 1, 2006, 6:44:00 PM11/1/06

to

In article <1162421248.3...@m7g2000cwm.googlegroups.com>,
bha...@gmail.com (BDH) wrote:

> Thanks a lot, Mr Von Neumann. That is not a very good system.

No. But it is both easy to design hardware for - which was the initial
reason for it - and easy to teach to people. It has proved to be pretty
versatile, and there doesn't seem to be a better-for-all-purposes
replacement around.

Because of the sunk investment in Von Neumann, any radically different
paradigm will have to be a lot better, for almost every purpose. The way
that Arabic numerals and the system for using them, as compared with
Latin numerals, provide an example of the kind of difference you want.

---
John Dallman j...@cix.co.uk
"Any sufficiently advanced technology is indistinguishable from a
well-rigged demo"

BDH

unread,

Nov 1, 2006, 10:37:17 PM11/1/06

to

I am enthusiastic over humanity's extraordinary and sometimes very
timely ingenuities. If you are in a shipwreck and all the boats are
gone, a piano top buoyant enough to keep you afloat may come along and
make a fortuitous life preserver. This is not to say, though, that the
best way to design a life preserver is in the form of a piano top. I
think we are clinging to a great many piano tops in accepting
yesterday's fortuitous contrivings as constituting the only means for
solving a given problem. - R. Buckminster Fuller

BDH

unread,

Nov 1, 2006, 10:38:35 PM11/1/06

to

> >[...] Let's put our code in objects because...I don't
> >know, some guy thinks without justification that it will make X easier!
>
> Shirley, you're not serious.

Maybe I'm biased - I hate Java.

When people see things as beautiful,  ugliness is created.  When
people see things as good,  evil is created.
-Tao Te Ching

BDH

unread,

Nov 2, 2006, 12:27:06 AM11/2/06

to

So how do you build and move to a boat while sitting on a piano top?
Welllll, first we assume a sufficiently smart corporation, then we
assume a sufficiently smart compiler, then we assume sufficiently smart
developers...

Andrew Reilly

unread,

Nov 2, 2006, 12:50:57 AM11/2/06

to

How are the developers, however smart, going to express their algorithms
without introducing sequential dependencies, however inadvertently?
(What is an algorithm, without sequential dependencies?) Don't you need an
appropriate language, and perhaps a plausible parallel machine
abstraction, before you start on the compiler? How would your language be
different from Verilog or VHDL or Occam? What would be different, this
time?

Or are you, perhaps, hinting at the "High Productivity" DARPA project, or
one of Sun, IBM or Cray's sub-projects, each of which, I assume, has
working answers to my previous questions?

Cheers,

--
Andrew

Richard

unread,

Nov 2, 2006, 1:33:09 AM11/2/06

to

[Please do not mail me a copy of your followup]

"BDH" <bha...@gmail.com> spake the secret code

<1162438715.3...@h48g2000cwc.googlegroups.com> thusly:

>> >[...] Let's put our code in objects because...I don't
>> >know, some guy thinks without justification that it will make X easier!
>>
>> Shirley, you're not serious.
>
>Maybe I'm biased - I hate Java.

That's like saying you think procedural programming sucks because you
hate Pascal.

BDH

unread,

Nov 2, 2006, 4:31:34 AM11/2/06

to

> Don't you need an
> appropriate language, and perhaps a plausible parallel machine
> abstraction, before you start on the compiler?

Done and done. People like me love that stuff! But do you have somebody
that's looking for it?

> How would your language be
> different from Verilog or VHDL or Occam?

Me, I'd go for a language with attributes of lisp and APL, with layered
optimized compilation (using graph-rewriting) making it both high and
low level, with code-data mixing with some explicit type data. I also
have a couple new basic abstractions that I reckon an efficient
architecture can be made to parallel, but they're a little bit alien.

> What would be different, this
> time?

Hum. I'm more arrogant?

I know! I'm not in a position to do any harm!

> Or are you, perhaps, hinting at the "High Productivity" DARPA project, or
> one of Sun, IBM or Cray's sub-projects, each of which, I assume, has
> working answers to my previous questions?

Working, for certain values of working. But not, I don't think, the
right answers.

BDH

unread,

Nov 2, 2006, 4:32:47 AM11/2/06

to

> >Maybe I'm biased - I hate Java.
>
> That's like saying you think procedural programming sucks because you
> hate Pascal.

No, it's like saying you think procedural programming sucks, but
concede maybe you really just hate Pascal.

BDH

unread,

Nov 2, 2006, 4:40:31 AM11/2/06

to

> with code-data mixing with some explicit type data.

That is, code produces a set of mixes of data and future code. This can
then be transformed. When wanted, the code is then evaluated on the
data it is mixed with.

BDH

unread,

Nov 2, 2006, 4:42:26 AM11/2/06

to

> How would your language be
> different from Verilog or VHDL or Occam? What would be different, this
> time?

Or are you asking what I think of locks and message passing and so
forth? Well, I'm more concerned with VLIW type techniques.

Nick Maclaren

unread,

Nov 2, 2006, 5:16:56 AM11/2/06

to

In article <1162421248.3...@m7g2000cwm.googlegroups.com>,

Sigh. There are at least the following classes of problem:

Programs where it is known how to do that, and the task is merely
redesigning
Programs where it is NOT known how to do that, but IS believed to be
feasible in theory
Programs where believed to be infeasible to do that, but it is not
known for sure
Programs where it is KNOWN to be infeasible

There are also many programs where the design makes it infeasible, even
when it is feasible in theory, and ones where it can be made to be
feasible by changing the objective of the program slightly.

|> > I think for theoretic purposes, Backus' Turing lecture on getting out of
|> > the von Neumann paradigm has potential (you have to read these words
|> > carefully, FP didn't go far, and FL also stagnated).
|>
|> I don't know what FP and OO and half a dozen other acronyms are

|> supposed to be. ...

That is very clear. FP, to me, is usually floating-point, but it
clearly means functional programming here. By all means post dubious
and controversial assertions, as it is a good way to learn, but do not
tell people who know vastly more than you do that they are talking
nonsense.

|> > >it's to do with the language used to program gets turned into an RTL
|> > >(register transfer language) which specifies all variables as
|> > >registers, and then this has to be mapped onto a machine which only has
|> > >so many regs.
|>
|> Thanks a lot, Mr Von Neumann. That is not a very good system.

That is true, but the parallelism issue has little to do with registers,
and Backus was NOT talking primarily about parallelism as it is now
understood in that remark, but about the 'memory wall'.

The parallelism issue is about the language, yes, but it is about how
to express algorithms without introducing non-fundamental serialisation.
They are conceptually slightly different issues.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Nov 2, 2006, 5:22:19 AM11/2/06

to

In article <4qtfa0F...@individual.net>,

"Andrew Reilly" <andrew-...@areilly.bpc-users.org> writes:
|> On Wed, 01 Nov 2006 21:27:06 -0800, BDH wrote:
|>
|> How are the developers, however smart, going to express their algorithms
|> without introducing sequential dependencies, however inadvertently?
|> (What is an algorithm, without sequential dependencies?) Don't you need an
|> appropriate language, and perhaps a plausible parallel machine
|> abstraction, before you start on the compiler? How would your language be
|> different from Verilog or VHDL or Occam? What would be different, this
|> time?

Oh, THAT'S easy. The standard definition of an algorithm is too
restrictive, and it is perfectly possible to have ones based around
mathematical invariants with no ordering dependency at all. The only
way to do it is to get the teaching of mathematics turned on its head
so that it does not introduce serialisation until at least university.

Now, how to get there from here, and how to get that accepted by any
of the communities involved, are entirely different questions. To which
I have no answer :-)

|> Or are you, perhaps, hinting at the "High Productivity" DARPA project, or
|> one of Sun, IBM or Cray's sub-projects, each of which, I assume, has
|> working answers to my previous questions?

You are being ironic, of course?

Regards,
Nick Maclaren.

BDH

unread,

Nov 2, 2006, 5:28:34 AM11/2/06

to

> Oh, THAT'S easy. The standard definition of an algorithm is too
> restrictive, and it is perfectly possible to have ones based around
> mathematical invariants with no ordering dependency at all. The only
> way to do it is to get the teaching of mathematics turned on its head
> so that it does not introduce serialisation until at least university.

Mozart/Oz eh?

BDH

unread,

Nov 2, 2006, 5:31:14 AM11/2/06

to

> > >Very good! A logarithmic time sort for a million ints would fill up a
> > >modern CPU, and would only run a million times faster.
>
> On further thought, flash memory would be more representative of the
> density a device of this form could achieve. Make that 4 million
> floats.

But here is the thing - you CAN use a smaller sort core than what
you're sorting, using (sort size)*log(sort size / core size) memory
transfers. And you can do multiple simultaneous smaller sorts instead
of one larger sort on the same hardware.

BDH

unread,

Nov 2, 2006, 5:46:23 AM11/2/06

to

> |> The core of that is pretty much obvious. But the slow things can be
> |> made more parallel.
>
> Sigh. There are at least the following classes of problem:

...

> Programs where it is KNOWN to be infeasible

What examples of these do you like?

> |> I don't know what FP and OO and half a dozen other acronyms are
> |> supposed to be. ...
>
> That is very clear.

Get correcting! You clearly have lots of work!

> That is true, but the parallelism issue has little to do with registers,
> and Backus was NOT talking primarily about parallelism as it is now
> understood in that remark, but about the 'memory wall'.

More of a data transfer wall. Alright, I have my plan and all, but how
would you deal with this?

> The parallelism issue is about the language, yes, but it is about how
> to express algorithms without introducing non-fundamental serialisation.
> They are conceptually slightly different issues.

OK, what is graph theory missing that's hard to fix?

BDH

unread,

Nov 2, 2006, 5:58:53 AM11/2/06

to

> By all means post dubious
> and controversial assertions, as it is a good way to learn, but do not
> tell people who know vastly more than you do that they are talking
> nonsense.

What, you want more self-deprecation?

(Jumps in puddle of Prejudice-B-Gone.)

http://www.gonemovies.com/WWW/MyWebFilms/Drama/Wizardmelting.mp3

Richard

unread,

Nov 2, 2006, 11:22:10 AM11/2/06

to

[Please do not mail me a copy of your followup]

"BDH" <bha...@gmail.com> spake the secret code

<1162459967....@b28g2000cwb.googlegroups.com> thusly:

Well other than an opinion about one language, you haven't refuted
anything about object oriented programming. No matter which way you
try to state it, its a stupid and baseless argument.

jacko

unread,

Nov 2, 2006, 1:41:14 PM11/2/06

to

BDH wrote:
> > |> The core of that is pretty much obvious. But the slow things can be
> > |> made more parallel.
> >
> > Sigh. There are at least the following classes of problem:
>
> ...
>
> > Programs where it is KNOWN to be infeasible
>
> What examples of these do you like?

10 print "hello" : goto 10
?

> > That is true, but the parallelism issue has little to do with registers,
> > and Backus was NOT talking primarily about parallelism as it is now
> > understood in that remark, but about the 'memory wall'.
>
> More of a data transfer wall. Alright, I have my plan and all, but how
> would you deal with this?

yes how does implicitly serial code get mapped to distributed
registers?

> > The parallelism issue is about the language, yes, but it is about how
> > to express algorithms without introducing non-fundamental serialisation.
> > They are conceptually slightly different issues.

obviously i think some kind of parrallel language would have fully
parallel execution, and serial dependancy would have to be explicitly
indicated.

on (var) {
// only to be done on var write.
...
}

> OK, what is graph theory missing that's hard to fix?

don't know.

wouldn't the ideal language look a bit like a spreadsheet, with each
cell having formula, whose evaluation is triggered by any value cell it
depends on being written?

cheers.

p.s. i have not used much apl, but am a registered J user, but to be
honest, forth is my current usage.

Del Cecchi

unread,

Nov 2, 2006, 2:28:19 PM11/2/06

to

> solving a given problem.â€¨- R. Buckminster Fuller
>
Unfortunately neither you nor mr fuller have come up with superior
solutions for many of these problems. Domes haven't replaced
rectilinear frame construction, and alternatives haven't replaced von
neumann.

--
Del Cecchi
"This post is my own and doesn’t necessarily represent IBM’s positions,
strategies or opinions.”

Eugene Miya

unread,

Nov 2, 2006, 1:33:30 PM11/2/06

to

Boy, I'm going to have to trace this thread back.

In article <4quv6lF...@individual.net>,

Del Cecchi <cecchi...@us.ibm.com> wrote:
>BDH wrote:
>> I am enthusiastic over humanity's extraordinary and sometimes very
>> timely ingenuities. If you are in a shipwreck and all the boats are
>> gone, a piano top buoyant enough to keep you afloat may come along and
>> make a fortuitous life preserver. This is not to say, though, that the
>> best way to design a life preserver is in the form of a piano top. I
>> think we are clinging to a great many piano tops in accepting
>> yesterday's fortuitous contrivings as constituting the only means for

>> solving a given problem. - R. Buckminster Fuller

>>
>Unfortunately neither you nor mr fuller have come up with superior
>solutions for many of these problems. Domes haven't replaced
>rectilinear frame construction, and alternatives haven't replaced von
>neumann.

Not only that. Stewart Brand recanted his support for domes in his smart
buildings book. This is not to say that they don't use uses like
radomes. And it was once useful "largest dome in Livermore" to find a
future officemate's house. Boy what a time to leave my RSA data frobb at
home.... No access to my quote database for what Brand said.

--

Eugene Miya

unread,

Nov 2, 2006, 1:42:58 PM11/2/06

to

eug...@cse.ucsc.edu (Eugene Miya) spake the secret code

no secrets

<45491e94$1@darkstar> thusly:
>>As a start, you should read Amdahl's (law) paper. It's barely 3 pages
>>long. Amdahl himself has given permission and it's the comp.parallel
>>FAQ panel on the 20th at the very end. It goes out monthly.

In article <eib7hd$hhk$1...@news.xmission.com>, Richard <> wrote:
>Also: <http://en.wikipedia.org/wiki/Amdahl's_law>

Whereas this is not a bad description (I appreciate what Jimmy has done
with wikipedia to its limits), it's not the original work. Amdahl
has practically no math in his paper (I left out the one graph).
The math was done later back Jack Worlton of LASL (now LANL).

Amusing. A Brit (wonder if he's one of Nick's friends) mostly wrote
this Wikipedia page. This page requires a little work. It's got minor
details that can stand polishing. No time. Well, I did one thing.

--

Eugene Miya

unread,

Nov 2, 2006, 2:02:27 PM11/2/06

to

>> Look not everything is parallel.
>> As a start, you should read Amdahl's (law) paper.

In article <1162421248.3...@m7g2000cwm.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>The core of that is pretty much obvious. But the slow things can be
>made more parallel.

Hmmm, no.
I've worked on several benchmark teams and have to sweat various issues.
If you want brutal, non-trivial slow problems outside cryptanalysis,
I can think of many. A favorite might be Golomb rulers.
No floating point. Can fit in memory.

>> I think for theoretic purposes, Backus' Turing lecture on getting out of
>> the von Neumann paradigm has potential (you have to read these words
>> carefully, FP didn't go far, and FL also stagnated).
>
>I don't know what FP and OO and half a dozen other acronyms are
>supposed to be. They both succeed when they do by making certain kinds
>of abstraction easier in a half-assed way, and then say they're really
>about something totally ridiculous. Let's get rid of state on a machine
>that's mostly state! Let's put our code in objects because...I don't
>know, some guy thinks without justification that it will make X easier!

FP was merely Backus' own short hand for Functional Programming
(which got suceeded by Functional Language or FL). OO Object-oriented:
a fad in a way since the Simula 67 days which has uses but is still no
silver bullet to the software problem.

>> >it's to do with the language used to program gets turned into an RTL
>> >(register transfer language) which specifies all variables as
>> >registers, and then this has to be mapped onto a machine which only has
>> >so many regs.
>
>Thanks a lot, Mr Von Neumann. That is not a very good system.

On the contrary what von Neumann wrot was a very good simple system for
his time. He was an awesome intellect. I wish I had the intellect he
had in his little pinky and I would bet the most knowledge left here
would, too. **

>> >this is why we use paper for calculations, so that the variable space
>> >can fit on it while the short term memory (register space) can only
>> >hold so much.
>
>You're going to argue that we should base computer I/O on human I/O?

That's the person I was responsing to, they have to answer to you.

>> Have you used APL?
>> That's where part of the field was 30 years ago.
>
>What happened to APL? There's K and J but it just hasn't had the
>influence on modern popular languages of Lisp or Smalltalk or C. Or
>Pascal or Fortran or Algol probably.
>
>I've been half-assedly working on sort of an APL/Lisp hybrid.

A number of things killed APL. I recall the special APL character set
(the Culler-Fried keyboard was little better and its was QWERTY).
I extracted from an officemate involved with the CDC Star-100 that the
direction of evaluation was a poor choice. But at the memorial for Ken
Iverson it's hard to say what came out. I've also had some nice
conversations with Bob Bernecky.

Part of the problem which kills simple languages like these are the
kinds of numeric applications which have volumetric and border (edges,
faces, vertices, and hyper-structures (4D and higher Ds) which are
exception cases which then go to irregular geometries, etc.

** Merge ideas:
Yesterday I attended a seminar given by Bill Dally at Stanford.
The series is an amazing collection of people (Bill, whom I had dinner
with afterward, is now the CS chair, and we synced on friends in common).
I'm going to let the cat out of the bag for people outside the Santa
Clara valley. The lecture is "taped" and put on the web for the
graduate seminar series:
ee380.stanford.edu
He'll tell you about parallelism, and he was worked on respectable
architectures. I also worked with one of the benchmarks he talked about
in the talk.

The amazing thing of the series run by the D of Dr. Dobb's and others
(I have provided a number of speakers), when you go back in time is the
preponderance of Santa Clara Valley talent both giving talks and sitting
in the audience. It's possible to go back and see many archived talks.
Amazing talks on problems for EE etc. (including climatology
[application areas]). Fortunately my 380 talk was before they saved the
tapes. ;^)

--

Eugene Miya

unread,

Nov 2, 2006, 2:05:31 PM11/2/06

to

In article <1162438637....@m73g2000cwd.googlegroups.com>,

>solving a given problem.=E2=80=A8- R. Buckminster Fuller

That's it?

One of my favorite smart guy quotes (caught on tape, not mine):
You see what I do is imagination with a straight jacket....
--R. P. Feynman.

Oh to have considering going out with his goddauther.

--

Eugene Miya

unread,

Nov 2, 2006, 2:07:15 PM11/2/06

to

>> Shirley, you're not serious.

In article <1162438715.3...@h48g2000cwc.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>Maybe I'm biased - I hate Java.

So?
The tools we have are far from perfect.
Make better ones.

--

Eugene Miya

unread,

Nov 2, 2006, 2:11:15 PM11/2/06

to

On Wed, 01 Nov 2006 21:27:06 -0800, BDH wrote:
>> So how do you build and move to a boat while sitting on a piano top?
>> Welllll, first we assume a sufficiently smart corporation, then we
>> assume a sufficiently smart compiler, then we assume sufficiently smart
>> developers...

That's the punchline to the joke what's the difference between a
computer scientist and an EE? One of each is stranded on a "desert"
island (this being before Survivor). Both come to the conclusion that a
computer will help them. The CS says like above for Turing machines.
The EE says first we take some sand.....

In article <4qtfa0F...@individual.net>,

Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:
>How are the developers, however smart, going to express their algorithms
>without introducing sequential dependencies, however inadvertently?
>(What is an algorithm, without sequential dependencies?) Don't you need an
>appropriate language, and perhaps a plausible parallel machine
>abstraction, before you start on the compiler? How would your language be
>different from Verilog or VHDL or Occam? What would be different, this
>time?

Take a look and try what friends did on SISAL.

>Or are you, perhaps, hinting at the "High Productivity" DARPA project, or
>one of Sun, IBM or Cray's sub-projects, each of which, I assume, has
>working answers to my previous questions?

They mostly work on hardware.

--

Terje Mathisen

unread,

Nov 2, 2006, 3:54:35 PM11/2/06

to

Eugene Miya wrote:
> One of my favorite smart guy quotes (caught on tape, not mine):
> You see what I do is imagination with a straight jacket....
> --R. P. Feynman.

Feynman was a "wizard of the highest order". Even after he had figured
something out and explained it, there was no way for mere mortals to see
how they could ever have done the same.

>
> Oh to have considering going out with his goddauther.

Please tell!

Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

BDH

unread,

Nov 2, 2006, 6:33:44 PM11/2/06

to

> > > Programs where it is KNOWN to be infeasible
> >
> > What examples of these do you like?
>
> 10 print "hello" : goto 10

How about something useful and terminating?

> wouldn't the ideal language look a bit like a spreadsheet, with each
> cell having formula, whose evaluation is triggered by any value cell it
> depends on being written?

No need with a decent graph theoretic flow analysis to suss out time
dependencies.

> p.s. i have not used much apl, but am a registered J user, but to be
> honest, forth is my current usage.

K is ok but J...well I would say J tends to introduce complicating
abstractions with low returns on effort.

Andrew Reilly

unread,

Nov 2, 2006, 6:40:33 PM11/2/06

to

Do VLIW type techniques (plausibly) scale to anything like the degree of
parallelism available at the moment with locks and messages (tens of
thousands of FUs)? Isn't communication and memory infrastructure still the
main issue/concern?

Cheers,

--
Andrew

BDH

unread,

Nov 2, 2006, 6:48:57 PM11/2/06

to

> >Unfortunately neither you nor mr fuller have come up with superior
> >solutions for many of these problems. Domes haven't replaced
> >rectilinear frame construction, and alternatives haven't replaced von
> >neumann.
>
> Not only that. Stewart Brand recanted his support for domes in his smart
> buildings book. This is not to say that they don't use uses like
> radomes. And it was once useful "largest dome in Livermore" to find a
> future officemate's house. Boy what a time to leave my RSA data frobb at
> home.... No access to my quote database for what Brand said.

Eh, I'm not trying to sell domes here.

Richard

unread,

Nov 2, 2006, 6:49:40 PM11/2/06

to

[Please do not mail me a copy of your followup]

eug...@cse.ucsc.edu (Eugene Miya) spake the secret code
<454a4a42$1@darkstar> thusly:

>eug...@cse.ucsc.edu (Eugene Miya) spake the secret code
> no secrets
><45491e94$1@darkstar> thusly:
>>>As a start, you should read Amdahl's (law) paper. It's barely 3 pages
>>>long. Amdahl himself has given permission and it's the comp.parallel
>>>FAQ panel on the 20th at the very end. It goes out monthly.
>
>In article <eib7hd$hhk$1...@news.xmission.com>, Richard <> wrote:
>>Also: <http://en.wikipedia.org/wiki/Amdahl's_law>
>
>Whereas this is not a bad description (I appreciate what Jimmy has done

>with wikipedia to its limits), it's not the original work. [...]

I tossed that link out there because I tried to find it in the
comp.parallel FAQ and while the copy I was viewing was formatted oddly
(looks like a bad brute force ASCII -> HTML conversion), I couldn't
find Amdahl's paper in there. Got a link to the paper proper?

BDH

unread,

Nov 2, 2006, 6:58:08 PM11/2/06

to

> Well other than an opinion about one language, you haven't refuted
> anything about object oriented programming. No matter which way you
> try to state it, its a stupid and baseless argument.

There is no commonly accepted definition of OO programming, it's more
like an a la carte menu of options, pick at least X. Pick a description
and set of assertions about why it's good, and I'd be happy to beat up
that.

Pro OO/Anti OO has been played out many times, and I was saying I
basically agree with the arguments I've seen on the Anti OO side. Do
you just want to see me repeat the arguments for why OO doesn't help
with reuse or teaching programming and so forth?

Andrew Reilly

unread,

Nov 2, 2006, 7:01:22 PM11/2/06

to

On Thu, 02 Nov 2006 13:11:15 -0700, Eugene Miya wrote:
> In article <4qtfa0F...@individual.net>,
> Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:
>>How are the developers, however smart, going to express their algorithms
>>without introducing sequential dependencies, however inadvertently?
>>(What is an algorithm, without sequential dependencies?) Don't you need an
>>appropriate language, and perhaps a plausible parallel machine
>>abstraction, before you start on the compiler? How would your language be
>>different from Verilog or VHDL or Occam? What would be different, this
>>time?
>
> Take a look and try what friends did on SISAL.

That's nice and all, but what's stopping it from being used instead of
MPI or OpenMP on the big, parallel iron? Just inertia? SISAL
is twenty years old, now. [There does appear to be an active SISAL
implementation on sourceforge, though. Reckon it's worth having a look at.]

>>Or are you, perhaps, hinting at the "High Productivity" DARPA project, or
>>one of Sun, IBM or Cray's sub-projects, each of which, I assume, has
>>working answers to my previous questions?
>
> They mostly work on hardware.

Cray Chapel; IBM X10; Sun Fortress:
http://crd.lbl.gov/~parry/hpcs_resources.html

Opinions?

Cheers,

--
Andrew

BDH

unread,

Nov 2, 2006, 7:25:38 PM11/2/06

to

> Feynman was a "wizard of the highest order".

Yes.

> > Oh to have considering going out with his goddauther.
>
> Please tell!

Yes.

You know, there are more people nowadays, and by definition more people
with whatever IQ. But the amount of genius around seems constant at
best.

BDH

unread,

Nov 2, 2006, 7:42:06 PM11/2/06

to

> Hmmm, no.
> I've worked on several benchmark teams and have to sweat various issues.
> If you want brutal, non-trivial slow problems outside cryptanalysis,
> I can think of many. A favorite might be Golomb rulers.
> No floating point. Can fit in memory.

Golomb rulers are already found with massively parallel computation. I
guess I'm not sure what you mean.

> On the contrary what von Neumann wrot was a very good simple system for
> his time.

Fine, it's unfair to blame him like that.

Some people definitely have a higher opinion of him than I do though.

> I recall the special APL character set
> (the Culler-Fried keyboard was little better and its was QWERTY).
> I extracted from an officemate involved with the CDC Star-100 that the
> direction of evaluation was a poor choice.

But its influence died off too. Why?

> Part of the problem which kills simple languages like these are the
> kinds of numeric applications which have volumetric and border (edges,
> faces, vertices, and hyper-structures (4D and higher Ds) which are
> exception cases which then go to irregular geometries, etc.

That can be turned into arrays too.

Eugene Miya

unread,

Nov 2, 2006, 7:00:14 PM11/2/06

to

>> Feynman was a "wizard of the highest order".

In article <1162513538.8...@h54g2000cwb.googlegroups.com>,
BDH <bha...@gmail.com> wrote:
>Yes.

I have to agree a bit with what Gell-Mann wrote about him creating an
air/eire about him. Not that I ever met him more than a couple of times.
I spent enough time in Laurtisen to pass by his secretary's office
occasionally and also see one of his series of photos lecturing.

>> > Oh to have considering going out with his goddauther.
>> Please tell!
>Yes.

I just thought this one woman I went out with was one of the finest
women I have ever had the chance to know. Turned out her dad was a
Caltech physics prof and colleague of Feynman's who I first met
about 4 years or so earlier at the time of his Los Alamos lecture
at UCSB. Who would have thought that I'd take a job at JPL, then live
next door to Tech, then work for Tech for the Lab (well one is part of
the other). Anyways this CD is available in the revised book Classic
Feynman (I can hear Rik Smoody ask the question about safes at the end
(Rik went to Tech for a year or 2 before burn out)).

I chanced to meet her in a PCC intro geology class before grad school.
My friends at the Lab remarked "Where did you find her?" at the Section
hot tub party we used to hold (one of my then superviser is now JPL's
Director, go figure [an amazing Lebanese guy]).

So the lesson is do things outside computing.

>You know, there are more people nowadays, and by definition more people
>with whatever IQ. But the amount of genius around seems constant at
>best.

Life is far more than intelligence. It doesn't tell you about the
physical and emotional problems (complexities) people have especially women.
I'm not going to get into her problems and her life.

The US is very fortune to have Caltech. It's intensity is like no other
campus in the US that I have visited (it's the basis for the film Real
Genius by Martha Cooliege (sp)). I include MIT and Stanford (differ
qualities: I'm amazed that I know Hennessy and go that one page for his
book: go buy 4th ed.). But Caltech has never had a great CS Dept. in
part because it's an institution (It's The Institute) of physicists
(I mean Carver well, and it's where the precursor to Mathematica got its
start, and the home of hypercubes, and I'd get asked questions about CS
by my friends who run the place (one friend was #2 man there)).
Oh and Connection Machines had their start where when Danny move from
there to MIT (I am pretty sure that Danny was in a von Neumann
bottleneck lecture given by Ivan Sutherland).

--

Eugene Miya

unread,

Nov 2, 2006, 7:11:50 PM11/2/06

to

In article <ch2q14-...@osl016lin.hda.hydro.com>,

Terje Mathisen <terje.m...@hda.hydro.com> wrote:
>Feynman was a "wizard of the highest order". Even after he had figured
>something out and explained it, there was no way for mere mortals to see
>how they could ever have done the same.

Oh that was you who said that.

Hmmmm what can I say?
Caltech collects geniuses like Feynman. He was clearly unique, but so
are and were a fair number of other people who went to, worked at, or
were part of Tech. I could never have gotten in. I am not smart enough.

They may have also passed their their prime (friends hope not).
I took a class from a JPL co-worker (hard to use that word as he was so
much more advanced that I was: Blinn). Most of you will have never
heard Jim because this isn't comp.graphics. He had a hobby of
collecting Pythagorean theorem proofs (he's got over 100).
He's also a master of 4x4 matrix multiplication.

That says nothing of non-computational fields.
You should visit Tech for half a day the next time you come to
California Terje.

--

Eugene Miya

unread,

Nov 2, 2006, 7:15:11 PM11/2/06

to

> link to the paper?

Just easier for me to read it in and cut.

Newsgroups: comp.parallel,comp.sys.super
Subject: [l/m 9/20/2005] IBM and Amdahl -- comp.parallel (20/28) FAQ
Summary:
References:
Sender:
Reply-To: eug...@nasS.nasaN.govIP (Eugene N. Miya) SNIP
Followup-To: poster
Distribution: world
Organization: Usenet Self-Moderation Project
Keywords:
Approved: eugene

Archive-Name: superpar-faq
Last-modified: 20 Oct 2005

20 IBM and Amdahl (the man, the article)
22 Grand challenges and HPCC
24 Suggested (required) readings
26 Dead computer architecture society
28 Dedications
2 Introduction and Table of Contents and justification
4 Comp.parallel news group history
6 parlib
8 comp.parallel group dynamics
10 Related news groups, archives and references
12
14 References
16
18 Supercomputing and Crayisms

Keywords and phrases:
Watson memo, marketing, COBOL, mythology, aspiring blue boxes, DEC,
laws: Amdahl, other,
challenge, debates, prizes: Karp, Bell, Hillis

# Is a supercomputer, a mainframe?
....
846 line clipped

What's Amdahl's Law?
====================

What's this I hear about "Amdahl's other law/rule?"
===================================================

The following is probably one of the single most important papers
(several people have recommended this article be read weekly) challenging
parallelism. Well, we'll make it monthly here. It's kinda like UFOs.

The following document is Copyright AFIPS 1967.
Copyright pending (checked with the Copyright Clearance Center).
Nod also given by the author.
This document's use shall be for non-profit educational use only.

Citation:

%A Dr. Gene M. Amdahl
%Z International Business Machines Corporation, Sunnyvale, California
%T Validity of the single processor approach to achieving
large scale computing capabilities
%J AFIPS Proc. of the SJCC
%V 30
%D 1967
%P 483-485

Validity of the single processor
approach to achieving large scale
computing capabilities

DR. GENE M. AMDAHL
International Business Machines Corporation
Sunnyvale, California

The indented table structure is editorial to improve readability.
The original paper did not have it.

INTRODUCTION

For a decade prophets have voiced the contention that the organization
of a single computer has reached its limits and that truly significant
advances can be made only by interconnection of a multiplicity of computers
in such a manner as to permit cooperative solution. Variously the
proper direction has been pointed out as general purpose computers
with a generalized interconnection of memories, or as specialized computers
with geometrically related memory interconnections and controlled by one or
more instruction streams.

Demonstration is made of the continued validity of the single processor
approach and of the weaknesses of the multiple processor approach in
terms of application to real problems and their attendant irregularities.

The arguments presented are based on statistical characteristics of
computation on computers over the last decade and upon the operational
requirements within problems of physical interest. An additional reference
will be one of the most thorough analyses of relative computer capabilities
currently published --
"Changes in Computer Performance," *Datamation*, September 1966,
Professor Kenneth E. Knight, Stanford School of Business Administration.

The first characteristic of interest is the fraction of the computational
load which is associted with data management housekeeping. This fraction
is very nearly constant for about ten years, and accounts for 40% of the
executed instructions in production runs. In an entirely dedicated
special purpose environment this might be reduced by a factor of two,
but it is highly improbably [sic] that it could be reduced by a factor of
three. The nature overhead appears to be sequential so that it is likely
to be amenable to parallel processing techniques. Overhead alone would
then place an upper limit on throughput of five to seven times the sequential
processing rate, even if housekeeping were done in a separate processor.
The non-housekeeping part of the problem could exploit at most a processor
of performance three to four time the performance of the
housekeeping processor. A fairly obvious conclusion which can be drawn at
this point is that the effort expended on achieving high parallel processing
rates is wasted unless it is accompanied by achievement in sequential
processing rates of very nearly the same magnitude.

Data management housekeeping is not the only problem to plague oversimplified
approaches to high speed computation. The physical problems which are of
practical interest tend to have rather significant complications.
Examples of these complications are as follows:
boundaries are likely to be irregular;
interiors are likely to be inhomogeneous;
computations required may be dependent on the states of variables
at each point;
propagation rates of different physical effects may be quite different; the rate of convergence, or convergence at all, may be strongly
dependent on sweeping through the array along different
axes on succeeding passes;
etc.
The effect of each of these complications is very severe on any
computer organization based on geometrically related processors in
a paralleled processing system. Even the existence of rectangular boundaries
has the interesting property that for spatial dimension N there are $3 sup N$
different point geometries to be dealt with in a nearest neighbor computation.
If the second nearest neighbor were also involved, there would be $5 sup N$
different point geometries to contend with. An irregular boundary
compounds this problem as does an inhomogeneous interior. Computations
which are dependent on the states of variables would require the processing
at each point to consume approximately the same computational time as the
some of computations of all physical effects within a large region.
Differences or changes in propagation rate may affect the mesh point
relationships.

Ideally the computation of the action of the neighboring points upon the
point under consideration involves their values at a previous time
proportional to the mesh spacing and inversely proportional to the
propagation rate. Since the time step is normally kept constant,
a faster propagation rate for some effects would imply interactions
with more distant points. Finally the fairly common practice of sweeping
through the mesh along different axes on succeeding passes poses problems of
data management which affects all processors, however it affects
geometrically related processors more severely by requiring transposing
all points in storage in addition to the revised input-output scheduling.
A realistic assessment of the effect of these irregularities on the actual
performance of a parallel processing device, compared to its performance on
a simplified and regularized abstraction of the problem yields a
degradation in the vicinity of one-half to one order of magnitude.

To sum up the effects of data management housekeeping and of problem
irregularities, the author has compared three different machine organizations
involving equal amounts of hardware.
Machine A has thirty two arithmetic execution units controlled by a single
instruction stream [SIMD].
Machine B has pipelined arithmetic execution units with up to three overlapped
operations on vectors of eight elements.
Machine C has the same pipelined execution units, but initiation of individual
operations at the same rate as Machine B permitted vector operations.
The performance of these three machines are plotted in Figure 1 as a function
of the fraction of the number of instructions which permit parallelism.
The probable region of operation is centered around a point corresponding to
25% data management overhead and 10% of the problem operations forced to be
sequential.

The historic performance versus cost of computers has been explored
very thoroughly by Professor Knight. The carefully analyzed data he
presents reflects not just the execution times for arithmetic operations
and cost of minimum of recommended configurations. He includes memory capacity
effects, input-output overlap experienced, and special functional capabilities.
The best statistical fit obtained corresponds to a performance proportional
to the square of the cost at any technological level. This result very
effectively supports the often invoked "Grosch's law." Utilizing this
analysis, one can argue that if twice the amount of hardware were exploited
in a single systems, one could expect to obtain four times the performance
The only difficulty is involved in knowing how to exploit this additional
hardware. At any point in time it is difficult to foresee how the previous
bottlenecks in a sequential computer will be effectively overcome.
If it were easy they would not have been left as bottlenecks.
It is true by historical example that the successive obstacles have been
hurdled, so it is appropriate to quote the Rev. Adam Clayton Powell --
"Keep the faith, baby!" If alternative one decided to improve the
performance by putting two processors side by side with shared memory,
one would find approximately 2.2 times as much hardware. The additional
two tenth in hardware accomplishes the crossbar switching for the sharing.
The resulting performance achieved would be about 1.8.
The latter figure is derived from the assumption that each processor
utilized half of the memories about half of the time.
The resulting memory conflicts in the shared system would extend the
execution of one to two operations by one quarter of the time.
The net result is a price performance degradation to 0.8 rather than an
improvement to 2.0 for the single larger processor.

Comparative analysis with associative processors is far less easy and obvious.
Under certain conditions of regular formats there is a fairly direct approach.
Consider an associative processor designed for pattern recognition, in which
decisions within individual elements are forwarded to some set of other
elements. In the associative processor design the receiving elements
would have a set of source addresses which recognize by associative techniques
whether or not it is to receive the decision of the currently declaring
declaring element. To make a corresponding special purpose non-associative
processor one would consider a receiving element and its source addresses as
an instruction, with binary decisions maintained in registers. Considering
the use of thin film memory, an associative cycle would be longer than a
non-destructive read cycle. In such a technology the special purpose
non-associative processor can be expected to take about one-fourth as
many memory cycles as the associative version and only about one-sixth
the time. These figures were computed on the full recognition task,
with somewhat differing ratios in each phase. No blanket claim is
intended here, but rather that each requirement should be investigated
from both approaches.

END of paper.

See also
CRI UNICOS has a useful command amlaw(1) which does simple
number crunching on Amdahl's Law.
------------
On a CRI system type: `man amlaw`.

1 1
S = lim ------------ = ---
P->oo 1-s s
s + ---
P

S = speedup which can be achieved with P processors
s (small sigma) = proportion of a calculation which is serial
1-s = parallelizable portion

Speedup_overall
= 1 / ((1 - Fraction_enhanced) + (Fraction_enhanced/Speedup_enhanced))

Articles: comp.parallel
Administrative: eug...@cse.ucsc.edu.SNIP
Archive: http://groups.google.com/groups?hl=en&group=comp.parallel

--

Eugene Miya

unread,

Nov 2, 2006, 7:21:13 PM11/2/06

to

In article <1162511337.1...@h48g2000cwc.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>Eh, I'm not trying to sell domes here.

You are attempting an intellectual technique to justify "free" thinking.
You aren't the first, and nor will you be the last.
Fuller was also popular for this when I was in high school.
He had a few good ideas. But he also had good techicians.

Del is pretty sharp. He and I may not always disagree, but
we do agree here. von Neumann was not always right: he was somewhat
wrong about Fortran and things like word processing (another waste),
but he is not shrugged lightly. He was one of the smartest
mathematicians who ever lived. He would never have bothered with
something lik Usenet.

And yes I considered the use of the words agree or disagree, and I
definitely chose the latter. ;^)

--

Eugene Miya

unread,

Nov 2, 2006, 7:29:36 PM11/2/06

to

>> Hmmm, no.
>> I've worked on several benchmark teams and have to sweat various issues.
>> If you want brutal, non-trivial slow problems outside cryptanalysis,
>> I can think of many. A favorite might be Golomb rulers.
>> No floating point. Can fit in memory.

In article <1162514526....@h54g2000cwb.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>Golomb rulers are already found with massively parallel computation. I
>guess I'm not sure what you mean.

Not very far. Not many. The parallelism is quite bounded.

>> On the contrary what von Neumann wrot was a very good simple system for
>> his time.
>
>Fine, it's unfair to blame him like that.
>
>Some people definitely have a higher opinion of him than I do though.

Well few of us will ever a book like his game theory book, some of his
papers, his work at the Manhattan Project and on the H-bomb, etc.

Much less influence on computers. Others attempted too complex designs.
You don't hear much about them.

>> I recall the special APL character set
>> (the Culler-Fried keyboard was little better and its was QWERTY).
>> I extracted from an officemate involved with the CDC Star-100 that the
>> direction of evaluation was a poor choice.
>
>But its influence died off too. Why?

APL or the Star?

The Star probably died most because it was a system which lacked balance.
Its vector pipeline could not compensate for the poor scalar performance.
It took too long to learn how to use at the compiler level, and
its software came too late that it destroyed the careers and psyches of
people assigned to work with it during the Cold War. And that's an over
simple way of saying that. I never had to use the machine. I know and
have interviewed people who did.

>> Part of the problem which kills simple languages like these are the
>> kinds of numeric applications which have volumetric and border (edges,
>> faces, vertices, and hyper-structures (4D and higher Ds) which are
>> exception cases which then go to irregular geometries, etc.
>
>That can be turned into arrays too.

Oh, they are arrays, no question, but few outside certain communities
can appreciate good gather-scatter hardware and operations.

Anyways. I'm outta here.

--

BDH

unread,

Nov 2, 2006, 8:38:26 PM11/2/06

to

> Do VLIW type techniques (plausibly) scale

I say sure. SIMD can help.

> Isn't communication and memory infrastructure still the
> main issue/concern?

Yes. For everything. But it's not so hard to devise different
addressing schemes that can fix that.

Richard

unread,

Nov 2, 2006, 10:38:14 PM11/2/06

to

[Please do not mail me a copy of your followup]

eug...@cse.ucsc.edu (Eugene Miya) spake the secret code

<454a981f$1@darkstar> thusly:

>> link to the paper?
>
>Just easier for me to read it in and cut.

Thanks! It was a nice read.

BDH

unread,

Nov 3, 2006, 4:31:13 AM11/3/06

to

> Del is pretty sharp. He and I may not always disagree, but
> we do agree here. von Neumann was not always right: he was somewhat
> wrong about Fortran and things like word processing (another waste),
> but he is not shrugged lightly. He was one of the smartest
> mathematicians who ever lived. He would never have bothered with
> something lik Usenet.

So if I was smart enough to handle the position I want, I wouldn't be
on Usenet? Ouch!

Terje Mathisen

unread,

Nov 3, 2006, 4:28:05 AM11/3/06

to

Eugene Miya wrote:
> In article <ch2q14-...@osl016lin.hda.hydro.com>,
> Terje Mathisen <terje.m...@hda.hydro.com> wrote:
>> Feynman was a "wizard of the highest order". Even after he had figured
>> something out and explained it, there was no way for mere mortals to see
>> how they could ever have done the same.
>
> Oh that was you who said that.
>
> Hmmmm what can I say?
> Caltech collects geniuses like Feynman. He was clearly unique, but so
> are and were a fair number of other people who went to, worked at, or
> were part of Tech. I could never have gotten in. I am not smart enough.

Right. Pull another one.

There's a Churchill quote:
“Mr. Attlee is a very modest man. Indeed he has a lot to be modest about.”

Don't try to impersonate Mr Attlee!

BTW, my younger brother Knut got his PhD (in industrial chemistry) at
Caltech, at the same time I was working for Novell in Utah. According to
him it was a nice place.

> They may have also passed their their prime (friends hope not).
> I took a class from a JPL co-worker (hard to use that word as he was so
> much more advanced that I was: Blinn). Most of you will have never
> heard Jim because this isn't comp.graphics. He had a hobby of
> collecting Pythagorean theorem proofs (he's got over 100).
> He's also a master of 4x4 matrix multiplication.
>
> That says nothing of non-computational fields.
> You should visit Tech for half a day the next time you come to
> California Terje.

OK, I'll try to if I can figure out an excuse to travel so far south.

My last US trip was a 3 day in&out to Seattle a couple of weeks ago,
with no time for sightseeing. :-(

BDH

unread,

Nov 3, 2006, 4:42:48 AM11/3/06

to

> >Golomb rulers are already found with massively parallel computation. I
> >guess I'm not sure what you mean.
>
> Not very far. Not many. The parallelism is quite bounded.

They're found by exhaustive searching of a reduced search space. It's
easy to partition that for massive parallelism.

> >Some people definitely have a higher opinion of him than I do though.
>
> Well few of us will ever a book like his game theory book, some of his
> papers, his work at the Manhattan Project and on the H-bomb, etc.

Hey, I have time...

...or maybe I don't. Probably don't. =(

> >But its influence died off too. Why?
>
> APL or the Star?
>
> The Star probably died most because it was a system which lacked balance.
> Its vector pipeline could not compensate for the poor scalar performance.
> It took too long to learn how to use at the compiler level, and
> its software came too late that it destroyed the careers and psyches of
> people assigned to work with it during the Cold War. And that's an over
> simple way of saying that. I never had to use the machine. I know and
> have interviewed people who did.

I was thinking of APL, but thanks for that.

Thinking Machines went under too, but we still have Microsoft.

I suspect that in another 10 years people will still be assuming
incorrect things about compilers.

> >> Part of the problem which kills simple languages like these are the
> >> kinds of numeric applications which have volumetric and border (edges,
> >> faces, vertices, and hyper-structures (4D and higher Ds) which are
> >> exception cases which then go to irregular geometries, etc.
> >
> >That can be turned into arrays too.
>
> Oh, they are arrays, no question, but few outside certain communities
> can appreciate good gather-scatter hardware and operations.

I guess my question was, is there something about APL that is
incompatible with either someone educated a certain way or inherently
incompatible with the majority of programmers, or was it just
happenstance of history?

> Anyways. I'm outta here.

Bye.

BDH

unread,

Nov 3, 2006, 4:50:56 AM11/3/06

to

> The Star probably died most because it was a system which lacked balance.
> Its vector pipeline could not compensate for the poor scalar performance.
> It took too long to learn how to use at the compiler level, and
> its software came too late that it destroyed the careers and psyches of
> people assigned to work with it during the Cold War. And that's an over
> simple way of saying that. I never had to use the machine. I know and
> have interviewed people who did.

CDC, I guess it was run in a smart way. But some kinds of useful smart
need tempering to keep them away from insanity. I guess maybe they were
missing that.

Of course, usually companies get much less intelligent insanity.

Eugene Miya

unread,

Nov 3, 2006, 10:31:47 AM11/3/06

to

eug...@cse.ucsc.edu (Eugene Miya) spake the secret code
><454a981f$1@darkstar> thusly:

>>Just easier for me to read it in and cut.

In article <eiedj6$ld0$1...@news.xmission.com>, Richard <> wrote:
>Thanks! It was a nice read.

So what has happened in the mean time is that hypercubes came and went,
"MPP" architectures came and went, "scaled speed up" ... etc.

What people miss from not living in the Santa Clara Valley, I should say
computer people is that he lives here. It's possible to run into him in
many different ways. One thing which was here and amazing (before
Fry's) was Computer Literacy Bookshop, and he gave a nice opening talk
to his San Jose store (not taped). And the Law discussion came up.
And he was asked a question how he knew this: simulation? etc.
And he said, "No. Just thinking."

--

Eugene Miya

unread,

Nov 3, 2006, 10:38:18 AM11/3/06

to

In article <amer14-...@osl016lin.hda.hydro.com>,

Terje Mathisen <terje.m...@hda.hydro.com> wrote:
>>> Feynman was a "wizard of the highest order".

>> Caltech collects geniuses like Feynman. He was clearly unique, but so
>

>Right. Pull another one.

Well my friend who married Pauling daughter (as well as Pauling) are 2.
Many Divisions have world famous people. I have no idea what some of my
friends who teach and do research there are noted for. We just recreate
together, have common friends.

>There's a Churchill quote:...

>Don't try to impersonate Mr Attlee!
>
>BTW, my younger brother Knut got his PhD (in industrial chemistry) at
>Caltech, at the same time I was working for Novell in Utah. According to
>him it was a nice place.

It's an intense place. Depends on the person and the field.
If you have been there, no need to visit. It's where the Einstein photo
of him on the bike is taken.

>> You should visit Tech for half a day the next time you come to
>> California Terje.
>
>OK, I'll try to if I can figure out an excuse to travel so far south.

Naw if you visited your bro, you have seen enough.

>My last US trip was a 3 day in&out to Seattle a couple of weeks ago,
>with no time for sightseeing. :-(

Yeah I hate those in and out trips. Travel is a thread over in
alt.folklore.computers right now.

--

Eugene Miya

unread,

Nov 3, 2006, 10:48:01 AM11/3/06

to

>> we do agree here. von Neumann

In article <1162546273.1...@b28g2000cwb.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>So if I was smart enough to handle the position I want, I wouldn't be
>on Usenet? Ouch!

So do something. Accomplish something.

Well, in the early days of Usenet, when far fewer people were here, in
fact guys like Alan Smith (Mr. cache memory) and John Hennessy (now
Stanford's President) were here (posted) or lurked. Others had
assistants lurk, post and troll. They looked for talented people to hire.
Discussion quality was high (it's typically higher in the comp.* groups).

Then after a while as the net grew (I'll use a graphics example) some of
these people had to tell their people NOT to reveal corporate info
on the net. But its a two way street. Now many people can make a
living off the net.

I am lucky in this regard to a point. My job was to watch how some of
the money was spent by the guys whose mail and news server you are using.
So I have a pretty idea what they doing scanning your email and your
posts (and mine, but I won't get a gmail acct.). That's fine with me.

If you are young and sharp, build a machine, write some code, write a
paper, etc. Do something to benefit the world or the net or something.

--

Eugene Miya

unread,

Nov 3, 2006, 10:55:14 AM11/3/06

to

In article <1162546968.2...@i42g2000cwa.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>> >Golomb rulers are already found with massively parallel computation. I
>

>They're found by exhaustive searching of a reduced search space. It's
>easy to partition that for massive parallelism.

Then I suggest go find the next set and do it.

>> >Some people definitely have a higher opinion of him than I do though.

>> von Neumann

>
>Hey, I have time...
>...or maybe I don't. Probably don't. =(

Write the next definitive book on game theory or some new topic.

>I was thinking of APL, but thanks for that.

I am not the APL expert you need. Maybe track down Bernecky.
Or maybe not, he talks to me because I attempt to maintain my
parallelism biblio.

>Thinking Machines went under too, but we still have Microsoft.

Sure, and Tom Blank, Marpar's architect works for them.

>I suspect that in another 10 years people will still be assuming
>incorrect things about compilers.

Oh I suspect if you think that, then, they probably already do now.
If you have a better idea for a parallelizing compiler: do it.
Do better than Kuck or Kennedy. I know they would like to be bettered.
But keep Amdahl in the brain.

>I guess my question was, is there something about APL that is
>incompatible with either someone educated a certain way or inherently
>incompatible with the majority of programmers, or was it just
>happenstance of history?

My guess is all of the above.
Compatibility works against you. You would do better not asking here as
much as comp.lang.apl. The document/typesetting guys have similar
problems and they are only beginning to learn that. The history of
computing is largely about neglect and ignorance.

>> Anyways. I'm outta here.
>Bye.

Briefly back.
--

Eugene Miya

unread,

Nov 3, 2006, 11:02:48 AM11/3/06

to

>> The Star -100

In article <1162547455....@k70g2000cwa.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>CDC, I guess it was run in a smart way. But some kinds of useful smart
>need tempering to keep them away from insanity. I guess maybe they were
>missing that.
>Of course, usually companies get much less intelligent insanity.

Well CDC began as a very focused company for a number of public and a
number of secret reasons (as in: we may never know, because we have
no need to know). The problem began when as it was growing it
hired a bunch of IBM business guys who apparently decided to expand
their market and they also took their money to attempt to diversity into
other more commodity areas, to change the general world, etc.
There were business decisions involved.
They lost a number of their best people.
They were doing hard technical things.

There was a CDC lecture back in June, but I skipped off on an annual trip to
Alaska to help some friends. What I heard was that it was a marketing
love fest w/o any views of technical history. I have not viewed the tape.
In fact when there was an Alto lecture also a few years back but in June
I have not gone to view that Alto lecture (I just go eat salad with some
of my PARC buddies [I'll see some of them next weekend anyways]).

--

Del Cecchi

unread,

Nov 3, 2006, 7:52:02 PM11/3/06

to

"Eugene Miya" <eug...@cse.ucsc.edu> wrote in message
news:454b7638$1@darkstar...

They lost seymour cray. They weren't content in their niche. The market
changed and they didn't adapt. Bill Norris got the big head disease.
Pick your reason.

del
> --

BDH

unread,

Nov 4, 2006, 2:06:30 AM11/4/06

to

> If you are young and sharp, build a machine, write some code, write a
> paper, etc. Do something to benefit the world or the net or something.

That's a good position, and I guess it's sad to say I don't care half
as much today as I did.

What, you have something in mind? Video compression?

BDH

unread,

Nov 4, 2006, 2:07:44 AM11/4/06

to

> ** Merge ideas:
> Yesterday I attended a seminar given by Bill Dally at Stanford.
> The series is an amazing collection of people (Bill, whom I had dinner
> with afterward, is now the CS chair, and we synced on friends in common).
> I'm going to let the cat out of the bag for people outside the Santa
> Clara valley. The lecture is "taped" and put on the web for the
> graduate seminar series:
> ee380.stanford.edu
> He'll tell you about parallelism, and he was worked on respectable
> architectures. I also worked with one of the benchmarks he talked about
> in the talk.

Good find.

Eugene Miya

unread,

Nov 6, 2006, 11:27:09 AM11/6/06

to

In article <1162623990....@k70g2000cwa.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>That's a good position, and I guess it's sad to say I don't care half
>as much today as I did.
>
>What, you have something in mind? Video compression?

What? For you? That's your mind. You have to decide.

I have my own responsbilities.
--

Eugene Miya

unread,

Nov 6, 2006, 11:34:48 AM11/6/06

to

>>>> The Star -100

In article <4r26heF...@individual.net>,

Del Cecchi <delcecchi...@gmail.com> wrote:
>They lost seymour cray. They weren't content in their niche. The market
>changed and they didn't adapt. Bill Norris got the big head disease.
>Pick your reason.

Basically.

>They lost seymour cray.

This seems to be the leading theory of technical types and it has
credence when the world lost him.

>They weren't content in their niche.

This seems to be a leading theory of one set of business types.
See 4 below. But this reason is a bit abstract.

>The market changed and they didn't adapt.

This seems to be a leading theory of another set of business types.
The evidence here comes from ETA Systems, NSC, and a number of other
other spinoffs. "Carried forward by momentum."

>Bill Norris got the big head disease.

This seems to be a leading theory of a 3rd set of business types.
Favorable reason for some in the PLATO crowd.

--

Nick Maclaren

unread,

Nov 6, 2006, 12:47:54 PM11/6/06

to

|> >The market changed and they didn't adapt.
|>
|> This seems to be a leading theory of another set of business types.
|> The evidence here comes from ETA Systems, NSC, and a number of other
|> other spinoffs. "Carried forward by momentum."

It's also a theory among many of the technical types who have experience
of more than just the sort of HPC that traditionally ran on CDCs.
Many of us feel that Seymour Cray succeeded because his undoubted
design ability compensated for his understanding of the market and
its direction. We could see the market for 'his' kind of system
shrinking steadily, even in absolute terms. And that was true even
in 1970.

Regards,
Nick Maclaren.

Eugene Miya

unread,

Nov 6, 2006, 4:04:02 PM11/6/06

to

CDC

Cecchi <delcecchi...@gmail.com> wrote:
>|> >They lost seymour cray.

In article <454f7238$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
article <4r26heF...@individual.net>,

>|> This seems to be the leading theory of technical types and it has
>|> credence when the world lost him.
>|>

>|> This seems to be a leading theory of another set of business types.
>|> The evidence here comes from ETA Systems, NSC, and a number of other
>|> other spinoffs. "Carried forward by momentum."

In article <einsga$84l$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>It's also a theory among many of the technical types who have experience
>of more than just the sort of HPC that traditionally ran on CDCs.
>Many of us feel that Seymour Cray succeeded because his undoubted
>design ability compensated for his understanding of the market and
>its direction. We could see the market for 'his' kind of system
>shrinking steadily, even in absolute terms. And that was true even
>in 1970.

Shrinking? Market? ....Right. As Bill Cosby would say.

He was not a designer of microprocessors.
Ability: No number of six foot high jumpers equals a seven jumper.

Gombosi is gone and David is gone. As well as quite a few other Crayons.
I suspect that you and the vast majority of lurkers/posters don't really
know what his market was. Oh, we can use words like cryptanalysis or
nuclear bomb codes and the like which were public and others would argue
for the civilian applications in oil and structural analysis were a
factor. They didn't cover development costs. I suspect that many in
those serious markets won't ever really reveal just what he did for them.
Nor ERA nor CDC.

--

BDH

unread,

Nov 6, 2006, 8:28:21 PM11/6/06

to

> What? For you? That's your mind. You have to decide.
>
> I have my own responsbilities.

Heh, no, I meant that more generally. More of "what do you think of as
needing improvement and having an actual chance of changing."

Jan Vorbrüggen

unread,

Nov 7, 2006, 3:01:47 AM11/7/06

to

> He was not a designer of microprocessors.

Has anybody claimed that? Even today, there is a place for more than The One
Architecture, and a place for lots of different systems. But for quite some
time there hasn't been a place for the original Cray approach.

> Ability: No number of six foot high jumpers equals a seven jumper.

For sure. But the goal function has changed: Everybody can jump seven feet
high nowadays, but doing it carrying 50 kg of baggage is the new game in town.

> I suspect that you and the vast majority of lurkers/posters don't really
> know what his market was. Oh, we can use words like cryptanalysis or
> nuclear bomb codes and the like which were public and others would argue
> for the civilian applications in oil and structural analysis were a
> factor. They didn't cover development costs. I suspect that many in
> those serious markets won't ever really reveal just what he did for them.
> Nor ERA nor CDC.

Twaddle. Too much of a conspiracy theory. And do I read "serious" = "national
security"!?

Jan

Nick Maclaren

unread,

Nov 7, 2006, 4:57:26 AM11/7/06

to

In article <4raspqF...@mid.individual.net>,

=?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <jvorbr...@not-mediasec.de> writes:
|> > He was not a designer of microprocessors.
|>
|> Has anybody claimed that? Even today, there is a place for more than The One
|> Architecture, and a place for lots of different systems. But for quite some
|> time there hasn't been a place for the original Cray approach.

Or rather, the place has shrunk so much that it is no longer large enough
to contain a viable original Cray approach. The place still exists, even
if it is solely in the minds of Organisations That We Know Nothing About.

|> Twaddle. Too much of a conspiracy theory. And do I read "serious" = "national
|> security"!?

With the emphasis on the quotes in "national security".

Regards,
Nick Maclaren.

Eugene Miya

unread,

Nov 7, 2006, 12:58:21 PM11/7/06

to

In article <1162862901....@m73g2000cwd.googlegroups.com>,

BDH <bha...@gmail.com> wrote:
>Heh, no, I meant that more generally. More of "what do you think of as
>needing improvement and having an actual chance of changing."

Well, there are some very difficult challenges on both the architectural
and device level. It's if one wants to do more of the same as most IBM
customers want or something really radical. Bold might not work and is
a considerable investment. I am not the person to do the work, but like
Google, I observe for funders. But I don't observe for hardware.

For me the next 2 hurtles I have to watch are this weekend.
One friend is leading a session on advances in ideas he has in optics.
That friend is assigned to give a public talk in Feb. (hurtle #2).
Another set of people are looking at rethinking architecture and
languages this weekend. I am not certain who's leading that session,
but I know Knuth is being tapped. Now I know that MIX and MMIX aren't popular.
And he's not an architect either. I think this session might be regard
as some what amateurish, but then Wozniak was the original local at this
meeting (others know him far better than me, and he no longer attends,
but many of his friends do).

On the wider architectural scale, I'm more curious about how dynamic
data flow might have worked. Dally's 380 talk had a tiny bit of that.

--