Al Grant, ag...@cam.ac.uk, wrote on Mon, 22 May 2000:
> While reading some old files I found a description (c.1988)
> of the TF-1, a 32768-processor MIMD machine with teraflop
> performance that IBM had designed. Did it ever get built?
We need to ask the living legends in news:comp.sys.super
and news:alt.folklore.computers about such a dinosaur.
TF-1 must have been the TeraFlop One. Mebbe they were made
to order for a three-letter agency like CIA, NSA or NRO.
Whoever has a TF-1 (perhaps intercepting and reading this post),
please run http://www.geocities.com/mentifex/mind4th.html on it.
> There are few details of the CPU type but it does say
> it has 64 64-bit registers and a 20ns cycle time
> (and it sounds like a Harvard architecture).
It sounds too big for a crazed Harvard man to send through the mail.
No and before you ask I'm not related to Hugh Grant either.
> > While reading some old files I found a description (c.1988)
> > of the TF-1, a 32768-processor MIMD machine with teraflop
> > performance that IBM had designed. Did it ever get built?
>
> We need to ask the living legends in news:comp.sys.super
> and news:alt.folklore.computers about such a dinosaur.
Fair enough. I was particularly interested in the CPU design
though. It predates the release of POWER (1990 IIRC).
Were IBM using early POWERs in custom computers or
was this something quite different?
> Whoever has a TF-1 (perhaps intercepting and reading this post),
> please run http://www.geocities.com/mentifex/mind4th.html on it.
I can't understand this at all.
> > Whoever has a TF-1 (perhaps intercepting and reading this post),
> > please run http://www.geocities.com/mentifex/mind4th.html on it.
>
> I can't understand this at all.
Mentifex is a lunatic. Actually, he's crazy in a charming, British
sort of fashion.
-- g
> Fair enough. I was particularly interested in the CPU design
> though. It predates the release of POWER (1990 IIRC).
> Were IBM using early POWERs in custom computers or
> was this something quite different?
RIOS predated POWERPC. RIOS was used in number of things. ROMP
predated RIOS (as was targeted for a number of things). I have little
plastic case on my desk with six chips embedded in it with title IBM
AWD Austin & GTD Burlington, "POWER (aka RIOS) Architecture", 150 million
ops, 60 million flops, 7 million transistors.
One thing I consider big difference between ROMP/RIOS & POWERPC ... was
POWERPC allowed for cache consistency (aka SMP). all the ROMP/RIOS
designs (that I know of) had no provisions for cache consistency
operation (ROMP/RIOS would be involved in various parallel machine
designs ... in part because they couldn't be used in cache consistency
SMP machines ... aka if all you have is a hammer, then everything is a
nail?).
random references:
http://www.garlic.com/~lynn/2000.html#49
http://www.garlic.com/~lynn/99.html#129
--
Anne & Lynn Wheeler | ly...@adcomsys.net, ly...@garlic.com
http://www.garlic.com/~lynn/ http://www.adcomsys.net/lynn/
FWIU, both machines were meant for research experimentation only,
and never meant to be sold. Perhaps it was the GF11 that was supposed
to compute the mass of the proton from first principles, or maybe it
was the TF1.
As to the technology, the GF11 was a massively parallel bunch of
ROMP processors. I believe the POWER was out by the time the TF1
was being conceived, though I'm not sure. Clearly it wasn't PowerPC,
though.
I seem to remember hearing that the GF11 was indeed built, and did
predict proton mass within reasonable experimental limits, though
this is indeed fuzzy memory.
If there's any value to this reply, it's tying the TF1 to the GF11 -
that at least gives you another search point. It's also worth knowing
that to the best of my knowledge, both were one-of-a-kind research
machines, and probably spend their entire service lifetimes in
Yorktown Heights, New York.
Dale Pontius
NOT speaking for IBM
I can't help but think he must have some good ideas --- if only I could
understand them!
After all, old Bucky Fuller sounded twice as crazy as Mentifex . . .
--
<kra...@pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/>
The Internet stock bubble didn't burst on 1999-11-08. Hurrah!
<URL:http://www.pobox.com/~kragen/bubble.html>
The power didn't go out on 2000-01-01 either. :)
> If there's any value to this reply, it's tying the TF1 to the GF11 -
> that at least gives you another search point. It's also worth knowing
> that to the best of my knowledge, both were one-of-a-kind research
> machines, and probably spend their entire service lifetimes in
> Yorktown Heights, New York.
I think that some of this was sponsored out of a lab in Kingston that
started out with a whole load of FPS boxes (attached to a couple IBM
mainframes) ... working on various chemical & atomic calculations, I
vaguely remember them announcing along the way various calculation
thresholds in the gflops range in the mid to late '80s. Then there
were upgrades with FPS boxes in combination with IBM 3090s with vector
facility.
http://www.research.ibm.com/compsci/arch_os/abs.html
with respect to RP3 comment:
http://www.garlic.com/~lynn/99.html#136a
my wife had been assigned the task to review RP3 to decide whether it
should continue to receive funding.
And, of course, the 801 Project predated (and begat) them all.
- Ken Seefried
CTO, DigitalMoJo
Information Security Consulting, Training & Management
@article
{
RP3-experience,
author = "Ray Brant and Hung-Yang Chang and Bryan Rosenburg",
title = "Experience Developing the RP3 Operating System",
jornal = "Computing Systems",
volume = 4, number = 3, pages = "183--216",
year = 1991, month = "Summer",
snote = "++good discussion of mistakes made + lessons learned",
}
--
-- Jonathan Thornburg <jth...@thp.univie.ac.at>
http://www.thp.univie.ac.at/~jthorn/home.html
Universitaet Wien (Vienna, Austria) / Institut fuer Theoretische Physik
"There isn't a security vulnerability in Outlook involved in this at all."
-- Scott Culp (Microsoft) providing "spin" on the "I Love You" mail virus
>In article <8gbin...@news2.newsguy.com>,
>Greg Lindahl <lin...@pbm.com> wrote:
>>"Al Grant" <ag...@cam.ac.uk> writes:
>>> > Whoever has a TF-1 (perhaps intercepting and reading this post),
>>> > please run http://www.geocities.com/mentifex/mind4th.html on it.
>>>
>>> I can't understand this at all.
>>
>>Mentifex is a lunatic. Actually, he's crazy in a charming, British
>>sort of fashion.
>
>I can't help but think he must have some good ideas --- if only I could
>understand them!
"I don't understand him at all! He must be brilliant!" ;)
Not quite. There were quite a few SMP designs that did not rely
on cache coherence. It is pretty foul to sort out a typical
unstructured kernel, but has been done - the work is almost
identical to enabling them for distributed memory systems. Most
of them used a simpler, special-purpose kernel. And ordinary
processes and inter-process communication are no big deal.
The thing that killed them was automatic parallelisation. You
can't do a great deal with languages like Fortran (though HPF
was a valiant attempt), and C is simply unspeakable. So the
compilers had to treat them like distributed memory machines,
with all the consequential inefficiencies.
My view is that this problem turned out to be insuperable only
because it was tackled by the 'cheap and cheerful brigade',
who failed to realise that most customers want a complete,
ready-to-use system and don't want to hack their own and
imported programs to hell and back again. I am pretty certain
that the approach COULD have been made to work.
It is interesting that there have been claims on this group
that very similar technologies HAVE been made to work (e.g.
efficient, general-purpose HPF and OpenMP for distributed
memory systems.) However, all of the GENERAL-PURPOSE systems
that I have heard about are not yet announced ....
I would say it is more like "If you insist on holding a
screwdriver by the blade, then even screws will look like
nails ...."
Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679
"Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
news:8ggd3s$8o8$1...@pegasus.csx.cam.ac.uk...
>
> In article <uaehi3...@mail.adcomsys.net>, Anne & Lynn Wheeler
<ly...@adcomsys.net> writes:
> |>
> |> One thing I consider big difference between ROMP/RIOS & POWERPC ... was
> |> POWERPC allowed for cache consistency (aka SMP). all the ROMP/RIOS
> |> designs (that I know of) had no provisions for cache consistency
> |> operation (ROMP/RIOS would be involved in various parallel machine
> |> designs ... in part because they couldn't be used in cache consistency
> |> SMP machines ... aka if all you have is a hammer, then everything is a
> |> nail?).
>
> Not quite. There were quite a few SMP designs that did not rely
> on cache coherence. It is pretty foul to sort out a typical
> unstructured kernel, but has been done - the work is almost
> identical to enabling them for distributed memory systems. Most
> of them used a simpler, special-purpose kernel. And ordinary
> processes and inter-process communication are no big deal.
>
> The thing that killed them was automatic parallelisation. You
> can't do a great deal with languages like Fortran (though HPF
> was a valiant attempt), and C is simply unspeakable. So the
> compilers had to treat them like distributed memory machines,
> with all the consequential inefficiencies.
Isn't the Tera machine another approach to this? By eliminating the data
cache, they eliminate any cache coherency issues and can look like an SMP to
the software. Of course, they need the multithreading to handle the latency
issues that come with no data cache, but you can't have everything :-(.
--
- Stephen Fuld
It was the GF11. That was built, and did compute that mass from
first principles, for the first time. The result would have been
a *lot* more interesting if the computation had been correct, but
the answer had not agreed with experiment. But it did.
> As to the technology, the GF11 was a massively parallel bunch of
> ROMP processors. I believe the POWER was out by the time the TF1
> was being conceived, though I'm not sure. Clearly it wasn't PowerPC,
> though.
Nope, that was RP3, done at about the same time. And indeed, it
was clobbered by POWER, which was so much faster than the ROMPs
that it blew RP3 away.
GF-11 was SIMD: A whale of a lot of ALUs all pounding at the same
time, no conventional processors at the nodes.
> I seem to remember hearing that the GF11 was indeed built, and did
> predict proton mass within reasonable experimental limits, though
> this is indeed fuzzy memory.
Yup, see above.
> If there's any value to this reply, it's tying the TF1 to the GF11 -
> that at least gives you another search point. It's also worth knowing
> that to the best of my knowledge, both were one-of-a-kind research
> machines, and probably spend their entire service lifetimes in
> Yorktown Heights, New York.
All correct.
If my own memory serves, the TF1 was intended to be a MIMD
follow-on to the GF-11, but never got off the drawing board (so
to speak).
Now, of course, we have Blue Gene in the GF11 tradition of "one
problem, one computer architecture."
Greg Pfister
<not my employer's opinion>
Steve...
In a way, but it takes the same approach of providing data coherence.
The systems that I am talking about did NOT provide a coherent view
of data to different threads, and it was the software's business to
stay out of trouble.
The point is that non-coherent SMP hardware is simple to design,
cheap to make, and very scalable - in fact, it is nothing more than
a distributed data system with global addressing. The (actual but
inefficient) implementations of HPF and the (efficient but purported)
implementations of OpenMP for distributed memory systems are exactly
the same technology, but the with addressing itself in software.
Data coherent SMP has none of those advantages, but is much easier
to program for.
funny you should mention that ... the 16-way (in the following
reference on 801, circa '76)
http://www.garlic.com/~lynn/95.html#11
http://www.garlic.com/~lynn/98.html#40
was big mainframes crammed into a box w/o cross-cache signaling for
consistency. we did some slight of hand with virtual memory and
simulated messaging to support a broad range of applications. the
business problem was getting people that were accustom to very strong
memory consistency to put it out as a standard product.
when my wife & I were doing cluster scaleup for RIOS ... we didn't
make that mistake ... but created a couple of other problems. the
interconnect fabric we were using ... not only supported
processor-to-processor messaging ... but also processor-to-device, so
instead of confining ourselves to only doing processor to processor
message solutions ... we also looked at using the same fabric in other
ways for device interconnect also, misc. ref:
http://www.garlic.com/~lynn/96.html#15
--
--
Anne & Lynn Wheeler | ly...@garlic.com, finger for pgp key
http://www.garlic.com/~lynn/
Ah, so you started the late 1970s boom in that technology! I still
think that it is a pity that it never took off - there were quite a
lot of boxes that hit the marketplace, mostly produced by small
companies that later went belly-up. But the technology worked, and
was cheap and simple.
I shall award a small prize to the first techno-peasant who I hear
of as being faced with a core dump of an 8-processor IA-64 system
with each processor running in a different mode or interrupt state
at the point of failure and suspicions that the cause was a cache
coherency failure. I.e. I will buy the poor sucker a couple of
drinks to drown sorrows in :-)
IBM built such a machine in the early 1990's. It was a short-lived project
that I think went under the name "PowerParallel", and was available (IIRC)
by special order only. The two boxes that I knew of were at Florida State
and at Dupont.
The system had 4 POWER CPUs, each with their own "local" memory. A portion
of the address space accessed a physically shared memory array in the top
of the box. The POWER architecture has a segmented memory addressing scheme
that makes this approach very easy to do.
Of course there was no cache coherency -- that was up to the user.
The project was killed because IBM was losing a lot of money at the time.
IBM also claimed that customers were unhappy with the difficulty of
debugging a non-cache-coherent system.
Of course, all of this is from recollection, and I may not be remembering
all the details correctly.....
--
John McCalpin, "Dr. Bandwidth" -- a.k.a. "The Bandwidth Bigot"
jo...@mccalpin.com http://www.cs.virginia.edu/stream/
it was oak ... it had a feature where segments could be flagged as
cached or not-cached. storage that needed to be consistent was
allocated in a segment flagged non/never-cached. an issue was
developing programming paradigms that could use non-standard
hardware. work was done on using simulated message passing. they
needed tweak to the processor (custom chips) even to do that bit
(chips were referred to as rios 0.9).
part of the issue was that we could demonstrate 16-way and quick path
to 128-way (and above) using standard existing chips & motherboards
with interconnect fabric (even demonstrate interconnect fabric on
existing workstations in racks ... but some additional manufactoring
cost savings by packaging standard mother boards for high-density rack
mounting ... racks had to have some cooling characteristics to achieve
the high-density packaging of the mother boards).
even tho oak had a lot of similarities with the mid-70s effort with
ibm mainframe ... points on the trade-off curves regarding commodity
standard parts were different between the mid-70s and late 80s/early
90s (in part, oak was stuck with non-standard chip part and memory
bus)
--
Actually, that project went by many names over time. It kept
getting proposed, killed, re-proposed with a new name, re-killed,
etc. The name it was RPQ'd under for a while was POWER4.
Other than POWER4, I forget the prior names, but the last name
was Live Oak (never was called just "oak"), named after a local
Texas tree that somebody thought was called "Live Oak" because it
was so tough that it was really, really hard to kill.
Nice thought, nice intention, but --
(a) Live Oak treess actually have that name because they keep
their leaves almost all year, shedding them only for a week or so
in winter. They aren't particularly hard to kill. The wood is
very tough, though. Make that *very* tough.
(b) That was the *last* name. It finally was wiped out by the SP.
> part of the issue was that we could demonstrate 16-way and quick path
> to 128-way (and above) using standard existing chips & motherboards
I never heard anybody say 128-way, and I was somewhat in the
middle of this. What I saw was that it was always intended as an
SMP replacement (yes, SMP, don't ask, some people's heads were in
a strange place then). The software bill, not the hardware, was
always the sticking point. Anyway, the SP became by definition
the "right" solution, and Live Oak was not resurrected again.
That wasn't any relation to Rio Bravo was it? All that would have taken
was a little unobtainium.....
del cecchi
> I never heard anybody say 128-way, and I was somewhat in the
> middle of this. What I saw was that it was always intended as an
> SMP replacement (yes, SMP, don't ask, some people's heads were in
> a strange place then). The software bill, not the hardware, was
> always the sticking point. Anyway, the SP became by definition
> the "right" solution, and Live Oak was not resurrected again.
you weren't at the following meeting:
http://www.garlic.com/~lynn/95.html#13
yes it was live oak ... jsut finger abbreviation.
--
a couple months earlier (than above mentioned meeting) ... there were
demos/presentations of the <4" (high) rack-mount motherboard housing
with pair of fiber-channnel connectors and special cooling channels
built into the rack that provided cooling to the motherboards (cool
air from the rack cooling channels in one side of the motherboard
housing, across the motherboard and out the otherside to return path
cooling channel built into the otherside of the rack).
Fabric interconnect was based on cascaded ancor 64way non-blocking
switch (replicated fabric infrastructure to the pair of fiber-channel
connectors on each motherboard). Relatively few racks would house 128
processors with associated interconnect switches and associated disk
drives.
In article <3929...@news.victoria.tc.ca>
uj...@victoria.tc.ca (Arthur T. Murray) writes:
>> Organization: University of Cambridge, England
>[Cambridge, eh? Did you know Ludwig Wittgenstein?
>Oh, never mind. Eugene Miya most likely has personal
>recollections of old boy Ludwig, or else a direct
>channel to Wittgenstein through Shirley Maclaine.]
Ludwig Wittgenstein?... who was that other poster who was really into him?
I'm not much into philohophy, and don't know much about him, other than
I have these people who tell me that I should be reading him....
>Al Grant, ag...@cam.ac.uk, wrote on Mon, 22 May 2000:
>> While reading some old files I found a description (c.1988)
>> of the TF-1, a 32768-processor MIMD machine with teraflop
>> performance that IBM had designed. Did it ever get built?
>
>We need to ask the living legends in news:comp.sys.super
>and news:alt.folklore.computers about such a dinosaur.
The TF-1 was Monty Denneau's machine. I think I signed non-disclosure
about it. Never built. Not enough potential customers.
Bit serial SIMD type of thing maybe. I didn't have enough close contact
and I started moving away from that kind of thing. I do recall one meeting
where I signed an NDA.
>TF-1 must have been the TeraFlop One. Mebbe they were made
>to order for a three-letter agency like CIA, NSA or NRO.
Doubtful. Doubtful. No idea.
I do know that the FBI has not been able to program
their TMC CM. I hope to grab their CM.
>Whoever has a TF-1 (perhaps intercepting and reading this post),
>please run http://www.geocities.com/mentifex/mind4th.html on it.
>
>> There are few details of the CPU type but it does say
>> it has 64 64-bit registers and a 20ns cycle time
>> (and it sounds like a Harvard architecture).
>
>It sounds too big for a crazed Harvard man to send through the mail.
Cornell? Can't remember. I might be able to check for a location in
my biblio (I'm working at fixing mass storage damage on it at the moment).
I remember that Cornell had the IBM Mainframes to FPS boxes
(Ken Wilson's thing). That's probably where you get 64-bit registers.
The way to trip up IBM people is ask them questions like:
sizeof(long) = ?
sizeof(int) = ?
sizeof(int *) = ?
sizeof(double) = ?
sizeof(float) = ? # if 32 (then view with suspicion)
# They will argue: "compatibility"
# just smile and walk away
There's more than enough knowledgeable (as opposed to unknowledgeable)
IBM folk in c.s.s. but you want to be careful in a.f.c. and c.a.
Followup-To: comp.sys.super,alt.folklore.computers
I think we might have the first RIOS in Building 45 in the Museum.
I brought Peter C. by and he sort of recognized it. IBM put a plaque on it.
Too many research machines to keep straight: I think that some of this
discussion is confusing the RP3 (ask Greg to set us straight) with the
TF-1 with the lCAP (that's ell-CAP) and a bunch of other machines.
I thought that the GF-11 was still running. Otherwise I need to change
the FAQ and attempt to hunt for pieces for the Museum.
Eccentric?
He's mostly harmless.
>I can't help but think he must have some good ideas --- if only I could
>understand them!
I think that a copy of The Lighthill Report needs to be put up as a Web page.
I have two copies that netters gave me buried in my paper, but I need to
recover these. The AI people don't have the balls to do this.
Progress rides a very fine line. --One of my old bosses
>After all, old Bucky Fuller sounded twice as crazy as Mentifex . . .
Fuller had useful ideas. You can apply his ideas.
Pseudosciences never get out of the proof of existence stage.
Stewart Brand was the big pusher of geodesic domes in the 1960s who kept
Fuller's ideas alive. Stewart has since recanted on domes
(How Buildings Learn). But Fuller had other very useful ideas on
avoiding waste, conservation, etc.
You decide the followups. This ain't arch and it ain't super.
I believe that this was technically called the lCAP.
It's somewhere in my biblio.
Ken Wilson was the primary driver for this this system which was one of
the original 10 NSF supercomputer centers.
I think there were 10 or so FPS-164 and later 264 boxes and the like.
Right now I have to track down an IMP for the NASM. No time to look for
FPS boxes.
FUs to c.s.s. only.
Good header editing. Keep it up. Get others to do it.
>A very nice RP3 reference:
I recommend that if any reader gets a chance to hear our group's Greg P.
give a retrospective talk on RP3: go hear it. In some ways, I wish we
had acquired one (NDA may still apply).
>@article
> {
> RP3-experience,
> author = "Ray Brant and Hung-Yang Chang and Bryan Rosenburg",
> title = "Experience Developing the RP3 Operating System",
> jornal = "Computing Systems",
bibTeX typo
> volume = 4, number = 3, pages = "183--216",
> year = 1991, month = "Summer",
> snote = "++good discussion of mistakes made + lessons learned",
> }
Ah, you have not run this through TeX yet have you!
F-Us to c.s.s.
Well, functional programming was supposed to save us all....
>The thing that killed them was automatic parallelisation.
What automatic parallelization?
Oh, you mean lack of.
>My view is that this problem turned out to be insuperable only
>because it was tackled by the 'cheap and cheerful brigade',
micro-optimists (marketeers)
>who failed to realise that most customers want a complete,
>ready-to-use system and don't want to hack their own and
>imported programs to hell and back again. I am pretty certain
>that the approach COULD have been made to work.
I think its more than that. You can quantitatively evaluate syntax.
But you have no measure of side effects and other semantics.
The devil is in the detail. The problem is that we make tricky work
arounds in programs.
Caches "worked" because their addition was "transparent" to normal
program function, but your programming could influence them.
The question then was to ignore them and remain portable and get some
benefit or get obscure.
>It is interesting that there have been claims on this group
>that very similar technologies HAVE been made to work (e.g.
>efficient, general-purpose HPF and OpenMP for distributed
>memory systems.) However, all of the GENERAL-PURPOSE systems
>that I have heard about are not yet announced ....
>
>I would say it is more like "If you insist on holding a
>screwdriver by the blade, then even screws will look like nails ...."
If you compose a "package" for David, I will deliver it to him. 8^)
You mean
Fitzpatrick?
like:
%A J. Michael Fitzpatrick
%A John J. Grefenstette
%T Genetic Algorithms in Noisy Environments
%J Machine Learning
%V 3
%N 2/3
%P 101-120
%D 1988
%K superlinear speedup, simulated annealing,
F-Us reduced.
I had our library do a literature search and only found two papers on SA
by him.
There's another 27 but I'd have to search and examine them
for SA relevance on them.
FYI.
....but is of little use since it assumes that a program can have no effects.
In reality, the features of functional programming (e.g. transparent
parallelism) only require that a program have no undeclared effects.
[Actually functional programming is slowly realizing this,
but currently their declaration of effects is very limited and clumsy.
e.g. monads.]
: I think its more than that. You can quantitatively evaluate syntax.
: But you have no measure of side effects and other semantics.
: The devil is in the detail. The problem is that we make tricky work
: arounds in programs.
Correct, but it is sufficient to declare such effects.
For my work in this direction see 'Task Frames' and other papers
at http://www-zeus.desy.de/~funnel/TSIA/index.htm
The work extends previous work on transparent parallelism
(e.g. dataflow and graph reduction). Don't worry, familiarity
with the previous work is not required to enjoy my papers.
Question: Who else presently is pursuing transparent parallelism?
I find very few people in industry, gov.t or academia.
e.g. Work on dataflow and graph reduction largely halted
about 10 years ago.
e.g. Few recent functional programming conferences
have papers mentioning transparent parallelism.
e.g. People programming parallelism using threads (witness Java!)
and thinking that explicit parallelism is perfectly fine.
i.e. Never mind the feasibility,
most people have forgotten (if they ever knew)
the motivation for transparent parallelism.
i.e. A tremendous productivity increase resulting
from separating the application from the system.
i.e. The usual benefits of the division of labour.
Aside:
: Caches "worked" because their addition was "transparent" to normal
: program function, but your programming could influence them.
: The question then was to ignore them and remain portable and get some
: benefit or get obscure.
Not necessarily.
Many of the algorithms suitable for transparent parallelism
are cache oblivious. i.e. Even though they know no cache parameters,
they perform as well as other alg.s explicitly using cache parameters.
See http://supertech.lcs.mit.edu/cilk/papers
Burkhard
> e.g. Work on dataflow and graph reduction largely halted
> about 10 years ago.
The Legion folks put it into their OS, and have a C++ extension that
makes it fairly easy to stick dataflow into a program. I don't view
"transparent" as being that different from "add a couple of keywords",
as long as ease of use isn't that different from serial programming.
Data parallel programming models are still being looked at. For
example, the SMS system has some HPF-like syntax but gets good speedup
on O(100) nodes. See: http://www-ad.fsl.noaa.gov/ac/sms.html
-- greg
Some of my papers reference Legion's Mentat whose return-to-future
mechanism is in the right direction for general and efficient dataflow.
Also Legion's other dataflow work seems in the right direction
(regardless of my question below about its ease.)
I'm not completely up-to-date on Legion/Mentat,
since this direction for dataflow has been taken further
by the early versions of Cilk (http://supertech.lcs.mit.edu/cilk).
Question: Is the C++ code of p.103 of the Legion 1.6.4 Developer Manual
(http://www.cs.virginia.edu/~legion/documentation/developer_1.6.4.pdf)
an example of what you call 'fairly easy'?
If so, I violently disagree.
Fairly easy would be the original 'user' code of the example:
main () {
int a = 10, b = 15, x, y, z;
MyObject A, B;
x = A.op1(a);
y = B.op1(b);
z = A.op2(x, y);
printf ("%d\n", z);
}
Cilk and my work have shown that similar code is directly suitable
for dataflow. This demonstrates what I mean by transparent.
Aside: In any case, the C++ code of p.103 is a relatively inefficient
implementation of dataflow. See early Cilk or my 'Task Frames' paper
for more efficient implementations.
>I don't view "transparent"
>as being that different from "add a couple of keywords",
>as long as ease of use isn't that different from serial programming.
Agreed, depending on what the keywords mean.
For example, the early versions of Cilk provided transparent
parallelism. e.g. An application could not have race conditions.
Despite having similar keywords, later versions of Cilk allowed
applications to have race conditions and other problems
of non-transparent parallelism.
>Data parallel programming models are still being looked at.
>example, the SMS system has some HPF-like syntax but gets good speedup
>on O(100) nodes. See: http://www-ad.fsl.noaa.gov/ac/sms.html
A strange way of declaring the parts of an application
that are independent (e.g. CSMS$PARALLEL) or dependent
(e.g. CSMS$EXCHANGE), but one that can be made to work for array
applications (others?).
Cilk and my work go a long way to showing that these array applications
(along with many (most?) other applications)
can enjoy a more transparent parallelism
(e.g. the dependencies are easily inferred from the code)
with the same or better performance.
Burkhard
:?! <-- nose out of joint?
oh: 'follow-ups' [never mind...]
-eric
--
Eric C. Fromm efr...@sgi.com
Principal Engineer Scalable Systems Division
SGI - Silicon Graphics, Inc. Chippewa Falls, Wi.
> Question: Is the C++ code of p.103 of the Legion 1.6.4 Developer Manual
> (http://www.cs.virginia.edu/~legion/documentation/developer_1.6.4.pdf)
> an example of what you call 'fairly easy'?
> If so, I violently disagree.
The code you refer to is the output of the Mentat translator, not the
input to the Mentat translator. I would suggest that you read much
more carefully next time, since Mentat is covered on pages 18-22 on
that manual (and has its own, separate manual), while page 103 is in
the "Legion Runtime Library" section, which is not where I would look
for transparent user-level interfaces.
-- greg
In article <393D56CA...@sgi.com> "Eric C. Fromm" <efr...@sgi.com> writes:
>:?! <-- nose out of joint?
>
>oh: 'follow-ups' [never mind...]
Hooked you Eric!
;^)
Well, I guess that not every acronym is destined to be......
Thanks. The accompanying "user" code doesn't seem to be Mentat code
and that's part of what led me astray.
The area of p.103 is one of the few mentioning dataflow
according to the Legion site's search utility.
I'd much appreciate pointers to areas better describing
Legion's use of dataflow.
: I would suggest that you read much more carefully next time,
My apologies. Life is finite and as I mentioned in my previous posting
I haven't been following Legion/Mentat closely since Cilk and other projects
take dataflow much further.
I would like to know if recent Legion/Mentat does more with dataflow.
Burkhard
> I'd much appreciate pointers to areas better describing
> Legion's use of dataflow.
Well, if you're interested in Legion's use of dataflow, you were
looking at the guts of it in the section you quoted. That's the
underlying implementation. For a higher-level overview of the entire
system, see "Legion from 50,000 feet" at
http://legion.virginia.edu/papers.html
There are lots of little ways that Legion uses dataflow. For example,
the implementation of stateless object method invocation works by
manipulating dataflow graphs. Ditto for exceptions. Dataflow is just
the generalized mechanism for talking to someone; it's often trivial
graphs, but on occasion you can do interesting things.
Another user-friendly interface is the Fortran dataflow, BFS, which is
described in the manual. Again, since we didn't do a Fortran parser,
it's not as easy to use as it could be.
-- greg
I did in the first half of the nineties. See for example:
Svend Erik Knudsen, "Statement-Sets", Lecture Notes in Computer Science 1127,
Springer 1996, ff. 160-173.
As I still think that it was conceptually right (or better not wrong) what
I did those days, I currently plan to redo the work for a modern
environment.
Svend Erik Knudsen
--
Svend Erik Knudsen, ETH Zentrum, CH-8092 Zürich
I can hardly wait to see The Wrong Reverend coming in and telling us to
accept whatever he's programming in this week as our personal saviour.
-s
--
Copyright 2000, All rights reserved. Peter Seebach / se...@plethora.net
C/Unix wizard, Pro-commerce radical, Spam fighter. Boycott Spamazon!
Consulting & Computers: http://www.plethora.net/
Get paid to surf! No spam. http://www.alladvantage.com/go.asp?refid=GZX636
> I think that some of this was sponsored out of a lab in Kingston that
> started out with a whole load of FPS boxes (attached to a couple IBM
> mainframes) ... working on various chemical & atomic calculations, I
> vaguely remember them announcing along the way various calculation
> thresholds in the gflops range in the mid to late '80s. Then there
> were upgrades with FPS boxes in combination with IBM 3090s with vector
> facility.
checking some archives.
In June of '86, the Kingston Engineering & Science Center had 20 x64
FPS "attached processors" configured with range of memories between
32mbyte to 512mbyte that had peak of 1.5gflop.
... also from apr '87 CNSF (cornell national supercomputer facility)
announcement (note the FPS disk subsystem was RAID with 40mbyte/sec
transfer):
The CNSF provides a configuration consisting of an IBM 3090-400 with
four vector facilities and five attached scientific computers from
Floating Point Systems, giving a peak throughput of over 600 megaflops.
The IBM 3090-400 VF has a peak performance of 432 megaflops, with 128
megabytes of memory, 512 megabytes of expanded storage, and 105 gigabytes
of disk storage. Each application may use up to 1 gigabyte of memory.
Software support exists for vectorization, including a vectorizing
compiler and vector libraries, and for parallelization. VM/XA SF (CMS)
is the operating system; both interactive and batch modes are provided.
The five FPS 264 scientific computers each have 650 megabytes of disk
storage and 38 megaflops peak speed. Four of the FPS processors have
36 megabytes of memory each, and one has 16 megabytes of memory. These
processors are connected by a high-speed bus for parallel processing.
An IBM 4381 and two additional FPS 164 processors provide a development
environment. All the IBM and FPS systems fully support ANSI-standard
FORTRAN-77.
... & from s-comput sep. 86
List of Supercomputers on Bitnet/Netnorth/Earn
==============================================
Bitnet Center 1985-1986 1987
Nodename name (tentative)
== ======== ===================== ================ ================
1 JVNC - Princeton Cyber 205 ETA-10
2 ASUACAD Arizona State IBM 3090-200/VF
3 BOSTONU Boston University IBM 3090-200/VF Same
4 CORNELLD/ Theory - Cornell IBM 3084/QX128, IBM 3090/400,
CORNELLF FPS 264's FPS 264's
5 CPWPSCA/ Pittsburgh Cray X-MP/?? Same
CPWPSCB
6 CSU205 Colorado State Cyber 205 Same
7 DB0ZIB21 Berlin - Germany Cray 1M Same
8 DFVLROP1 German Aerospace Cray 1S Same
9 DGAIPP1S Max Planck - Germany Cray X-MP/14 Cray X-MP/24
10 DJUKFA11 Juelich - Germany Cray X-MP/22 Same
11 DKAUNI46 Karlsruhe - Germany Cyber 205 Same
12 DS0RUS1I Stuttgart - Germany Cray 1M Cray 2
13 FSUSUP Florida State Cyber 205 ETA-10
14 HASARA5 Amsterdam U - Neth. Cyber 205 Same
15 ISUMVS Iowa State NAS/AS 9160VPF Same
16 NCSAVMSA/ NCSA - Illinois Cray X-MP/24 Cray X-MP/48
NCSAVMSB
17 SDSC San Diego Cray X-MP/48 Same
18 UCBLYNX U C Berkeley Cray X-MP/12 Cray X-MP/14
19 UCBCMSA U C Berkeley IBM 3090-200/VF same
20 UCLAMVS UCLA IBM 3090-200/VF Same
21 UGA205 Univ of Georgia Cyber 205 Same
22 UNCACDC Univ of Calgary, CAN Cyber 205 Same
23 UTORONTO U of Toronto Cray X-MP/22 Same
24 VSP1 Boeing Data Services Cray X-MP/24 Same
25 VTVM1 Virginia Polytech IBM 3090-200/VF Same
--