In article <35...@lll-winken.LLNL.GOV> bro...@maddog.llnl.gov writes: >m...@mips.com pointed out some important considerations in the issue >of whether supercomputers as we know them will survive. I thought >that I would attempt to get a discussion started. Here is a simple >fact for the mill, related to the question of whether or not machines >delivering the fastest performance at any price have room in the >market. >Fact number 1: >The best of the microprocessors now EXCEED supercomputers for scalar >performance and the performance of microprocessors is not yet stagnant. >On scalar codes, commodity microprocessors ARE the fastest machines at >any price and custom cpu architectures are doomed in this market. >bro...@maddog.llnl.gov, bro...@maddog.uucp
This much has been fairly obvious for a few years now, and was made especially clear by the introduction of the MIPS R-3000 based machines at about the beginning of 1989. I think that this point is irrelevant to the more appropriate purpose of supercomputers, which is to run long (or large), compute-intensive problems that happen to map well onto available architectures.
Both factors (memory/time and efficiency) are important here. It is generally not necessary to run short jobs on supercomputers, and it is not cost-effective to run scalar jobs on vector machines. On the other hand, I have several codes that run >100 times faster on the ETA-10G relative to a 25 MHz MIPS R-3000. Since I need to run these codes for hundreds of ETA-10G hours, the equivalent time on the workstation is over one year.
The introduction of vector workstations (Ardent & Stellar) changes these ratios substantially. The ETA-10G runs my codes only 20 times faster than the new Ardent Titan. In this environment, the important question is, "Can I get an average of more than 1.2 hours of supercomputer time per day". If not, then the Ardent provides better average wall-clock turnaround.
It seems to me that the introduction of fast scalar and vector workstations can greatly enhance the _important_ function of supercomputers --- which is to allow the calculation of problems that are otherwise too big to handle. By removing scalar jobs and vector jobs of short duration from the machine, more resources can be allocated to the large calculations that cannot proceed elsewhere.
Enough mumbling.... -- John D. McCalpin - mccal...@masig1.ocean.fsu.edu mccal...@scri1.scri.fsu.edu mccal...@delocn.udel.edu
m...@mips.com pointed out some important considerations in the issue of whether supercomputers as we know them will survive. I thought that I would attempt to get a discussion started. Here is a simple fact for the mill, related to the question of whether or not machines delivering the fastest performance at any price have room in the market.
Fact number 1: The best of the microprocessors now EXCEED supercomputers for scalar performance and the performance of microprocessors is not yet stagnant. On scalar codes, commodity microprocessors ARE the fastest machines at any price and custom cpu architectures are doomed in this market.
In article <35...@lll-winken.LLNL.GOV> bro...@maddog.llnl.gov () writes: >Fact number 1: >The best of the microprocessors now EXCEED supercomputers for scalar >performance and the performance of microprocessors is not yet stagnant. >On scalar codes, commodity microprocessors ARE the fastest machines at >any price and custom cpu architectures are doomed in this market.
I take my hat off to them, too, because that's no mean feat. But don't forget that the supercomputers didn't set out to be the fastest machines on scalar code. If they had, they'd all have data caches, non-interleaved main memory, and no vector facilities. What the supercomputer designers are trying to do is balance their machines to optimally execute a certain set of programs, not the least of which are the LLL loops. In practice this means that said machines have to do very well on vectorizable code, while not falling down badly on the scalar stuff (lest Amdahl's law come to call.)
So while it's ok to chortle at how the micros have caught up on the scalar stuff, I think it would be an unwarranted extrapolation to imply that the supers have been superseded unless you also specify the workload. And by the way, it's the design constraints at the heavy-duty, high parallelism, all functional-units-going-full-tilt-using-the-entire-memory- bandwidth that make the price of the supercomputers so high, not the constraints that predominate at the scalar end. That's why I conclude that when the micro/workstation guys want to play in the supercomputer sandbox they'll either have to bring their piggy banks to buy the appropriate I/O and memory, or convince the users that they can live without all that performance.
Bob Colwell ..!uunet!mfci!colwell Multiflow Computer or colw...@multiflow.com 31 Business Park Dr. Branford, CT 06405 203-488-6090
In article <35...@lll-winken.LLNL.GOV> bro...@maddog.llnl.gov () writes: >The best of the microprocessors now EXCEED supercomputers for scalar >performance and the performance of microprocessors is not yet stagnant.
Is this a fair statement? I've played some with the i860 and I can write (by hand so far) code that is pretty fast. However, the programs where it really zooms are vectorizable. That is, I can make this micro solve certain problems well; but these are the same problems that a vector machines handle well.
Getting good FP performance from a micro seems to require pipelining. Keeping the pipe(s) full seems to require a certain amount of parallelism and regularity. Vectorizable loops work wonderfully well.
Perhaps I've misunderstood your intent, though. Perhaps you meant that an i860 (or Mips or whatever) can outrun a Cray (or Nec or whatever) on some programs. I guess I'm still doubtful. Do you have examples you can tell us about?
Gordon Bell, in the September CACM (p.1095) says, "By the end of 1989, the performance of the RISC, one-chip microprocessor should surpass and remain ahead of any available minicomputer or mainframe for nearly every significant benchmark and computational workload. By using ECL gate arrays, it is relatively easy to build processors that operate at 200 MHz (5 ns. clock) by 1990." (For those who don't know, Mr. Bell has his name on the PDP-11, the VAX, and the Ardent workstation.)
The big iron is fighting back, and that involves reducing their chip count. Once, a big cpu took ~10^4 chips: now it's more like 10^2. I expect it will shortly be ~10 chips. Shorter paths, you know.
I see the hot micros and the big iron meeting in the middle. What will distinguish their processors? Mainly, there will be cheap systems. And then, there will be expensive ones, with liquid cooling, superdense packaging, mongo buses, bad yield, all that stuff. Even when no multichip processors remain, there will still be $1K systems and $10M systems. Of course, there is no chance that the $10M system will be uniprocessor. -- Don D.C.Lindsay Carnegie Mellon Computer Science
In article <1...@m3.mfci.UUCP> colw...@mfci.UUCP (Robert Colwell) writes: >So while it's ok to chortle at how the micros have caught up on the scalar >stuff, I think it would be an unwarranted extrapolation to imply that the >supers have been superseded unless you also specify the workload.
Microprocessor development is not ignoring vectorizable workloads. The latest have fully pipeline floating point and are capable of pipelining several memory accesses. As I noted, interleaving directly on the memory chip is trivial and memory chip makers will do it soon. Micros now dominate the performance game for scalar code and are moving on to vectorizable code. After all, these little critters mutate and become more voracious every 6 months and vectorizable code is the only thing left for them to conquer. No NEW technology needs to be developed, all the micro-chip and memory-chip makers need to do is to decide to take over the supercomputer market.
They will do this with their commodity parts.
Supercomputers of the future will be scalable multiprocessors made of many hundreds to thousands of commodity microprocessors. They will be commodity parts because these parts will be the fastest around and they will be cheap. These scalable machines will have hundreds of commodity disk drives ganged up for parallel access. Commodity parts will again be used because of the cost advantage leveraged into a scalable system using commodity parts. The only custom logic will be the interconnect which glues the system together, and error correcting logic which glues many disk drives together into a reliable high performance system. The CM data vault is a very good model here.
NOTHING WILL WITHSTAND THE ATTACK OF THE KILLER MICROS!
In article <2...@brazos.Rice.edu> pres...@titan.rice.edu (Preston Briggs) writes: >In article <35...@lll-winken.LLNL.GOV> bro...@maddog.llnl.gov () writes: >>The best of the microprocessors now EXCEED supercomputers for scalar >>performance and the performance of microprocessors is not yet stagnant.
>Is this a fair statement? I've played some with the i860 and
Yes, in the sense that a scalar dominated program has been compiled for the i860 with a "green" compiler, no pun intended, and the same program was compiled with a mature optimizing compiler on the XMP, and the 40MHZ i860 is faster for this code. Better compilers for the i860 will open up the speed gap relative to the supercomputers.
>I can write (by hand so far) code that is pretty fast. >However, the programs where it really zooms are vectorizable.
Yes, this micro beats the super on scalar code, and is not too sloppy for hand written code which exploits its cache and pipes well. The compilers are not there yet for the vectorizable stuff on the i860. Even if there were good compilers, the scalar-vector speed differential is not as great on the i860 as it is on a supercomputer. Of course, interleaved memory chips will arrive and microprocessors will use them. Eventually the high performance micros will take the speed prize for vectorizable code as well, but this will require another few years of development.
In article <6...@pt.cs.cmu.edu> lind...@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: >Gordon Bell, in the September CACM (p.1095) says, "By the end of >1989, the performance of the RISC, one-chip microprocessor should >surpass and remain ahead of any available minicomputer or mainframe >for nearly every significant benchmark and computational workload.
It has already happened for SOME workloads, those which hit cache well and are scalar dominated. This was done without ECL parts. The ECL parts will only make matters worse for custom processors, as Bell indicates, dominating performance for all workloads.
>I see the hot micros and the big iron meeting in the middle. What >will distinguish their processors?
Nothing.
>Mainly, there will be cheap >systems. And then, there will be expensive ones, with liquid cooling, >superdense packaging, mongo buses, bad yield, all that stuff. Even >when no multichip processors remain, there will still be $1K systems >and $10M systems. Of course, there is no chance that the $10M system >will be uniprocessor.
The $10M systems will be scalable systems built out of the same microprocessor. These systems will probably be based on coherent caches, the micros having respectable on chip caches which stay in sync with very large off chip caches. The off chip caches are kept coherent through scalable networks. The "custom" value added part of the machine for the supercomputer vendor to design is the interconnect and the I-O system. The supercomputer vendor will still have a cooling problem on his hands because of the density of heat sources in such a machine.
In article <1...@m3.mfci.UUCP> colw...@mfci.UUCP (Robert Colwell) writes: >I take my hat off to them, too, because that's no mean feat. But don't >forget that the supercomputers didn't set out to be the fastest machines >on scalar code. If they had, they'd all have data caches, non-interleaved >main memory, and no vector facilities. What the supercomputer designers
Excuse me, non-interleaved main memory? I've always assumed that interleaved memory could help scalar code too. After all, instruction fetch tends to take place from successive addresses. Of course if main memory is very fast there is no point to interleaving it, but if all you've got is drams with slow cycle times, I would expect that interleaving them would benefit even straight scalar code. -- Mike Haertel <m...@stolaf.edu> ``There's nothing remarkable about it. All one has to do is hit the right keys at the right time and the instrument plays itself.'' -- J. S. Bach
In <35...@lll-winken.LLNL.GOV> bro...@maddog.llnl.gov wrote: > The best of the microprocessors now EXCEED supercomputers for scalar > performance and the performance of microprocessors is not yet stagnant. > On scalar codes, commodity microprocessors ARE the fastest machines at > any price and custom cpu architectures are doomed in this market.
Yes. And though this is a recent development, an unprejudiced observer could have seen it coming for several years. I did, and had the temerity to say so in print way back in 1986. My reasoning then is still relevant; *speed goes where the volume market is*, because that's where the incentive and development money to get the last mw-sec out of available fabrication technology is concentrated.
Notice that nobody talks about GaAs technology for general-purpose processors any more? Or dedicated Lisp machines? Both of these got overhauled by silicon microprocessors because commodity chipmakers could amortize their development costs over such a huge base that it became economical to push silicon to densities nobody thought it could attain.
You heard it here first:
The supercomputer crowd is going to get its lunch eaten the same way. They're going to keep sinking R&D funds into architectural fads, exotic materials, and the quest for ever more ethereal heights of floating point performance. They'll have a lot of fun and generate a bunch of sexy research papers.
Then one morning they're going to wake up and discover that the commodity silicon guys, creeping in their petty pace from day to day, have somehow managed to get better real-world performance out of their little boxes. And supercomputers won't have a separate niche market anymore. And the supercomputer companies will go the way of LMI, taking a bunch of unhappy investors with them. La di da.
Trust me. I've seen it happen before... -- Eric S. Raymond = e...@snark.uu.net (mad mastermind of TMN-Netnews)
In <35...@lll-winken.LLNL.GOV> bro...@maddog.llnl.gov wrote: > The best of the microprocessors now EXCEED supercomputers for scalar > performance and the performance of microprocessors is not yet stagnant. > On scalar codes, commodity microprocessors ARE the fastest machines at > any price and custom cpu architectures are doomed in this market.
Speaking of "commodities", I think a lot of people have lost sight of, or perhaps never recognized something about the vast majority of supercomputers. They are shared. How often do you get a Cray processor all to yourself? Not very often, unless you have lots of money, or Uncle Sam is picking up the tab so you can design atomic bombs faster. As soon as you have more than one job per processor, you're talking about *commodity Mflops*. The issue is no longer performance at any cost, because if it was you would order another machine at that point. The important thing is Mflops/dollar for most people, and that's where the micros are going to win in a lot of cases.
In article <35...@lll-winken.LLNL.GOV>, bro...@maddog.llnl.gov writes: > m...@mips.com pointed out some important considerations in the issue of whether > supercomputers as we know them will survive. I thought that I would attempt > to get a discussion started. Here is a simple fact for the mill, related to > the question of whether or not machines delivering the fastest performance > at any price have room in the market.
> Fact number 1: > The best of the microprocessors now EXCEED supercomputers for scalar > performance and the performance of microprocessors is not yet stagnant. > On scalar codes, commodity microprocessors ARE the fastest machines at > any price and custom cpu architectures are doomed in this market.
Brooks is making a good point here. By "this market", I assume he means the one defined above, (as well as by mash) - to paraphrase, "the fastest box at any price". I'll let go what "fastest" and "box" mean for sake of easy discussion :-) Most of us, I hope, can fathom what price is.
Anyway, I agree with mash that there is - albeit small - a market for the machine with the highest peak absolute performance (pick your number, the most popular one recently seems to be Linpack 100x100 all Fortran, Dongarra's Table One). The national labs have proven that point for almost a generation. I believe that it will take at least one more generation - those who weaned on machines from CDC, then CRI - before a more reasonable approach to machine procurement comes to pass. Thus, I disagree that there will *always* be a market for this sort of thing. Status symbols may be OK in cars, but for machines purchased with taxpayer dollars, the end is near. Hence, Brooks' "attack of the killer micros".
However, I do believe that there will always be a market for various types of processors and processor architectures. Killer scalar micros are finding wide favor as above. Vector supers and their offspring, e.g. the i860 and other 64-bit things, will always dominate codes which can be easily vectorized and do not lend themselves well to parallel computation. Medium-scale OTS-technology machines like Sequent will start (are starting) to dominate OLTP and RDBMS work, perfect tasks for symmetric MP machines. (Pyramid, too; hi Chris). Massive parallel machines will eventually settle into production shops, perhaps running one and only one application, but running it at speeds that boggle the mind.
It's up to the manufacturers to decide 1) which game they want to play 2) for what stakes 3) with what competition 4) for how long 5) etc. etc.etc. That's what makes working for a manufacturer such fun and terror at once.
In article <35...@lll-winken.LLNL.GOV> bro...@maddog.llnl.gov (Eugene
Brooks) writes: >Microprocessor development is not ignoring vectorizable workloads. The >latest have fully pipeline floating point and are capable of pipelining >several memory accesses. As I noted, interleaving directly on the memory >chip is trivial and memory chip makers will do it soon. [ ... more > stuff deleted ... ] > They will do this with their commodity parts.
It is not at all clear to me that the memory bandwidth required for running vector codes is going to be developed in commodity parts. To be specific, a single 64-bit vector pipe requires a sustained bandwidth of 24 bytes per clock cycle. Is an ordinary, garden-variety commodity microprocessor going to be able to use 6 32-bit words-per-cycle of memory bandwidth on non-vectorized code? If not, then there is a strong financial incentive not to include that excess bandwidth in commodity products....
In addition, the engineering/cost trade-off between memory bandwidth and memory latency will continue to exist for the "KILLER MICROS" as it does for the current generation of supercomputers. Some users will be willing to sacrifice latency for bandwidth, and others will be willing to do the opposite. Economies of scale will not eliminate this trade-off, except perhaps by eliminating the companies that take the less profitable position (e.g. ETA).
>Supercomputers of the future will be scalable multiprocessors made of >many hundreds to thousands of commodity microprocessors. They will >be commodity parts because these parts will be the fastest around and >they will be cheap.
It seems to me that the experience in the industry is that general-purpose processors are not usually very effective in parallel-processing applications. There is certainly no guarantee that the uniprocessors which are successful in the market will be well-suited to the parallel supercomputer market -- which is not likely to be a big enough market segment to have any control over what processors are built....
The larger chip vendors are paying more attention to parallelism now, but it appears to be in the context of 2-4 processor parallelism. It is not likely to be possible to make these chips work together in configurations of 1000's with the application of "glue" chips....
This is not to mention the fact that software technology for these parallel supercomputers is depressingly immature. I think traditional moderately parallel machines (e.g. Cray Y/MP-8) will be able to handle existing scientific workloads better than 1000-processor parallel machines for quite some time.... -- John D. McCalpin - mccal...@masig1.ocean.fsu.edu mccal...@scri1.scri.fsu.edu mccal...@delocn.udel.edu
There's more to supercomputing than scalar speed. One of the primary things you can do on a supercomputer is run large programs quickly. Virtual memory is nice, but some programs cause it to thrash. That's when it's nice to have a real 4GB machine. The same thing can be said about vector processing, some programs can be done using vector processors (or lots of parallel processors) faster than scalar.
I don't see the death of the supercomputer, but a redefinition of problems needing one. I have more memory on my home computer than all the computers at this site when I started working here (hell the total was <2MB). Like wise CPU and even disk. The number of problems which I can't solve on my home system is a lot smaller than it was back then.
However, thats the kicker, that real problems are limited in size. Someone said that the reason for micros catching up is that the development cost could be spread over the users. For just that reason the vector processors will stay expensive, because fewer users will need (ie. buy) them. There will always be a level of hardware needed to solve problems which are not shared by many users. While every problem has a scalar portion, many don't need vectors, or even floating point.
I think this goes for word size, too. When I see that the Intel 586 will have a 64 bit word I fail to generate any excitement. The main effect will be to break all the programs which assume that short==16 bits (I've ported to the Cray, this *is* a problem). If you tell me I can have 64 bit ints, excuse me if I don't feel the need to run right out and place an order. Even as memory gets cheaper I frequently need 1-2 million ints, and having them double in size is not going to help keep cost down.
I think that the scalar market will continue to be micros, but I don't agree with Eric that the demand for supercomputers will vanish, or that micros will catch them for the class of problems which are currently being run on supercomputers. The improving scalar performance will reduce the need for vector processing, and keep them from getting economies of scale. He may well be right that some of the companies will fall, since the micros will be able to solve a lot of the problems which are not massively vectorable or inherently require huge addressing space.
-- bill davidsen (david...@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
This article certainly generated some responses. Unfortunately, some responders seemed to miss (or chose to ignore :-) the tongue-in-cheek nature of the title.
I used to argue, only a couple of years ago, that supercomputers produced cheaper scalar computing cycles than "smaller" systems. That isn't true today. However, supercomputers still produce cheaper floating point results on vectorizable jobs. And, they produce memory bandwidth cheaper than other systems. That may change, too.
Q: What will it take to replace a Cray with a bunch of micros?
A: (IMHO) : A "cheap" Multiport Interleaved Memory subsystem. In order to do that, you need to provide a way to build such subsystems out of a maximum of 3 different chips, and be able to scale the number of processors and interleaving up and down. A nice goal might be a 4-port/32-way-interleaved 64-bit-wide subsystem cheap enough for a $100 K system. (That is only enough memory bandwidth for a 1 CPU Cray-like system, or 4 micro based CPUs with only 1 word/cycle required, but it would sure be a big step forward.) The subsystem needs to provide single level local-like memory, like a Cray.
[Or, show a way to make, in software, a truly distributed system as efficient as a local memory system (PhD thesis material...- I am betting on hardware solutions in the short run...)].
You also need to provide a reasonably reliable way for the memory to subsystem connections to be made. This is sort of hard hardware level engineering. For example, you probably can't afford the space for 32 VME buses...
Does anyone have any suggestions on how the connections into and out of such memory subsystems could be made without a Cray-sized bundle of connectors?
On the topic of the original posting, what I have seen is that micro based workstations are eating away fast at the minicomputer market, just on the basis of price performance, leaving only workstation clusters, vector machines (Convex-sized to Cray-sized), and other big iron, such as very large central storage servers. So, I wouldn't write off big iron just yet, but obviously some companies will be selling a lot more workstations and a lot fewer minicomputers than they were planning.
Quiz: Why does Cray use *8* way interleaving per memory *port* on the Cray Y-MP?
Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamas...@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
In article <7...@thor.acc.stolaf.edu> m...@thor.stolaf.edu () writes: >In article <1...@m3.mfci.UUCP> colw...@mfci.UUCP (Robert Colwell) writes: >>I take my hat off to them, too, because that's no mean feat. But don't >>forget that the supercomputers didn't set out to be the fastest machines >>on scalar code. If they had, they'd all have data caches, non-interleaved >>main memory, and no vector facilities. What the supercomputer designers
>Excuse me, non-interleaved main memory? I've always assumed that >interleaved memory could help scalar code too. After all, instruction >fetch tends to take place from successive addresses. Of course if >main memory is very fast there is no point to interleaving it, but >if all you've got is drams with slow cycle times, I would expect >that interleaving them would benefit even straight scalar code.
I meant that as a shorthand way of putting across the idea that the usual compromise is one of memory size, memory bandwidth, and memory latency. For the canonical scalar code you don't need a very large memory, and the bandwidth may not be as important to you as the latency (pointer chasing is an example).
The point I was making was that the supercomputers have incorporated design decisions, such as very large physical memory, and very high bandwidth to and from that memory, so that their multiple functional units can be kept usefully busy while executing 'parallel' code. Were you to set out to design a machine which didn't (or couldn't) use those multiple buses (pin limits on a single-chip micro for instance) then that bandwidth isn't worth as much to you and you might be better off with a flat, fast memory, which is what most workstations do (or used to do, anyway).
Bob Colwell ..!uunet!mfci!colwell Multiflow Computer or colw...@multiflow.com 31 Business Park Dr. Branford, CT 06405 203-488-6090
In article <35...@lll-winken.LLNL.GOV>, bro...@maddog.llnl.gov writes: > Fact number 1: > The best of the microprocessors now EXCEED supercomputers for scalar > performance and the performance of microprocessors is not yet stagnant. > On scalar codes, commodity microprocessors ARE the fastest machines at > any price and custom cpu architectures are doomed in this market.
Alas, I believe you have been sucked into the MIPS=Performance falacy. There is *not* a simple relationship between something as basic as scalar performance and something as complex as overall application (or even routine) performance.
Case in point: The R2000 chipset implemented on the R/120 (mentioned by others in this conversation) has, by all measures *excellent* scalar performance. One would benchmark it at about 12-14 times a microVAX. However, in real-world, doing-useful-work, not-just-simply-benchmarking situations, one finds that actual performance (i.e., performance in very simple routines with very simple algorithms doing simple floating point operations) is about 1/2 that expected.
Why? Because memory bandwidth is *not* as good on a R2000 as it is on other machines, even machines with considerably "slower" processors. There are several components to this, the most important being the cache implementation on an R/120. Other implementations using the R2000/R3000/Rx000 chipsets might well do much better, but only with considerable effort and cost, both of which mean that those "better" implementations will begin to approach the price/ performance of the "big" machines that you argue will be killed by the price/performance of commodity microprocessors.
I think you are to a degree correct, but one must always tailor such generalities with a dose of real-world applications. I didn't, and I got bit to the tune of a fine bottle of wine. :-(
Phil
___________________________________________________________________________ ____ Philip A. Naecker Consulting Software Engineer Internet: p...@propress.com Suite 101 uunet!prowest!pan 1010 East Union Street Voice: +1 818 577 4820 Pasadena, CA 91106-1756 FAX: +1 818 577 0073 Also: Technology Editor DEC Professional Magazine ___________________________________________________________________________ ____
In article <35...@lll-winken.LLNL.GOV> bro...@maddog.llnl.gov (Eugene Brooks) writes:
(Another amusing challenge:)
>After all, these little critters mutate and become more voracious every >6 months and vectorizable code is the only thing left for them to conquer.
(I like the picture of fat computer vendors, or at least fat marketing depts, hunched together in bunkers hiding from the killer micros. I have no doubt that they are planning a software counterattack. Watch out for a giant MVS robot built to save the day! :-)
>No NEW technology needs to be developed, all the micro-chip and memory-chip >makers need to do is to decide to take over the supercomputer market.
> They will do this with their commodity parts.
The only problem I see with this is the interconnection technology. The *rest* of it is, or will soon be, commodity market stuff.
>Supercomputers of the future will be scalable multiprocessors made of many >hundreds to thousands of commodity microprocessors.
The appropriate interconnection technology for this has not, to my knowledge, been determined. Perhaps you might explain how it will be done? The rest, I agree, is doable at this point, though some of it is not trivial.
Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamas...@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
In article <1...@csinc.UUCP> rpeg...@csinc.UUCP (Rob Peglar x615) writes:
>In article <35...@lll-winken.LLNL.GOV>, bro...@maddog.llnl.gov writes: >that point for almost a generation. I believe that it will take at least >one more generation - those who weaned on machines from CDC, then CRI - >before a more reasonable approach to machine procurement comes to pass.
In my experience, gov't labs are very cost conscious. I could tell a lot of stories on this. Suffice it to say that many people who have come to gov't labs from private industry get frustrated with just how cost conscious the gov't can be (almost an exact quote: "In my last company, if we needed another 10GBytes, all we had to do was ask, and they bought it for us." That was when 10 GBytes cost $300 K.) The reason supercomputer are used so much is that they get the job done more cheaply. You may question whether or not new nuclear weapons need to be designed, but I doubt if the labs doing it would use Crays if that were not the cheapest way to get the job done. Private industry concerns with the same kinds of jobs also use supercomputers the same way. Oil companies, for example. At various times, oil companies have owned more supercomputers than govt labs.
>Thus, I disagree that there will *always* be a market for this sort of >thing. Status symbols may be OK in cars, but for machines purchased with >taxpayer dollars, the end is near. Hence, Brooks' "attack of the killer >micros".
I will make a reverse claim: People who want status symbols buy PC's for their office. These PC's, the last time I checked, were only 1/1000th as cost effective at doing scientific computations as supercomputers. Talk about *waste*... :-)
Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamas...@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
In article <33...@ames.arc.nasa.gov> lamas...@ames.arc.nasa.gov (Hugh LaMaster) writes: >>Supercomputers of the future will be scalable multiprocessors made of many >>hundreds to thousands of commodity microprocessors.
>The appropriate interconnection technology for this has not, to my knowledge, >been determined. Perhaps you might explain how it will be done? The rest, >I agree, is doable at this point, though some of it is not trivial.
This is the stuff of research papers right now, and rapid progress is being made in this area. The key issue is not having the components which establish the interconnect cost much more than the microprocessors, their off chip caches, and their main memory. We have been through message passing hypercubes and the like, which minimize hardware cost while maximizing programmer effort. I currently lean to scalable coherent cache systems which minimize programmer effort. The exact protocols and hardware implementation which work best for real applications is a current research topic. The complexity of the situtation is much too high for a vendor to just pick a protocol and build without first running very detailed simulations of the system on real programs.
In article <MCCALPIN.89Oct16141...@masig3.ocean.fsu.edu> mccal...@masig3.ocean.fsu.edu (John D. McCalpin) writes:
>The larger chip vendors are paying more attention to parallelism now, >but it appears to be in the context of 2-4 processor parallelism. It >is not likely to be possible to make these chips work together in >configurations of 1000's with the application of "glue" chips....
These microprocessors, for the most part, are being designed to work in a small processor count coherent cache shared memory environment. This is the reason why examining scalable coherent cache systems is so imporant. The same micros, with their capability to lock a cache line for a while to do an indivisible op, will work fine in the scalable systems. I agree that they won't be optimal, but they will be within 90% of optimal and that is all that is required. The MAJOR problem with current micros in a scalable shared memory environment is their 32 bit addressing. Unfortunately, no 4 processor system will ever need more than 32 bit addresses, so we will have to BEG the micro vendors to put in bigger pointer support..
>This is not to mention the fact that software technology for these >parallel supercomputers is depressingly immature. I think traditional >moderately parallel machines (e.g. Cray Y/MP-8) will be able to handle >existing scientific workloads better than 1000-processor parallel >machines for quite some time....
The software question is the really hary one, that is why LLNL is sponsoring the Massively Parallel Computing Initiative. We see scalable machines being very cost effective and are making a substantial effort in the application software area.
In article <33...@ames.arc.nasa.gov> lamas...@ames.arc.nasa.gov (Hugh LaMaster) writes: >I will make a reverse claim: People who want status symbols buy PC's for their >office. These PC's, the last time I checked, were only 1/1000th as cost >effective at doing scientific computations as supercomputers. Talk about >*waste*... :-)
A "PC" with a MIPS R3000 or an Intel i860 in it is about 70 times more cost effective for scalar codes, and we run a lot of those on our supercomputers at LLNL, and about 3 to 7 times more cost effective for highly vectorized codes. In fact, much to our computer center's dismay, research staff are voting with their wallet and buying these "PC"s in droves. Our computer center is responding by buying microprocessor powered machines, currently in bus based shared memory multiprocessor form, but eventually in scalable shared memory multiprocessor form.
This came for various people - the references are so confusing I removed them so as not to put the wrong words in someone's mouth:
>>>Supercomputers of the future will be scalable multiprocessors made of many >>>hundreds to thousands of commodity microprocessors.
>This is the stuff of research papers right now, and rapid progress is being >made in this area. The key issue is not having the components which establish >the interconnect cost much more than the micros, their off chip caches, >I currently lean to scalable coherent cache systems which minimize programmer >effort. The exact protocols and hardware implementation which work best >for real applications is a current research topic.
Last year, I took a graduate level course in parallel computing here at Princeton. I would like to make the following comments, which are my *own*:
1) There is no parallel machine currently the works faster than non-parallel machines for the same price. The "fastest" machines are also non-parallel - these are vector processors.
2) A lot of research is going on - and went on for over 10 years now. As far as I know, no *really* scalable parallel architecture with shared memory exists that will scale far above 10 processors (i.e. 100). And it does not seems to me this will be possible in the near future. "A lot of research" does not imply any effective results - especially in CS - just take a look how many people write articles improving time from O(N log log N) to O(Nlog log log N), which will never be practical for N<10^20 or so (the log log is just an example; you know what I mean).
3) personally I feel parallel computing has no real future as the single cpu gets a 2-4 folds performance boost every few years, and parallel machines constructions just can't keep up with that. It seems to me that for at least the next 10 years, non-parallel machines will still give the best performance and the best performance/cost.
4) I think Cray-like machines will be here for a long long time. People talk about Cray-sharing. This is true, but when an engineer needs a simulation to run and it takes 1 day each time, if you run it on a 2 or 3 day machine, he sits doing nothing for that time, which costs you a lot, i.e. it is turn-around time that really matters. And while computers get faster, its seems software complexity and the need for faster and faster machines is growing even more rapidly.
In article <12...@cgl.ucsf.EDU> sei...@cgl.ucsf.edu (George Seibel) writes:
* In <35...@lll-winken.LLNL.GOV> bro...@maddog.llnl.gov wrote: * > On scalar codes, commodity microprocessors ARE the fastest machines at * > any price and custom cpu architectures are doomed in this market. * * Speaking of "commodities",... * ... * *commodity Mflops*. The issue is no longer performance at any cost, because * if it was you would order another machine at that point. The important * thing is Mflops/dollar for most people, and that's where the micros are * going to win in a lot of cases. ---- well, first... Maybe, even, the _commodity_ is _not_ _M_flops per dollar, but just _flop_flops per dollar? That is, if the cycle time to "set up the problem", "crunch the numbers", "get the plot/list/display" is under _whatever_ upper limit fits with _my_ mode of "useful work", then I very likely _do_not_care_ if it gets any shorter (i.e. if the _flop_flops per second per dollar goes higher). This becomes, IMHO, even more significant if my "useful" cycle time is available to me _truly_ whenever _I_ darn well feel the need. All of which works, again, to the advantage of microcrunchers. ---- and, second... A non-trivial part of the demand for megacrunchers, IMHO, stems from solution methods which have evolved from the days when _only_ "big" machines were available for "big" jobs (any jobs?) and _just_had_to_be_ shared. For what _I_ do, anyhow, (and probably a _lot_ of other folk somewhere out there), the "all-in-one-swell-foop" analyses/programs/techniques are not the _only_ way to get to the _results_ needed to do the job--and they may well _not_ be the "best" way. I often find that somewhat more of somewhat smaller steps get me to my target faster than otherwise. That is, if I can only get 1 or 2 or 10 passes per day through the megacruncher, the job takes more work from me and more time on the calendar and more bucks from whoever is paying the tab, than if I can just as many as I need of smaller passes. ---- also third... And those smaller passes may well be easier (and thus faster) to program, and more amenable to validation/assurance/etc. And they may admit algorithms which work plenty fast on a dedicated machine even if it is pretty small but would not work very fast at all on a shared machine even if it is quite big (maybe especially because it is "big architecture"). ---- so, finally... I believe in micros. ------------- regardz, Ken Leonard
In article <33...@ames.arc.nasa.gov> lamas...@ames.arc.nasa.gov (Hugh LaMaster) writes:
[...]
>Does anyone have any suggestions on how the connections into and out of such >memory subsystems could be made without a Cray-sized bundle of connectors?
[...] Multiplexed optical busses driven by integrated receivers with the optics, decoders, and logic-level drivers on the same substrate. It's the obvious solution (one I think many companies are working on).
DISCLAIMER: This opinion is in no way related to my employment with Convex Computer Corporation. (As far as I know we aren't working on optical busses, but then I'm not in New Products).