> In last couple of decades the exponential increase in computer
> performance was because of the advancements in both computer
> architecture and fabrication technology.
Call me cynical, but I can't think of any advances in computer
architecture that have entered the mainstream in the last twenty years.
Surely its *all* been in fabrication.
> What will be the case for future ? Can I comment that the next major
> leap in computer performance will not because of breakthroughs in
> computer architecture but rather from new underlying technology ?
I see no reason for the fabrication not to improve, but whether those
improvements will boost single-core speed seems, umm, less certain. Right
now, everyone is going parallel, despite the fact that no-one has much
idea of how to use that parallel power. I'd say the need for research has
never been greater.
Agreed. Most of the complications of the past 20 years have either
been old techniques reintroduced, or dubiously advantages. That
isn't to denigrate the engineers, but the objectives they were given.
>I see no reason for the fabrication not to improve, but whether those
>improvements will boost single-core speed seems, umm, less certain. Right
>now, everyone is going parallel, despite the fact that no-one has much
>idea of how to use that parallel power. I'd say the need for research has
>never been greater.
I can. Until now, the problems were (hard) engineering ones; they
are now (hard) physics ones. The killer is to obtain the advantages
of smaller processes without the major downsides: power consumption,
unreliability and fab cost. It's got a LOT tougher.
I agree about parallelism, of course.
Regards,
Nick Maclaren.
Instruction set architecture: multi-media extensions
micro-architecture: 2-bit branch prediction
Yes, but utimately boring and really just rearranging the deck chairs.
Compared to the 70's and 80's the pace of development is essentially
static. Then there were dozens ? of architectures. Now we have what
intel and amd deem good for us. ie: what makes them the most money for
as long as possible. In this respect, there's a basic conflict between
the aims of science / pursuit of excellence and the aims of business.
Don't expect any advance from the established order. They know far too
much already :-)...
Regards,
Chris
After having designed an instruction set I think there are some
architectural improvements to be made, but they are few. Most of my
research is now in mathematics.
A few of the architecture and fab improvements I see happening are:
1. DISCO-FETs for faster lower power switching.
2. A simpler instruction set.
3. Multiple cores set up as an execution ring, with memory spliting,
cache spliting.
4. New number systems.
cheers jacko
http:/sites.google.com/site/jackokring
> Yes, but utimately boring and really just rearranging the deck chairs.
> Compared to the 70's and 80's the pace of development is essentially
> static.
Name one feature introduced in the 80s which you consider not
"rearranging the deck chairs". For extra credit identify the
architecture in the 50s and 60s which first used this feature (or a
variant).
>
> Name one feature introduced in the 80s which you consider not
> "rearranging the deck chairs". For extra credit identify the
> architecture in the 50s and 60s which first used this feature (or a
> variant).
No credits to gain, sorry. Not a computer architect, but do have a
general interest and design hardware and write software around
architecture. My comment about progress has more to do with performance
gains, apparent new directions and willingness to take risks in
computing, which seemed very significant in the 70's and 80's. The 3
year old xeon machine on the desktop here seems not much faster than the
10+ year old 1997 Alpha box recently retired, so I naturally wonder,
what's happened in the meantime ?. All the hardware parts look familiar,
even down to the dos'ish bios and pci slots, when one would have
expected to see something very different after 10 to 15 years of
'progress'. Ok, we have pci express, more and faster memory, dual cpus
that have heatsinks filling half the box, but what else has changed ?.
Of course, this is not particularly scientific, but seems to be a valid
point of view as a user of casual interest and suggests that there are
other forces at work. I'm sure that there is, as usual, no shortage of
good ideas.
Seems to me that barriers to progress are as much cultural as
commercial. In the 60's the US put men on the moon and the attitudes
that allowed that to happen are being lost by an aging western
civilisation that has become far too complacent, safe and risk averse.
All this trickles down and becomes pervasive. Add to that the
monopolisation of the architectural gene pool and i'm not expecting much
to happen any time soon...
Regards,
Chris
Sorry, can't let that one go. There have been tremendous improvements in
branch prediction accuracy from the late eighties to today. Without
highly accurate branch prediction, the pipeline is filled with too many
wrong path instructions so it's not worth going to deeper pipelines.
Without deeper pipelines we don't get higher clock rates. So without
highly accurate branch predictors, clock rates and performance would be
much worse than they are today. If we hadn't hit the power wall in the
early 2000s we would still be improving performance through better branch
prediction and deeper pipelines.
Trace cache is another more-or-less recent microarchitectural innovation
that allowed Pentium 4 to get away with decoding one x86 instruction
per cycle and still have peak IPC greater than 1.
Cracking instructions into micro-ops, scheduling the micro-ops, then fusing
the micro-ops back together in a different way later in the pipeline allows
an effectively larger instruction window and more efficient pipeline.
That's a relatively recent innovation, too.
History-based memory schedulers are another recent innovation that
promises to improve performance significantly.
MIT built RAW and UT Austin built TRIPS. These are really weird
architectures and microarchitectures that could be very influential
for future processors.
Not to mention network processors and GPUs. See Hot Chips proceedings
for more examples of microarchitectural innovation in real chips, and
ISCA/MICRO/HPCA for more speculative stuff.
--
Daniel Jimenez djim...@cs.utexas.edu
"I've so much music in my head" -- Maurice Ravel, shortly before his death.
" " -- John Cage
Actually trace cache goes back to the VAX HPS, circa 1985.
They called the decoded instruction cache a "node cache".
As far as I know, VAX HPS was never built though.
> Cracking instructions into micro-ops, scheduling the micro-ops, then fusing
> the micro-ops back together in a different way later in the pipeline allows
> an effectively larger instruction window and more efficient pipeline.
> That's a relatively recent innovation, too.
Except for the fused micro-ops, this was also VAX HPS.
See
Critical Issues Regarding HPS, A High Performance Architecture
Pratt, Melvin, Hwu, Shebanow
ACM 1985
Eric
Was it a trace cache, i.e., were decoded instructions stored in order of
execution rather than the order of the program text?
>> Cracking instructions into micro-ops, scheduling the micro-ops, then fusing
>> the micro-ops back together in a different way later in the pipeline allows
>> an effectively larger instruction window and more efficient pipeline.
>> That's a relatively recent innovation, too.
>
>Except for the fused micro-ops, this was also VAX HPS.
The fused micro-ops is the innovation. It allows an effectively larger
instruction window, so more ILP, more performance. One can argue that
micro-ops are equivalent to microcode, which predates minis.
They were discussing a number of design issues and
evaluating different possible ways of implementing it,
but yes I think it is the same idea.
They state the nodes are entered into the node cache
in decode order, and must be merged with certain non-microcode
values like immediate literal instruction constants.
Also it must handle 'vax-isms' like procedure call
instructions CALLG/CALLS pick up their register save mask
from the routine entry point, and save that into the cache.
>>> Cracking instructions into micro-ops, scheduling the micro-ops, then fusing
>>> the micro-ops back together in a different way later in the pipeline allows
>>> an effectively larger instruction window and more efficient pipeline.
>>> That's a relatively recent innovation, too.
>> Except for the fused micro-ops, this was also VAX HPS.
>
> The fused micro-ops is the innovation. It allows an effectively larger
> instruction window, so more ILP, more performance. One can argue that
> micro-ops are equivalent to microcode, which predates minis.
But for the VAX, the instr. decode was very much a bottleneck
because it was originally designed as an LL sequential parse.
The node cache bypasses that bottleneck and, in theory,
allows parallel scheduling of micro-ops to their
OoO function units.
Eric
I am not so sure.
There are significant improvements to be had in single thread
performance by going to really large instruction windows. Multilevel
instruction windows. The key is how to do this in a smart and power
efficient manner. Such as I have patent pending, on inventions I made
outside Intel. (I owe y'all an article on this.)
I doubt that this will deliver performance improvements linear in the
number of transistors. However, all of the evidence that I have seen
indicates that it will deliver performance proportional to the square
root of the number of trsnistors.
By the way, some people call this - performance proportional to square
root of the number of transistors - Pollack's Law. Fred Pollack, my old
boss, presented it at some big conferences. Myself, I told Fred about
this "law", which I first encountered in Tjaden and Flynn's paper that
said performance is proportional to the square root of the number of
branches looked past. After encountering such square root laws in
several places, I conjectured the generalization, which seems to be
confirmed by many metrics. To differentiate myself, let me conjecture
further that in a space of dimension d, performance is proportional to
the (d-1)/d-th root of the number of devices. E.g. in 3D, I conjecture
that performance is proportional to the 2/3-rds root of the number of
devices.
I suspect that there are significant improvements in parallel processing
to be made, most likely in the "how to make it easier" vein. I'm in the
many, many, processors camp.
I believe that there is significant potential to apply parallelism to
improve single thread performance. Speculative multithreading, SpMT. I
wish Haitham Akkary luck as he carries the torch for this research. DMT.
I wish I could do the same.
Along the lines of technology, as indicated above I suspect that 3D
integration could bring many benefits. But heat dissipation is such a
big problem that I doubt that it is reasonable to hope for this in the
next 10 years. I.e. I doubt that we will have cubes of logic intermixed
with memory 1 cm on a side. However, incremental progress will be made.
2-4 layers of transistors within 10 years.
Although smaller, faster, more power efficient devices are always a
possibility, I think that the human brain points out the capabilities of
relatively slow computation, albeit with complex elements and high
connectivity. I suppose this counts as technology, although not
necessarily on the traditional access of evolution.
Sorry, no.
The HPS (and HPSm) node cache was not a trace cache. It did not have a
single entry point for a trace of instructions.
I invented the trace cache, or at least the term "trace cache", while
taking the first Wen-Mei Hwu (the H in HPS) taught after receiving his
Ph.D. and coming to UIUC in 1986 or 1987. I invented it to solve the
problems that a decoded instruction cache had with variable length
instructions (and also to support forms of guarded execution, what would
now be called control independence or hammocks or hardware if
conversion). Wen-mei was my MS advisor. I am sure that he would have
informed me if the trace cache was just the node cache rehashed (and
given me a bash, and thrown it in the trash, and not given me any cash).
Alex Peleg and Uri Weiser may have preceded me in inventing the trace
cache, and certainly patented it first. (I never patented anything at
UIUC, or prior to Intel.) But so far as I know, I invented the term
"trace cache", and popularized it at Intel in 1991, before Peleg and Weiser.
Oh, really? I don't see it. The big difference is that the modern
processes remove the size limits that made earlier branch predictors
relatively ineffective. And that's not architecture.
>History-based memory schedulers are another recent innovation that
>promises to improve performance significantly.
Don't bet on it. A hell of a lot of the papers on the underlying
requirements are based on a gross abuse of statistics. They make
the cardinal (and capital) error of confusing decisions based on
perfect knowledge (i.e. foresight) with admissible decision rules
(which can use only history).
The point here is that, as with branch prediction, a hell of a lot
of the important problem codes are precisely those that are least
amenable to such optimisations. The ONLY solution is to get them
rewritten in a more civilised paradigm.
Regards,
Nick Maclaren.
Isn't that all y'all?
>I doubt that this will deliver performance improvements linear in the
>number of transistors. However, all of the evidence that I have seen
>indicates that it will deliver performance proportional to the square
>root of the number of trsnistors.
Not what I have seen. The point here is that what you say IS true
for the better HPC and many benchmarketing codes, but isn't for
many of the problem codes. That's an old 1970s problem, when the
usual distinction was between user and supervisor mode code - the
former gained a lot of benefit from cache, but the latter didn't.
Again, that brings back my well-worn hobby-horse: NO such advance
can be of near-universal benefit unless we change to using coding
paradigms and programming languages that are better suited to such
optimisations.
Regards,
Nick Maclaren.
> In this respect, there's a basic conflict between the aims
> of science / pursuit of excellence and the aims of business.
Once upon a time there was an attempt to satisfy "the aims of science /
pursuit of excellence".
It was called Itanium. Turned out to be a multi-billion dollar flop. No more
projects like that please!
Best regards
Piotr Wyderski
All of computers has been deck chair Olympics since the Babbage age.
RISC being smaller chairs for easier type decoding. I think the multi
core approach is the way forward. The almost linear potential
improvements which may become possible are the ideal. The idea of
passing register sets between cores and splitting the cache and memory
into sections related to each core does free up some transistors, but
does mean sometimes a core can not execute anything ... but the cores
can be small...
The basic flexibility of instruction software sequencing to reduce the
size of hardware, is what micros were about. Speed innovations are
needed for some tasks, but often these are better done by custom
functional units. For your average divx video user 800MHz is ok.
I suggest DISCO-FETs for efficiency of switching, not necessarily for
extra speed. With simple instruction sets, cache line unit of
execution can be done by configuration of a 32 ALU etc. soft
configuration, and the cache line can be executed in one go. The
pipeline concept does inhibit some of the speed in this design, due to
the differing time for each execution step. Time = Max. Steps instead
of Time = sum of Steps.
cheers jacko
p.s. I think this was called async design, but pipelines were easier
to organize.
Yes, really. Recent branch predictors have been made far more accurate by
techniques that are orthogonal to the size of the predictor. Two-level
branch prediction was introduced in 1991 and was the first big improvement
over a PC-indexed table of counters. However, even with a ridiculously huge
two-level predictor you can only go so far in terms of accuracy, and you lose
with latency. EV6 featured a tournament-style hybrid predictor choosing
between local and global history predictors; this is more accurate than a
larger two-level predictor. Intel's recent offerings have featured a loop
predictor capable of perfectly predicting loops with trip counts up to 64
(IIRC); that's impossible with two-level predictors. Intel has also recently
started putting indirect branch predictors in their products giving an
additional performance boost for e.g. dense switch statements and virtual
dispatch; until indirect branch predictors came along, the best you could
do was hope the target is cached in the BTB, which is often isn't for
complex control flow.
Exploiting larger tables enabled by improved process technology is not
trivial, either. The latency of a larger table means you have do play some
tricks to get a prediction in a single cycle, e.g. ahead-pipelining (not
pipelining, *ahead*-pipelining). Being able to do that without losing
accuracy is another recent innovation (i.e. how do you predict a branch
if you don't know which branch you're predicting? It's not easy.)
In terms of research, branch prediction stagnated in the late 1990s but
in the early to mid 2000s there was a resurgence of activity exploring
all kinds of weird ideas for improving accuracy, including some very
practical ones. Seznec's skewed predictor would have been in EV8 had it
not been cancelled; the clever organization and update policy of this
predictor make it more accurate than a two-level predictor with more than
twice the area.
My apologies - I did not mean to imply these were equivalent,
and the paper doesn't use the term 'trace cache', as I noted.
HPS wasn't a trace cache machine, but it is so similar that it
probably planted the seeds that became trace caching.
The functional description in the paper I cited and others
in the HPS series have many similarities to a trace cache idea:
caching the risc-y micro-ops of decoded instructions,
merging of instruction literal values into the cache,
repairs (aka replay traps), register renaming,
Tomasulo OoO execution, etc.
The cache structure itself was a question because of the
one-to-many mapping of instructions to nodes (micro-ops).
They discuss the cache space allocation and fragmentation
issues but do not propose a specific solution
(two level index? LRU? garbage collect?).
The micro ops are not physically stored together as a
basic block, so that leap has not been made yet.
However all the pieces are in play.
This all left me with the impression that HPS was an
early step along the design path the led to Pentium 4.
Some of these HPS papers PDFs are available online at
http://www.zytek.com/~melvin/pubs.html
And easy read and of historical interest is:
"An HPS Implementation of VAX: Initial Design and Analysis", 1986
Eric
Well, the flop is still a multi-billion dollar business I think, and
we support some thousand(s) of them that are actively used by some of
the world's largest companies for a substantial portion of their
critical computing needs.
There were perhaps more new and interesting ISA features in Itanium
than in any other I can recall. You may not like them, or may not
consider them a success, but it's definitely in the realm of "new
and interesting within the last 20 years" in at least a few respects.
G.
Of course everyone ignored my suggestion that old architects might
want to try reading some new cellular biology texts, but I think
there are many interesting sources for inspiration there. Like the
fact that you can have interconnect that requires zero energy, etc.
G.
Here's the real deal.
As my friend, whom I otherwise respect, says, the discovery of
computers was one of the most signal events in human history, perhaps
the most important since the discovery of writing. Ask anyone in the
business.
Everyone recognized that reality, and the smartest intellects ever,
including the dead and the as yet to be born, were conjured, and
everything that could be done was done in the first six days. Ask
anyone in the business. They were there.
Robert.
Why do you think you were ignored?
Some of us even went and took classes on the topic.
Well, I am, and I strongly disagree. The steam engine was FAR more
important, as the concept it introduced led to all forms of powered
machinery. I could mention other things, including chemistry.
Computers are vastly overrated.
>Everyone recognized that reality, and the smartest intellects ever,
>including the dead and the as yet to be born, were conjured, and
>everything that could be done was done in the first six days. Ask
>anyone in the business. They were there.
Well, I wasn't there at the beginning of the modern era, but knew
a fair number of people who were. There is some truth in that, but
it is more accurate to say that everything was done later was only
invented only in princple, as its delivery needed advances in
process technology.
People knew how to build a functional steam locomotive in Hero's
day - they didn't have the technological base to do it.
Regards,
Nick Maclaren.
> so I naturally wonder,
> what's happened in the meantime ?.
Rant mode on
Software bloat. The programs I use except for games have not visibly
increased in speed since my first PC. They have developed a lot more
bells and whistles but not got faster. DOS could run programs and a GUI
(GEM) in 512kb of memory. Windows 3.1 would run with a mb though it
needed 4mb to get maximum performance, I understand that Windows 7 has a
minimum requirement of 2gb. Just about all the increases in hardware
speed have been used to run more elaborate software at the same speed.
Rant mode off.
Ken Young
Most of the antic technology that was build was more a sort of "toy",
maybe to impress people, but not intended to be useful. You had slaves
for that, for useful work. Machines were "magic"¹. Hero of Alexandria
obviously didn't really know how to build a functional steam locomotive,
but he knew how to build nice toys. My father assembled an Aeolipile
out of a used ewer and an old bicycle wheel a few years ago - it's fun
to watch it slowly spinning around (the kindergarten aged children in
the neighborhood like it), but it's so grossly inefficient that you
can't use the power for anything useful.
The breakthrough for the steam engine was when it could provide more
power per cost than humans or animals. This is why many people still
measure engines in HP. The crude ancient steam engines in China had
even been used to power a vehicle, but they weren't a success there
either, so I guess they had not been cost-efficient. Water-mills have
been cost efficient and they were successes both in ancient China and in
Europe.
¹) It's like opening an ancient Chinese "paddle steamer", and
discovering that inside they had just people with pedals turning the
paddle wheels, and no steam engine ;-).
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
I tried googling "MIT RAW" and "UT Austin TRIPS" and got no hits, could
you find some links, there are a bunch of comp.arch readers that would
love to learn more.
>
> > MIT built RAW and UT Austin built TRIPS. These are really weird
> > architectures and microarchitectures that could be very influential
> > for future processors.
>
> I tried googling "MIT RAW" and "UT Austin TRIPS" and got no hits, could
> you find some links, there are a bunch of comp.arch readers that would
> love to learn more.
stream processor mit raw
http://groups.csail.mit.edu/cag/raw/documents/
I know less about TRIPS, but the google
stream processor trips darpa
yields tons of stuff
Robert.
Yeah: and how much more powerful is my phone, than my first graphical
Unix workstation, that I used to do real work? (lots)
I know that's a popular rant: I've even given it myself, from time to
time. Isn't it the case, though, that for most of that "popular
software" speed is a non-issue? Either a given operation is "fast
enough" (and that's not hard to achieve, when the limitations are those
of hands, eyes and brain of the user), or "not fixable by software":
network and server latencies, disk access latencies, and so on.
So compettition and More's law have given us faster and faster computers,
with higher and higher resolution screens, with graphics systems with
hardware assist for various basic operations (including 3D texture
mapping). Is it any surprise that the fixed required response time gets
burned by rendering *beautifully* kearned arbitrarily scalable text,
rather than blatting out the single fixed-width system font? Or that
various program and desktop icons are lovingly alpha-blended from
scalable vector graphic representations, rather than composed from fixed-
resolution, four-bit-colour pallette?
There are certainly aspects of this whimsical algorithmic flexibility
that jar: how can it possibly take a dual core computer with billions of
instructions per second up its sleve *many seconds* to pull up the
"recently used documents" menu, every time? (Unless, of course, the
whimsical algorithm that performs that function operates by starting up
all of the applications so that it can ask them, or similar dumbness.)
Cheers,
--
Andrew
>
> There are certainly aspects of this whimsical algorithmic flexibility
> that jar: how can it possibly take a dual core computer with billions of
> instructions per second up its sleve *many seconds* to pull up the
> "recently used documents" menu, every time?
It's because the people who write software are the smartest on the
planet. Just ask anyone in the business.
There's a story kicking around about Ballmer booting Vista to demo it
for someone and losing it Ballmer-style as the machine sat there
forever without giving a clue as to what it was doing.
To be fair, those gorgeously-expensive graphics are about the only
thing computers have left going for them as far as the average
customer is concerned. That's what the customer sees in the store,
and that's what the customer buys. Nothing else works, really: not
security, not reliability, not response time, not usability.
The only thing that would really make any difference is if computers
really could act intelligent--for example, it knows that it might get
away with the "Recent Documents" foulup once or twice, but not time
after time and could invent a shortcut around the most general case
that works well enough most of the time. Getting a factor of two
through hardware is hard. Gobbling it up several times over in
spaghetti code is easy.
Robert.
This does imply that languages which forbode the software bad days are
the best place to put research. I think method local write variables,
with only one write point, and many read points. Yes spagetti
languages are long in the tooth
cheers jacko
I think your Google is broken.
The first hit for "MIT RAW" on Google for me is this:
http://groups.csail.mit.edu/cag/raw/
The first hit for "UT Austin TRIPS" on Google for me is this:
http://www.cs.utexas.edu/~trips/
These are exactly the right pages to begin learning about these
projects.
>
> This does imply that languages which forbode the software bad days are
> the best place to put research. I think method local write variables,
> with only one write point, and many read points. Yes spagetti
> languages are long in the tooth
Much as I loathe c, I don't think it's the problem.
I *think* the problem is that modern computers and OS's have to cope
with so many different things happening asynchronously that writing
good code is next to impossible, certainly using any of the methods
that anyone learned in school.
It's been presented here has a new problem with widely-available SMP,
but I don't think that's correct. Computers have always been hooked
to nightmares of concurrency, if only in the person of the operator.
As we've come to expect more and more from that tight but impossible
relationship, things have become ever more challenging and clumsy.
All that work that was done in the first six days of the history of
computing was aimed at doing the same thing that human "computers"
were doing calculating the trajectories of artillery shells. Leave
the computer alone, and it can still manage that sort of very
predictable calculation tolerably well.
Even though IBM and its camp-followers had to learn early how to cope
with asynchronous events ("transactions"), they generally did so by
putting much of the burden on the user: if you didn't talk to the
computer in just exactly the right way at just exactly the right time,
you were ignored.
Even the humble X-windowing system contemplates an interaction that
would at one time have been unimaginable in the degree of expected
flexibility and tolerance for unpredictability, and the way the X-
windowing system often works in practice shows it, to pick some
example other than Windows.
In summary:
1. The problem is built in to what we expect from computers. It is
not a result of multi-processing.
2. No computer language that I am aware of would make noticeable
difference.
3. Nothing will get better until people start operating on the
principle that the old ideas never were good enough and never will be.
Eugene will tell me that it's easy to take pot shots. Well, maybe it
is. It's also easy to keep repeating the same smug but inadequate
answers over and over again.
Robert.
> In article <ggtgp-1A606D....@netnews.asp.att.net>,
> Brett Davis <gg...@yahoo.com> wrote:
> >In article <hb86g3$fo6$1...@apu.cs.utexas.edu>,
> > djim...@cs.utexas.edu (Daniel A. Jimenez) wrote:
> >> MIT built RAW and UT Austin built TRIPS. These are really weird
> >> architectures and microarchitectures that could be very influential
> >> for future processors.
> >
> >I tried googling "MIT RAW" and "UT Austin TRIPS" and got no hits, could
> >you find some links, there are a bunch of comp.arch readers that would
> >love to learn more.
>
> I think your Google is broken.
I obviously did a news search instead of a web search. My bad.
> The first hit for "MIT RAW" on Google for me is this:
> http://groups.csail.mit.edu/cag/raw/
Lots of broken links, and TRIPS looks better anyway.
> The first hit for "UT Austin TRIPS" on Google for me is this:
> http://www.cs.utexas.edu/~trips/
Some nice info on the architecture, and comparison chart with MIT RAW
and others:
http://www.cs.utexas.edu/users/cart/trips/publications/computer04.pdf
Results with a real uber chip:
http://www.cs.utexas.edu/users/cart/trips/publications/micro06_trips.pdf
No papers since 2006, no real benefit on most (serial) code, nice
speedups on code that has since been rewritten to use x86 vector
instructions that give much the same speedups.
Does not appear to have a die size advantage, could in fact be at a
large disadvantage, and could also have a heat disadvantage.
Adding vector units to the 16 ALUs to make it competitive again is a
non-starter due to space and heat.
Worthy of more research, but still roadkill under the wheels of the
killer micros, much less something an order of magnitude faster, like a
ATI chip.
I would go with an ATI chip and its 1600 vector pipes instead as the
roadmap to the future for mass computation.
Cool info though, TRIPS is the first modern data flow architecture I
have looked at. Probably the last as well. ;(
Brett
It's part of the problem - even worse, the programming paradigms it
implies (and sometimes requires) are not the right ones.
>I *think* the problem is that modern computers and OS's have to cope
>with so many different things happening asynchronously that writing
>good code is next to impossible, certainly using any of the methods
>that anyone learned in school.
The former is not true, and I can't speak for the latter. What WERE
you taught in school? Even calculators were not part of my formal
schooling :-)
There are plenty of good programming paradigms that can handle such
requirements. Look at Smalltalk for an example of a language that
does it better. Note that I said just "better", not "well".
>It's been presented here has a new problem with widely-available SMP,
>but I don't think that's correct. Computers have always been hooked
>to nightmares of concurrency, if only in the person of the operator.
>As we've come to expect more and more from that tight but impossible
>relationship, things have become ever more challenging and clumsy.
There are differences. 40 years ago, few applications programmers
needed to know about it - virtually the only exceptions were the
vector computer (HPC) people and the database people. That is
about to change.
>All that work that was done in the first six days of the history of
>computing was aimed at doing the same thing that human "computers"
>were doing calculating the trajectories of artillery shells. Leave
>the computer alone, and it can still manage that sort of very
>predictable calculation tolerably well.
Sorry, but that is total nonsense. I was there, from the late 1960s
onwards.
>Even though IBM and its camp-followers had to learn early how to cope
>with asynchronous events ("transactions"), they generally did so by
>putting much of the burden on the user: if you didn't talk to the
>computer in just exactly the right way at just exactly the right time,
>you were ignored.
Ditto.
>Even the humble X-windowing system contemplates an interaction that
>would at one time have been unimaginable in the degree of expected
>flexibility and tolerance for unpredictability, and the way the X-
>windowing system often works in practice shows it, to pick some
>example other than Windows.
Ditto. A more misdesigned and unreliable heap of junk, it is hard
to imagine - as soon as the user starts to stress it, some events get
lost or directed to the wrong window. It is common for most of the
forced restarts (and not just session, often system) to be due to a
GUI mishandling error recovery, and the design of the X Windowing
System is integral to that. Microsoft just copied that, by legally
ripping off IBM Presentation Manager.
>In summary:
>
>1. The problem is built in to what we expect from computers. It is
>not a result of multi-processing.
It may be what you expect from them. It is not what either the first
or second generation computer users did (and I count as one of the
latter). You would count as the third, and there is now a fourth.
>2. No computer language that I am aware of would make noticeable
>difference.
Perhaps I am aware of a wider range, because I do.
>3. Nothing will get better until people start operating on the
>principle that the old ideas never were good enough and never will be.
That is PRECISELY the principle on which modern general-purpose
operating systems, interfaces and programming languages WERE designed.
The old mainframe people despair at them reinventing the wheel, and
not even getting the number of sides right :-(
>Eugene will tell me that it's easy to take pot shots. Well, maybe it
>is. It's also easy to keep repeating the same smug but inadequate
>answers over and over again.
Very, very true.
Regards,
Nick Maclaren.
> People knew how to build a functional steam locomotive in Hero's
> day - they didn't have the technological base to do it.
The Disc of Phaistos is typed document, demonstrating both know-how and
technical base, but printing didn't take off until 1500. And maybe ancient
Iraqis were electroplating, but we had to wait a while for any other
applications of that idea, too.
Maybe we already know how to build scalable parallel machines, and there's
an example collecting dust in a museum somewhere, but the time isn't
right. :) Wouldn't *that* be annoying?
Er, if you think that being able to hand-stamp a clay tablet with
hand-carved seals is a suitable technological base for building a
printer, I suggest that you try it.
>Maybe we already know how to build scalable parallel machines, and there's
>an example collecting dust in a museum somewhere, but the time isn't
>right. :) Wouldn't *that* be annoying?
We do, and have done for a long time. What we don't know how to do,
is to map the majority of the requirements we have into programs that
will run efficiently on the machines we know how to build.
Regards,
Nick Maclaren.
Printing in China has been used for about a 1000 years when movable types
where invented there - and that invention didn't catch on. It was easy
enough to carve the writing into stone or wood and print that way, movable
types are only cost effective if you print in low volume. The main
invention that made printing feasible was the invention of (cheap) paper -
both in Europe and in China.
Coming back to topic: massive parallel machines have been developed in the
past, as well, and are already rotting in museums. Remember Thinking
Machines? MasPar? They all did architectures somewhat similar to current
GPUs more than 20 years ago. Why didn't they catch on? Because it's more
expensive to write massive parallel programs.
That's why we are getting massive parallel CPUs through the wheel of
reinvention, through the GPUs. It's cost efficient to write some small
massive parallel programs for rendering graphics, the main algorithms are
still serial. It's even cost efficient to put a physics engine into the
massive parallel domain. Step by step, the GPGPU takes tasks over from the
CPU.
Maybe we now have reached the sophistication of the tools so that writing
parallel programs is cheap enough. This is my point when looking at these
historical comparisons: It usually wasn't the invention of some principle
(steam engine, movable types) that made the break-through, it usually was
something else that made the resulting machine cost effective. When we
think of the steam engine, we think of James Watt, but in reality, during
the time when James Watt hat his patent on a particular improvement of steam
engines, not so many were deployed. The revolution came afterwards, when an
"open source"-like community of steam engine developers and operators
rapidly improved the steam engine, so that it became cost effective in many
more use cases than before.
That's not true. The converse is. Carving a whole block in reverse
and printing from that is only cost-effective if the number of pages
printed is small. You need extremely skilled people to carve them,
in order to keep the error rate down. There are also problems with
movable type and Chinese characters (i.e. the number of them!)
Before moveable type, almost all printing was things like pictures,
prayers, fabric patterns and other uses where the number of different
pages is small.
>The
>invention that made printing feasible was the invention of (cheap) paper -
>both in Europe and in China.
That's true.
Regards,
Nick Maclaren.
You need to get that equation right: The initial setup cost of carving a
whole block is high, so it is cost-effective if the number of pages printed
with one carved block is also high. You have to print the same page quite
often to amortize the carving, but it doesn't matter if you do so with just
a few pages, or with many pages - as long as you are able to sell the whole
book in 10k units, carving is worth the effort. Movable types don't last as
long as carved blocks. Movable types finally were accepted in China and
surrounded nations when metallurgy had advanced sufficiently to make the
movable types last long enough to print even high-volume books.
> You need extremely skilled people to carve them,
> in order to keep the error rate down. There are also problems with
> movable type and Chinese characters (i.e. the number of them!)
This is actually not a big problem, since when you are seriously printing,
you need a lot of characters anyways (not just so many different ones), and
the statistics of Chinese characters means that you have at least an 80/20
rule (>80% of the text uses less than 20% of the available characters, and
there's a significant number of high-runners).
And the error rate is not so much a problem: Chinese characters are
sufficiently redundant.
> Before moveable type, almost all printing was things like pictures,
> prayers, fabric patterns and other uses where the number of different
> pages is small.
You are obviously not aware of what kind of books were printed in classical
China. Granted, many books had about half of the space devoted to pictures
(it's a "because we can" attitude). But a lot of books had many pages,
despite the fact that Chinese texts are much denser than texts in western
languages. Having a much larger market to sell the books to (more
population, more literacy, same written language everywhere) made it easier
to amortize printing with carved blocks.
Actually that wasn't the aim of Itanium, at least if you are a cynic
like me. Some would say that it was an attempt to create a new
proprietary architecture that was outside the web of cross licensing
agreements that Intel had.
The notion that it would be better merely had to be plausible in order
to achieve that goal.
>
> Actually that wasn't the aim of Itanium, at least if you are a cynic
> like me. Some would say that it was an attempt to create a new
> proprietary architecture that was outside the web of cross licensing
> agreements that Intel had.
>
> The notion that it would be better merely had to be plausible in order
> to achieve that goal.
>
That was my take on it as well. Design an architecture so complex that
no one could copy or simulate it's functionality. Intel never could
design a simple coherent processor anyway. Even stuff as far back as
8086 was pain in the neck to design hardware for and the architecture
was far from orthoganal or compiler friendly. I know, having designed
memory boards and drivers for early 8086 machines back in the mid 80's.
Even the hardware reference manual appeared to contradict itself from
one page to the next. Compare the 8086 with the 68k or any number of
other compiler friendly architectures and one wonders how it ever got as
far as it did. Now, that's all thst's left in the mainstream and the
whole fabric of computing suffers because of it's history and inertia...
Regards,
Chris
Heh, well ok then :)
G.
I don't see how the number of original pages is relevant. What matters
is the number of copies of each page, and the more you print, the more
copies you can amortize the typesetting cost across.
Extremely long documents would require a lot of type if you laid
everything out ahead of time, but presumably you'd lay out as many pages
as you have type for, then wait until the first page was copied before
laying out the next, etc. Or you could just buy/make more pieces of type.
It'd be another situation entirely if you didn't have movable type.
> You need extremely skilled people to carve them, in order to keep the
> error rate down.
You'd need skilled artisans for the original molds, just like with Latin
type, but pouring and using each individual piece of type is a
relatively mechanical process that doesn't require nearly as much skill.
> There are also problems with movable type and Chinese characters (i.e.
> the number of them!)
You wouldn't need as many different pieces of type as you'd think; most
of the more complicated characters are composed of various combinations
of simpler sub-characters.
> Before moveable type, almost all printing was things like pictures,
> prayers, fabric patterns and other uses where the number of different
> pages is small.
ITYM where the number of original pages was small but the number of
copies was huge.
>> The invention that made printing feasible was the invention of
>> (cheap) paper - both in Europe and in China.
>
> That's true.
... because the price went down and the volume of copies went way, way
up as a result. It's classic economy of scale.
S
--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking
> There are plenty of good programming paradigms that can handle such
> requirements. Look at Smalltalk for an example of a language that
> does it better. Note that I said just "better", not "well".
>
> >It's been presented here has a new problem with widely-available SMP,
> >but I don't think that's correct. Computers have always been hooked
> >to nightmares of concurrency, if only in the person of the operator.
> >As we've come to expect more and more from that tight but impossible
> >relationship, things have become ever more challenging and clumsy.
>
> There are differences. 40 years ago, few applications programmers
> needed to know about it - virtually the only exceptions were the
> vector computer (HPC) people and the database people. That is
> about to change.
>
Vector computing was trivial, relatively speaking. I learned Cray
assembly language before I learned any microprocessor assembly
language. None of us really knew much of anything about concurrency
except how to turn loops inside out, to insert $IVDEP directives, to
use the vector mask register, and to avoid indirect addressing.
> >All that work that was done in the first six days of the history of
> >computing was aimed at doing the same thing that human "computers"
> >were doing calculating the trajectories of artillery shells. Leave
> >the computer alone, and it can still manage that sort of very
> >predictable calculation tolerably well.
>
> Sorry, but that is total nonsense. I was there, from the late 1960s
> onwards.
>
Where do you think I was, Nick? Do you know? I don't want to make
this personal, but when you use comp.arch as a forum for your personal
mythology, I don't know how to avoid it. I stand by my
characterization.
For one thing, the computers that were available, even as late as the
late sixties, were pathetic in terms of what they could actually do.
I've had lengthy public discussions about some of the modeling that
went on with one of the actual players and presented citations to
published documents that talk about the failures of that modeling. In
designing the atomic bomb, scientists at Los Alamos used human
pipelining to do calculations (that bit *was* before my time), and the
first few decades of computers were not a huge improvement on that in
terms of capability.
> >Even though IBM and its camp-followers had to learn early how to cope
> >with asynchronous events ("transactions"), they generally did so by
> >putting much of the burden on the user: if you didn't talk to the
> >computer in just exactly the right way at just exactly the right time,
> >you were ignored.
>
> Ditto.
>
Nick. I *know* when time-sharing systems were developed. I wasn't
involved, but I was *there*, and I know plenty of people who were. I
*know* what using IBM batch processes was like. I *know* what having
to make another trip to the computer lab because of some cryptic bit
of JCL was suddenly required. I *know* what using a line-oriented
editor from a 300 baud tty was like. I also know how to backspace a
punch card machine.
> >Even the humble X-windowing system contemplates an interaction that
> >would at one time have been unimaginable in the degree of expected
> >flexibility and tolerance for unpredictability, and the way the X-
> >windowing system often works in practice shows it, to pick some
> >example other than Windows.
>
> Ditto. A more misdesigned and unreliable heap of junk, it is hard
> to imagine - as soon as the user starts to stress it, some events get
> lost or directed to the wrong window. It is common for most of the
> forced restarts (and not just session, often system) to be due to a
> GUI mishandling error recovery, and the design of the X Windowing
> System is integral to that. Microsoft just copied that, by legally
> ripping off IBM Presentation Manager.
>
Ok, Great Wizard of Oz. Write a better display manager.
> >In summary:
>
> >1. The problem is built in to what we expect from computers. It is
> >not a result of multi-processing.
>
> It may be what you expect from them. It is not what either the first
> or second generation computer users did (and I count as one of the
> latter). You would count as the third, and there is now a fourth.
>
You remind me of a plasma physicist that I knew at a national lab.
For someone of his self-proclaimed stature, he didn't know something
very basic about Green's functions for the wave equation in even-
numbered dimensions. He, too, was keen to establish his creds by
counting generations.
> >2. No computer language that I am aware of would make noticeable
> >difference.
>
> Perhaps I am aware of a wider range, because I do.
>
I stand by my assertion, and I doubt very seriously that you know
better. Yes, there are languages, like ada, that should be used in
place of c, but they aren't. The issue been discussed at length here
and ada was the only language identified that had sufficient
generality to hope to replace c as a systems programming language. If
you had a big secret to reveal, I don't know why you were holding
back.
> >3. Nothing will get better until people start operating on the
> >principle that the old ideas never were good enough and never will be.
>
> That is PRECISELY the principle on which modern general-purpose
> operating systems, interfaces and programming languages WERE designed.
> The old mainframe people despair at them reinventing the wheel, and
> not even getting the number of sides right :-(
>
Unix was a hail Mary pass, so far as I understand it, because what was
being developed when I was around was simply too ambitious. I stand
by my comment.
Robert.
> Actually that wasn't the aim of Itanium, at least if you are a cynic
> like me. Some would say that it was an attempt to create a new
> proprietary architecture that was outside the web of cross licensing
> agreements that Intel had.
>
> The notion that it would be better merely had to be plausible in order
> to achieve that goal.
Agreed.
However suspect Intel's motivations may have been, I thought, and I
still think, for reasons only marginally related to hardware
architecture, that it was one of the more interesting undertakings in
the history of computing. The triumph of x86 may have been a triumph
of common sense, but common sense has never seemed very interesting to
me.
Robert.
I've been manipulating large Excel spreadsheets.
Minutes-long recalcs.
Sometimes as long as 15 minutes to open up such a spreadsheet embedded
as an OLE object in a Word document.
I'm reasonably sure it's computation, and not disk.
However, I am also fairly certain, because I have done a few
experiments, that there are a few quadratic algorithms that could be linear.
Algorithms trump hardware, nearly every time.
(Hmm.... in the past I have devised "instruction rewriting" hardware
that converted linear code such as i++;i++;...;i++ into O(lg N) latency,
O(N lg N) size parallel prefix code. I wonder if the problems I am
seeing Excel could be optimized away by a subsystem.)
> Printing in China has been used for about a 1000 years when movable
> types where invented there -
Movable type really requires an alphabetic script. You are better of
with wood block printing for a written language that has several
thousand characters. The Chinese attempt at movable type was associated
with the Yuan dynasty and a script that Kublai introduced, it failed as
much for political reasons as technical ones.
> movable
> types are only cost effective if you print in low volume.
Even given that the major constraint on printing in the West was the
cost of type I think you are wrong. Cheap books required the
introduction of the rotary press and means to convert type to plates
that could be used in them. Still type when it was introduced only had
to compete with the Scriptorium. There were other constraints than type
on early printing as well. Using a screw press was slow and paper was
expensive.
Ken Young
I'm not quite as cynical as you Del, but I still agree:
The main target of Itanium was to have a "closed-source" cpu that simply
couldn't be cloned at all:
A new separate (joint venture) company to develop it, making existing
deals with both Intel and HP moot, lots of funky little patented details
that were intentially exposed to the programmer, making it very much
harder to invent around.
OTOH, I really do believe Intel intended to start deliver in 1997, in
which case it _would_ have been, by far, the fastest cpu on the planet.
When they finally did deliver, years later, it was still the fastest cpu
for dense fp kernels like SpecFP.
They delivered too little, too late, but still managed to terminate
several competing architecture development tracks at other vendors.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
> Andrew Reilly wrote:
>> Isn't it the case, though, that for most of that "popular software"
>> speed is a non-issue?
>
> I've been manipulating large Excel spreadsheets.
well, there's your problem ;-)
I've never got the hang of spreadsheets, and never found a problem that
didn't look like more of a job for awk or matlab, or even a real program
of some sort. I guess that there must be some (or at least users who
think differently than I): it's certainly popular.
> Minutes-long recalcs.
>
> Sometimes as long as 15 minutes to open up such a spreadsheet embedded
> as an OLE object in a Word document.
>
> I'm reasonably sure it's computation, and not disk.
For small values of "computation", probably: i.e., it's probably wading
through vast and convoluted layers of "abstraction", choking on memory
latency, rather than running your real "computation". Have you made any
estimates of how long your computation job would take when expressed in a
more conventional programming language?
On the other hand, there clearly are some things that PCs still take a
long time to do. (My colleagues recently turned on link-time-
optimization on a project that I sometimes compile, and now I have time
to make and drink a cup of coffee...)
> However, I am also fairly certain, because I have done a few
> experiments, that there are a few quadratic algorithms that could be
> linear.
>
> Algorithms trump hardware, nearly every time.
Yes, that's true. And one can probably guess that quite a lot of the
programs that had their geneses when PCs had floppy disks and no caches
also have some corners with hand tuned assembly language kernels to make
a simple (poorly scaling) algorithm "fast". How much of that there is in
Excel I have no idea, of course.
> (Hmm.... in the past I have devised "instruction rewriting" hardware
> that converted linear code such as i++;i++;...;i++ into O(lg N) latency,
> O(N lg N) size parallel prefix code. I wonder if the problems I am
> seeing Excel could be optimized away by a subsystem.)
I'm afraid that I don't know enough of Excel to comment. Maybe you'd be
able to get a reasonable speed-up on a single core by running the
"computation" part of the recalculation process through a compiler: strip
away a a few layers of abstraction with some suitable constant
propagation.
Cheers,
--
Andrew
Yes. I corrected Bernd, and he corrected me back again :-) That
is the right calculation.
>> You need extremely skilled people to carve them, in order to keep the
>> error rate down.
>
>You'd need skilled artisans for the original molds, just like with Latin
>type, but pouring and using each individual piece of type is a
>relatively mechanical process that doesn't require nearly as much skill.
Precisely.
>> There are also problems with movable type and Chinese characters (i.e.
>> the number of them!)
>
>You wouldn't need as many different pieces of type as you'd think; most
>of the more complicated characters are composed of various combinations
>of simpler sub-characters.
Yes, but few of them are simply separable, and making blocks for the
sub-characters that fitted together in all the combinations needed.
>> Before moveable type, almost all printing was things like pictures,
>> prayers, fabric patterns and other uses where the number of different
>> pages is small.
>
>ITYM where the number of original pages was small but the number of
>copies was huge.
Usually. That's not always been true for etchings and woodblock
'paintings'.
Regards,
Nick Maclaren.
> In article <1947351.S...@elfi.zetex.de>, bernd....@gmx.de
> (Bernd Paysan) wrote:
>
>> Printing in China has been used for about a 1000 years when movable
>> types where invented there -
>
> Movable type really requires an alphabetic script. You are better of
> with wood block printing for a written language that has several
> thousand characters. The Chinese attempt at movable type was associated
> with the Yuan dynasty and a script that Kublai introduced, it failed as
> much for political reasons as technical ones.
What you describe has nothing to do with history. Look up "Bi Sheng", e.g.
on Wikipedia
http://en.wikipedia.org/wiki/Bi_Sheng
who was the inventor of movable types. His types were made of baked clay,
so more fragile than wood. He lived in the Song dynasty, and used the ~3000
characters that were in use back then (today, about 5000-7000 characters are
in use). It took several centuries of progress to make movable types
competitive with the established woodblock printing, and most of that
progress was made in Korea.
Note that the woodblock printing principle again has won over movable types
in the 20th century, and it has expanded to fields no one could have
imagined (from circuit boards to chips, all that is basically derived from
woodblock printing technology).
> Even given that the major constraint on printing in the West was the
> cost of type I think you are wrong. Cheap books required the
> introduction of the rotary press and means to convert type to plates
> that could be used in them.
The rotary press wasn't invented until 1843. Yes, it was again lowering the
price of books by another order of magnitude, but affordable books came out
of normal printing presses before.
Now, printing books is going to end soon, through e-book readers, lowering
the price of a book by another order of magnitude.
> The main target of Itanium was to have a "closed-source" cpu that simply
> couldn't be cloned at all:
>
> A new separate (joint venture) company to develop it, making existing
> deals with both Intel and HP moot, lots of funky little patented details
> that were intentionally exposed to the programmer, making it very much
> harder to invent around.
>
> OTOH, I really do believe Intel intended to start deliver in 1997, in
> which case it _would_ have been, by far, the fastest cpu on the planet.
> When they finally did deliver, years later, it was still the fastest cpu
> for dense fp kernels like SpecFP.
>
> They delivered too little, too late, but still managed to terminate
> several competing architecture development tracks at other vendors.
In other words, Itanium is the epitome of FUD.
Hail FUDzilla! :-)
(RIP Alpha and PA-RISC)
The truth is half-way in between. You need a lot MORE progress to
make moveable type viable for ideograms than for alphabets. That
is one of the reasons that moveable type spread so fast in the West,
but not in the East.
>Note that the woodblock printing principle again has won over movable types
>in the 20th century, and it has expanded to fields no one could have
>imagined (from circuit boards to chips, all that is basically derived from
>woodblock printing technology).
It never went away. It always was a better principle for pages that
aren't feasible to build up out of smaller, replicatable ones, such
as images.
>Now, printing books is going to end soon, through e-book readers, lowering
>the price of a book by another order of magnitude.
That's two extreme speculations in one sentence :-)
Regards,
Nick Maclaren.
So do we. But the companies do not perform their calculations on Itanics
just because they are Itanics, but because Intel and HP hire really good
marketing guys who were able to sell that hardware. The very first thing
we do is to persuade our clients not using that platform. A large enough
x64 or Sparc-based system is all they should need.
I agree that supporting Itanium-based systems it is a business, but it is
something like making money on cripples -- not a very ethical thing to
do, since there exists a cure.
> There were perhaps more new and interesting ISA features in Itanium
> than in any other I can recall. You may not like them, or may not
> consider them a success, but it's definitely in the realm of "new
> and interesting within the last 20 years" in at least a few respects.
Surely. But a flop is a flop...
Best regards
Piotr Wyderski
...
> OTOH, I really do believe Intel intended to start deliver in 1997, in
> which case it _would_ have been, by far, the fastest cpu on the planet.
I don't care enough at this late date to do more than question the above
assertion from what I can remember off the top of my head without
attempting to quantify my reservations, but remember that the product
that they allegedly expected to deliver in 1997 (or perhaps 1998,
depending on how you read their early claims) was Merced, not McKinley.
Do you really think that Merced in a then-current process "_would_
have been, by far, the fastest cpu on the planet" - especially for
general (rather than heavily FP-specific) code? Considering the
competition at about that time (e.g., the Alpha 21264 and the Pentium
generation that was giving it a good run at times - and PA-RISC was no
slouch either, all of them successful OoO implementations) that seems at
least debatable.
> When they finally did deliver, years later, it was still the fastest cpu
> for dense fp kernels like SpecFP.
Fastest by a small margin, once Compaq grudgingly tuned the then-current
Alpha products for SpecFP as much as the McKinleys were tuned (had they
done this earlier, Merced would not have been at the top of the heap at
all).
>
> They delivered too little, too late, but still managed to terminate
> several competing architecture development tracks at other vendors.
Based almost completely on ridiculously overblown hype and perceived
market domination rather than on technical merit.
- bill
I'm not talking about the official sales price of a book ;-). A usable e-
book reader is to books as what an MP3 player is to music. Did the official
sales price of music drop due to the existence of MP3 players and downloaded
music? No. It doesn't matter, the practical price for music dropped by an
order of magnitude or more.
The book publishers are going to repeat all the mistakes the MI did, plus
some new (like abusing DRM right from start, e.g. Amazon erasing "1984").
Oh, yes, indeed, it would have been - if they had delivered in 1997
what they were promising in 1995-6. Inter alia, it would have had
lazy execution as the norm, compilers that would have optimised
arbitrary C spaghetti into excellent IA64 machine code, it would
been cheaper than the competition, and no doubt it would have been
announced by a sounder of pigs singing hosannas overhead.
If they had delivered just the hardware, it would have been by far
the fastest cpu on the planet for suitable codes (not necessarily
floating-point, but definitely only a small subset). The process
wouldn't have been a catastrophic problem, though the yield of its
pancakes wouldn't have been wonderful. And they might have pulled
that off, without needing to break any of the laws of mathematics
or physics.
Regards,
Nick Maclaren.
Yes, it is. However, that's not what I was talking about. There
was significant work done with asynchronicity, parallel databases,
parallel I/O, parallel communications and so on. I can't say when
it started, because it was before my time.
>> >All that work that was done in the first six days of the history of
>> >computing was aimed at doing the same thing that human "computers"
>> >were doing calculating the trajectories of artillery shells. =A0Leave
>> >the computer alone, and it can still manage that sort of very
>> >predictable calculation tolerably well.
>>
>> Sorry, but that is total nonsense. =A0I was there, from the late 1960s
>> onwards.
>>
>Where do you think I was, Nick? Do you know? I don't want to make
>this personal, but when you use comp.arch as a forum for your personal
>mythology, I don't know how to avoid it. I stand by my
>characterization.
If you check up on it, the first interactive 'real-time' games date
from the 1950s, and they were widespread by the mid-1960s. Again,
before I started. If you will deprecating the work of the first
generation, I will stop correcting you.
>For one thing, the computers that were available, even as late as the
>late sixties, were pathetic in terms of what they could actually do.
As the Wheelers frequently point out, many people did then with those
'pathetic' computers what people are still struggling to do. At the
start of this thread, I pointed out that the initial inventions were
often no more than proof of concept, but their existence is enough
to show that things have NOT changed out of all recognition.
>> >Even though IBM and its camp-followers had to learn early how to cope
>> >with asynchronous events ("transactions"), they generally did so by
>> >putting much of the burden on the user: if you didn't talk to the
>> >computer in just exactly the right way at just exactly the right time,
>> >you were ignored.
>>
>> Ditto.
>>
>Nick. I *know* when time-sharing systems were developed. I wasn't
>involved, but I was *there*, and I know plenty of people who were.
What on earth are you on about? Let's ignore the detail that Cambridge
was one of the leading sites in the world in that respect. You claim
that the 'IBM' designs involved transactions being ignored - nothing
could be further from the truth. That is a design feature of the
X Windowing System (and perhaps Xerox PARC before it), and was NOT
a feature of the mainframe designs.
Regards,
Nick Maclaren.
> >> >All that work that was done in the first six days of the history of
> >> >computing was aimed at doing the same thing that human "computers"
> >> >were doing calculating the trajectories of artillery shells. =A0Leave
> >> >the computer alone, and it can still manage that sort of very
> >> >predictable calculation tolerably well.
>
> >> Sorry, but that is total nonsense. =A0I was there, from the late 1960s
> >> onwards.
>
> >Where do you think I was, Nick? Do you know? I don't want to make
> >this personal, but when you use comp.arch as a forum for your personal
> >mythology, I don't know how to avoid it. I stand by my
> >characterization.
>
> If you check up on it, the first interactive 'real-time' games date
> from the 1950s, and they were widespread by the mid-1960s. Again,
> before I started. If you will deprecating the work of the first
> generation, I will stop correcting you.
>
I don't think I've ever deprecated the work of any generation. I've
responded in every way I can to correct the false and dangerous idea
that everything that can be thought of has been thought of and over
half a century ago, at that. What's on display is not information,
but something bizarre about your psychology that needs to put what you
and what you happen to know at the center of the cosmos. If I wanted
to discourage progress in a field, I couldn't find a better way to do
it.
I pondered whether I should respond to your post at all. These
exchanges make both of us look like fools. I've always been kind of
an outsider, but you can be certain that I'm no fool, and I'm not
intimidated by your bluster any more than I am impressed by the name-
dropping of another poster.
As to your encyclopedic knowledge of everything, you have your facts
about interactive computer games wrong--at least as they are commonly
understood. I was ushered into a darkened room and played Space War
implemented on a PDP-1 with a round CRT in Cambridge, just not
Cambridge, England, and not all that many years after it was first
developed. The discrepancy in our "knowledge" may be that Space War
could be played by multiple players, even if others may have and I'm
sure did fiddle with man against machine before 1961. As far as I
know, there was no off-the-shelf hardware that would have supported a
game like Space War before the PDP-1.
> >For one thing, the computers that were available, even as late as the
> >late sixties, were pathetic in terms of what they could actually do.
>
> As the Wheelers frequently point out, many people did then with those
> 'pathetic' computers what people are still struggling to do. At the
> start of this thread, I pointed out that the initial inventions were
> often no more than proof of concept, but their existence is enough
> to show that things have NOT changed out of all recognition.
>
People in the sixties had many wild and dangerous ideas that have been
proven to be dead ends, at best. Many of the ideas about computers
are in a category with Werner von Braun's rotating donuts, which would
have been unstable. The self-confidence of the period corresponded to
very little in the way of reality, and space and computers are still
the showcase examples of delusional thinking.
In contrast to your posts, the Wheelers posts are packed with facts
that jibe pretty well with what I personally remember.
> >> >Even though IBM and its camp-followers had to learn early how to cope
> >> >with asynchronous events ("transactions"), they generally did so by
> >> >putting much of the burden on the user: if you didn't talk to the
> >> >computer in just exactly the right way at just exactly the right time,
> >> >you were ignored.
>
> >> Ditto.
>
> >Nick. I *know* when time-sharing systems were developed. I wasn't
> >involved, but I was *there*, and I know plenty of people who were.
>
> What on earth are you on about? Let's ignore the detail that Cambridge
> was one of the leading sites in the world in that respect. You claim
> that the 'IBM' designs involved transactions being ignored - nothing
> could be further from the truth. That is a design feature of the
> X Windowing System (and perhaps Xerox PARC before it), and was NOT
> a feature of the mainframe designs.
>
Yes, Nick. If bombs weren't being dropped on England, the building I
worked in might never have had the history it did in radar. Bombs
were being dropped all over Europe, and the center of gravity of
technology moved away from Europe, possibly forever. Get over it.
As to transactions being ignored, I don't know where that phrasing
comes from, because it didn't come from me. To initiate *anything* on
any computer at any time, you have to "get in." It was true then, and
it's true now. If you managed to get your transaction into an IBM
system, I'm sure it was no more likely to drop it than any other
system.
If you're close to a deadline, and everyone is trying to "get in,"
then a mistyped character or a syntax error could cost you your place
in line. It still can, except that computers are faster and more
tolerant of human error. Trying to deal with the problem of "getting
in" without making it seem as if the computer were at times out to get
you is still, so far as I know, an unworked problem, and probably
unworkable with any approach now in existence.
Robert.
That certainly isn't the direction we've been heading. As process
tech has gone through 90->65->45->32nm nodes it's happily provided us
with a lot of additional logic to play with but it hasn't provided the
same massive increase in circuit speed that everyone had grown
accustomed to. (Sure, it's slightly faster in most cases and there
are other constraints such as power keeping the speed down but no
matter how you slice it you're getting a smaller proportional speed
bonus from simply shrinking a design).
IMO (and it's just my opinion, there are some people in this thread
who are in a better position to speak about this than I am and they're
welcome to set me straight) a lot of the recent gains have been from
finding smarter ways to use the logic we already have and intelligent
ways to use the ridiculous amount of additional logic we're given at
each node. Moving an ever-increasing portion of the system onto the
same die. Moving away from the ancient FSB. Improving the memory
hierarchy (there's a long way to go here). Each of these falls within
the realm of comp arch and has provided a fairly significant
performance boost.
You need people that spend their time trying to find new uses for the
additional logic at our fingertips. Without that all you gain with
new process nodes is the ability to pack more of the same chip on a
wafer and/or stuff more cache on each chip. What if your competitor
(there are still at least a few in the CPU world) has architects
plugging away at the endless list of remaining problems and they
happen to find a very good use for all the additional logic? Well, if
that happens, you're screwed.
>>
> Yes, Nick. If bombs weren't being dropped on England, the building I
> worked in might never have had the history it did in radar. Bombs
> were being dropped all over Europe, and the center of gravity of
> technology moved away from Europe, possibly forever. Get over it.
>
'Scuse me butting in here, but i think I know where that is and have a
large number of the 30 vol set that was published shortly after wwII.
For anyone interested in the history of electronics, it's worth looking
up. The scale of the work is quite amazing and still used for reference
even now...
Seriously good stuff...
Regards,
Chris
The rotating donut space stations are unstable?
Why?
Eric
some from the CTSS group went to 5th flr and multics ... and some went
to the 4th flr and the science center. in 1965, science center did
(virtual machine) cp40 on 360/40 that had hardware modifications to
support virtual memory. cp40 morphed into cp67, when the science center
got 360/67 that came standard with hardware virtual memory support.
last week of jan68, three people from science center came out and
installed cp67 at univ. where i was undergraduate. over the next several
months i rewrote significant portions of the kernel to radically speed
things up. part of presentation that i made at aug68 SHARE user group
meeting ... about both both speedups done fro os/360 (regardless of
whether or not running in virtual machine or on real hardware) as well
as rewrites of major section of cp67.
http://www.garlic.com/~lynn/94.html#18 CP/67 & OS MFT14
cp67 came standard with 2741 and 1052 terminal support (and did
automatic terminal type recognition). univ. also had ascii/tty machines
(33s & 35s) and i got to add tty support. I tried to do this in
consistent way that did automatic terminal type (including being able to
have single rotary dial-in number for all terminals). Turns out that
standard ibm terminal controller had a short-cut and couldn't quite do
everything that i wanted to do. Somewhat as a result, univ was motivated
to do a clone controller project ... where channel interface was reverse
engineered and a channel interface board was built of an interdata/3
minicomputer ... and the interdata/3 was programmed to emulate the
mainframe terminal controller (along with being able to do automatic
baud rate detection)
my automatic terminal recognition would work with standard controller
for leased lines ... the standard controller could switch the type of
line scanner under program control ... but had hardware the baud rate
oscillator to each port interface. This wouldn't work if i wanted to
have a common pool of ports (next available selected from common dialin
number) for terminals that operated at different baud rates.
os/360 tended to have a operating system centric view of the world
... with initiation of things at the operating system ... and people
responding at the terminal. cp67 was just the opposite ... it had
end-user centric view ... with the user at the terminal initiating
things and the operating system reacting. one of the things i really
worked on was being able to pre-emptive dispatching and page fault
handling in a couple hundreds instructions (i.e. take page fault, select
replacement page, initiate page read, switch to different process,
handle interrupt, and switch back to previous process all in a couple
hundred instructions aggregate).
the univ. library had also gotten a ONR grant to do computerized
catalogue and then was also selected to be betatest for the original
CICS product release (cics still one of the major transaction processing
systems). i got tasked to support (and debug) this betatest.
a little later one of the things that came out of cp67 was charlie
invented compare&swap instruction when he was doing work on fine-grain
locking in cp67 (compare&swap was selected because CAS is charlie's
initials). initial forey into POK trying to get it included in 370
architecture was rebuffed ... favorite son operating system claiming
that test&set from 360 SMP was more than sufficient. challenge to
science center was to come up with use of compare&swap that wasn't smp
specific ... thus was born all the stuff for multithreaded
implementation (independent of operation on single processor or multiple
processor machine) ... which started to see big uptake in transaction
processing and DBMS applications ... even starting to appear on other
hardware platforms.
minor digression about getogether last year celebrating jim gray
... also references he tried to palm off some amount of his stuff on me
when he departed for tandem
http://www.garlic.com/~lynn/2008p.html#27 Father of Financial Dataprocessing
some old email from that period
http://www.garlic.com/~lynn/2007.html#email801006
http://www.garlic.com/~lynn/2007.html#email801016
--
40+yrs virtualization experience (since Jan68), online at home since Mar1970
I seem to recall that those developments were terminated well before
Itanium hardware was actually available, ie MIPS at SGI and Alpha at
Compaq/HP. But memory is the second thing to go and I forget what is
first.
A torus rotating about its principle axis would be stable were it not
for tidal forces. A rotating donut circling the earth will be
subjected to torques as one part of the torus is subjected to a
stronger pull than the opposite part of the torus. The result, if not
counteracted actively, will be tumbling.
Robert.
That's exactly the point: First, Intel promised the moon. Second, SGI
(and maybe Compaq/HP as well) said, "Oh, then there's no point going on
with MIPS (or Alpha) -- we'll just use Merced". THEN... Intel was very,
very late making Itanium hardware actually available. But by that time,
the damage was done. SGI [at least] had frozen their MIPS development
for too long to ever catch up with the curve again. (*sigh*)
-Rob
-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
At least if they get big enough, see the various articles about Niven's
Ringworld, a donut space station big enough to surround a star at
Earthlike (150e9 m) distance.
I don't believe a regular-size one without an untethered mass in the
center would suffer the same problem.
Your memory hasn't gone yet: If those competitors had actually waited
until Itanium didn't get delivered on schedule, they might have kept
their cpu architects.
While I agree that asynchrony is hard, well, I *wish* that was where the
problems begin.
E.g. the majority of security flaws are not due to asynchrony. They are
simple buffer overflows, that would occur in a single threaded program.
It is only recently that race conditions have started creeping up the
charts as causes of security flaws.
No, no!
All of the modern OOO machines are dynamic dataflow machines in their
hearts. Albeit micro-dataflow: they take a sequential stream of
instructions, convert it into dataflow by register renaming and what
amounts to memory dependency prediction and verification (even if, in
the oldest machine, the prediction was "always depends on earlier stores
whose address is unknown"; now, of course, better predictors are available).
I look forward to slowly, incrementally, increasing the scope of the
dataflow in OOO machines.
* Probably the next step is to make the window bigger, by
multilevel techniques.
* After that, get multiple sequencers from the same single threaded
program feeding in.
* After that, or at the same time, reduce the stupid recomputation
of the dataflow graph that we are constantly redoing.
My vision is of static dataflow nodes being instantiated several times
as dynamic dataflow.
I suppose that you could call trips static dataflow, compiler managed.
But why?
I think we are just blind men feeling different parts of the
elephant.
I was thinking about the problem: "Could you, given all the time,
peace and quiet, and support you wanted, write code to give a
reasonable user experience using modern hardware?" and the answer was,
"No, I couldn't."
By "reasonable user experience," I meant an experience that would
*appear* to be reasonable to the user; i.e., the computer would not
suddenly stop responding for unknowably long periods of time or
simply
give me an error message in the middle of some important task. I
don't think the requirement is simply hard. I think it's impossible,
given current approaches to user interfaces and managing resources.
You were asking the question, "What single thing could you do to
eliminate the largest number of security vulnerabilities in a single
stroke?"
We don't necessarily disagree. We were asking different questions.
Robert.
Robert.
There's a solution to that which I thought of when I heard about
the debate - quite trivial - just tether the ring to a Klemperer
rosette. But that's a different problem.
>I don't believe a regular-size one without an untethered mass in the
>center would suffer the same problem.
No, Robert is right. The reason is that the moment of inertia is
largest in the symmetric plane, and disturbed rotating objects
tend to change to rotating about the axis with the smallest one.
Regards,
Nick Maclaren.
>
> No, Robert is right. The reason is that the moment of inertia is
> largest in the symmetric plane, and disturbed rotating objects
> tend to change to rotating about the axis with the smallest one.
>
I think you'll find google gyroscopic precession describes the effect...
But anyway, getting back on topic, how about von neumann is dead ?, and
what could replace it ?.
Regards,
Chris
> Isn't it the case, though, that for most of that "popular
> software" speed is a non-issue? Either a given operation is "fast
> enough" (and that's not hard to achieve, when the limitations are those
> of hands, eyes and brain of the user), or "not fixable by software":
> network and server latencies, disk access latencies, and so on.
NB: Replacing hard-disk drives with solid-state drives improves
random read latency by two orders of magnitude, which might push
this particular bottle-neck to a different sub-system.
Regards.
That's irrelevant: the question was how an actual (shipped-in-mid-2001)
Merced would have performed if delivered in 1997 (possibly 1998) using a
then-current process, not how some vaporware being talked about earlier
might have performed.
One might get a clue from how Merced actually did perform in 1999 when
the first samples went out: not nearly well enough to satisfy anyone,
which was a large part of why it didn't ship for another two years.
...
> If they had delivered just the hardware, it would have been by far
> the fastest cpu on the planet for suitable codes (not necessarily
> floating-point, but definitely only a small subset).
Once again that's irrelevant to the question under discussion here:
whether Terje's statement that Merced "_would_ have been, by far, the
fastest cpu on the planet" (i.e., in some general sense rather than for
a small cherry-picked volume of manually-optimized code) stands up under
any real scrutiny.
- bill
You may have gathered that I agree with you from the rest of the
paragraph :-)
But the insoluble problems were software and not hardware - they
could, at least if they had handled the project excellently, have
delivered what they claimed in terms of hardware. As it was, they
didn't handle it well, and wasted a lot of effort trying to solve
the software problems by adding massive caches and other hardware
tweaks.
Regards,
Nick Maclaren.
Sometimes it's difficult to be sure. Ironically, both you and Robert
have somewhat similarly subtle ways of conveying sarcasm.
I'm actually kind of interested in whether I might be wrong here, since
I don't often find myself on the opposite side of a technical issue from
Terje (not that I viewed his assertion as necessarily any more than an
off-the-cuff remark that he may not have thought much about).
>
> But the insoluble problems were software and not hardware - they
> could, at least if they had handled the project excellently, have
> delivered what they claimed in terms of hardware.
Even that observation is not really relevant to the question here -
because Terje's assertion (at least as I read it) refers to what they
actually delivered (had it been delivered earlier), not to what they
might have hoped to deliver.
- bill
"Andy "Krazy" Glew" <ag-...@patten-glew.net> wrote in message
news:4ADEA866...@patten-glew.net...
> I look forward to slowly, incrementally, increasing the scope of the
> dataflow in OOO machines.
> * Probably the next step is to make the window bigger, by multilevel
> techniques.
What is your favorite multilevel technique. I don't think I ever heard your
opinion on HSW (Hierarchical Scheduling Windows
http://portal.acm.org/citation.cfm?id=774861.774865)...
Thanks,
Ned
>> so I naturally wonder,
>> what's happened in the meantime ?.
>
> Rant mode on
>
> Software bloat. The programs I use except for games have not visibly
> increased in speed since my first PC. They have developed a lot more
> bells and whistles but not got faster. DOS could run programs and a GUI
> (GEM) in 512kb of memory. Windows 3.1 would run with a mb though it
> needed 4mb to get maximum performance, I understand that Windows 7 has a
> minimum requirement of 2gb. Just about all the increases in hardware
> speed have been used to run more elaborate software at the same speed.
>
> Rant mode off.
>
> Ken Young
Agreed, but the graphics is orders of magnitude better and that's partly
where all the spare cpu power has gone. But little has changed in
fundamental architectural terms for a very long time. When I take the
lid off any new machine and see a device whose function I can't estimate
within a minute or two, then I shall become curious about the progress
of computing again.
I caught the first 5 or 10 minutes of some physics character on tv a few
days ago. The sort of pop science that the beeb do so well, with the
dramatic music, graphics and odd angle shots of the presenter trying to
look wise and intelligent. His thesis was that we would soon have
intelligence in everything, with billions of microprocessors. My first
reaction was who will write all the code for this ?, as there's already
a shortage of embedded programmers who really know what they are doing.
Probably India and China, but I digress. In a roundabout sort of way,
he was right and it's already happening as more and more applications
for embedded devices appear. If you track the development of computing,
the story is one of an initial ivory towers world inhabited primarily by
mathematicians, to a commodity used for amusement by everyone, but
commodity means standardisation and dumbing down far enough to make it
cheap. Now we are seeing the next stage, with some of the the
functionality of the desktop becoming obsolete in favour of distributed
computing. Separate appliances with lower throughput and less power hungry.
In the old days, there were a wide variety of system designs, whereas
now there are few. Why ? - well partly because the old machines were
built from small scale devices. Bit slice, ttl, etc, with the wiring
between them and perhaps microcode defining the overall architecture.
The state of technology produced a level playing field that anyone with
a little capital and a good idea could exploit. Now, the state of
technology is such that only the largest corporations can afford to
develop new devices. It's completely frozen out all the small vendors in
terms of visibility. It's well known that the mainstream never really
develops anything really new. It exists merely to maintain the status
quo, perhaps with a few crumbs of improvement from time to time. It's
the bits at the edges where all the interesting stuff happens, but there
are effectively no edges left now. Mainstream computing has dug itself
into a very deep hole, which is why i'm so pessimistic about future
developments...
Regards,
Chris
Sorry, that's not true. Especially in the "small embedded intelligent
device" area we are talking about. The scale of integration changed: You
will produce a SoC for these applications. I.e. the parts are build from
incredible small devices: GDS polygons, to be precise (people use more
convenient building blocks, though). Most people who make SoCs embed some
standard core like an ARM (e.g. Cortex M0) or an 8051 (shudder - takes the
same area as a Cortex M0, but is horrible!), but that's because they chose
so, not because it's not feasible to develop your own architecture.
Architectures that usually don't surface to the user - e.g. when I embed a
b16 in a device of ours, it's not user-programmable. It's not visible what
kind of microprocessor there is or if there is any at all.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
While Von Neumann is dead, Von Neumann computing isn't, and current
designs are almost all multiple Von Neumann threads with coherent
shared memory. Except for GPUs and some specialist systems, where
the coherence is patchy.
Note that I am not saying that is a good approach, merely that it is
what we have today.
Regards,
Nick Maclaren.
This one is just as obvious today as it was (at least to SF writers and
readers) 30 years ago:
Computers will write the code of future computers, we just need to pass
the singularity where AI actually starts to match human intelligence.
From that point on things will get very interesting, possibly in the
chinese sense. :-)
I still believe it is possible we will reach that singularity within my
lifetime, we're not missing that many orders of magnitude now.
Possibly interesting proto-thought:
I'm using Excel because the team I'm working with uses Excel. And
because there are some useful features, usually user-interfacey, that
are hard to get access to via other means. But mainly because of
putative "ease of use".
Observation: there are situations where the "Excel way" leads to
sub-optimal algorithms. O(N^2) instead of O(N), in two examples I have
found.
To avoid such problems one must leave Excel and resort to VBA or some
other programming language. I'm willing to do that, but others may not be.
I wonder if other "ease of use" facilities similarly lead to situations
of suboptimal algorithms.
Because of the sub-optimal algorithms that Excel encourages, people are
encouraged NOT to increase problem size, since O(N^2) bites them more
quickly
So we get a subculture of Excel users, forced to deal with smaller
problem sizes because of scale-out issues. Versus a subculture of
programmers, that can deal with larger problems, but which doesn't have
the "ease of use" of Excel for smaller problems.
Who has the competitive advantage?
Probably the hybrid subculture that has a small number of hackers
provide outside-of-Excel scalability to the Excel users.
My vision requires more transistors. I.e. my vision can take advantage
of more transistors. But this means more power (a) only if they are
switching, and (b) only if they are leaking.
We know how to make non-leaky devices. So long as they are the same
size, just slower, and so long as we are in the exponential part of the
leakage curve - 10% slower => 10X less leakage - we can take advantage
of them for big machines.
At some point we run into the problem that less leaky devices are
bigger. But, again, its the shape of the curve that matters.
Since I postulate the square law tradeoff - performance varies as the
square root of the number of devices - so long as the leakage is below
the square root line, we can add extra transistors to increase
performance without running into the leakage based power wall.
Plus, I must admit that I have some hope that there are "miraculous
technologies" that have much better leakage. I have encountered some in
the literature, but I must admit that I do not know how practical they
are. Better folks than me say they are.
As for the dynamic power, similar considerations: the total amount of
charge switched per cycle must stay below the square root curve. More
precisely, the total amount of charge*frequency*voltage must stay below
the square root curve.
This leads us into a place where we want to take advantage of a large
number of slow devices. There are two ways to do this: (a)
everything slow - lots of simple, slow, MIMD cores, and (b) a very small
amount of fast logic in a sea of slow logic. Possibly centralized; more
likely distributed like raisins in rice pudding. In the latter case we
are at MIMD again: the only difference is whether the core is all slow,
or has some fast transistors.
Oh, Terje, you are slipping!
Firstly, that's not a singularity, as there is no reason to believe
in infinite expansion in a strictly finite time. And Vinge should
know better, too :-(
Secondly, the problem isn't capacity, but capability. We have had
both programs that write programs and self-adapting programs for
several decades. But we don't know how to design them so that they
can write programs more capable (roughly, 'more complex') than
themselves, or even AS capable but different. All we know is how
to design them to write simpler, but larger, programs.
Regards,
Nick Maclaren.
I'll reread the HSW papers and get back to comp.arch.
In the meantime, you can get an idea of my favorite, dating back to
2004, when I was briefly "free", between leaving AMD and rejoining
Intel. Look for patents by me, that are not assigned to AMD or Intel.
Here, from my CV:
---
June 2004-August 30, 2004: Inventor, Oceanside, Oregon
The first time in years that my inventions have been owned by me, not my
employers, Intel or AMD
Formulated MultiStar, a multicluster/multithreaded microarchitecture
with multilevel everything: multilevel branch predictor, scheduler,
instruction window, store buffer, register file
Worked around and invented alternatives to proprietary (patented or
trade secret) technology
I.e. I violated no Intel or AMD I.P. This was all new stuff.
Invented solutions to several problems I had been trying to solve for
years.
Even now (in 2009) it is still leading edge.
Patents applied for, with Centaurus Data LLC, include:
20080133893 HIERARCHICAL REGISTER FILE
20080133889 HIERARCHICAL INSTRUCTION SCHEDULER
20080133885 HIERARCHICAL MULTI-THREADING PROCESSOR
20080133883 HIERARCHICAL STORE BUFFER
20080133868 METHOD AND APPARATUS FOR SEGMENTED SEQUENTIAL STORAGE
Some of these patents (in application) have already been licensed by
Fortune 500 companies.
Did not make much progress on my book, apart from 70 pages of draft
MultiStar description.
---
I called this Multi-Star - a processor with Multi-level everything.
Of course I had to avoid Intel or AMD I.P. in multi-star, so I am not
always using the best known way to do things.
There are probably a few more patent applications and continuations. I
was obviously not able to work on this outside-Intel stuff when I was an
Intel employee. And Intel forbade me from working on OOO CPUs inside
Intel when I reminded them I had patents pending (which I had disclosed
to them when I was hired). I left Intel in part because I want to
continue work in these directions.
However, not so much free time at new job. At least now I am able to
talk about them on comp.arch. And I hope to summarize the key points
for comp.arch. Here's a hint: at least one of the ideas/inventions
pertaining to hierarchical instruction scheduling is, IMHO, one of the
best I have ever had.
Note: I'm not just an OOO bigot. I also have neat ideas about parallel
systems, MP, MIMD, Coherent Threading. But I am probably the most
aggressive OOO computer architect on Earth.
This is why I get frustrated when people say "OOO CPU design has run
into a wall, and can't improve: look at Intel and AMD's latest designs".
I know how to improve things - but I was not allowed to work on it at
Intel for the last 5 years. Now I can - in my copious free time.
Leakage is exponential in voltage around threshold, but the exponent
is highly temperature dependent. Operating at cryogenic temperatures
lowers theleakage almost to vanishing with current voltages.
Remaining problems center on threshold control, which then becomes
dominant, but we can again be slopply and work with full swing CMOS
technology at very low Vdd. Superconducting interconnect raises the
velocity of chip crossings from 1-3% of the speed of light to 60% of
the speed of light, and makes good inductors possible. Resonant power
recovery (at least of the clocks, and probably of much of the logic
with reversible techniques) becomes easy. We will do this, but
probably not in your phone.
This is where the disco-fet will help a little. As the gate electrons
are stored within a parallel channel, and do not have to be pulled in/
out. The lower miller capacitance reduces the dynamic CMOS currents,
and increases the switching speed, reducing the voltage crowbar effect
in the switchover. The RC stripline charging is still an issue. Maybe
the routing should be active inverter chains?
cheers jacko
Does the way your spreadsheet works force serial calculations? I.e. are
almost all the cells that are to be recalculated dependent upon the
previous one, thus forcing a serial chain of calculations. Or are there
"multiple chains of dependent cells" that are only serial due to the
way Excel itself is programmed? If the latter, one could enhance Open
Office to use multiple threads for the recalcs which would take
advantage of multiple cores for something useful.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
>Who has the competitive advantage?
>
>Probably the hybrid subculture that has a small number of hackers
>provide outside-of-Excel scalability to the Excel users.
An exceptionally smart Haskellite of my acquaintance is extremely
well-paid by Credit Suisse to write clever VBA (or, indeed, clever
Haskell that writes VBA) that talks both to large compute farms and to
Excel, for precisely this reason.
Obviously if I'm taking an average of a billion simulation runs, I
won't do it in Excel; but if I'm recording how the averages behave
when I change between six conditions, Excel feels like the thing to
use.
(actually it's gnumeric most of the time, but Excel has a much better
'attempt to minimise cell X by changing cells Y, Z and T' interface;
gnumeric's is a linear-programming tool and Excel's seems to be
basically simplex method)
gnuplot is ludicrously bad at fitting non-linear models (even
something as simple as a*x**c) to data, to the point that I wonder
whether it's actually buggy - taking enough logs to make the fitting
linear sometimes helps.
Tom
I would be that a huge chunk of the time isn't in doing the actual
calculations but in verifying that the calculations can be done.
Spreadsheets are pretty much the ultimate in mutably-typed interactive
code, and there's very little to prevent a recalculation from requiring
a near-universal reparse.
in the mid-80s ... I noticed the code in unix and commented that I had
replaced almost the identical code that was in cp67 in 1968 (some
conjecture that cp67 might possibly traced back to ctss ... and unix
might also traced design back to ctss ... potentially via multics).
i've periodically mentioned that this was significant contribution to
being able to leave the system up 7x24 ... allowing things like offshift
access, access from home, etc.
the issue was that the mainframes "rented" and had useage meters ...
and paid monthly useage based on the number of hours run in the useage
meters. in the early days ... simple sporadic offshift useage wasn't
enuf to justify the additional rental logged by the useage meters.
the useage meters ran when cpu &/or i/o was active and tended to
log/increment a couple hundred milliseconds ... even if only had a few
hundred instructions "tic'ing" a few times per second (effectively
resulting in the meter running all the time). moving to event based
operation and eliminating the "tic'ing", helped enabling the useage
meter actually stopping doing idle periods.
the other factors (helping enable transition to leaving systems up 7x24)
were
1) "prepare" command for terminal i/o ... allowed (terminal) channel i/o
program to go appear idle (otherwise would have also resulted in useage
meter running) but able to immediately do something when there were
incoming characters
and
2) automatic reboot/restart after failure (contributed to lights out
operation, leaving the system up 2nd & 3rd shift w/o human operator
... eliminating those costs also).
on 370s, the useage meter would take 400 milliseconds of idle before
coasting to stop. we had some snide remarks about the favorite son
operating system that had a "tic" process that was exactly 400
milliseconds (if the system was active at all, even otherwise completely
idle, it was guaranteed that the useage meter would never stop).
--
40+yrs virtualization experience (since Jan68), online at home since Mar1970
>
> Sorry, that's not true. Especially in the "small embedded intelligent
> device" area we are talking about. The scale of integration changed: You
> will produce a SoC for these applications. I.e. the parts are build from
> incredible small devices: GDS polygons, to be precise (people use more
> convenient building blocks, though). Most people who make SoCs embed some
> standard core like an ARM (e.g. Cortex M0) or an 8051 (shudder - takes the
> same area as a Cortex M0, but is horrible!), but that's because they chose
> so, not because it's not feasible to develop your own architecture.
I think you must work in a slightly more rarified atmosphere :-). The
only place i've seen custom with embedded core is in mobile telephony,
though it's probably far more prevalent now, as it moves more into the
mainstream and where the high volume can justify it. Most embedded
devices still use off the shelf micros afaics, including a lot of
consumer electronics. I still use later si labs 8051 variants (shock
horror) for simple tasks and logic replacement. The compilers produce
reliable, if tortuous code. Historically, used a lot of 68k, but am
moving over to arm for the more complex stuff because everyone makes it.
It is a bit idiosyncratic and stuff like the interrupt handling takes
some getting used to. Portable libraries are important here, so limiting
architectures saves time and money. Most of the client work is low to
medium volume where there is a bit more scope for creativity and novel
solutions. Current project is based around Renesas 80C87 series, which
is a fairly neat 16 bit machine. I didn't choose it, but it's a very
conventional architecture and easy to integrate and program.
>
> Architectures that usually don't surface to the user - e.g. when I embed a
> b16 in a device of ours, it's not user-programmable. It's not visible what
> kind of microprocessor there is or if there is any at all.
>
Just as it should be, but simple user interfaces often represent a large
proportion of the overall software effort, just to make them that way...
Regards,
Chris
>
> Once again that's irrelevant to the question under discussion here:
> whether Terje's statement that Merced "_would_ have been, by far, the
> fastest cpu on the planet" (i.e., in some general sense rather than for
> a small cherry-picked volume of manually-optimized code) stands up under
> any real scrutiny.
I think that Intel seriously expected that the entire universe of
software would be rewritten to suit its ISA.
As crazy as that sounds, it's the only way I can make sense of Intel's
idea that Itanium would replace x86 as a desktop chip.
To add spice to the mix of speculation, I suspect that Microsoft would
have been salivating at the prospect, as it would have been a one-time
opportunity for Microsoft, albeit with a huge expenditure of
resources, to seal the doom of open source.
None of it happened, of course, but I think your objection about hand-
picked subsets of software would not have impressed Intel management.
Robert.
I am sure you are familiar with Monsoon & Id, and all the work that went
into serializing the dataflow graph :).
As for recomputing the dataflow graph; several papers/theses called for
explitictly annotating instructions with the dependence distance(s). I
always wondered:
- do you annotate for flow (register-write-read-dependences) only?
- or do you annotate for any (write-write and read-write as well)?
- how do you deal with multi-path (particularily dependence joins)?
- how do you deal with memory (must vs. may)?
- and how do you indicate this information so that the extra bits in the
I$ don't overwhelm any savings in the dynamic logic?
The first two points are important, because the information you need
differs between implementations with and without renaming.
About making windows bigger: my last work on this subject is a bit
dated, but, at that time, for most workloads, you pretty soon hit a
point of exponentially smaller returns. Path mispredicts & cache misses
were a couple of the gating factors, but so were niggling little details
such as store-queue sizes, retire resources & rename buffer sizes. There
is also the nasty issue of cache pollution on mispredicted paths.
> As crazy as that sounds, it's the only way I can make sense of Intel's
> idea that Itanium would replace x86 as a desktop chip.
I don't think that it's as crazy as it sounds (today). At the time
Microsoft had Windows NT running on MIPS and Alpha as well as x86: how
much effort would it be to run all of the other stuff through the
compiler too?
We're a little further along, now, and I think that the MS/NT experience
and Apple's processor-shifting [six different instruction sets in recent
history: 68k, PPC, PPC64, ia32, x86_64 and at least one from ARM, perhaps
three] (and a side order of Linux/BSD/Unix cross-platform support) shows
that the bigest headaches for portability (across processors but within a
single OS) are word-size and endianness, rather than specific processor
instruction sets. We're still having issues with pointer size changes as
we move from ia32 to x86_64, but at least the latter has *good* support
for running the former at the same time (as long as the OS designers
provide some support and ship the necessary 32-bit libraries).
Of course, most of the pointer-size issues stem directly from the use of
C et al, where they show up all sorts of bad assumptions about integer
equivalence and structure size and packing. Most other languages don't
even have ways to express those sorts of concerns, and so aren't as
affected.
> To add spice to the mix of speculation, I suspect that Microsoft would
> have been salivating at the prospect, as it would have been a one-time
> opportunity for Microsoft, albeit with a huge expenditure of
> resources, to seal the doom of open source.
How so? Open source runs fine on the Itanium, in general. (I think that
most of the large SGI Itanium boxes only run Linux, right?)
Cheers,
--
Andrew