!! Intel goes APE-SHIT after hearing about the X86 Consortium !!

Isaac Jing Herng Leung

unread,

Jan 21, 1997, 3:00:00 AM1/21/97

to

Jim Brooks (jbr...@swbell.net) wrote:
: If you think about it, it is pretty dumb to put a 300Mhz CPU
: in a system whose memory runs at 70ns (that speed was originally
: intended for a 8Mhz IBM AT). I guess neither Intel nor Cyrix
: have figure that one out yet.

: But the X86 Consortium has. Guess what they plan to do?
: Heck, they are going to do what is obvious, they are going
: to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
: on one single die. Fuck the L2 cache, the memory now runs as
: fast as the CPU. Yep, when you want to upgrade your system,
: you upgrade to both a faster processor and more memory.

Guess what? Any 2nd year engineer or CompSci student knows this basic
principle. hence the existence of cache.
This isn't some overnight revelation or great leap forward in technology.
(Sorry to burst your bubble there).

But anybody with a clue about semi-conductor manufacture knows that the
problem is COST.
If I had 32MB worth of CACHE (SRAM), that's faster than 32MB of DRAM.
The same on board the CPU itself is even better.
But the yields for sucha huge die would be low, low low. This
means the cost would be astronomically high. (The Pentium is already a
huge die itself). If you want to pay for this, this is an easy change. I'm
guessing they could tape out within 3 months and have parts back within 6
months if there was a market for this.

Isaac

Timothy Edwards

unread,

Jan 21, 1997, 3:00:00 AM1/21/97

to

Isaac Jing Herng Leung wrote:

> Jim Brooks (jbr...@swbell.net) wrote:
> : But the X86 Consortium has. Guess what they plan to do?
> : Heck, they are going to do what is obvious, they are going
> : to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
> : on one single die. Fuck the L2 cache, the memory now runs as
> : fast as the CPU. Yep, when you want to upgrade your system,
> : you upgrade to both a faster processor and more memory.
>
> Guess what? Any 2nd year engineer or CompSci student knows this basic
> principle. hence the existence of cache.

When were your Computer Science textbooks printed? 1985? 1989?
Go re-educate yourself. Those same textbooks asserted that it was
impossible for CMOS processors to go beyone 100Mhz, it could
only be done with gallium-arsenide. Hogwash. I am sure that you believed
it back then. People must learn that those academic textbooks are not prophecy
and commandments that can never be questioned.

> This isn't some overnight revelation or great leap forward in technology.
> (Sorry to burst your bubble there).
> But anybody with a clue about semi-conductor manufacture knows that the
> problem is COST.

But it is a technology leap. That X86 consortium evidently has an
economically viable high density CMOS technology, otherwise they would
not announce their product.

> If I had 32MB worth of CACHE (SRAM), that's faster than 32MB of DRAM.
> The same on board the CPU itself is even better.
> But the yields for sucha huge die would be low, low low. This
> means the cost would be astronomically high. (The Pentium is already a

> huge die itself). If you want to pay for this, this is an easy change. I'm ^^^^^^^^

> guessing they could tape out within 3 months and have parts back within 6
> months if there was a market for this.

Ah, but the definition of a "huge die" is always changing.
People once said the 6502 was cheaper than the 68K because the
68K had a "huge die". Today, the dies of both are considered miniscule.

I for one am glad that Wintel is now possibly facing serious competition.

--
Timothy

Jim Brooks

unread,

Jan 21, 1997, 3:00:00 AM1/21/97

to

If you think about it, it is pretty dumb to put a 300Mhz CPU
in a system whose memory runs at 70ns (that speed was originally
intended for a 8Mhz IBM AT). I guess neither Intel nor Cyrix
have figure that one out yet.

But the X86 Consortium has. Guess what they plan to do?

Heck, they are going to do what is obvious, they are going
to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
on one single die. Fuck the L2 cache, the memory now runs as
fast as the CPU. Yep, when you want to upgrade your system,
you upgrade to both a faster processor and more memory.

Go read the news at:
http://www.news.com/News/Item/0,4,7211,00.html

Yet another reason for poor Andy Grovy to go paranoid ;-)

--

| | Jim Brooks
| _ | jbr...@swbell.net
______________|_(_)_|______________ PGP public key available
+|+ [ ( o ) ] +|+
* O[_]---[_]O *

Erick Smith

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

> : Heck, they are going to do what is obvious, they are going

> : to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
> : on one single die. Fuck the L2 cache, the memory now runs as
> : fast as the CPU. Yep, when you want to upgrade your system,
> : you upgrade to both a faster processor and more memory.

The problem is that other devices also have to access the memory. For
example, the DMA on your HD controller is going to need access to the
memory.

Additionally, putting all the memory on the CPU wouldn't improve
performance that much anyway. Even with a 256K L2 cache you get a ninty
something percent hit rate.

Just my 2 cents worth...

Erick Smith
Software Design Engineer
Microsoft

Tony Skinner

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

On Tue, 21 Jan 1997 19:42:48 -0500, Timothy Edwards
<ti...@utdallas.edu> wrote:

>Isaac Jing Herng Leung wrote:
>> Jim Brooks (jbr...@swbell.net) wrote:

>> : But the X86 Consortium has. Guess what they plan to do?

>> : Heck, they are going to do what is obvious, they are going
>> : to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
>> : on one single die. Fuck the L2 cache, the memory now runs as
>> : fast as the CPU. Yep, when you want to upgrade your system,
>> : you upgrade to both a faster processor and more memory.
>>

>> Guess what? Any 2nd year engineer or CompSci student knows this basic
>> principle. hence the existence of cache.
>
>When were your Computer Science textbooks printed? 1985? 1989?
>Go re-educate yourself. Those same textbooks asserted that it was
>impossible for CMOS processors to go beyone 100Mhz,

We are talking about manufacturing difficulties here, not speed
limitations. Look at the Pentium Pro, ever wonder why they are so
expensive? Because if any of the components of the chip (cache,or any
other part) results in junking the whole chip. If Intel is using
35micron technology and is having manufacturing problems with only
512K of cache, I would love to see the output of a chip with
32megabytes of cache!

>But it is a technology leap. That X86 consortium evidently has an
>economically viable high density CMOS technology, otherwise they would
>not announce their product.

I seriously doubt they will be successful in implementing 32mb of SRAM
on to the processor. Announcement of their product could be an
attempt to entice investors into dumping more cash into their works.
It could very well end up being a flop.

>
>> If I had 32MB worth of CACHE (SRAM), that's faster than 32MB of DRAM.
>> The same on board the CPU itself is even better.
>> But the yields for sucha huge die would be low, low low. This
>> means the cost would be astronomically high. (The Pentium is already a
>> huge die itself). If you want to pay for this, this is an easy change. I'm ^^^^^^^^
>> guessing they could tape out within 3 months and have parts back within 6
>> months if there was a market for this.
>
>Ah, but the definition of a "huge die" is always changing.
>People once said the 6502 was cheaper than the 68K because the
>68K had a "huge die". Today, the dies of both are considered miniscule.
>

What does it matter? The fact is, the larger the DIE the more
difficult it becomes to manufacture the DIE. Yes the DIE sizes have
grown as opposed to the previous chips they replaced but the amount of
transistors has surpassed proportionally the size of the old DIEs to
the new DIE sizes!
If they have some radical new .05 micron , BiCMOS technology
its possible they could make a reasonably small DIE with maybe upto
16mb of cache.

Christopher D Dukes

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

In article <5c3imq$qo$2...@nntp.ucs.ubc.ca>,

Isaac Jing Herng Leung <isa...@canucks.ee.ubc.ca> wrote:
>
>Guess what? Any 2nd year engineer or CompSci student knows this basic
>principle. hence the existence of cache.

Apparently your University is better than mine. Here the Senior
CSC students still don't understand what a segmentation fault is.
That aside.

>This isn't some overnight revelation or great leap forward in technology.
>(Sorry to burst your bubble there).
>
>But anybody with a clue about semi-conductor manufacture knows that the
>problem is COST.

>If I had 32MB worth of CACHE (SRAM), that's faster than 32MB of DRAM.
>The same on board the CPU itself is even better.
> But the yields for sucha huge die would be low, low low. This
>means the cost would be astronomically high. (The Pentium is already a
>huge die itself). If you want to pay for this, this is an easy change. I'm

>guessing they could tape out within 3 months and have parts back within 6
>months if there was a market for this.

Who says it has to be a large die? Intel has already shown (again)
that the package need not contain just one silicon chip. I need to
reread the summary on Paul Franzon's work on "flip-chip" technology
but as memory serves, a CPU that is traditionally a single large die
design (Like say the MIPS) can be implemented by interconnecting
the chips from several small dies all in one package.
Here is where I get fuzzy on the summary, but if you etch the wafers with
several different dies (The components of the flip chip) and have
a process to automatically test the "sub-chips" you can increase
yields at the same time you lower costs.
An added bonus is that it becomes much easier to mix
heterogeneous semiconductor technologies.

However, this does not address another issue. It costs a lot more
to make a meg of sram than a meg of dram, I don't care how you package it.

Oh well, just raising a little purgatory.

I'll see if I can scrounge up
>
>
>Isaac

--
The following must be destroyed. Microsoft, Lyons Partnership, SoftCell.
Balkanize USENET!! Vote from the rooftops!!
Sending unsolicited commercial massmail to this account may result
in a network outage for your site. Have a nice day.

Jonas Palm

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

In article <01bc0812$69ddfa40$ec5b379d@ericks_dev>, "Erick Smith"
<eri...@microsoft.com> wrote:

> > : Heck, they are going to do what is obvious, they are going
> > : to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
> > : on one single die. Fuck the L2 cache, the memory now runs as
> > : fast as the CPU. Yep, when you want to upgrade your system,
> > : you upgrade to both a faster processor and more memory.
>

> The problem is that other devices also have to access the memory. For
> example, the DMA on your HD controller is going to need access to the
> memory.
>
> Additionally, putting all the memory on the CPU wouldn't improve
> performance that much anyway. Even with a 256K L2 cache you get a ninty
> something percent hit rate.
>
> Just my 2 cents worth...
>
> Erick Smith
> Software Design Engineer
> Microsoft

The Question is not only what hit-rate you get on your 256K L2 cache,
but also how much it costs you to to go to the L2 cache instead of
keeping on-chip, and how much it costs you to go to main mem in those
instances where you _do_ get a cache miss.

Back in the days when the discussion was about scalar vs. vector
the saying went that even if you managed to vectorize 90% of
your code, the remaining 10% scalar code would still be rate-
limiting if your vector performance was more than ten times
higher than the scalar.

The same was applied to parallellizing code.

And the same can be applied to executing data and instructions
residing in cache vs. in main memory in today personal computers.

Check out the HINT benchmark results to get a good feeling for the
effects of memory hierarhy size/speed vs problem size vs performance.
Wonderfully descriptive graphs at that site.

HINT mainly adresses bandwidth though, and latencies are just as
important.

Reducing latencies is a hot area now, and has been for some time.
Digital has some good docs on their web-site, relating to Alphas
of course, but all manufacturers have to adress the issue as
the CPU cores get ever faster.

Memory architecture issues are what determine much of what the
user perceives as CPU performance.

The memory architectures of personal computers lag enormously
behind the development of the CPU cores.

Programmers should be aware of these issues, and, if possible,
code accordingly. Sadly, many programmers aren't aware, and
many problems can't be productively reprogrammed anyway.

I don't for a second believe that this initiative will have a
major impact on day to day computing in the forseeable future,
but they _do_ target the correct performance bottleneck.

Jonas Palm

Jim Brooks

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Erick Smith wrote:

> The problem is that other devices also have to access the memory. For
> example, the DMA on your HD controller is going to need access to the
> memory.

Yes, but it is not a problem, really. On an ordinary PC,
when the DMA requests an address from memory, the memory
controller responds. But on an X86 Consortium system,
the BIU of the CPU would respond to the memory request,
if the SRAM of the CPU would have it.

> Additionally, putting all the memory on the CPU wouldn't improve
> performance that much anyway. Even with a 256K L2 cache you get a ninty
> something percent hit rate.

Wrong. There would be a significant performance boost,
which would increase with clock speed, when compared
to today's CPU/external L2 cache system.

I agree, the average hit rate of a L2 chace is 90%.
However, the CPU reads from a L2 cache at the speed
of the CPU's bus, which for a 150Mhz Pentium, happens
to be 75Mhz! So think of it like this: When there is
a L1 miss and a L2 hit, the 150Mhz Pentium drops down
to half its speed. This half-speed reduction happens
frequently because there are lots of misses for a
8K L1 cache. Performance goes to hell when there are
misses in both L1 & L2, in which case the CPU effectively
drops down to something like 15..20 Mhz!

So, tommorrow's CPU wars won't be decided by who has the
most efficient microarchitecture, clock rate, best VLIW sheme, etc.
The deciding factor will be who will be first to get the
technology to integrate processor and memory on a single IC.

> Erick Smith
> Software Design Engineer
> Microsoft

--

Jim Brooks

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Erick Smith wrote:

> The problem is that other devices also have to access the memory. For
> example, the DMA on your HD controller is going to need access to the
> memory.

Yes, but it is not a problem, really. On an ordinary PC,
when the DMA requests an address from memory, the memory
controller responds. But on an X86 Consortium system,
the BIU of the CPU would respond to the memory request,
if the SRAM of the CPU would have it.

> Additionally, putting all the memory on the CPU wouldn't improve
> performance that much anyway. Even with a 256K L2 cache you get a ninty
> something percent hit rate.

Wrong. There would be a significant performance boost,
which would increase with clock speed, when compared
to today's CPU/external L2 cache system.

I agree, the average hit rate of a L2 chace is 90%.
However, the CPU reads from a L2 cache at the speed

of the CPU's bus, which for a 180Mhz Pentium, happens
to be 60Mhz! So think of it like this: When there is
a L1 miss and a L2 hit, the 180Mhz Pentium drops down
to a third of its speed! This 1/3 speed reduction happens

frequently because there are lots of misses for a
8K L1 cache. Performance goes to hell when there are
misses in both L1 & L2, in which case the CPU effectively
drops down to something like 15..20 Mhz!

So, tommorrow's CPU clone wars won't be decided by who has the

most efficient microarchitecture, clock rate, best VLIW sheme, etc.
The deciding factor will be who will be first to get the

technology to integrate the processor and memory on a single IC.

Mark Willoughby

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Ahh, the voice of reason.

Using several small dies in a single package is known technique. You run into some
problems with the package size, as you still need to get all of your wires out for
the address and data busses. Let's face it, even if they put 32Meg of fast memory
on chip, and make it dual ported, it's still not going to be enough. You'd also
want to optimize your operating system to take advantage of the fast local memory:
an extra level in the memory model rather than a big local cache.

One other technique is bump bonding, where connections between the two dies have
small blobs like solder dots on them, and are pushed together. What you lose in
area for this array, you gain in yield (of the package), because the array can be
much more dense than the normal I/O which is bonding wire limited. Plus these
'local' signals don't need too much in the way of drive, or ESD protection, compared
to those going off package.

> However, this does not address another issue. It costs a lot more
> to make a meg of sram than a meg of dram, I don't care how you package it.
>

True enough, but how much can you shave off the access time by having local DRAM
(or SDRAM for that matter) local as opposed to sat on a SIMM miles away.

> Oh well, just raising a little purgatory.
>
> I'll see if I can scrounge up
> >
> >
> >Isaac
>

Given that the price of all silicon is artificially high to accomodate R&D, Fab
buidling etc. it would be interesting to see what they'd price point this at.

Cheers

Mark

Tim Behrendsen

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Jim Brooks <jbr...@swbell.net> wrote in article <32E52D...@swbell.net>...

> If you think about it, it is pretty dumb to put a 300Mhz CPU
> in a system whose memory runs at 70ns (that speed was originally
> intended for a 8Mhz IBM AT). I guess neither Intel nor Cyrix
> have figure that one out yet.

My God! The memory is a bottleneck??? STOP THE PRESSES!

The answer is so simple! Why didn't anyone ever think of it?

> But the X86 Consortium has. Guess what they plan to do?

> Heck, they are going to do what is obvious, they are going
> to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
> on one single die. Fuck the L2 cache, the memory now runs as
> fast as the CPU. Yep, when you want to upgrade your system,
> you upgrade to both a faster processor and more memory.

Sheer brilliance! Only -- uh, one small question. How much
difference is it going to make when most code executes out of
the L2 cache? How much of a *throughput* difference are we
really going to see?

And then there's that small question of, uh, cost. SRAM is a
little pricy to put 32M on a chip with the CPU.

Then there's that darn yield problem of putting all that kaka
on a chip.

And still remains the question of having to trash the whole
chip to put in some more memory. Have you noticed that 16meg
of memory is forty bucks nowadays?

> Go read the news at:
> http://www.news.com/News/Item/0,4,7211,00.html
>
> Yet another reason for poor Andy Grovy to go paranoid ;-)

Andy "Grovy" Grove is sleeping very well, I would think.

-- YAWN -- yet another consortium to laugh at. Remember the
"RAM consortium" of a few years ago? Or shall we hold up
the "HDTV consortium". Or how about <giggle> the "EISA
consortium". Or CDE, or ...

When will they learn that technology by consortium usually ends
up being a complete joke?

--
==========================================================================
| Tim Behrendsen (t...@a-sis.com) | http://www.cerfnet.com/~timb |
| "Judge all, and be prepared to be judged by all." |
==========================================================================

Maynard Handley

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

In article <32E562...@utdallas.edu>, Timothy Edwards
<ti...@utdallas.edu> wrote:

> > This isn't some overnight revelation or great leap forward in technology.
> > (Sorry to burst your bubble there).
> > But anybody with a clue about semi-conductor manufacture knows that the
> > problem is COST.
>

> But it is a technology leap. That X86 consortium evidently has an
> economically viable high density CMOS technology, otherwise they would
> not announce their product.

Obviously they must. FUD is something unheard of in the computer industry.
How old are you? 12?

Maynard

--
My opinion only

Frank Wilkerson

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Timothy Edwards wrote:

" < --
" <> My opinion only
" > < Maynard
" <
" : Subject:
" : Re: !! Intel goes APE-SHIT after hearing about the X86 Consortium !!
" : Date:
" : Wed, 22 Jan 1997 13:23:21 -0800
" : From:
" : hand...@apple.com (Maynard Handley)
" : Organization:
" : Apple Computer
"
" Hey, Maynard, what's your problem? Did get panties get in a wad
" after Apple posted a lost of $120M? Or maybe after you heard
" that Steve Jobs is back in charge now, and rumor has it that
" he is going to clean house and get rid of the dead weight at Apple
" (oops, guess you don't have much job security there in Apple),
" in a vain attempt to save the rotten Apple from Wintel.
"
" Concerning my age, I am old enough to know that:
"
" 1. It is dumb to buy a PowerMac.
" 2. It is dumb to buy stock in Apple.
" 3. It is dumb to send my resume to Apple when I get my MSCS.
"
" --
" Timothy

Who is the biggest dickhead on Usenet: Scott Nudds or Timothy Edwards?
Let's have a vote. I vote for Timbo.

Sincerely,
Frank A. Wilkerson
OptTek EDA Tools, Inc.
Disclaimer: My opinions are my own, not those of my employer.

Doug Zabel

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Timothy Edwards wrote:
>
> Isaac Jing Herng Leung wrote:
> > Jim Brooks (jbr...@swbell.net) wrote:

> > : But the X86 Consortium has. Guess what they plan to do?

> > : Heck, they are going to do what is obvious, they are going
> > : to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
> > : on one single die. Fuck the L2 cache, the memory now runs as
> > : fast as the CPU. Yep, when you want to upgrade your system,
> > : you upgrade to both a faster processor and more memory.
> >

> > Guess what? Any 2nd year engineer or CompSci student knows this basic
> > principle. hence the existence of cache.
>

> When were your Computer Science textbooks printed? 1985? 1989?
> Go re-educate yourself. Those same textbooks asserted that it was

> impossible for CMOS processors to go beyone 100Mhz, it could
> only be done with gallium-arsenide. Hogwash. I am sure that you believed
> it back then. People must learn that those academic textbooks are not prophecy
> and commandments that can never be questioned.
>

> > This isn't some overnight revelation or great leap forward in technology.
> > (Sorry to burst your bubble there).
> > But anybody with a clue about semi-conductor manufacture knows that the
> > problem is COST.
>

> But it is a technology leap. That X86 consortium evidently has an
> economically viable high density CMOS technology, otherwise they would
> not announce their product.
>

> > If I had 32MB worth of CACHE (SRAM), that's faster than 32MB of DRAM.
> > The same on board the CPU itself is even better.
> > But the yields for sucha huge die would be low, low low. This
> > means the cost would be astronomically high. (The Pentium is already a

> > huge die itself). If you want to pay for this, this is an easy change. I'm ^^^^^^^^

> > guessing they could tape out within 3 months and have parts back within 6
> > months if there was a market for this.
>

> Ah, but the definition of a "huge die" is always changing.
> People once said the 6502 was cheaper than the 68K because the
> 68K had a "huge die". Today, the dies of both are considered miniscule.
>

> I for one am glad that Wintel is now possibly facing serious competition.
>
> --
> Timothy

You're dumb. Any way to slice it, making the die larger makes the
product MUCH more expensive. x2 die size probably equals x4 cost.

Timothy Edwards

unread,

Jan 22, 1997, 3:00:00 AM1/22/97

to

Maynard Handley wrote:
> < In article <32E562...@utdallas.edu>, Timothy Edwards
< <ti...@utdallas.edu> wrote:
<

< > > This isn't some overnight revelation or great leap forward in technology.
< > > (Sorry to burst your bubble there).
< > > But anybody with a clue about semi-conductor manufacture knows that the
< > > problem is COST.
< >
< > But it is a technology leap. That X86 consortium evidently has an
< > economically viable high density CMOS technology, otherwise they would
< > not announce their product.
<

Elmer Phud

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

In article <32E642...@swbell.net>, jbr...@swbell.net says...

>
>Erick Smith wrote:
>
>> The problem is that other devices also have to access the memory. For
>> example, the DMA on your HD controller is going to need access to the
>> memory.
>
>Yes, but it is not a problem, really. On an ordinary PC,
>when the DMA requests an address from memory, the memory
>controller responds. But on an X86 Consortium system,
>the BIU of the CPU would respond to the memory request,
>if the SRAM of the CPU would have it.
>

Jim the PPro runs its cache at full CPU frequency. The hit rate is 90%.
Adding 32 meg of Sram is only going after the remaining 10%. Assuming
32meg of sram as you stated earlier (which I seriuosly doubt for reasons
I will state later) that's increasing the cache size by 128x for at best
a 10% boost. Does that make sense? I don't think so...

For anyone who has seen a P6 package with the cap off, you will have seen
that the cache is roughly the same size as the P6 itself!!! If you are
going to add 32meg of cache to the processor chip (or package) that means
at least 128x the die area!!! Seeing as Intel has far and away the lions
share of the worlds .35u capacity, and they are capacity constrained, just
where is the fab capacity going to come from to supply this silicon?

Intel has 3 or 4 $1.5 billion fabs cranking out pentiums & pentium pros,
that means we will need about $6billion x 128 = $768 billion worth of
new fabs to do the trick.

Elmer Phud

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

In article <32E642...@swbell.net>, jbr...@swbell.net says...
>
>Erick Smith wrote:
>
>> The problem is that other devices also have to access the memory. For
>> example, the DMA on your HD controller is going to need access to the
>> memory.
>
>Yes, but it is not a problem, really. On an ordinary PC,
>when the DMA requests an address from memory, the memory
>controller responds. But on an X86 Consortium system,
>the BIU of the CPU would respond to the memory request,
>if the SRAM of the CPU would have it.
>

Seeing as this will be a consortium, the pricing will be very competative
which means profit margins will be low. Who in the world is going to be
stupid enough to get involved in this?

Lee

David T. Wang

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

Frank Wilkerson (fra...@teknet.com) wrote:

: Timothy Edwards wrote:
: "
: " Maynard Handley wrote:

: " <
: " < Obviously they must. FUD is something unheard of in the computer industry.

: " < How old are you? 12?
: " < --
: " <> My opinion only
: " > < Maynard
: "

: " Hey, Maynard, what's your problem? Did get panties get in a wad

: " after Apple posted a lost of $120M? Or maybe after you heard
: " that Steve Jobs is back in charge now, and rumor has it that
: " he is going to clean house and get rid of the dead weight at Apple
: " (oops, guess you don't have much job security there in Apple),
: " in a vain attempt to save the rotten Apple from Wintel.
: "
: " Concerning my age, I am old enough to know that:
: "
: " 1. It is dumb to buy a PowerMac.
: " 2. It is dumb to buy stock in Apple.
: " 3. It is dumb to send my resume to Apple when I get my MSCS.
: "
: " --
: " Timothy

: Who is the biggest dickhead on Usenet: Scott Nudds or Timothy Edwards?

: Let's have a vote. I vote for Timbo.

: Sincerely,
: Frank A. Wilkerson
: OptTek EDA Tools, Inc.
: Disclaimer: My opinions are my own, not those of my employer.

Hello "Frank A. Wilkerson".

Please say hello to "Timbo" and/or "Jimbo" for me. I find it quite odd that
the three of you share the same NNTP server, uses the same Newsreader,
and "Jimbo" and "Timbo" even shared the same "Generation X" Organization
name. Isn't this a strange world?

David. (Who really is from Maryland)

John Christie

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

Frank Wilkerson (fra...@teknet.com) wrote:

: Who is the biggest dickhead on Usenet: Scott Nudds or Timothy Edwards?

Lets not start a competition!
--

------------------------------------------------------------------

John Christie

"You aren't free because you CAN choose - only if you DO choose."

"All you are is the decisions you make. If you let circumstances make
them for you then what you are becomes very easy to estimate."

Robert Krawitz

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

In article <5c6svs$av4$2...@falcon.ns.net> #ep...@ns.net (Elmer Phud) writes:

Jim the PPro runs its cache at full CPU frequency. The hit rate is 90%.
Adding 32 meg of Sram is only going after the remaining 10%. Assuming
32meg of sram as you stated earlier (which I seriuosly doubt for reasons
I will state later) that's increasing the cache size by 128x for at best
a 10% boost. Does that make sense? I don't think so...

Assuming that the 90% hit rate is true (which I think is dubious, at
least in a heavily multitasked system running memory intensive stuff),
your arithmetic is still wrong.

Random memory access to 70 ns DRAM typically takes more like 110 ns;
access to the PPro cache takes 5 ns. Thus, you can get over 20
accesses to the PPro cache in the same time you can get one access to
main memory. Even with a 90% hit rate, 2/3 of the memory access time
is spent hitting main memory.

This analysis is simplistic, of course; it doesn't take into account
the fact that both fast page and EDO can do somewhat better on
consecutive memory accesses (such as bursts to fill a cache line).
--
Robert Krawitz <r...@tiac.net> http://www.tiac.net/users/rlk/

Member of the League for Programming Freedom -- mail l...@uunet.uu.net
Tall Clubs International -- http://www.tall.org/ or 1-800-521-2512

John Briggs, VAX system manager, x4411

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

In article <5c6svs$av4$2...@falcon.ns.net>, #ep...@ns.net (Elmer Phud) writes:
...

> Jim the PPro runs its cache at full CPU frequency. The hit rate is 90%.
> Adding 32 meg of Sram is only going after the remaining 10%. Assuming
> 32meg of sram as you stated earlier (which I seriuosly doubt for reasons
> I will state later) that's increasing the cache size by 128x for at best
> a 10% boost. Does that make sense? I don't think so...

Careful with those apples and oranges comparisons.

Improving cache hit rate from 90% to 100% is a 10% increase in cache hit
rate. This may or may not translate to a 10% increase in performance.

In this case, the important metric is not cache miss percentage. It is
cpu-idle-waiting-for-memory-reference percentage. The two numbers are
not identical.

Socratic hints:

o How many CPU cycles does a cache hit take?
o How many CPU cycles does a cache miss take?
o How often does the CPU cycle without making (or waiting on) a memory
reference?
o How do the answers to these questions alter the relationship between
cache miss rate and performance?

John Briggs vax...@alpha.vitro.com

Elmer Phud

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

In article <1997Jan2...@alpha.vitro.com>, vax...@alpha.vitro.com says...

This may be the all time "you missed the point".
Who cares about a couple of %? The point is 128x the silicon for very
little performance increase!!!.

El

Jeff Krob

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

On 23 Jan 1997 14:48:50 GMT, r...@sunspot.tiac.net (Robert Krawitz)
wrote:

>In article <5c6svs$av4$2...@falcon.ns.net> #ep...@ns.net (Elmer Phud) writes:
>

> Jim the PPro runs its cache at full CPU frequency. The hit rate is 90%.
> Adding 32 meg of Sram is only going after the remaining 10%. Assuming
> 32meg of sram as you stated earlier (which I seriuosly doubt for reasons
> I will state later) that's increasing the cache size by 128x for at best
> a 10% boost. Does that make sense? I don't think so...
>

>Assuming that the 90% hit rate is true (which I think is dubious, at
>least in a heavily multitasked system running memory intensive stuff),
>your arithmetic is still wrong.
>
>Random memory access to 70 ns DRAM typically takes more like 110 ns;

Where does this extra 40ns come from, pray tell?

John Briggs, VAX system manager, x4411

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

In article <5c81ri$ht8$1...@falcon.ns.net>, #ep...@ns.net (Elmer Phud) writes:
> In article <1997Jan2...@alpha.vitro.com>, vax...@alpha.vitro.com says...

...

>>Improving cache hit rate from 90% to 100% is a 10% increase in cache hit
>>rate. This may or may not translate to a 10% increase in performance.
>>
>>In this case, the important metric is not cache miss percentage. It is
>>cpu-idle-waiting-for-memory-reference percentage. The two numbers are
>>not identical.

...

> This may be the all time "you missed the point".
> Who cares about a couple of %? The point is 128x the silicon for very
> little performance increase!!!.

Hey, dufus with the invalid e-mail address:

I didn't disagree with your claim that 128x the silicon isn't worth it
for a minimal performance increase.

I disagreed with your laughable assumption that a 10% reduction in
cache miss rate is equivalent to a 10% improvement in performance.

*plonk*

John Briggs vax...@alpha.vitro.com

Christopher D Dukes

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

In article <32e795b4...@news.intercom.net>,

Jeff Krob <jk...@shore.intercom.net> wrote:
>On 23 Jan 1997 14:48:50 GMT, r...@sunspot.tiac.net (Robert Krawitz)
>wrote:
>
>>In article <5c6svs$av4$2...@falcon.ns.net> #ep...@ns.net (Elmer Phud) writes:
>>
>> Jim the PPro runs its cache at full CPU frequency. The hit rate is 90%.
>> Adding 32 meg of Sram is only going after the remaining 10%. Assuming
>> 32meg of sram as you stated earlier (which I seriuosly doubt for reasons
>> I will state later) that's increasing the cache size by 128x for at best
>> a 10% boost. Does that make sense? I don't think so...
>>
>>Assuming that the 90% hit rate is true (which I think is dubious, at
>>least in a heavily multitasked system running memory intensive stuff),
>>your arithmetic is still wrong.
>>
>>Random memory access to 70 ns DRAM typically takes more like 110 ns;
>
>Where does this extra 40ns come from, pray tell?

Overhead of detecting a cache miss, overhead of picking what
to dump from cache, overhead of pumping the data into the cache.
Plus in a multiprocessor environment, there is the added overhead
of keeping the other CPUS informed on cache coherency.

And right now the words "page tables" is sticking in my mind.

Klaus Mueller

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to Jeff Krob

In article <32e795b4...@news.intercom.net>

jk...@shore.intercom.net (Jeff Krob) writes:
>On 23 Jan 1997 14:48:50 GMT, r...@sunspot.tiac.net (Robert Krawitz)
>wrote:
>
>>In article <5c6svs$av4$2...@falcon.ns.net> #ep...@ns.net (Elmer Phud) writes:
>>Assuming that the 90% hit rate is true (which I think is dubious, at
>>least in a heavily multitasked system running memory intensive stuff),
>>your arithmetic is still wrong.
>>
>>Random memory access to 70 ns DRAM typically takes more like 110 ns;
>
>Where does this extra 40ns come from, pray tell?

The access time give the time from requesting a data from the RAM up
to the return of the data by the RAM. This is typically 70 or 60
ns. But after the read or write the RAM need a little time to
refresh. The sellers to not publicity this time.

A full cycle of a RAM read is the following:

t0: ~0 ns - request of data
t1: ~60 ns - return of the data minimum access type
t2: ~100 ns - refresh of the RAM cell cycle time
-------------------------------------
t.complete 100 ns

The time of refresh can be hidden to the processor using any type of
RAM interleaving:

Interleaving using 2 RAM cells:

R1 R2
t0: 0 ns - request of data nothing
t1: 60 ns - response of data request of data
t2: 100 ns - refresh response of data
----------------------------------
t3: request of data refresh
t4: response of data request of data
t5: refresh response of data
----------------------------------

The processor get a continous stream of data.

I believe, this is a very simple demo but I hope it shows the working
of access time an complete cycle time.

It's a very dirty english, I know. Do not flame please.

Klaus

--

Chris Quirke

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

isa...@canucks.ee.ubc.ca (Isaac Jing Herng Leung) wrote:
>Jim Brooks (jbr...@swbell.net) wrote:

>Guess what? Any 2nd year engineer or CompSci student knows this basic
>principle. hence the existence of cache.

>This isn't some overnight revelation or great leap forward in technology.
>(Sorry to burst your bubble there).

>But anybody with a clue about semi-conductor manufacture knows that the
>problem is COST.

>If I had 32MB worth of CACHE (SRAM), that's faster than 32MB of DRAM.
>The same on board the CPU itself is even better.

Consider this; EDO may run 15% faster, but if only 5% of RAM accesses
go direct rather than via cache, that's 15% of 5% = not a lot.

A lot of folks balked at paying the 10% premium for EDO because the
advantage was "not worth it", so how are you going to get them to pay
10 x the price for on-chip RAM, or 4 x the price for L2-cache-speed
RAM? And many would say "well, I'd rather have 64M of slower RAM than
16M of fast RAM, thanks" and all those silly RAM compression
technologies would pop out of the woodwork again.

The *last* thing we need is expensive, proprietary RAM, IMO.

Arnold Lee

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

all of your arguments are nice and all, but let's face it. iNTEL is
unbeatable in the IBM PC market. If your company has to catch UP to intel,
where you have to conform your CPUs to match the Pentiums otherwise
problems pop up (6x86 vs P5...), then you are too far behind.

It's nice to think that eventually Cyrix and AMD will be able to compete,
but i don't see it anytime soon. Socket 9 will be the wave of the future
and noone has production plants or the money that intel has stored up in
their war chest for the past decade.

too bad.

Paul Hsieh

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

Jim Brooks:
> >Erick Smith:

> > The problem is that other devices also have to access the memory.
> > For example, the DMA on your HD controller is going to need access
> > to the memory.

Clearly shows a lack of understanding of hardware. As Jim points out
Software is not the only avenue for flexibility. DMA may be different,
but there's no reason it would be impossible:

> Yes, but it is not a problem, really. On an ordinary PC,
> when the DMA requests an address from memory, the memory
> controller responds. But on an X86 Consortium system,
> the BIU of the CPU would respond to the memory request,
> if the SRAM of the CPU would have it.
>
> > Additionally, putting all the memory on the CPU wouldn't improve
> > performance that much anyway. Even with a 256K L2 cache you get a
> > ninty something percent hit rate.

A 640x480x256 color graphics screen is 302K in size. If Intel wants
to fantasize about their piddley processor doing Multimedia applications
they need more than a 256K L2 cache to even *cover* a back buffer.
After CPU clock rate, memory performance is the modern PCs major factor
towards performance. While Intel has been able to crank their processor
clock rate, they have done very little to improve memory
bandwidth. Each Triton gets better and better, but they are stuck
behind that 33 MHz PCI brick wall.

Modern application quite simply wont stay inside that 256K L2 cache
(no thanks in large part due to Win 95, and multitasking in general.)

> Wrong. There would be a significant performance boost,
> which would increase with clock speed, when compared
> to today's CPU/external L2 cache system.
>

> I agree, the average hit rate of a L2 chace is 90%.
> However, the CPU reads from a L2 cache at the speed
> of the CPU's bus, which for a 180Mhz Pentium, happens
> to be 60Mhz! So think of it like this: When there is
> a L1 miss and a L2 hit, the 180Mhz Pentium drops down
> to a third of its speed! This 1/3 speed reduction happens
> frequently because there are lots of misses for a
> 8K L1 cache. Performance goes to hell when there are
> misses in both L1 & L2, in which case the CPU effectively
> drops down to something like 15..20 Mhz!

But this is only the tip of the Iceberg. The cache is only useful
for sustaining reads. Today's multimedia based applications are
*write* oriented, and while cache can buy you a lot in deferred flushing
in parallel, that really only works if the size of your
output is not sustained (so that the cache has time to drain enough
to receive the next bunch of packets.) But that is contrary to the
very nature of the way 3D games and MPEG work.

It surprises me not in the least that employees at Microsoft do not
understand how important it is going to become to solve the basic
system architecture bottleneck of the slow interface to memory. Intel
realizes it and are trying to weasel people into believe AGP is the
ultimate solution. There's no question AGP is a good idea, but I am
not convinced that it is the *best* idea.

> So, tommorrow's CPU clone wars won't be decided by who has the
> most efficient microarchitecture, clock rate, best VLIW sheme, etc.
> The deciding factor will be who will be first to get the
> technology to integrate the processor and memory on a single IC.

I have to admit this is a very interesting architecture. But in the
final analysis, it is not convincing as the best final solution.
Typically, in any application, its blocks of memory are dedicated to
data structures that have a fixed number of functions that are
applied to it. So if a processor is living near by, it will tend to
sacrifice its flexibility by the fact that it would be doing the same
thing over and over again to its memory. If its memory was partitioned
to require numerous algorithms, the associated processor becomes
over-burdened by context/task switching.

My take on this is that simple clean architectures that work to
circumvent or work around inherent memory bottlenecks like, having
powerful processors associated with your memory mapped devices, such
as graphics accelerators. In this way compressed data and high level
commands originate from the host CPU, and get translated to multiple
complex low level instructions once it is received by the device
processor.

Where you get your biggest benefit of using a device processor over a
memory banked processor is that it can be more flexible in that it
talks to other devices rather than just memory, its not bound by the
trappings of "backward compatibility" (and hence use faster BUS
technology such as RAMBUS or firewire) and most importantly the
processor can be completely programmable.

Sound a little far fetched? Too good to be true? Well, it already
exists. Check it out:

http://www.mpact.com

--
Paul Hsieh
qed "at" chromatic "dot" com
http://www.geocities.com/SiliconValley/9498
Graphics Programmer
Chromatic Research

What I say and what my company says are not always the same thing

Paul Hsieh

unread,

Jan 23, 1997, 3:00:00 AM1/23/97

to

Tim Behrendsen wrote:
> Jim Brooks <jbr...@swbell.net> wrote in article

> > If you think about it, it is pretty dumb to put a 300Mhz CPU
> > in a system whose memory runs at 70ns (that speed was originally
> > intended for a 8Mhz IBM AT). I guess neither Intel nor Cyrix
> > have figure that one out yet.
>
> My God! The memory is a bottleneck??? STOP THE PRESSES!
> The answer is so simple! Why didn't anyone ever think of it?

> [...]

> Sheer brilliance! Only -- uh, one small question. How much
> difference is it going to make when most code executes out of
> the L2 cache? How much of a *throughput* difference are we
> really going to see?

Your problem is that you fundamentally dont understand what is wrong
with relying on the "cache" to give you fast memory. The L2 cache is
256K, but the average system these days has 16 or 32 MB of memory.
That is to say, the L2 cache only covers a 64th of your memory. And if
you believe old Win 3.x/DOS based statistics that say your code hangs
around the L2 cache for 90% of its data acesses, then you are wearing
blinders!

Win 95, Win NT, OS/2 and Linux are all pre-emptive multitasking (to
different degrees.) That means that the typical desktop situtation
has numerous windows open, so committing most of your memory is a
trivial matter. Then there's the applications themselves. Very few
big applications, that people actually use, have a code and data
footprint that is accumulated in a fixed set of only 256K. If 90% of
*MY* memory is cover by a tiny 256K cache then I am *really* under
utilizing my PC.

Caching solves a lot of performance problems, but it is hardly the
final solution for the ever changing PC. A different solution must
be found.

> And then there's that small question of, uh, cost. SRAM is a
> little pricy to put 32M on a chip with the CPU. Then there's that darn
> yield problem of putting all that kaka on a chip.

Yeah, SRAM is expensive. As far as putting crap on a chip, I've got
an URL at the end of this post I think you should check out.

> And still remains the question of having to trash the whole
> chip to put in some more memory. Have you noticed that 16meg
> of memory is forty bucks nowadays?

Hard to argue with this ...

> > Yet another reason for poor Andy Grovy to go paranoid ;-)
>
> Andy "Grovy" Grove is sleeping very well, I would think.

Actually I think not, thought I doubt he's lying awake thinking about
the X86 consortium. Andy Grove knows very well what a "Multimedia
Processor" is and I'm sure its giving him insomnia. If you don't know
what a "Multimedia Processor" is check out:

http://www.mpact.com/

> -- YAWN -- yet another consortium to laugh at. Remember the
> "RAM consortium" of a few years ago? Or shall we hold up
> the "HDTV consortium". Or how about <giggle> the "EISA
> consortium". Or CDE, or ...

Agreed. You just have to figure who the aggressive companies are that
take the lead and cause a paradigm shift in the industry.

Elmer Phud

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

In article <5c7toi$g...@news-central.tiac.net>, r...@sunspot.tiac.net says...

>
>In article <5c6svs$av4$2...@falcon.ns.net> #ep...@ns.net (Elmer Phud) writes:
>

> Jim the PPro runs its cache at full CPU frequency. The hit rate is 90%.
> Adding 32 meg of Sram is only going after the remaining 10%. Assuming
> 32meg of sram as you stated earlier (which I seriuosly doubt for reasons
> I will state later) that's increasing the cache size by 128x for at best
> a 10% boost. Does that make sense? I don't think so...
>

>Assuming that the 90% hit rate is true (which I think is dubious, at
>least in a heavily multitasked system running memory intensive stuff),
>your arithmetic is still wrong.
>
>Random memory access to 70 ns DRAM typically takes more like 110 ns;

>access to the PPro cache takes 5 ns. Thus, you can get over 20
>accesses to the PPro cache in the same time you can get one access to
>main memory. Even with a 90% hit rate, 2/3 of the memory access time
>is spent hitting main memory.
>
>This analysis is simplistic, of course; it doesn't take into account
>the fact that both fast page and EDO can do somewhat better on
>consecutive memory accesses (such as bursts to fill a cache line).
>--

I could mention that the P6 doesn't have to wait for a memory request to
complete before issuing another transaction, but that would just
complicate matters. The point is (obviously) that there isn't enough
fab capacity in the world to product the type of highly integerated
CPU/memory highbryd that started this thread. Go ahead and argue how
many angles can dance on the head of I pin. That wasn't a point I care
to defend.

El

Elmer Phud

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

In article <1997Jan2...@alpha.vitro.com>, vax...@alpha.vitro.com says...
>
>In article <5c81ri$ht8$1...@falcon.ns.net>, #ep...@ns.net (Elmer Phud) writes:
>> In article <1997Jan2...@alpha.vitro.com>, vax...@alpha.vitro.com
says...
>...
>>>Improving cache hit rate from 90% to 100% is a 10% increase in cache hit
>>>rate. This may or may not translate to a 10% increase in performance.
>>>
>>>In this case, the important metric is not cache miss percentage. It is
>>>cpu-idle-waiting-for-memory-reference percentage. The two numbers are
>>>not identical.
>...
>> This may be the all time "you missed the point".
>> Who cares about a couple of %? The point is 128x the silicon for very
>> little performance increase!!!.
>
>Hey, dufus with the invalid e-mail address:
>
>I didn't disagree with your claim that 128x the silicon isn't worth it
>for a minimal performance increase.
>
>I disagreed with your laughable assumption that a 10% reduction in
>cache miss rate is equivalent to a 10% improvement in performance.
>

I see no need to defend the cache issue, it's not the point. I accept
your analysis. This issue is economics, not cache hits.

Dufus? I assumed that anyone with half a brain could see the # was there
to stop junk email. I guess I should have made allowences for you too.

El

Robert Collins

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

In article <32E816...@chromatic.com>, Paul Hsieh <qed> wrote:
>Jim Brooks:

>
>A 640x480x256 color graphics screen is 302K in size. If Intel wants
>to fantasize about their piddley processor doing Multimedia applications
>they need more than a 256K L2 cache to even *cover* a back buffer.
>After CPU clock rate, memory performance is the modern PCs major factor
>towards performance. While Intel has been able to crank their processor
>clock rate, they have done very little to improve memory
>bandwidth. Each Triton gets better and better, but they are stuck
>behind that 33 MHz PCI brick wall.

I don't believe this is true. Intel recently announced that they have
begun sleeping with Rambus. I'm sure the Rambus deal was many years in
the making. This would seem to imply that Intel's been thinking about
this problem for a long time.

>But this is only the tip of the Iceberg. The cache is only useful
>for sustaining reads. Today's multimedia based applications are
>*write* oriented, and while cache can buy you a lot in deferred flushing
>in parallel, that really only works if the size of your
>output is not sustained (so that the cache has time to drain enough
>to receive the next bunch of packets.) But that is contrary to the
>very nature of the way 3D games and MPEG work.
>
>It surprises me not in the least that employees at Microsoft do not
>understand how important it is going to become to solve the basic
>system architecture bottleneck of the slow interface to memory. Intel
>realizes it and are trying to weasel people into believe AGP is the
>ultimate solution. There's no question AGP is a good idea, but I am
>not convinced that it is the *best* idea.

I think you're only loooking at this problem through Chromatic's
eyes. I say this, because I know what you're doing over there.
I don't think Intel's trying to weasel people into anything. Intel
constantly purues many different parallel approaches. The Rambus
deal is just another example. We've already heard rumors that Intel's
working on a 100 MHz system bus. I heard some decent info recently that
they're also working on a 200 MHz system bus (or something like it).
Intel is by no means sitting on their hands, while Chromatic develops
their media processor.

Speaking of Chromatic...there's been rumors that you guy's are working
on x86 too. Any comment?

--
"Intel Secrets -- What Intel doesn't want you to know"( ) Robert Collins
HomePage: http://www.x86.org Anonymous FTP: ftp://ftp.x86.org
-----------------------------------------------------------------------------
Robert Collins mailto:rcol...@x86.org 408.832.6270

pitest1

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

Uh, some would say Cyrix and AMD HAVE caught up to Intel in the
Pentium-class CPU market. And, BTW, Intel's products would be far slower
and costlier today if these companies did not provide a real alternative.
The FPU is the only area in which these companies' CPUs lag behind the
Pentium. At a given clock speed, they are measurably faster than a Pentium
in all other areas.

Arnold Lee <meat...@wam.umd.edu> wrote in article
<01bc097c$14a23260$af170880@obelisk>...

Laggard

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

In article <32E642...@swbell.net>, Jim Brooks <jbr...@swbell.net> says...

">
">> Additionally, putting all the memory on the CPU wouldn't improve
">> performance that much anyway. Even with a 256K L2 cache you get a ninty
">> something percent hit rate.
">

">Wrong. There would be a significant performance boost,
">which would increase with clock speed, when compared
">to today's CPU/external L2 cache system.
">
">I agree, the average hit rate of a L2 chace is 90%.
">However, the CPU reads from a L2 cache at the speed
">of the CPU's bus, which for a 180Mhz Pentium, happens
">to be 60Mhz! So think of it like this: When there is
":>a L1 miss and a L2 hit, the 180Mhz Pentium drops down
">to a third of its speed! This 1/3 speed reduction happens
">frequently because there are lots of misses for a
">8K L1 cache. Performance goes to hell when there are
">misses in both L1 & L2, in which case the CPU effectively
">drops down to something like 15..20 Mhz!
">

">So, tommorrow's CPU clone wars won't be decided by who has the
">most efficient microarchitecture, clock rate, best VLIW sheme, etc.
">The deciding factor will be who will be first to get the
">technology to integrate the processor and memory on a single IC.

Damn. I knew Intel's rising stock trend was headed for disaster.
Now's the time to cash out on my Intel stock.

Tim Behrendsen

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

Paul Hsieh <q...@chromatic.com> wrote in article <32E826...@chromatic.com>...

> Tim Behrendsen wrote:
> > Jim Brooks <jbr...@swbell.net> wrote in article
> > > If you think about it, it is pretty dumb to put a 300Mhz CPU
> > > in a system whose memory runs at 70ns (that speed was originally
> > > intended for a 8Mhz IBM AT). I guess neither Intel nor Cyrix
> > > have figure that one out yet.
> >
> > My God! The memory is a bottleneck??? STOP THE PRESSES!
> > The answer is so simple! Why didn't anyone ever think of it?
> > [...]
> > Sheer brilliance! Only -- uh, one small question. How much
> > difference is it going to make when most code executes out of
> > the L2 cache? How much of a *throughput* difference are we
> > really going to see?
>
> Your problem is that you fundamentally dont understand what is wrong
> with relying on the "cache" to give you fast memory. The L2 cache is
> 256K, but the average system these days has 16 or 32 MB of memory.
> That is to say, the L2 cache only covers a 64th of your memory. And if
> you believe old Win 3.x/DOS based statistics that say your code hangs
> around the L2 cache for 90% of its data acesses, then you are wearing
> blinders!

So what *is* the number then, for a typical Win/95 machine? I really
doubt that it's below 80%.

> Win 95, Win NT, OS/2 and Linux are all pre-emptive multitasking (to
> different degrees.) That means that the typical desktop situtation
> has numerous windows open, so committing most of your memory is a
> trivial matter. Then there's the applications themselves. Very few
> big applications, that people actually use, have a code and data
> footprint that is accumulated in a fixed set of only 256K. If 90% of
> *MY* memory is cover by a tiny 256K cache then I am *really* under
> utilizing my PC.

That's missing the point. The point of a cache is not to hold the
entire program within it, it's to keep cached the major loops. Most
of a program is necessary in the global scheme, but only a small
fraction is executed at any one time.

In any case, 512K caches are not uncommon, but as near as I can
recall, you start getting very diminishing returns beyond this.

> Caching solves a lot of performance problems, but it is hardly the
> final solution for the ever changing PC. A different solution must
> be found.

Indeed. Caching can only solve CPU bottlenecks. Depending on
the CPU for things better done in custom hardware is the general
problem.

> > > Yet another reason for poor Andy Grovy to go paranoid ;-)
> >
> > Andy "Grovy" Grove is sleeping very well, I would think.
>
> Actually I think not, thought I doubt he's lying awake thinking about
> the X86 consortium. Andy Grove knows very well what a "Multimedia
> Processor" is and I'm sure its giving him insomnia. If you don't know
> what a "Multimedia Processor" is check out:
>
> http://www.mpact.com/

I think this is *exactly* the solution. The real solution is to
have custom chips to perform low level grunt work based on high
level protocols. The new generation of 3D graphics cards are a
perfect example of this.

It will be interesting to see what Intel's role will be in all this.
I see MMX as an evolutionary step, and I think Intel does too. It
would not surprise me to see Intel start producing multimedia chips
that work in concert with their CPU chips.

Of course, I think we're all agreed that trying to put everything
on a *single* chip is a laughable concept.

Robert Krawitz

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

In article <5caobr$n...@slip.net> rcol...@slip.net (Robert Collins) writes:

I think you're only loooking at this problem through Chromatic's
eyes. I say this, because I know what you're doing over there.
I don't think Intel's trying to weasel people into anything. Intel
constantly purues many different parallel approaches. The Rambus
deal is just another example. We've already heard rumors that Intel's
working on a 100 MHz system bus. I heard some decent info recently that
they're also working on a 200 MHz system bus (or something like it).
Intel is by no means sitting on their hands, while Chromatic develops
their media processor.

That would certainly explain the move away from the onboard L2 cache,
if they won't need that to get decent performance from the L2 cache.

Alicia Carla Longstreet

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

David T. Wang wrote:

> : Who is the biggest dickhead on Usenet: Scott Nudds or Timothy Edwards?

> : Let's have a vote. I vote for Timbo.

I vote for the disgusting low life who came up with the subject for this
thread.

Alicia Carla Longstreet
There was a little girl,
Who had a little curl,
Right in the middle of her forehead.
When she was good...
she was very, very good,
but when she was bad...
she was Better!!!!

Jim Brooks

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

Elmer Phud wrote:

> [..] The point is (obviously) that there isn't enough

> fab capacity in the world to product the type of highly integerated
> CPU/memory highbryd that started this thread. Go ahead and argue how
> many angles can dance on the head of I pin. That wasn't a point I care

> to defend. [..]

Really? Want some proof? Well, here it is, an excerpt from
http://www.news.com/News/Item/0,4,7324,00.html

"Looking further into the future of TI's strategy, the
company expects to begin production in 1997 of a
single chip that has 125 million transistors on it,
which is approximately the number of transistors
found in 30 Pentium Pro processors. "

With 125M transistors, an X86 core and 16MB SRAM can be put on a single die.
Finally, Moore's law has been broken. But, hell, laws are made to be broken,
right?

Oh, another thing. TI-Japan is part of the X86 Consortium.
Pleasant dreams, Andy Grovy of Wintel, or should I say, R.I.P.

Paul Mesken

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to

>> Yes, but it is not a problem, really. On an ordinary PC,
>> when the DMA requests an address from memory, the memory
>> controller responds. But on an X86 Consortium system,
>> the BIU of the CPU would respond to the memory request,
>> if the SRAM of the CPU would have it.
>>
>> > Additionally, putting all the memory on the CPU wouldn't improve
>> > performance that much anyway. Even with a 256K L2 cache you get a
>> > ninty something percent hit rate.
>
>A 640x480x256 color graphics screen is 302K in size. If Intel wants
>to fantasize about their piddley processor doing Multimedia applications
>they need more than a 256K L2 cache to even *cover* a back buffer.
>After CPU clock rate, memory performance is the modern PCs major factor
>towards performance. While Intel has been able to crank their processor
>clock rate, they have done very little to improve memory
>bandwidth. Each Triton gets better and better, but they are stuck
>behind that 33 MHz PCI brick wall.
>

Hrmph, guess 266 MB/s is too slow for your applications (yeah, okay
only at 64 bits and burst mode).

Besides: graphics should be done with the dual ported memory chips
(real vram, wram). Typically the crtc (or the fancy names they use for
it nowadays) usurps the most of the bandwidth, you won't get around
this using just SRam (perhaps in combination with some latch register
it could prove somewhat faster).

>Modern application quite simply wont stay inside that 256K L2 cache
>(no thanks in large part due to Win 95, and multitasking in general.)
>

[looking if this is NOT crossposted to c.l.c/c++]
You're quite right, although the nice concept of dll's should make
programs smaller (if several are running at the same time with the
same dll's) we see modern apllications taking enormous amounts of
memory (mostly to suprise us with exception #13).

Sigh... If we would still program in Assembly this wouldn't be a
problem, it's all because of that so called wizards supplied with
c/c++ allowing the "programmer" to make programs by mere clicking
(YES! I'm overexagerating).

Perhaps Microsoft should start programming for the 486, Pentium or Pro
(and what comes next!). Especially the pentium and pro need a total
different approach regarding the 386 but still the OS-es MicroSoft
produces are targetting the 386 (perhaps some extension for particular
cpu's, MS OS-es already hang together on dll's and such).

The hardware nowadays is sufficient (well okay, we still have chips
running at 4.77 Mhz even in a full blown Compaq Deskpro XL 6200).
Solutions should be sought in the software instead of boosting the
hardware even more. Ofcourse this is economically uninteresting (just
as selling a bug free OS :-) )

Special consideration should be given to the graphics hardware since
this is taking the most of the system. HAL's are some improvement but
the best result is still got when directly accesing the hardware.
Ofcourse this approach is totally unportable (wouldn't it be nice to
have the good old homecomputer back? A hardware standard).

>My take on this is that simple clean architectures that work to
>circumvent or work around inherent memory bottlenecks like, having
>powerful processors associated with your memory mapped devices, such
>as graphics accelerators. In this way compressed data and high level
>commands originate from the host CPU, and get translated to multiple
>complex low level instructions once it is received by the device
>processor.
>

Yep, and these so called accelerators are already here but the OS-es
still don't utilize them. Instead OS-es focussed on hardware
independancy (YUK!) with lots of overhead. It really is the software's
fault you know :->

In search of Californian Girls Generating B Splines (CGGBS)

usu...@euronet.nl Paul Mesken aka Technocrate

Bill Hill

unread,

Jan 24, 1997, 3:00:00 AM1/24/97

to jbr...@swbell.net

Jim Brooks wrote:
>
> If you think about it, it is pretty dumb to put a 300Mhz CPU
> in a system whose memory runs at 70ns (that speed was originally
> intended for a 8Mhz IBM AT). I guess neither Intel nor Cyrix
> have figure that one out yet.
>

> But the X86 Consortium has. Guess what they plan to do?
> Heck, they are going to do what is obvious, they are going
> to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
> on one single die. Fuck the L2 cache, the memory now runs as
> fast as the CPU. Yep, when you want to upgrade your system,
> you upgrade to both a faster processor and more memory.
>
> Go read the news at:
> http://www.news.com/News/Item/0,4,7211,00.html

>
> Yet another reason for poor Andy Grovy to go paranoid ;-)
>

> --
>
> | | Jim Brooks
> | _ | jbr...@swbell.net
> ______________|_(_)_|______________ PGP public key available
> +|+ [ ( o ) ] +|+
> * O[_]---[_]O *

I forgot... Both the s--- and the f--- words (and all of the other low,
vulgar vocabulary) are out of place, unnecessary and offensive in a
public forum. Didn't your mama teach you any manners???
.B.

Arnold Lee

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

> Uh, some would say Cyrix and AMD HAVE caught up to Intel in the
> Pentium-class CPU market. And, BTW, Intel's products would be far slower
> and costlier today if these companies did not provide a real alternative.

> The FPU is the only area in which these companies' CPUs lag behind the
> Pentium. At a given clock speed, they are measurably faster than a
Pentium
> in all other areas.

i don't disagree about the fact that without competition, the CPU market
today would be quite boring, but to say that Cyrix and AMD have caught up
is laughable. Listen, I am not partial to iNTEL. I personally hate
monopolies. However, in MY benchmark tests, an intel P150 outperforms the
P166+ (however, the P150 is on the 75 mhz bus speed and the 166+ is on the
66 mhz bus.). Intel also has the P166 and P200, both of which can be
overclocked to as far as 225 mhz, and both of which provide solid 16 and 32
bit power far above even the 6x86 200+.
Forget that though, and look at how AMD and Cyrix have to follow and Intel
leads. MMX, although it is in my opinion a waste of silicon, was first
brought out by intel. AMD and cyrix have to play the catch up game and
hope that when they bring out their CPUs that the MMX market isn't finished
upgrading their cpus to intel.
Again, it's real nice to believe that they will catch up rapidly and mabye
begin to hurt intel in time, but since intel produces the motherboard
chipsets, and comes out with breakthroughs like AGP, even if Cyrix or AMD
begins to annoy them, intel can focus more on the chipsets. Intel was the
first to release the high end pentium class chips, and for some time to
come, will always be the first to release the fastest chips.
sigh. iNTEL is really too much of a juggernaut. it's really just too big
to stop.

David T. Wang

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

Alicia Carla Longstreet (ca...@ici.net) wrote:
: David T. Wang wrote:

: > : Who is the biggest dickhead on Usenet: Scott Nudds or Timothy Edwards?
: > : Let's have a vote. I vote for Timbo.

: I vote for the disgusting low life who came up with the subject for this
: thread.

Pardon me Madam, but I didn't write that above which you have attributed
to me. Futhermore, I posted to point out that "several posters" seem
to be posting fom the same account, and may in fact, be the same person.
Please keep your attributions straight in the future. Thank you.

: Alicia Carla Longstreet

Alicia Carla Longstreet

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

Alicia Carla Longstreet wrote:

David T. Wang didn't write anything offensive: I goofed.

Timothy Edwards

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

Alicia Carla Longstreet wrote:
>
>
> > : Who has the biggest dickhead on Usenet: Scott Nudds or Timothy Edwards?

> > : Let's have a vote. I vote for Timbo.
>
> I vote for the disgusting low life who came up with the subject for this
> thread.
>

That suprises me. I always believed that female programmers liked all
sorts of dickheads, regardless of size, color, race, HIV polarity, and hygiene.

Sorry, I just could not resist such a cheap shot ;-)

--
Timothy

Matthew Cameron Peckham

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

>With 125M transistors, an X86 core and 16MB SRAM can be put on a single die.
>Finally, Moore's law has been broken. But, hell, laws are made to be broken,
>right?

And they said the PowerPC chip would come out of the stocks and chomp
Intel into little bitty bits, too. Hell, I'm sure Intel could easily
create something just like the above mentioned chip specs--they're
just smarter because they realize a chip of this magnitude would be
absolutely _impossible_ to market to the mass public. IMPOSSIBLE. If
a Pentium Pro with 512K cache costs over $700 more than one with 256k,
even conservative estimates would put such a chip in the tens of
thousands of dollars range.

It's a neat idea, but there's a thing called market timing, and this
chip doesn't have it, period. It's the kind of thing you'll see a
company like the one I work for (Union Pacific) pick up when it comes
out to play around with: (a) because it might be of value to the
highest-end businesses out there and (b) they could afford it.

In the meantime, it makes for some interesting discussion as to what
the mass-market future will be like much further down the road...

-Matt

David T. Wang

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

Timothy Edwards (ti...@utdallas.edu) wrote:

C'mon Jimbo. haven't had enough?

: --
: Timothy

David T. Wang

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

Jim Brooks (jbr...@swbell.net) wrote:
: Elmer Phud wrote:

: > [..] The point is (obviously) that there isn't enough
: > fab capacity in the world to product the type of highly integerated
: > CPU/memory highbryd that started this thread. Go ahead and argue how
: > many angles can dance on the head of I pin. That wasn't a point I care
: > to defend. [..]

: Really? Want some proof? Well, here it is, an excerpt from
: http://www.news.com/News/Item/0,4,7324,00.html

: "Looking further into the future of TI's strategy, the
: company expects to begin production in 1997 of a
: single chip that has 125 million transistors on it,
: which is approximately the number of transistors
: found in 30 Pentium Pro processors. "

: With 125M transistors, an X86 core and 16MB SRAM can be put on a single die.

: Finally, Moore's law has been broken. But, hell, laws are made to be broken,
: right?

PPro's L2 used ~31Million transistors for the 512K version. 128M transistors
would get you a PPro with 2 Megs of L2.

: Oh, another thing. TI-Japan is part of the X86 Consortium.

Mr J R Hall

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

In article <5calil$7vo$2...@falcon.ns.net>,

#ep...@ns.net (Elmer Phud) writes:
>
>Dufus? I assumed that anyone with half a brain could see the # was there
>to stop junk email. I guess I should have made allowences for you too.
>

If I want to reply to a news posting to the original sender only, I
have to use my newsreader to do this. I press 'R', and it copies the
article into an editor, and then when I've finished it sends it to the
original poster.

At no time during this process do I get a chance to alter the email
address that I am sending it to.

I can see your reasons for using the '#', but it can be a little annoying
for the rest of us.

--
Julian R Hall preferred --->> ju...@dcs.warwick.ac.uk
http://www.dcs.warwick.ac.uk/~jules/ cs...@csv.warwick.ac.uk
THE NETWIDE ASSEMBLER PROJECT: http://www.dcs.warwick.ac.uk/~jules/nasm1.html
-- You may say I'm a dreamer, but I'm not the only one. --

Elmer Phud

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

In article <32E95A...@swbell.net>, jbr...@swbell.net says...

>
>Elmer Phud wrote:
>
>> [..] The point is (obviously) that there isn't enough
>> fab capacity in the world to product the type of highly integerated
>> CPU/memory highbryd that started this thread. Go ahead and argue how
>> many angles can dance on the head of I pin. That wasn't a point I care
>> to defend. [..]
>
>Really? Want some proof? Well, here it is, an excerpt from
>http://www.news.com/News/Item/0,4,7324,00.html
>
>"Looking further into the future of TI's strategy, the
> company expects to begin production in 1997 of a
> single chip that has 125 million transistors on it,
> which is approximately the number of transistors
> found in 30 Pentium Pro processors. "
>
>With 125M transistors, an X86 core and 16MB SRAM can be put on a single die.
>Finally, Moore's law has been broken. But, hell, laws are made to be broken,
>right?
>

>Oh, another thing. TI-Japan is part of the X86 Consortium.
>Pleasant dreams, Andy Grovy of Wintel, or should I say, R.I.P.

I'm getting tired of saying that people have missed the point.
I stated that there isn't enough fab capacity in the world to produce
the number of processors Intel is currently shipping if you are
replacing them with silicon that uses upwards of 100 million transistors.
You think you have disproven that statement because TI claims to be
able to produce devices that use 125 million transistors. That's fine
but you missed the point. Can they produce 75 million die, each with
125 million transistors? And can they do it at a price people are
willing to pay??? That capacity doesn't exist.

El

Tim Behrendsen

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

Jim Brooks <jbr...@swbell.net> wrote in article

<32E95A...@swbell.net>...

> Elmer Phud wrote:
>
> > [..] The point is (obviously) that there isn't enough
> > fab capacity in the world to product the type of highly integerated
> > CPU/memory highbryd that started this thread. Go ahead and argue how
> > many angles can dance on the head of I pin. That wasn't a point I care
> > to defend. [..]
>
> Really? Want some proof? Well, here it is, an excerpt from
> http://www.news.com/News/Item/0,4,7324,00.html
>
> "Looking further into the future of TI's strategy, the
> company expects to begin production in 1997 of a
> single chip that has 125 million transistors on it,
> which is approximately the number of transistors
> found in 30 Pentium Pro processors. "
>
> With 125M transistors, an X86 core and 16MB SRAM can be put on a single
die.
> Finally, Moore's law has been broken. But, hell, laws are made to be
broken,
> right?

Uh, except 16 meg is PUNY!! A standard PC is sold with at least
32 meg nowadays. 16 meg of memory is FORTY DOLLARS!

> Oh, another thing. TI-Japan is part of the X86 Consortium.
> Pleasant dreams, Andy Grovy of Wintel, or should I say, R.I.P.

Oh, another thing. How much is this 125 million transister thing
gonna COST? A helluva lot more than a PentiumPro with L2 cache and
forty dollar memory.

This is the biggest laugh I've had in some time.

--
=========================================================================|

Paul Hsieh

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

Paul Mesken wrote:
> >A 640x480x256 color graphics screen is 302K in size. If Intel wants
> >to fantasize about their piddley processor doing Multimedia
> >applications they need more than a 256K L2 cache to even *cover* a
> >back buffer. After CPU clock rate, memory performance is the modern

> >PCs major factor towards performance. [...]

> >
> Hrmph, guess 266 MB/s is too slow for your applications (yeah, okay
> only at 64 bits and burst mode).

If all I was doing with the back buffer was copying it, then I would
agree with you. But guess what, serious applications actually try to
*put things* in their back buffers, like triangles or sprites, or other
such things. 266MB/s, first of all, does not exist in any x86 I've
seen to date. The best I've seen so far is a P5-200 MMX, was able to
get 80MB/s using a FPU based 64 bit copy routine (I suppose that figure
goes up slightly if the MMX instructions are used.)

> Besides: graphics should be done with the dual ported memory chips
> (real vram, wram). Typically the crtc (or the fancy names they use for
> it nowadays) usurps the most of the bandwidth, you won't get around
> this using just SRam (perhaps in combination with some latch register
> it could prove somewhat faster).

You may understand interesting graphics technologies, but you do not
understand the true performance considerations. You need some sort of
an accelerator that can take buffered commands, and can interface its
local RAM at an extremely high bandwidth. Considerations of RAM type
are secondary to the BUS interface used.

> >Modern application quite simply wont stay inside that 256K L2 cache
> >(no thanks in large part due to Win 95, and multitasking in general.)
>
> [looking if this is NOT crossposted to c.l.c/c++]
> You're quite right, although the nice concept of dll's should make
> programs smaller (if several are running at the same time with the
> same dll's) we see modern apllications taking enormous amounts of
> memory (mostly to suprise us with exception #13).

Uh ... DLL's only solve the problem *slightly*. Most of the time you
only need to use part of a commonly shared DLL anyways, so you lose
by virtue of not being able to page in an optimal subset of the DLL
that your applications want to use. But this is entirely besides the
point. The direction of modern OS's is towards pre-emptive multitasking
and larger, more complicated algorithms (like Netscape/IE having plug-
ins out the wazzoo.)

> Sigh... If we would still program in Assembly this wouldn't be a
> problem,

This is not true. If we were able to halve or even quarter the size
of all of our programs, we might be happier for a time, but then
programmers would just throw more complicated and exciting new
algorithms at the OS (since they would have so much more space to
maneuver in.) The brick wall is still there, even if you are only
running at half the speed towards it.

> [...] it's all because of that so called wizards supplied with

> c/c++ allowing the "programmer" to make programs by mere clicking
> (YES! I'm overexagerating).
>

> [...]

>
> Perhaps Microsoft should start programming for the 486, Pentium or Pro
> (and what comes next!). Especially the pentium and pro need a total
> different approach regarding the 386 but still the OS-es MicroSoft
> produces are targetting the 386 (perhaps some extension for particular
> cpu's, MS OS-es already hang together on dll's and such).

Well, MFC, browser plug-ins and the current tendancy towards programmer
glutonny is truly a digusting sight to behold (thanks Microsoft!) but
that is not the real problem.

> The hardware nowadays is sufficient (well okay, we still have chips
> running at 4.77 Mhz even in a full blown Compaq Deskpro XL 6200).
> Solutions should be sought in the software instead of boosting the
> hardware even more. Ofcourse this is economically uninteresting (just
> as selling a bug free OS :-) )

This is what I dispute. While software improvements are obviously a
good idea, hardware improvements would be far more useful. With
hardware solutions solving the fundamental problem, even bad software
would run well, and good software would run amazingly!

> Special consideration should be given to the graphics hardware since
> this is taking the most of the system. HAL's are some improvement but
> the best result is still got when directly accesing the hardware.

> Of course this approach is totally unportable (wouldn't it be nice to

> have the good old homecomputer back? A hardware standard).

This is a software issue. Here something in the OpenGL or Direct X
line ought to deal with "portability".

> >My take on this is that simple clean architectures that work to
> >circumvent or work around inherent memory bottlenecks like, having
> >powerful processors associated with your memory mapped devices, such
> >as graphics accelerators. In this way compressed data and high level
> >commands originate from the host CPU, and get translated to multiple
> >complex low level instructions once it is received by the device
> >processor.
>
> Yep, and these so called accelerators are already here but the OS-es
> still don't utilize them. Instead OS-es focussed on hardware
> independancy (YUK!) with lots of overhead. It really is the software's
> fault you know :->

:o) Well, stability and artificial bottlenecks are due to poorly
designed software. I will agree with you there. But maxed out
performance of properly designed software (something my company is a
little more fammiliar with) still hits the hardware brick wall.

> In search of Californian Girls Generating B Splines (CGGBS)

Hee Hee Hee! :o) Maybe you should look into Irrational-B-Splines.

[... ducking feminazi bazookas, and running like the wind ...]

Paul Hsieh

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

First let me state that my views do *NOT* represent that of my company.
If you wish to see my company's statements, please go to:

http://www.chromatic.com/

(My company is not nearly so anti-Intel as I am personally, :o) )

Robert Collins wrote:
> Paul Hsieh wrote:
> >Jim Brooks:

> >A 640x480x256 color graphics screen is 302K in size. If Intel wants
> >to fantasize about their piddley processor doing Multimedia
> >applications they need more than a 256K L2 cache to even *cover* a
> >back buffer. After CPU clock rate, memory performance is the modern
> >PCs major factor towards performance. While Intel has been able to
> >crank their processor clock rate, they have done very little to
> >improve memory bandwidth. Each Triton gets better and better, but
> >they are stuck behind that 33 MHz PCI brick wall.
>

> I don't believe this is true.

Can you be more precise as to what exactly you think I said that is
actually untrue?

> [...] Intel recently announced that they have begun sleeping with

> Rambus. I'm sure the Rambus deal was many years in the making.

Yeah, well how does that explain that they are several years behind
"Chromatic Research" (my employer) who has been behind Rambus since
day one? How exactly can Intel have a deal with Rambus for several
years, if Rambus's technology keeps changing and Intel doesn't make
any parts with it? Its sort of like walking onto a moving freight-
train from a standstill position isn't it?

Of course we all know about Intel's Rambus announcement. And we all
know about AGP and Klamath and all that nonsense. But its not all
going to be here until some time in 1998. Has Intel even announced
an actual design using Rambus? I thought not.

The fact is, anyone writing any serious performance benchmarks will
quickly come to the conclusion that the biggest bottleneck in the
entire PC is memory bandwidth. Intel's very own, idiotic "Indeo"
video standard will run the same on my P5-166 as it will on 16
Klamaths running at 500 Mhz each. Why? Because the whole thing is PCI
memory bandwidth limited. All multimedia, and modern day graphical
applications are.

(On the other hand, a more sound standard like MPEG can be accelerated
to HDTV quality and above with a simple technology like MPACT whose
which is designed to minimize the painfully slow PCI bandwith while
leveraging RAMBUS's significantly higher bandwidths.)

> This would seem to imply that Intel's been thinking about this problem
> for a long time.

If they think about it any longer and Rambus will lose patience with
them and leave them behind. Rambus is *not* new technology. Nintendo
has been using it for years. (Rambus is about 4 times faster than PCI
at their peak transfer rates, and has fewer restrictions on byte
select access, as well as other architectural advantages.)

> >But this is only the tip of the Iceberg. The cache is only useful
> >for sustaining reads. Today's multimedia based applications are
> >*write* oriented, and while cache can buy you a lot in deferred
> >flushing in parallel, that really only works if the size of your
> >output is not sustained (so that the cache has time to drain enough
> >to receive the next bunch of packets.) But that is contrary to the
> >very nature of the way 3D games and MPEG work.
> >
> >It surprises me not in the least that employees at Microsoft do not
> >understand how important it is going to become to solve the basic
> >system architecture bottleneck of the slow interface to memory.
> >Intel realizes it and are trying to weasel people into believe AGP is
> >the ultimate solution. There's no question AGP is a good idea, but I
> >am not convinced that it is the *best* idea.
>

> I think you're only loooking at this problem through Chromatic's
> eyes. I say this, because I know what you're doing over there.

:o) I'd be interested in your estimation of what we are doing over
here. You are right in that we are focused primarily in Multimedia,
but that is because that is precisely where the Intel CPU is weakest.
We are going to hit Intel where it hurts the worst and I have a high
confidence that our company will be successful doing it.

The focus on high memory bandwidth requirements is not a narrow one.
The evolution of software in general tends them towards just simply
get bigger and bigger, handling larger and more complicated amounts of
data. The most obvious and concrete case of this is Windows itself.
It preemptively multitasks a potentially (and typically!) large number
of your favorite applications, switching back and forth between them
at a very high rate. And you think you can hold it all (even just the
inner loops) in a 256K L2 Cache? No matter what you have to use real
memory bandwidth for modern application; the cache wont save you and
so long as Intel keeps improvements in memory bandwidth technology
crawling along at a snails pace, applications will keep seeing the
memory bandwidth limitations as a brick wall.

If Intel had gone with higher performance memory bandwidth BUS designs
much earlier, our company's MPACT product would be far less viable
(since the difference of software and hardware implementations would be
far less dramatic.) Then you could actually use the Pentiums cycles to
actually do useful valuable work. But MPACT will be a viable product
for a *very* long time precisely because of Intel's sloth-like pace in
addressing the problems with their memory architecture.

> I don't think Intel's trying to weasel people into anything.

Then you have not been paying too close attention. Intel has been
quitely improving Triton, which has had as much to do with their ever
increasing performance levels as their CPU clock rates, but they have
not been making announcements about that. Yet they shell out $25 to
try to convince you that MMX is actually good for something (its
primarily good for its expanded cache, better pairability of
instructions and incorporation of a better interface to their
Triton chipset.)

> [...] Intel constantly purues many different parallel approaches.

> The Rambus deal is just another example. We've already heard rumors
> that Intel's working on a 100 MHz system bus.

That would make it only 33% slower than MPACT's current implementation
of RAMBUS (we have a complete product, BTW, our design was done two
years ago.) Hmm ... and by the time they ship their supposed 100 Mhz
BUS we will have moved onto much higher bandwidths. Their rumors and
hype may be able to compete with us, but their technology certainly
cannot.

> I heard some decent info recently that they're also working on a 200
> MHz system bus (or something like it).

Well here's some decent info for you, we are working on dual spine
RAMBUS (essentially doubling our throughput) for a future product (so
if my math is right, Intel's 200 Mhz bus will still be 33% slower, and
likely ship much later ...) Our decent info was publically desseminated
at the last Microprocessor forum (I can't remember the precise name of
the conference, but its the *big* one.)

> Intel is by no means sitting on their hands, while Chromatic develops
> their media processor.

No, but I doubt that they can run as fast as we can. Our first
generation will blow them away, they wont believe their eyes when they
see our second generation stuff.

> Speaking of Chromatic...there's been rumors that you guy's are working
> on x86 too.

You mean to ask: are we psychos setting us up for failure like AMD,
Cyrix and Motorola? Now, I see what the quality of your 'rumors' are.
No, we intend to be a successful company that sells into the lucrative
business of "technology that other companies dont have to balls or
brawn to solve properly." Multimedia is simply the most obvious hole
in the entire x86-PCI bus technology, if the x86 itself were a problem
I would think that Cyrix and AMD would be able be a little more
successful than they are.

Paul Hsieh

unread,

Jan 25, 1997, 3:00:00 AM1/25/97

to

Again, I must make it very clear that my view do not neccesarily
coincide with that of my company's. See:

http://www.chromatic.com

for my employer's opinion.

t...@a-sis.com says...

> Paul Hsieh wrote:
> > Your problem is that you fundamentally dont understand what is wrong

> > with relying on the "cache" to give you fast memory. [...] if

> > you believe old Win 3.x/DOS based statistics that say your code hangs
> > around the L2 cache for 90% of its data acesses, then you are wearing
> > blinders!
>
> So what *is* the number then, for a typical Win/95 machine? I really
> doubt that it's below 80%.

80%?? You're out of your nut. I have a 32MB system, I have 5
applications loaded (thats pretty low for me) and Windows is using 50.9MB
out of 70.1MB of virtual memory (I don't have a lot of free disk space.)
Two of the Windows are Netscape (a bizarrely popular application), one is
an empty 50 line DOS Box, one is my newsreader and one is the reply
window I am writing this message in. (There are obviously other tasks
running in the system including a memory monitor program which does not
have a task bar entry.)

Now, if even I *were* hitting a rate of 80% hits in the L2 cache that
would imply that 0.4% of all the code currently resident in memory is
actually active at once. And if the machine is idle (that is if I am
not doing anything and hence dont care how fast my machine is going)
then I could believe such a statistic. But I am obviously more
interested in cases where my PC is very active. This would mean more
Windows, more dynamic applications (such as Active X, a video game, a
compile, or some other such nonsense) using even more memory and more
percentage of it for inner loops.

In real world situtations, over the course of a single second of
relevant operation, I would be surprised if the L2 cache retains 20%
of your inner loop code.

> > Win 95, Win NT, OS/2 and Linux are all pre-emptive multitasking (to
> > different degrees.) That means that the typical desktop situtation
> > has numerous windows open, so committing most of your memory is a
> > trivial matter. Then there's the applications themselves. Very few
> > big applications, that people actually use, have a code and data
> > footprint that is accumulated in a fixed set of only 256K. If 90% of
> > *MY* memory is cover by a tiny 256K cache then I am *really* under
> > utilizing my PC.
>
> That's missing the point. The point of a cache is not to hold the
> entire program within it, it's to keep cached the major loops. Most
> of a program is necessary in the global scheme, but only a small
> fraction is executed at any one time.

Like what kind of a small fraction? Like less than 0.4%? I am not
missing the point. The cache is only good for small jobs and for
sustained simple inner loops of singular tasks (like stupid benchmarks.)
In the real world of complicated and multiple concurrent tasks, the
memory utilization is far more intensive. Remember that before disk
swapping, comes L2 cache swapping. If you disk swap on a regular basis,
then you know that you are L2 cache swapping (thrashing) on an even more
regular basis (this is not as obvious as disk swapping, except as a
total system performance drain.)

> In any case, 512K caches are not uncommon, but as near as I can
> recall, you start getting very diminishing returns beyond this.

Yes, the diminishing returns comes from the fact that its impossible
to predict where the next memory packets are coming from next as
opposed to being wildly over successful at what its trying doing.

> > Caching solves a lot of performance problems, but it is hardly the
> > final solution for the ever changing PC. A different solution must
> > be found.
>
> Indeed. Caching can only solve CPU bottlenecks. Depending on
> the CPU for things better done in custom hardware is the general
> problem.

Well, not all data moving is appropriate for custom hardware. But
certainly a lot of it is.

> > > Andy "Grovy" Grove is sleeping very well, I would think.
> >
> > Actually I think not, thought I doubt he's lying awake thinking about
> > the X86 consortium. Andy Grove knows very well what a "Multimedia
> > Processor" is and I'm sure its giving him insomnia. If you don't know
> > what a "Multimedia Processor" is check out:
> >
> > http://www.mpact.com/
>
> I think this is *exactly* the solution.

Glad you think so.

> [...] The real solution is to

> have custom chips to perform low level grunt work based on high
> level protocols. The new generation of 3D graphics cards are a
> perfect example of this.

Uh ... well I don't exactly support your reasoning, but your
conclusion is correct. The x86/PCI architecture just isn't well
suited for many tasks, such as multimedia, and it doesnt look like
Intel will be dealing with their short comings any time soon. So
that leaves open the fruitful ASIC/DSP/Coprocessor market for
assisting the poor over burdoned x86.

> It will be interesting to see what Intel's role will be in all this.

They make the main CPU, and they make a pretty good one. I think
their role is clear. They are trying to take back the multimedia
market, but its way too late, and they dont have any satisfactory
answers. They are the host CPU, and the Memory Chipset (they also
make most of the PC mother boards, but that's a separate issue.)

> I see MMX as an evolutionary step, and I think Intel does too.

Then their marketting is working all too well. MMX is just a
marketting label so that they can avoid calling their new
processors "P7" or the "Heptium". Think about it, the new MMX
processors are basically just a reworked instruction set, better
pipelining, better interface to the Triton chipset and larger
cache. And the results are that the total system performance
goes up (between 10 and 40 percent.) That's exactly what the
Pentium gave you over ther 486. The MMX processors are just
another stepping of the x86 line, with fewer architectural
changes.

MMX is just the next in the line of host processors that Intel
has always been delivering. It doesn't specifically solve any
particular problem (certainly not multimedia) besides being
faster than their previous generation as has always been the
case (though P6 performance improvments are debatable.)

> It would not surprise me to see Intel start producing multimedia
> chips that work in concert with their CPU chips.

Well, it would surprise me. They are working very closely with
Lockheed-Martin on their Real 3D chip. So it seems to me they
will be endorsing Real 3D to work with their AGP solution, but
I don't see how they are going to deal with the whole multimedia
problem. It would seem to be that they would be working a lot
harder on other fundamental performance problems, namely the
PCI memory bottleneck.

> Of course, I think we're all agreed that trying to put everything
> on a *single* chip is a laughable concept.

Only in that it makes little realistic sense. Intel doesnt have
a handle on multimedia (despite their hype), and multimedia
companies cant just go spin an x86 core overnight and expect good
technological results or market acceptance. Setting the dividing
line between multimedia and host processor makes perfect sense,
because in each case, they are being solved in efficient ways by
bodies who are expert at doing precisely that.

Roogly-Boogly

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

On Wed, 22 Jan 1997 16:05:51 -0500, Timothy Edwards
<ti...@utdallas.edu> wrote:

>Maynard Handley wrote:
>> < In article <32E562...@utdallas.edu>, Timothy Edwards
>< <ti...@utdallas.edu> wrote:
><
>< > > This isn't some overnight revelation or great leap forward in technology.
>< > > (Sorry to burst your bubble there).
>< > > But anybody with a clue about semi-conductor manufacture knows that the
>< > > problem is COST.
>< >
>< > But it is a technology leap. That X86 consortium evidently has an
>< > economically viable high density CMOS technology, otherwise they would
>< > not announce their product.
><
>< Obviously they must. FUD is something unheard of in the computer industry.
>< How old are you? 12?
>< --
><> My opinion only
>> < Maynard
><
>: Subject:
>: Re: !! Intel goes APE-SHIT after hearing about the X86 Consortium !!
>: Date:
>: Wed, 22 Jan 1997 13:23:21 -0800
>: From:
>: hand...@apple.com (Maynard Handley)
>: Organization:
>: Apple Computer
>
>
>Hey, Maynard, what's your problem? Did get panties get in a wad
>after Apple posted a lost of $120M? Or maybe after you heard
>that Steve Jobs is back in charge now, and rumor has it that
>he is going to clean house and get rid of the dead weight at Apple
>(oops, guess you don't have much job security there in Apple),
>in a vain attempt to save the rotten Apple from Wintel.
>
>Concerning my age, I am old enough to know that:
>
>1. It is dumb to buy a PowerMac.
>2. It is dumb to buy stock in Apple.
>3. It is dumb to send my resume to Apple when I get my MSCS.
>

AHHhhh . . . Therein lies the rub. MS_CS_, not an MS_EE_. No wonder
you're so gullible.

j.r. pierce [see .sig]

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

rcol...@slip.net (Robert Collins) wrote:
>I don't believe this is true. Intel recently announced that they have

>begun sleeping with Rambus. I'm sure the Rambus deal was many years in

>the making. This would seem to imply that Intel's been thinking about

>this problem for a long time.

All indications are that Intel's interest in Rambus is for the Merced
P7 generation and beyond. Short term (1997, into 98) will be
migrating to SDRAM

Certainly, a rambus based x86 architecture system would require
extensive L2/L1 caches to achieve decent 'real world' performance as
current Rambus's random access speeds are quite poor compared with its
very high performance burst characteristics. However a VLIW 64bit P7
native mode may be more amendable towards rambus.

Also, the articles I've seen imply a new faster 800Mhz RDRAM with a
16bit bus to achieve a 1.6GB/sec burst rate (current rambus peaks at
633MHz or so, while 500MHz is 'standard').

-jrp
--------------------------------------------------------------------
This posting has a invalid email address to discourage bulk emailers
if you need to mail me try "pierce at hogranch dot com"
--------------------------------------------------------------------

Paul Mesken

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

>Paul Mesken wrote:
>> >A 640x480x256 color graphics screen is 302K in size. If Intel wants
>> >to fantasize about their piddley processor doing Multimedia
>> >applications they need more than a 256K L2 cache to even *cover* a

>> Besides: graphics should be done with the dual ported memory chips
>> (real vram, wram). Typically the crtc (or the fancy names they use for
>> it nowadays) usurps the most of the bandwidth, you won't get around
>> this using just SRam (perhaps in combination with some latch register
>> it could prove somewhat faster).
>
>You may understand interesting graphics technologies, but you do not
>understand the true performance considerations. You need some sort of
>an accelerator that can take buffered commands, and can interface its
>local RAM at an extremely high bandwidth. Considerations of RAM type
>are secondary to the BUS interface used.
>

IDONTUNDERSTANDWHAT?!?!?!?

Perhaps you only program for 320x200x256 (this is known as
"practically begging to get flamed" :-) ) but if I wouldn't have wram
then I would hardly get inbetween the 220Mhz DAC and the videoram in a
full blown hi res screen (1280x1024x16M at 70 Hz or something).

The type of memory used as the video ram is extremely important.

The "accelerators" (dumb name for it) already exist, they've fifo
buffers for their registers, interface superbly fast with their video
ram via a fency blitter doing Gourad shading, opaqueness and whatever
on the fly AND they would be terribly slowed down if they had to wait
to get acces to the video ram (crtc always has highest priority, some
weird banking system could be used to get around this since operations
are normally carried out off-screen but I've never came across these
kind of solutions).

>> Sigh... If we would still program in Assembly this wouldn't be a
>> problem,
>
>This is not true. If we were able to halve or even quarter the size
>of all of our programs, we might be happier for a time, but then
>programmers would just throw more complicated and exciting new
>algorithms at the OS (since they would have so much more space to
>maneuver in.) The brick wall is still there, even if you are only
>running at half the speed towards it.
>

Darn! Where's your loyalty towards Assembly? :-) With all that darned
honesty we never win at c.l.c.! (now repeat after me: "Assembly is the
answer to every problem" :-> )

>> The hardware nowadays is sufficient (well okay, we still have chips
>> running at 4.77 Mhz even in a full blown Compaq Deskpro XL 6200).
>> Solutions should be sought in the software instead of boosting the
>> hardware even more. Ofcourse this is economically uninteresting (just
>> as selling a bug free OS :-) )
>
>This is what I dispute. While software improvements are obviously a
>good idea, hardware improvements would be far more useful. With
>hardware solutions solving the fundamental problem, even bad software
>would run well, and good software would run amazingly!
>

Yes, we also should look at the hardware (automatical hidden surface
stuff for example :-> ) but my point is that the software nowadays is
years behind current hardware. Perhaps we should tune the software
first (it's cheaper :-) ) before making the gap between hard and
software even broader.

>> Special consideration should be given to the graphics hardware since
>> this is taking the most of the system. HAL's are some improvement but
>> the best result is still got when directly accesing the hardware.
>> Of course this approach is totally unportable (wouldn't it be nice to
>> have the good old homecomputer back? A hardware standard).
>
>This is a software issue. Here something in the OpenGL or Direct X
>line ought to deal with "portability".
>

Yeah, the thin HAL's. It's a sound concept and the speed penalty isn't
so great.

>> In search of Californian Girls Generating B Splines (CGGBS)
>
>Hee Hee Hee! :o) Maybe you should look into Irrational-B-Splines.
>

Nah, I'm having a long talk with Euler, I don't wanna wait till
kingdom come before I can look at my Californian Girl (some new
algorithm with "incremental" and "biased dynamic" has a nice ring to
it).

>[... ducking feminazi bazookas, and running like the wind ...]

^^^^^^
_uniform_ b splines? :->

Jim Brooks

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

Tenie Remmel wrote:
>
> Besides, if you write something in *well written* assembly it should
> be between 10 and 100 times (not percent) smaller than C, and maybe
> 200 times smaller than Basic.

Those figures are ridiculous, at the least, contrived. If you want the
smallest possible code size, then FORTH is the optimal language of choice.
In 16-bit X86 assembly, to call a procedure costs 3 bytes (1 byte for CALL,
2 bytes for the address), while FORTH can call a procudure using a
2 byte token.

> If assembly were used for everything we could all do just fine with
> a 386/16, 1MB RAM, 80MB HD...

The limited scope of the programming you do evidently has manipulated your
mind towards narrowedness and opinionatedness. I used to be that way.

Paul Hsieh

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

tj...@mail.idt.net said:
> Besides, if you write something in *well written* assembly it should
> be between 10 and 100 times (not percent) smaller than C, and maybe
> 200 times smaller than Basic.

Uh ... this is a severe exaggeration. I write pretty tight assembly
code, and I have a hard time getting much better than 2:1 versus my
C compiler. I would me *MOST IMPRESSED* by anyone that could beat my
code for size at between 5:1 and 50:1. You must be talking about 16
bit assembly code ...

> If assembly were used for everything we could all do just fine with
> a 386/16, 1MB RAM, 80MB HD...

There aint no way you could convince me to touch such a piece of
garbage hardware! How am I supposed to play Quake on a 386/16??
1MB of RAM and 80MB of disk space?? I use more than that on most
of my thought experiments!

If anything, the PC hardware of today is somewhat less than satisfying
(for the memory bandwidth reasons I cite elsewhere.)

Paul Hsieh

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

usu...@euronet.nl said:
> >Paul Mesken wrote:
> >> [...] graphics should be done with the dual ported memory chips

> >> (real vram, wram). Typically the crtc (or the fancy names they use for
> >> it nowadays) usurps the most of the bandwidth, you won't get around
> >> this using just SRam (perhaps in combination with some latch register
> >> it could prove somewhat faster).
> >
> >You may understand interesting graphics technologies, but you do not
> >understand the true performance considerations. You need some sort of
> >an accelerator that can take buffered commands, and can interface its
> >local RAM at an extremely high bandwidth. Considerations of RAM type
> >are secondary to the BUS interface used.
> >

> IDONTUNDERSTANDWHAT?!?!?!?
>
> Perhaps you only program for 320x200x256 (this is known as
> "practically begging to get flamed" :-) )

Ah ... perhaps I should introduce myself a little more properly. I am
low level Windows graphics display driver programmer. In this vein, I
dont commonly deal with 320x200 or other standard VGA modes. I've been
writing graphics drivers for several years now, and have a very good
feel for what exactly the performance issues are. My previous credits
include being on the team that wrote the original ATI Mach 64 drivers
(while ATI was still on the top of the Benchmark race.)

> [...] but if I wouldn't have wram

> then I would hardly get inbetween the 220Mhz DAC and the videoram in a
> full blown hi res screen (1280x1024x16M at 70 Hz or something).

These rams can help memory refresh, but have nothing to do with system
graphics performance.

> The type of memory used as the video ram is extremely important.

:o) You've been reading a little too much into Matrox' hype. The
TSENG part has even faster RAM (I know, I've tested it) but Matrox
still wins the (current) Benchmark wars. Why is that? Could it be
that there are *other* factors contributing to performance *other*
than RAM speed?

> The "accelerators" (dumb name for it) already exist, they've fifo
> buffers for their registers, interface superbly fast with their video
> ram via a fency blitter doing Gourad shading, opaqueness and whatever

So I've heard ... (intentionally facetious.) Yes, I am among those
who are intimately aware of precisely this.

> on the fly AND they would be terribly slowed down if they had to wait
> to get acces to the video ram

I agree. So it is important that you not be waiting for video RAM
when trying to write to it. Again, this has nothing to do with the
speed of the RAM itself. The Triton Chipset and PCI interface are
far more important factors. I know. I've measured it.

> (crtc always has highest priority, some weird banking system could be

> used to get around this [...]

Yes, its called "buffering" or FIFOing. Its similar to "caching", a
technique you may have a little better grasp of. Whether or not the
RAM is ready to recieve the write is of minor importance in comparison
to whether or not the PCI interface is stalled because there is no
FIFO that can receive and temporarily hold onto the write until the
RAM is ready to receive it.

> since operations are normally carried out off-screen but I've never
> came across these kind of solutions).

I've never come across any other kind of solution. If I did, I would be
sure to make a point of laughing at it. (On the other hand I am only
used to dealing with fast architectures ... who knows, maybe the Trident
chipsets work without any sort of buffering ...)

> Yes, we also should look at the hardware (automatical hidden surface
> stuff for example :-> ) but my point is that the software nowadays is
> years behind current hardware.

When, I look at the PCI bottleneck day in, day out, when no amount of
rearranging or playing with my ASM code will make it go faster? ...
This may be true of the crap vomitted by Microsoft, but its just so
easy for (good) software to smash up against the fundamental hardware
bottlenecks.

> [...] Perhaps we should tune the software first (it's cheaper :-) )

> before making the gap between hard and software even broader.

The whole exercise of tuning and improving software is what makes the
software industry. At least that's what drives game companies like id.
The difference between bad software and good software is why the market
exists, in its competitive frenzied state. What is sad is that the
winners are not necessarily those that solve the technical problems
the best (eg: Microsoft.)

By comparison "better" software is easy to deal with. Companies just
keep tuning/revving or even rewriting code until it gets better and
better. Hardware is different. Once a big company like Intel sets a
standard, they cant go tweaking, and changing it. We've been with the
PCI bus for 3 years now, it wasn't much of a BUS back then, and its
dreadfully painful now. With AGP Intel promises to obsolete us yet
again in a couple years. We need to go *WAY FASTER* than that.

What do you suppose would happen if Intel (or Cyrix, or AMD) got together
with some RAM manufacturers and built an interface to really high speed
RAMs that were even faster than the system clock? How can this be you
ask? The Intel CPU has string move commands (rep movsd)! These are
perfect for initiating a blitter running asynchronously at a much higher
rate, because its not tied into other non-RAM related issues like x86
instruction issue, as the other algorithmic approaches are. The cost of
the ideal solution may be too high in the beginning, but by architecting
something early, we would have a much better handle on the memory
bottleneck which is causing us poor programmers so much grief.

H.W. Stockman

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

Paul Hsieh wrote:

> feel for what exactly the performance issues are. My previous credits
> include being on the team that wrote the original ATI Mach 64 drivers
> (while ATI was still on the top of the Benchmark race.)

Impressive... and I'm glad you didn't claim credit for
the Mach 32 drivers!
[...]

> ask? The Intel CPU has string move commands (rep movsd)! These are
> perfect for initiating a blitter running asynchronously at a much higher
> rate, because its not tied into other non-RAM related issues like x86
> instruction issue, as the other algorithmic approaches are. The cost of
> the ideal solution may be too high in the beginning, but by architecting

Would the rep movsd really be that helpful? It seems that in the
current Pentium CPUs, rep movs often ends up slower than the atomic
operations (not so sure about the PPro, though).

Chris

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

In article <32EBA2...@mail.idt.net>, Tenie Remmel <tj...@mail.idt.net>
wrote:

> Besides, if you write something in *well written* assembly it should
> be between 10 and 100 times (not percent) smaller than C, and maybe
> 200 times smaller than Basic.

> If assembly were used for everything we could all do just fine with
> a 386/16, 1MB RAM, 80MB HD...

with the obvious drawback that software would take a longer time to
develop, and would still have the shoddy quality from being rushed to
market. at least this way we get shoddy software in a reasonable time
frame. :-)

and I don't know about you, but I wouldn't be interested in writing either
a GUI or a GUI application in assembly. call me a coward, but...

chris

--
"The price of freedom--of individuality--is attention to politics, careful
planning, careful organization; philosophy is no more a barrier against
political disaster than it is against plague."
--Greg Bear on p67 of /Heads/

Paul Hsieh

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

hws...@swcp.com said:

> Paul Hsieh wrote:
> > feel for what exactly the performance issues are. My previous credits
> > include being on the team that wrote the original ATI Mach 64 drivers
> > (while ATI was still on the top of the Benchmark race.)
>

> Impressive... and I'm glad you didn't claim credit for the Mach 32
> drivers!

:o) I was not aboard ATI when the Mach 32 drivers were developed. The
Mach 64 drivers were rewritten almost entirely from scratch. After the
Mach 64 effort was finished, it was back ported to the Mach 32,
dramatically improving performance and stability. I don't know if ATI
ever released those improved Mach 32 drivers though.

> > ask? The Intel CPU has string move commands (rep movsd)! These are
> > perfect for initiating a blitter running asynchronously at a much higher
> > rate, because its not tied into other non-RAM related issues like x86
> > instruction issue, as the other algorithmic approaches are. The cost of
> > the ideal solution may be too high in the beginning, but by architecting
>

> Would the rep movsd really be that helpful? It seems that in the
> current Pentium CPUs, rep movs often ends up slower than the atomic
> operations (not so sure about the PPro, though).

Yes, REP MOVSD is not all that fast right now, because Intel has been
optimizing the instruction prefetch buffer, MMX instructions, and other
pipelining considerations, while not doing much for the more CISC like
string move instructions.

But what I would propose is that the RAM interfaces have optional blitter
capabilities and the Pentium/Triton chip set would use them instead of
relying on the bus when the appropriate conditions occurred. The only
problem is triggering the blitter. rep movsd is the ideal candidate
instruction construct to trigger a memory blit operation.

dang...@sound.net

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

q...@chromatic.com (Paul Hsieh) wrote:

I've visited the mpact website. Sounds very impressive. How is your
performance compared to dedicated chips such as the 3dfx voodoo?

Also when is the expected release date for add on boards and price?

If I may be so bold, may I recommend at least the option for a faster
RAMDAC. The resolution & refresh rates are a little lower than many
of us are using now.

Thanks for the info, I have been curious when a board that would
accelerate 2d, 3d, and DVD video would be announced (sound and modem
included, hmm, I didn't expect that <g>)

Tim Behrendsen

unread,

Jan 26, 1997, 3:00:00 AM1/26/97

to

Paul Hsieh <q...@chromatic.com> wrote in article

> t...@a-sis.com says...
> > Paul Hsieh wrote:
> > > Your problem is that you fundamentally dont understand what is wrong
> > > with relying on the "cache" to give you fast memory. [...] if
> > > you believe old Win 3.x/DOS based statistics that say your code hangs
> > > around the L2 cache for 90% of its data acesses, then you are wearing
> > > blinders!
> >
> > So what *is* the number then, for a typical Win/95 machine? I really
> > doubt that it's below 80%.
>
> 80%?? You're out of your nut. I have a 32MB system, I have 5
> applications loaded (thats pretty low for me) and Windows is using 50.9MB

> out of 70.1MB of virtual memory [...]

So what? I could have 1000 applications loaded, but I only use one
at a time. (Unless the others are doing background crap, of course).

> Two of the Windows are Netscape (a bizarrely popular application), one is
> an empty 50 line DOS Box, one is my newsreader and one is the reply
> window I am writing this message in. (There are obviously other tasks
> running in the system including a memory monitor program which does not
> have a task bar entry.)

Yes, and as you type in your reply window, I would say it's highly
likely that the mechanisms involved are sitting in your L2 cache.

> Now, if even I *were* hitting a rate of 80% hits in the L2 cache that
> would imply that 0.4% of all the code currently resident in memory is
> actually active at once.

This is not that hard to believe. Bring to down to real terms; what
is the code *doing* that would require all these giant loops?
Programs are big nowadays because of features, not because of giant
loops. If I don't use a feature, then it's not required to be in
the cache.

> And if the machine is idle (that is if I am
> not doing anything and hence dont care how fast my machine is going)
> then I could believe such a statistic. But I am obviously more
> interested in cases where my PC is very active. This would mean more
> Windows,

No, more *active* windows.

> more dynamic applications (such as Active X, a video game, a
> compile, or some other such nonsense) using even more memory and more
> percentage of it for inner loops.

Again, the key is locality. I agree that there are probably
times that the cache doesn't help a lot, but I think you
drastically overestimate the amount of uncached code that is
executed on a statistical basis.

> In real world situtations, over the course of a single second of
> relevant operation, I would be surprised if the L2 cache retains 20%
> of your inner loop code.

Well, it's all guesswork on both our parts unless we can find some
actual measurements to back up these claims.

> > In any case, 512K caches are not uncommon, but as near as I can
> > recall, you start getting very diminishing returns beyond this.
>
> Yes, the diminishing returns comes from the fact that its impossible
> to predict where the next memory packets are coming from next as
> opposed to being wildly over successful at what its trying doing.

No, the diminishing returns come from the fact that more cache
does not significantly increase the hit rate. The two extremes
are no cache, where I get zero hits, and a 32 meg L2 cache for 32 meg
of RAM, where I get 100% hits. In between is a 90%, 80%, etc point.
The question is, where is the 80% point? I submit that it's much
lower than you think.

> > > Actually I think not, thought I doubt he's lying awake thinking about
> > > the X86 consortium. Andy Grove knows very well what a "Multimedia
> > > Processor" is and I'm sure its giving him insomnia. If you don't
know
> > > what a "Multimedia Processor" is check out:
> > >
> > > http://www.mpact.com/
> >
> > I think this is *exactly* the solution.
>
> Glad you think so.

Just make sure that it's 50 bucks, and you will be successful. :-)

> > [...] The real solution is to
> > have custom chips to perform low level grunt work based on high
> > level protocols. The new generation of 3D graphics cards are a
> > perfect example of this.
>
> Uh ... well I don't exactly support your reasoning, but your
> conclusion is correct. The x86/PCI architecture just isn't well
> suited for many tasks, such as multimedia, and it doesnt look like
> Intel will be dealing with their short comings any time soon. So
> that leaves open the fruitful ASIC/DSP/Coprocessor market for
> assisting the poor over burdoned x86.

Well, we can argue about cache hite rates all day long, but even
a 100% hit rate is not going to solve the general question of the
fact that graphics are a helluva lot of data passing through the
CPU.

Paul Mesken

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

>> Perhaps you only program for 320x200x256 (this is known as
>> "practically begging to get flamed" :-) )
>
>Ah ... perhaps I should introduce myself a little more properly. I am
>low level Windows graphics display driver programmer.

[don't read this if you don't like getting insultated]
Ah, so you're the guy that makes me making my own engines without
using the (your) drivers :->

> [..] In this vein, I

>dont commonly deal with 320x200 or other standard VGA modes. I've been
>writing graphics drivers for several years now, and have a very good
>feel for what exactly the performance issues are. My previous credits
>include being on the team that wrote the original ATI Mach 64 drivers
>(while ATI was still on the top of the Benchmark race.)
>
>> [...] but if I wouldn't have wram
>> then I would hardly get inbetween the 220Mhz DAC and the videoram in a
>> full blown hi res screen (1280x1024x16M at 70 Hz or something).
>
>These rams can help memory refresh, but have nothing to do with system
>graphics performance.
>

WHY am I under the impression I'm talking to a TREE (NO, not a voxel
octree, a fractal one :-> ).

Look here, it's really simple. Let's say my videoram is contained in
simple sram chips (some sim with 621100's on it). I have one databus
and one adressbus. I (the Cpu) want to write to it but the crtc
(shifter ,WHATEVER, the thing that makes up the screen on your
monitor) is reading it. Now, I have to wait until the crtc is done
(yes, you could work with a cache, that is true).

You don't have this problem with wram which utilizes a serial port (so
there's no need for constantly giving adresses, works a little bit
like PCI's burst stuff) to satisfy the crtc while the Cpu or the so
called accelerator, graphics engine or whatever can access the memory
as well at the same time.

>> The type of memory used as the video ram is extremely important.
>
>:o) You've been reading a little too much into Matrox' hype.

Yeah, that big book with MGI confidential written all over it. It's a
hell of a lot more impressive than that TSENG crap (I think I'm
succesfull in raging another war :-> Apeshit: Matrox vs Tseng Labs)

>[...] The

>TSENG part has even faster RAM (I know, I've tested it) but Matrox
>still wins the (current) Benchmark wars. Why is that? Could it be
>that there are *other* factors contributing to performance *other*
>than RAM speed?
>

I never stated the acces time, transistion time or whatever (can
somebody PLEASE give me a pointer to a dictionary of standardized
compulingo??!??!) is THE performance factor. But I pointed out
there're at least two parties involved accessing the videoram, the one
that makes up the screen at your monitor from the contents of the
videoram and the one that changes the contents of the videoram.
(Actually I'm clueless, I only went till ET 4000 but if you tell me
about ET 6000 then I tell you why the MGA 2064 W is better :-> )

>> The "accelerators" (dumb name for it) already exist, they've fifo
>> buffers for their registers, interface superbly fast with their video
>> ram via a fency blitter doing Gourad shading, opaqueness and whatever
>
>So I've heard ... (intentionally facetious.) Yes, I am among those
>who are intimately aware of precisely this.
>

>> on the fly AND they would be terribly slowed down if they had to wait
>> to get acces to the video ram
>
>I agree. So it is important that you not be waiting for video RAM
>when trying to write to it. Again, this has nothing to do with the
>speed of the RAM itself. The Triton Chipset and PCI interface are
>far more important factors. I know. I've measured it.
>

Again, I was not talking about the SPEED of the ram, I merely stated
wram is quite handy since there're at least two accessing it. Besides,
who needs Triton, buy a decent Triflex chipset (hey! another war:
Compaq vs the rest :-> ).

If the PCI interface is too slow for ya then perhaps you should get
rid of all those bitmaps and make more use of the scan conversion
hardware (it's fun watching all those trapezoids :-> ).

>> (crtc always has highest priority, some weird banking system could be
>> used to get around this [...]
>
>Yes, its called "buffering" or FIFOing. Its similar to "caching", a
>technique you may have a little better grasp of. Whether or not the
>RAM is ready to recieve the write is of minor importance in comparison
>to whether or not the PCI interface is stalled because there is no
>FIFO that can receive and temporarily hold onto the write until the
>RAM is ready to receive it.

MY GOD! You must think I'm clueless. With weird banking system I meant
an array of simple dram chips with some more intelligent busarbiter.
Instead of disabling all chips except for the one(s) being accessed,
the accessed one should be inaccessible to other peripherals (by
dumping some wait states, not acknowledging, etc.) while the others
are still accessible (works nicely when there're more cpu's). This is
not uncommon in systems (actually, I'm doing this as a little project
with a bunch of 9938's (YEAH, long time no see :->) , Z80's and an all
empowering 68000).

ALSO, I'm beginning to suspect you're unaware of PCI bridge's primary
and secundary prefetch/post buffers (nothing to be ashamed about Paul
(Brr, my own name in third person) since it's transparant (and
probably left out of your machine hearing you nag about it's
performance)).

>> Yes, we also should look at the hardware (automatical hidden surface
>> stuff for example :-> ) but my point is that the software nowadays is
>> years behind current hardware.
>
>When, I look at the PCI bottleneck day in, day out, when no amount of
>rearranging or playing with my ASM code will make it go faster? ...
>This may be true of the crap vomitted by Microsoft, but its just so
>easy for (good) software to smash up against the fundamental hardware
>bottlenecks.
>

[sigh] Here's the remedy, make your own graphics card and stuff it in
one of your memory slots or just hardwire it to the cpu.

>> [...] Perhaps we should tune the software first (it's cheaper :-) )
>> before making the gap between hard and software even broader.
>
>The whole exercise of tuning and improving software is what makes the
>software industry. At least that's what drives game companies like id.
>The difference between bad software and good software is why the market
>exists, in its competitive frenzied state. What is sad is that the
>winners are not necessarily those that solve the technical problems
>the best (eg: Microsoft.)
>

Yep, you're right about Microsoft. Best program they've ever made was
edit.com :->

>By comparison "better" software is easy to deal with. Companies just
>keep tuning/revving or even rewriting code until it gets better and
>better. Hardware is different. Once a big company like Intel sets a
>standard, they cant go tweaking, and changing it. We've been with the
>PCI bus for 3 years now, it wasn't much of a BUS back then, and its
>dreadfully painful now. With AGP Intel promises to obsolete us yet
>again in a couple years. We need to go *WAY FASTER* than that.
>

Okay, PCI isn't perfect. But at least it isn't some ISA XT slot or
Vesa LocalBus. The trouble is ofcourse Intel has to deal with the
legacy of their former stuff and other manufacturers. They can't just
make a 180 degree turn.

>What do you suppose would happen if Intel (or Cyrix, or AMD) got together
>with some RAM manufacturers and built an interface to really high speed
>RAMs that were even faster than the system clock? How can this be you
>ask? The Intel CPU has string move commands (rep movsd)! These are
>perfect for initiating a blitter running asynchronously at a much higher
>rate, because its not tied into other non-RAM related issues like x86
>instruction issue, as the other algorithmic approaches are. The cost of
>the ideal solution may be too high in the beginning, but by architecting
>something early, we would have a much better handle on the memory
>bottleneck which is causing us poor programmers so much grief.
>

Grief? Hey, it's a challenge! :-)

>--
>Paul Hsieh
>qed "at" chromatic "dot" com
>http://www.geocities.com/SiliconValley/9498
>Graphics Programmer
>Chromatic Research
>
>What I say and what my company says are not always the same thing

In search of Californian Girls Generating B Splines (CGGBS)

usu...@euronet.nl Paul Mesken aka Technocrate

Jason Burgon

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

In article <32e795b4...@news.intercom.net>, Jeff Krob
<jk...@shore.intercom.net> writes
>On 23 Jan 1997 14:48:50 GMT, r...@sunspot.tiac.net (Robert Krawitz)
>wrote:
>
>>In article <5c6svs$av4$2...@falcon.ns.net> #ep...@ns.net (Elmer Phud) writes:
>>
>> Jim the PPro runs its cache at full CPU frequency. The hit rate is 90%.
>> Adding 32 meg of Sram is only going after the remaining 10%. Assuming
>> 32meg of sram as you stated earlier (which I seriuosly doubt for reasons
>> I will state later) that's increasing the cache size by 128x for at best
>> a 10% boost. Does that make sense? I don't think so...
>>
>>Assuming that the 90% hit rate is true (which I think is dubious, at
>>least in a heavily multitasked system running memory intensive stuff),
>>your arithmetic is still wrong.
>>
>>Random memory access to 70 ns DRAM typically takes more like 110 ns;
>
>Where does this extra 40ns come from, pray tell?

The access time of a DRAM is counted as the start of the RAS (Row
Address Strobe) signal going low to when the data at the I/O pin(s) can
be read/written. The extra time comes because (amongst other things) RAS
needs a "pre-charge" time before it can go low again. Also the address
pins need to be stable before RAS can be asserted.

The above may not be entirely correct today, as technology (such as EDO)
changes, but the access times of DRAM chips is one of those "paint the
best picture possible" figures that manufacturers use. Another is one is
the power rating of a car hi-fi. :-)

Jay
--
--------------------------------------------------------------------------
| Jason Burgon - author of Graphic Vision, TV-Like GUI for 256 Colour SVGA |
| g...@jayman.demon.co.uk |
| ***VERSION 1.40 OF GV AVAILABLE FROM*** -> http://www.jayman.demon.co.uk |
--------------------------------------------------------------------------

Turnpike evaluation. For Turnpike information, mailto:in...@turnpike.com

Videoman

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

Chris wrote:
>
> In article <32EBA2...@mail.idt.net>, Tenie Remmel <tj...@mail.idt.net>
> wrote:
> > Besides, if you write something in *well written* assembly it should
> > be between 10 and 100 times (not percent) smaller than C, and maybe
> > 200 times smaller than Basic.
> > If assembly were used for everything we could all do just fine with
> > a 386/16, 1MB RAM, 80MB HD...
>
> with the obvious drawback that software would take a longer time to
> develop, and would still have the shoddy quality from being rushed to
> market. at least this way we get shoddy software in a reasonable time
> frame. :-)
>
> and I don't know about you, but I wouldn't be interested in writing either
> a GUI or a GUI application in assembly. call me a coward, but...

The Win16 GDI is written in 16-bit ASM... one of the reasons that Win95 was forced
to use it also... too much to re-write and stay compatible.

(.mac and .powerpc groups trimmed)

Grant Schoep

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

> If assembly were used for everything we could all do just fine with
> a 386/16, 1MB RAM, 80MB HD...
>

AMEN!! But um, one problem, I don't think I could ever sit down and
spend the time writing Assembly for a RISC system. Just to much extra
time. Though,
gosh that would be clean code.
-grant
--
**********************************************************************
* Grant Schoep Computer Science Major *
* bi...@vax1.bemidji.msus.edu Bemidji State University *
* -Education is the key to progression of life and advancement of *
* humankind. Communication is the the key to education. *

Robert Collins

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

In article <01bc0adf$e57276c0$32ee6fcf@timhome2>,

Tim Behrendsen <t...@a-sis.com> wrote:
>Jim Brooks <jbr...@swbell.net> wrote in article
>>

>> "Looking further into the future of TI's strategy, the
>> company expects to begin production in 1997 of a
>> single chip that has 125 million transistors on it,
>> which is approximately the number of transistors
>> found in 30 Pentium Pro processors. "
>>
>> With 125M transistors, an X86 core and 16MB SRAM can be put on a single
>die.
>> Finally, Moore's law has been broken. But, hell, laws are made to be
>broken,
>> right?
>
>Uh, except 16 meg is PUNY!! A standard PC is sold with at least
>32 meg nowadays. 16 meg of memory is FORTY DOLLARS!
>
>> Oh, another thing. TI-Japan is part of the X86 Consortium.
>> Pleasant dreams, Andy Grovy of Wintel, or should I say, R.I.P.
>
>Oh, another thing. How much is this 125 million transister thing
>gonna COST? A helluva lot more than a PentiumPro with L2 cache and
>forty dollar memory.
>
>This is the biggest laugh I've had in some time.

No, the biggest laugh is reading everybody pontificate about how they
can predict the future, and inherently have knowledge about the
manufacturing costs and die sizes of every chip on the planet, being
produced by every manufacturer.

Please educate me regarding TI's 125-million transistor chip. Let's
see, if a 16 M-bit DRAM has 5 transistors per bit of memory storage,
that's equal to (a minimum of) 80-million transistors. Using the same
scale, a 64 M DRAM has at least 320-million transistors. Now, considering
there's 8 of those chips on a single SIMM socket, and that SIMM socket
costs $40.00 -- *RETAIL* -- would you kindly educate me about the
impossibility of manufacturing such a chip, and the projected cost of
such a manufacturing process? Which chip costs less -- Pentium Pro,
or the memory? I'd certainly like to see Pentium Pro's going for $10
each.

I'd certainly love to hear your knowledge of TI's manufacturing strategy
and costs. I'm sure a Wall Street investment firm would pay you big
bucks to relay your intimate (insider) knowledge regarding these
subjects.

--
"Intel Secrets -- What Intel doesn't want you to know"( ) Robert Collins
HomePage: http://www.x86.org Anonymous FTP: ftp://ftp.x86.org
-----------------------------------------------------------------------------
Robert Collins mailto:rcol...@x86.org 408.832.6270

JeffMo

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

cs...@csv.warwick.ac.uk (Mr J R Hall) wrote:

>In article <5calil$7vo$2...@falcon.ns.net>,
> #ep...@ns.net (Elmer Phud) writes:
>>
>>Dufus? I assumed that anyone with half a brain could see the # was there
>>to stop junk email. I guess I should have made allowences for you too.
>>

>If I want to reply to a news posting to the original sender only, I
>have to use my newsreader to do this. I press 'R', and it copies the
>article into an editor, and then when I've finished it sends it to the
>original poster.

>At no time during this process do I get a chance to alter the email
>address that I am sending it to.

Get a better newsreader.

JeffMo

"A valid argument is not formed solely by ignorance." -JeffMo
"A valid argument is not formed solely by assertion." -JeffMo

Religion : Science :: Methamphetamine : Exercise

For email replies, remove the "dipstick." from my eddress.
It should be self-evident that I am not a dipstick. ;-)

Robert Collins

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

In article <32EA93...@chromatic.com>, Paul Hsieh <qed> wrote:
>First let me state that my views do *NOT* represent that of my company.
>If you wish to see my company's statements, please go to:
>
> http://www.chromatic.com/
>
>(My company is not nearly so anti-Intel as I am personally, :o) )
>
>Robert Collins wrote:
>> Paul Hsieh wrote:
>> >Jim Brooks:
>> >A 640x480x256 color graphics screen is 302K in size. If Intel wants
>> >to fantasize about their piddley processor doing Multimedia
>> >applications they need more than a 256K L2 cache to even *cover* a
>> >back buffer. After CPU clock rate, memory performance is the modern
>> >PCs major factor towards performance. While Intel has been able to
>> >crank their processor clock rate, they have done very little to
>> >improve memory bandwidth. Each Triton gets better and better, but
>> >they are stuck behind that 33 MHz PCI brick wall.
>>
>> I don't believe this is true.
>
>Can you be more precise as to what exactly you think I said that is
>actually untrue?

I already did. All I needed to do was give an example contrary to
your statement "Intel has done very little to improve memory bandwidth."
But here's a few very specific examples:
* The Pentium Pro with onboard cache.
* Intel is developing a 100 MHz system bus.
* EDO and SDRAM support in their chipsets.
* Improved RAM timing in each successive generation of chipset.
* The deal with RAMBUS.

These are all examples of how Intel has tried to improve memory bandwidth.

>If they think about it any longer and Rambus will lose patience with
>them and leave them behind. Rambus is *not* new technology. Nintendo
>has been using it for years. (Rambus is about 4 times faster than PCI
>at their peak transfer rates, and has fewer restrictions on byte
>select access, as well as other architectural advantages.)

The Nintendo-64 is the first Nintendo machine to use RAMBUS technology.
The Nintendo-64 has only been shipping about 3 months...not "years."

>> Speaking of Chromatic...there's been rumors that you guy's are working
>> on x86 too.
>
>You mean to ask: are we psychos setting us up for failure like AMD,
>Cyrix and Motorola? Now, I see what the quality of your 'rumors' are.

Ditto for the quality of knowledge of what your company is doing.
So what exactly is your company doing with Cyrix and SGS Thompson?

>No, we intend to be a successful company that sells into the lucrative
>business of "technology that other companies dont have to balls or
>brawn to solve properly."

I don't like such overly broad statements. If the statement were true,
you *would* be going into the x86 market.

Robert Collins

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

In article <5cii76$r...@slip.net>, Robert Collins <rcol...@x86.org> wrote:
>
>Please educate me regarding TI's 125-million transistor chip. Let's
>see, if a 16 M-bit DRAM has 5 transistors per bit of memory storage,
>that's equal to (a minimum of) 80-million transistors. Using the same
>scale, a 64 M DRAM has at least 320-million transistors. Now, considering
>there's 8 of those chips on a single SIMM socket, and that SIMM socket
>costs $40.00 -- *RETAIL* -- would you kindly educate me about the
>impossibility of manufacturing such a chip, and the projected cost of
>such a manufacturing process? Which chip costs less -- Pentium Pro,
>or the memory? I'd certainly like to see Pentium Pro's going for $10
>each.

Consider myself having been re-educated on the number of
transisters per bit of storage in DRAM. Apparently it's 1+ for
DRAM and 4 or 6 transisters per bit of SRAM. Sorry for the confusion.
Doohhh.

Tim Behrendsen

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

On 27 Jan 1997, Robert Collins wrote:

: In article <01bc0adf$e57276c0$32ee6fcf@timhome2>,

: Tim Behrendsen <t...@a-sis.com> wrote:
: >Jim Brooks <jbr...@swbell.net> wrote in article
: >>
: >> "Looking further into the future of TI's strategy, the
: >> company expects to begin production in 1997 of a
: >> single chip that has 125 million transistors on it,
: >> which is approximately the number of transistors
: >> found in 30 Pentium Pro processors. "
: >>
: >> With 125M transistors, an X86 core and 16MB SRAM can be put on a single
: >die.
: >> Finally, Moore's law has been broken. But, hell, laws are made to be
: >broken,
: >> right?
: >
: >Uh, except 16 meg is PUNY!! A standard PC is sold with at least
: >32 meg nowadays. 16 meg of memory is FORTY DOLLARS!
: >
: >> Oh, another thing. TI-Japan is part of the X86 Consortium.
: >> Pleasant dreams, Andy Grovy of Wintel, or should I say, R.I.P.
: >
: >Oh, another thing. How much is this 125 million transister thing
: >gonna COST? A helluva lot more than a PentiumPro with L2 cache and
: >forty dollar memory.
: >
: >This is the biggest laugh I've had in some time.
:
: No, the biggest laugh is reading everybody pontificate about how they
: can predict the future, and inherently have knowledge about the
: manufacturing costs and die sizes of every chip on the planet, being
: produced by every manufacturer.

Who's the predicting the future? We are talking about economics
and technology. When superior technology can be created for an
economically feasable price, you have a new product. Simple as
that.

: Please educate me regarding TI's 125-million transistor chip. Let's

OK.

: see, if a 16 M-bit DRAM has 5 transistors per bit of memory storage,

: that's equal to (a minimum of) 80-million transistors. Using the same
: scale, a 64 M DRAM has at least 320-million transistors. Now, considering
: there's 8 of those chips on a single SIMM socket, and that SIMM socket
: costs $40.00 -- *RETAIL* -- would you kindly educate me about the
: impossibility of manufacturing such a chip, and the projected cost of
: such a manufacturing process? Which chip costs less -- Pentium Pro,
: or the memory? I'd certainly like to see Pentium Pro's going for $10
: each.

Uh, you are talking about SRAM. SRAM uses transisters for each bit,
which is why it's so expensive. DRAM uses capacitors for each bit,
which is why it's so cheap, but requires refresh cycles to keep the
charge, and is very [relatively speaking] slow to read the state of
the bit.

Notice how expensive that PPRO chip is, then look at how expensive
SRAM chips are. It is not coincidence that they are both expensive.

: I'd certainly love to hear your knowledge of TI's manufacturing strategy

: and costs. I'm sure a Wall Street investment firm would pay you big
: bucks to relay your intimate (insider) knowledge regarding these
: subjects.

Wall Street won't pay me bumpkis, because it doesn't take a genius
to know this stuff. All you have to know is that 125-million transister
technology is NEW -- and therefore expensive.

: "Intel Secrets -- What Intel doesn't want you to know"( ) Robert Collins

: HomePage: http://www.x86.org Anonymous FTP: ftp://ftp.x86.org
: -----------------------------------------------------------------------------
: Robert Collins mailto:rcol...@x86.org 408.832.6270

Let me get this straight. You run an anti-Intel propaganda site,
and you don't even know the difference between SRAM and DRAM? You
don't understand the economics of chip technology?

Perhaps you should stay away from these matters, since you have
thoroughly embarrassed yourself with your ignorance.

==========================================================================

==========================================================================

Tim Behrendsen

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

>
> : "Intel Secrets -- What Intel doesn't want you to know"( ) Robert Collins
> : HomePage: http://www.x86.org Anonymous FTP: ftp://ftp.x86.org
> : -----------------------------------------------------------------------------
> : Robert Collins mailto:rcol...@x86.org 408.832.6270
>
> Let me get this straight. You run an anti-Intel propaganda site,

I want to apologize to all, and particularly Robert Collins for this
"anti-Intel propaganda site" comment. While there is some anti-Intel
stuff, there is also quite a bit of pro-Intel stuff, and there is
definitely quite a bit of good hard technical information.

Disagreeing with the content of his post did not require this comment,
and I was wrong for taking only a cursory look before commenting.

Chuck Gray

unread,

Jan 27, 1997, 3:00:00 AM1/27/97

to

In article <32EBA2...@mail.idt.net>, Tenie Remmel <tj...@mail.idt.net>
wrote:

>Paul Mesken wrote:
>>
>> >> Sigh... If we would still program in Assembly this wouldn't be a
>> >> problem,
>> >
>> >This is not true. If we were able to halve or even quarter the size
>> >of all of our programs, we might be happier for a time, but then
>> >programmers would just throw more complicated and exciting new
>> >algorithms at the OS (since they would have so much more space to
>> >maneuver in.) The brick wall is still there, even if you are only
>> >running at half the speed towards it.
>> >
>>

>> Darn! Where's your loyalty towards Assembly? :-) With all that darned
>> honesty we never win at c.l.c.! (now repeat after me: "Assembly is the
>> answer to every problem" :-> )
>

>Besides, if you write something in *well written* assembly it should
>be between 10 and 100 times (not percent) smaller than C, and maybe
>200 times smaller than Basic.
>

>If assembly were used for everything we could all do just fine with
>a 386/16, 1MB RAM, 80MB HD...
>

Yeah, sure, but you wouldn't be able to write code as quickly and packed
with as many features or as portable across platforms. Just identify the
performance-critical or most often executed areas in your high-level code
and optimize that portion (in assembly, if you like). And of course,
assembly will not make up for algorithms that were inefficient in the
high-level code.

--
Chuck Gray
cg...@metronet.com

Elmer Phud

unread,

Jan 28, 1997, 3:00:00 AM1/28/97

to

In article <5cii76$r...@slip.net>, rcol...@slip.net says...

>Please educate me regarding TI's 125-million transistor chip. Let's

>see, if a 16 M-bit DRAM has 5 transistors per bit of memory storage,
>that's equal to (a minimum of) 80-million transistors. Using the same
>scale, a 64 M DRAM has at least 320-million transistors. Now, considering
>there's 8 of those chips on a single SIMM socket, and that SIMM socket
>costs $40.00 -- *RETAIL* -- would you kindly educate me about the
>impossibility of manufacturing such a chip, and the projected cost of
>such a manufacturing process? Which chip costs less -- Pentium Pro,
>or the memory? I'd certainly like to see Pentium Pro's going for $10
>each.
>

You have it wrong Robert. A dram uses 1 transistor, not 5. It is a sram
that uses 5 transistors. The previous poster was correct.

El

Mark S. Bilk

unread,

Jan 28, 1997, 3:00:00 AM1/28/97

to

In article <MPG.d547caa9...@news.zocalo.com>, Paul Hsieh <q...@chromatic.com> wrote:
>t...@a-sis.com says...
>> Paul Hsieh wrote:

>> > Your problem is that you fundamentally dont understand what is wrong
>> > with relying on the "cache" to give you fast memory. [...] if
>> > you believe old Win 3.x/DOS based statistics that say your code hangs
>> > around the L2 cache for 90% of its data acesses, then you are wearing
>> > blinders!
>>
>> So what *is* the number then, for a typical Win/95 machine? I really
>> doubt that it's below 80%.
>
>80%?? You're out of your nut. I have a 32MB system, I have 5
>applications loaded (thats pretty low for me) and Windows is using 50.9MB
>out of 70.1MB of virtual memory (I don't have a lot of free disk space.)

>...

>Now, if even I *were* hitting a rate of 80% hits in the L2 cache that
>would imply that 0.4% of all the code currently resident in memory is
>actually active at once. And if the machine is idle (that is if I am
>not doing anything and hence dont care how fast my machine is going)
>then I could believe such a statistic. But I am obviously more
>interested in cases where my PC is very active. This would mean more
>Windows, more dynamic applications (such as Active X, a video game, a
>compile, or some other such nonsense) using even more memory and more
>percentage of it for inner loops.
>
>In real world situtations, over the course of a single second of
>relevant operation, I would be surprised if the L2 cache retains 20%
>of your inner loop code.

But why does it matter if the inner loops are *retained* in
the cache over a single second?

Given:

o A time slice is 1/18 sec. (one PC timer tick).

o The main memory bandwidth is 80 MB/sec. (64 bits every
100 ns., which is pessimistic because it doesn't take
into account rapid page-mode access).

Then in 1/8 of a time slice (1/144 sec.), over 500 KB of
inner loop(s) can be loaded into the L2 cache, and can
run there at full speed (at least if each individual loop
sequentially fits into the L1 cache) for the remaining 7/8
of the time slice.

>> > Win 95, Win NT, OS/2 and Linux are all pre-emptive multitasking (to
>> > different degrees.) That means that the typical desktop situtation
>> > has numerous windows open, so committing most of your memory is a
>> > trivial matter. Then there's the applications themselves. Very few
>> > big applications, that people actually use, have a code and data
>> > footprint that is accumulated in a fixed set of only 256K. If 90% of
>> > *MY* memory is cover by a tiny 256K cache then I am *really* under
>> > utilizing my PC.
>>
>> That's missing the point. The point of a cache is not to hold the
>> entire program within it, it's to keep cached the major loops. Most
>> of a program is necessary in the global scheme, but only a small
>> fraction is executed at any one time.
>
>Like what kind of a small fraction? Like less than 0.4%? I am not
>missing the point. The cache is only good for small jobs and for
>sustained simple inner loops of singular tasks (like stupid benchmarks.)

Since the caches can reload in a tiny fraction of a time
slice, the entire 256 or 512 KB L2 cache is available to
efficiently hold the working set for each task. This allows
for a huge working set, which can also change totally 20
times a second without degrading performance significantly.

The exceptions to this calculation occur when two or more
inner loops are simultaneously executing (during the same
time slice) and are mapping to the same sections of cache.
An example might be a graphics application and the graphics
driver that it is calling.

A larger cache will minimize the probability of this happen-
ing.

>> In any case, 512K caches are not uncommon, but as near as I can
>> recall, you start getting very diminishing returns beyond this.
>
>Yes, the diminishing returns comes from the fact that its impossible
>to predict where the next memory packets are coming from next as
>opposed to being wildly over successful at what its trying doing.

I don't grasp your meaning here, but perhaps the probability
of two inner loops competing for the same cache area in the
same time slice is so small with 256KB of L2 cache that an
increase to 512KB doesn't yield much of an improvement--the
CPU is already running at full speed most of the time because
it rarely has to wait for main memory.

--
Disclaimer: This isn't my goddamn planet, understand, monkeyboy?

JC

unread,

Jan 28, 1997, 3:00:00 AM1/28/97

to

Tim Behrendsen wrote:
>
> Jim Brooks <jbr...@swbell.net> wrote in article

> <32E95A...@swbell.net>...
> > Elmer Phud wrote:
> >
> > > [..] The point is (obviously) that there isn't enough
> > > fab capacity in the world to product the type of highly integerated
> > > CPU/memory highbryd that started this thread. Go ahead and argue how
> > > many angles can dance on the head of I pin. That wasn't a point I care
> > > to defend. [..]
> >
> > Really? Want some proof? Well, here it is, an excerpt from
> > http://www.news.com/News/Item/0,4,7324,00.html

> >
> > "Looking further into the future of TI's strategy, the
> > company expects to begin production in 1997 of a
> > single chip that has 125 million transistors on it,
> > which is approximately the number of transistors
> > found in 30 Pentium Pro processors. "
> >
> > With 125M transistors, an X86 core and 16MB SRAM can be put on a single
> die.
> > Finally, Moore's law has been broken. But, hell, laws are made to be
> broken,
> > right?
>
> Uh, except 16 meg is PUNY!! A standard PC is sold with at least
> 32 meg nowadays. 16 meg of memory is FORTY DOLLARS!
>

He said on a single die. That means 16MB of cache can be put on the
chip, not main memory. Right now the Pentium Pro has 256K on it(besides
the expensive 512k version).

Rob Nicholson

unread,

Jan 28, 1997, 3:00:00 AM1/28/97

to

> 66 mhz bus.). Intel also has the P166 and P200, both of which can be
> overclocked to as far as 225 mhz, and both of which provide solid 16 and 32

And Digital has the 500MHz Alpha....

Regards, Rob.

Rob Nicholson

unread,

Jan 28, 1997, 3:00:00 AM1/28/97

to

> all of your arguments are nice and all, but let's face it. iNTEL is
> unbeatable in the IBM PC market. If your company has to catch UP to intel,

Right now, yes you're right. But then again, what percentage of the code running
on
an NT machine is Intel specific? Maybe 5%? Therefore, with a simple (ha!)
re-compile on your Alpha or MIPS box, and voila, no reliance on Intel.

Okay, it's not there yet but in two years time, the choice of Intel CPU or bust
won't be as easy.

Regards, Rob.

Steve Hix

unread,

Jan 28, 1997, 3:00:00 AM1/28/97

to

In article 79...@utdallas.edu, Timothy Edwards <ti...@utdallas.edu> writes:
:Alicia Carla Longstreet wrote:
:>
:>
:> > : Who has the biggest dickhead on Usenet: Scott Nudds or Timothy Edwards?
:> > : Let's have a vote. I vote for Timbo.
:>
:> I vote for the disgusting low life who came up with the subject for this
:> thread.
:>
:
:That suprises me. I always believed that female programmers liked all
:sorts of dickheads, regardless of size, color, race, HIV polarity, and hygiene.
:
:Sorry, I just could not resist such a cheap shot ;-)
:
:--
:Timothy

Putting out extra effort to win the title, right, Tim?

Christian Bau

unread,

Jan 28, 1997, 3:00:00 AM1/28/97

to

In article <5c6sq8$av4$1...@falcon.ns.net>, #ep...@ns.net (Elmer Phud) wrote:

> Jim the PPro runs its cache at full CPU frequency. The hit rate is 90%.
> Adding 32 meg of Sram is only going after the remaining 10%. Assuming
> 32meg of sram as you stated earlier (which I seriuosly doubt for reasons
> I will state later) that's increasing the cache size by 128x for at best
> a 10% boost. Does that make sense? I don't think so...

It is true that when you add more cache, you only go after the 10% miss
rate. However, these ten percent take a lot of cycles. For example, if
your processor runs at 200 MHz, your memory is 70nS, and the processor
does two instructions per cycle, then every cache miss takes as long as
executing 28 instructions. If one in four instructions accesses memory,
and one in ten memory accesses is a cache miss, the relation is 40
instructions to 28 instructions waiting for memory.

That said, you are definitely right that 32 MB of cache would not give the
best value for money.

Steve Hix

unread,

Jan 28, 1997, 3:00:00 AM1/28/97

to

:Timothy Edwards wrote:
:"
:" Hey, Maynard, what's your problem? Did get panties get in a wad
:" after Apple posted a lost of $120M?

Not *gewat* news, but for a multi-billion company, not lethal.

:"Or maybe after you heard that Steve Jobs is back in charge now,

In charge of what, Apple? Not in this universe.

:"and rumor has it that
:" he is going to clean house and get rid of the dead weight at Apple
:" (oops, guess you don't have much job security there in Apple),

Better get your news.feed fixed, you're getting crosstalk from one or
more parallel universes, Tim.

Daniel Brooks

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

Isaac Jing Herng Leung (isa...@canucks.ee.ubc.ca) wrote:
: Jim Brooks (jbr...@swbell.net) wrote:
: : If you think about it, it is pretty dumb to put a 300Mhz CPU
: : in a system whose memory runs at 70ns (that speed was originally
: : intended for a 8Mhz IBM AT). I guess neither Intel nor Cyrix
: : have figure that one out yet.

: : But the X86 Consortium has. Guess what they plan to do?
: : Heck, they are going to do what is obvious, they are going
: : to put the CPU core and 8MB, 16MB, or 32MB SRAM all together
: : on one single die. Fuck the L2 cache, the memory now runs as
: : fast as the CPU. Yep, when you want to upgrade your system,
: : you upgrade to both a faster processor and more memory.

: Guess what? Any 2nd year engineer or CompSci student knows this basic
: principle. hence the existence of cache.
: This isn't some overnight revelation or great leap forward in technology.
: (Sorry to burst your bubble there).

: But anybody with a clue about semi-conductor manufacture knows that the
: problem is COST.
: If I had 32MB worth of CACHE (SRAM), that's faster than 32MB of DRAM.
: The same on board the CPU itself is even better.
: But the yields for sucha huge die would be low, low low. This
: means the cost would be astronomically high. (The Pentium is already a
: huge die itself). If you want to pay for this, this is an easy change. I'm
: guessing they could tape out within 3 months and have parts back within 6
: months if there was a market for this.

You could always build a 3D "chip" so that you stack multiple layers on
top of each other. this not only can reduce the distance to parts of the
ram even further, but allows you to build a bigger chip because you have
more area. a 3D "chip" with sides of 5 cm by 5cm by 5cm gives you 125
cm^3. if a normal 2D chip was .1cm thick, you could fit 250 layers in the
3D chip, giving you an area of roughly 6250cm^2! That would be like having
a chip that was 50cm by 125cm!

--
E-mail: sfkr...@scfn.thpl.lib.fl.us

David T. Wang

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

Daniel Brooks (sfkr...@scfn.thpl.lib.fl.us) wrote:
: Isaac Jing Herng Leung (isa...@canucks.ee.ubc.ca) wrote:

: : Guess what? Any 2nd year engineer or CompSci student knows this basic

: : principle. hence the existence of cache.
: : This isn't some overnight revelation or great leap forward in technology.
: : (Sorry to burst your bubble there).

: : But anybody with a clue about semi-conductor manufacture knows that the
: : problem is COST.
: : If I had 32MB worth of CACHE (SRAM), that's faster than 32MB of DRAM.
: : The same on board the CPU itself is even better.
: : But the yields for sucha huge die would be low, low low. This
: : means the cost would be astronomically high. (The Pentium is already a
: : huge die itself). If you want to pay for this, this is an easy change. I'm
: : guessing they could tape out within 3 months and have parts back within 6
: : months if there was a market for this.

: You could always build a 3D "chip" so that you stack multiple layers on
: top of each other. this not only can reduce the distance to parts of the
: ram even further, but allows you to build a bigger chip because you have
: more area. a 3D "chip" with sides of 5 cm by 5cm by 5cm gives you 125
: cm^3. if a normal 2D chip was .1cm thick, you could fit 250 layers in the
: 3D chip, giving you an area of roughly 6250cm^2! That would be like having
: a chip that was 50cm by 125cm!

1. Metal layer interconnections have large (relative) capacitances.
more layers, larger capcitance, slower.

2. Currently, an MMX pentium at 140 mm^2 @200 MHz, dissipates ~20W
on a single layer of transistors. with current technology, layering
makes almost no sense, as 250 layers X 20 W = 5000 W.

3. SOI may be a promising technology, but I haven't seen many things
mass produced on SOI.

: --
: E-mail: sfkr...@scfn.thpl.lib.fl.us

j.r. pierce [see .sig]

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

sfkr...@scfn.thpl.lib.fl.us (Daniel Brooks) wrote:

>You could always build a 3D "chip" so that you stack multiple layers on
>top of each other. this not only can reduce the distance to parts of the
>ram even further, but allows you to build a bigger chip because you have
>more area. a 3D "chip" with sides of 5 cm by 5cm by 5cm gives you 125
>cm^3. if a normal 2D chip was .1cm thick, you could fit 250 layers in the
>3D chip, giving you an area of roughly 6250cm^2! That would be like having
>a chip that was 50cm by 125cm!

And just what would you do with the heat generated by this silicon
sandwich? I caught a abstract on the ISSCC conference website of a
new DEC 600MHz alpha, 16 million transistors, that dissapates 72 WATTS
at 2V... Thats, um, 36 AMPS. This estimated 40SpecFP95 chip will take
a MAJOR amount of cooling... if you were to sandwich several times
this into the same surface area, you'd have enough heat to melt the
silicon into a little puddle the first time you switched it on.

Btw, a 3D chip has been a holy grail in semiconductor manufactoring
since about the beginning of time (or at least the 60's)... If you
knew anything about how chips were manufactured currently, and the
material physics behind that, you'd realize that we are a LONG ways
from this.

-jrp
--------------------------------------------------------------------
This posting has a invalid email address to discourage bulk emailers
if you need to mail me try "pierce at hogranch dot com"
--------------------------------------------------------------------

j.r. pierce [see .sig]

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

"Tim Behrendsen" <t...@a-sis.com> wrote:
>This is not that hard to believe. Bring to down to real terms; what
>is the code *doing* that would require all these giant loops?
>Programs are big nowadays because of features, not because of giant
>loops. If I don't use a feature, then it's not required to be in
>the cache.

Programs today ARE giant loops. Message Loops. For example,
virtually every MS-windows application looks something like...

while (GetMessage(...)
{
TranslateMessage(...)
DispatchMessage(...)
}

and in turn, a HUGE

WinProc(msg...)
{
switch(msg)
{
case WM_dsrreq:
dsrreqfunc(...);
break;
case WM_fdasfsaaw:

....
}

I'm sure macintosh applications or x-windows applications aren't
significantly different than this

Typically these case statements handle a few hundred messages, every
menu, dialog option, etc of a win app is a case. In most apps, the
case statements in turn are calling subroutines to process each
message. This is not the tight, localality of reference style of
coding that make caches work efficiently.

...

>> In real world situtations, over the course of a single second of
>> relevant operation, I would be surprised if the L2 cache retains 20%
>> of your inner loop code.
>
>Well, it's all guesswork on both our parts unless we can find some
>actual measurements to back up these claims.

There's a paper (that I haven't seen) by some guys at Dec who actually
measured cache hit rates on various Alpha server and workstation
configurations.. The abstract I've seen implied that they were quite
suprised at just how poor the hit ratio was in this day and age of
bloated C++ object oriented, message passing, structured programming,
applications.

>No, the diminishing returns come from the fact that more cache
>does not significantly increase the hit rate. The two extremes
>are no cache, where I get zero hits, and a 32 meg L2 cache for 32 meg
>of RAM, where I get 100% hits. In between is a 90%, 80%, etc point.
>The question is, where is the 80% point? I submit that it's much
>lower than you think.

Remember, 10% cache misses on a 200MHz processor is a LOT more
expensive than you might think. Again, using the intel example I'm
most familiar with... the internal L1 cache on a intel part can
deliver a 32bit word in one internal 200Mhz clock (5nS). A cache miss
will require (in the case of a EDO Dram memory system) something like
19 clocks at 66MHz or 288nS to fetch a 64 byte cache line (assuming
the typical 5-2-2-2-2-2-2-2 burst timing of a pentium EDO memory
subsystem). Thats about 57 cpu clocks! So if in fact 1 in 10 memory
accesses (thats your 10% number) were missing that L1 cache and going
straight to main memory, that 1 in 10 clocks would be taking 57
clocks! This is a extreme example, I left out the L2 cache since
NOONE has been able to tell me just how fast the P6's L2 cache really
is... On a p54 or p55 system, the pipeline burst external L2 cache
has 3-1-1-1-1-1-1-1 timings or 10 clocks at 66MHz, 150nS or about 30
internal CPU clocks at 200MHz to fill a 64byte l1 cache line.

Grant Schoep

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

> You could always build a 3D "chip" so that you stack multiple layers on
> top of each other. this not only can reduce the distance to parts of the
> ram even further, but allows you to build a bigger chip because you have
> more area. a 3D "chip" with sides of 5 cm by 5cm by 5cm gives you 125
> cm^3. if a normal 2D chip was .1cm thick, you could fit 250 layers in the
> 3D chip, giving you an area of roughly 6250cm^2! That would be like having
> a chip that was 50cm by 125cm!

Great idea, but(don't you hate these), heat dissapation would be
terrible, thats why chips are so thin now. But I got to thinking, a 3d
chip, someone correct me if I am wrong, but aren't the die's for chips
and processors all layers, like one layer stacked on another. I wonder
if there would be a away to design it so it wasn't just layers but more
3d, like pathways could go, up down, left right, whatever. THough maybe
they do this, probably. I'm no engineer.

Kristian Thommesen

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

Grant Schoep wrote:
>
> > You could always build a 3D "chip" so that you stack multiple layers on
> > top of each other. this not only can reduce the distance to parts of the
> > ram even further, but allows you to build a bigger chip because you have
> > more area. a 3D "chip" with sides of 5 cm by 5cm by 5cm gives you 125
> > cm^3. if a normal 2D chip was .1cm thick, you could fit 250 layers in the
> > 3D chip, giving you an area of roughly 6250cm^2! That would be like having
> > a chip that was 50cm by 125cm!

Except that the _die_ is not nearly as large as 5cm - more like 1cm on
each side.

> Great idea, but(don't you hate these), heat dissapation would be
> terrible, thats why chips are so thin now. But I got to thinking, a 3d
> chip, someone correct me if I am wrong, but aren't the die's for chips
> and processors all layers, like one layer stacked on another. I wonder
> if there would be a away to design it so it wasn't just layers but more
> 3d, like pathways could go, up down, left right, whatever. THough maybe
> they do this, probably. I'm no engineer.

Just designing such a thing would probably be almost impossible - and
_manufacturing_ such a babe would not be possible before we have
nanotech to do it. 3D memory on the other hand might not be so long down
the road. I've heard of experiments with some light-sensitive organic
cells fixed in some gelly, that could be used to store information -
using intersecting lasers to select the location in the gelly. Really
cool consept - now if we could also use organic material for processors
too.

--
Kristian Thommesen, science student at the University of Aarhus.
www: http://www.daimi.aau.dk/~u952534/
email: mailto:u95...@daimi.aau.dk

Tim Behrendsen

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

j.r. pierce [see .sig] <anti...@here.not> wrote in article
<32f0fe16...@news.scruznet.com>...

> "Tim Behrendsen" <t...@a-sis.com> wrote:
> >This is not that hard to believe. Bring to down to real terms; what
> >is the code *doing* that would require all these giant loops?
> >Programs are big nowadays because of features, not because of giant
> >loops. If I don't use a feature, then it's not required to be in
> >the cache.
>
> Programs today ARE giant loops. Message Loops. For example,
> virtually every MS-windows application looks something like...

>[snip]

> I'm sure macintosh applications or x-windows applications aren't
> significantly different than this
>
> Typically these case statements handle a few hundred messages, every
> menu, dialog option, etc of a win app is a case. In most apps, the
> case statements in turn are calling subroutines to process each
> message. This is not the tight, localality of reference style of
> coding that make caches work efficiently.

I agree that loops are bigger than they used to be, but 256K or
512K big? Those numbers are getting smaller all the time, but
they're still pretty good sized. :-)

> >Well, it's all guesswork on both our parts unless we can find some
> >actual measurements to back up these claims.
>
> There's a paper (that I haven't seen) by some guys at Dec who actually
> measured cache hit rates on various Alpha server and workstation
> configurations.. The abstract I've seen implied that they were quite
> suprised at just how poor the hit ratio was in this day and age of
> bloated C++ object oriented, message passing, structured programming,
> applications.

I'm not sure if the Alpha is directly applicable to the Intel case,
since a 64 bit architecture may naturally have larger memory
requirements.

> >No, the diminishing returns come from the fact that more cache
> >does not significantly increase the hit rate. The two extremes
> >are no cache, where I get zero hits, and a 32 meg L2 cache for 32 meg
> >of RAM, where I get 100% hits. In between is a 90%, 80%, etc point.
> >The question is, where is the 80% point? I submit that it's much
> >lower than you think.
>
> Remember, 10% cache misses on a 200MHz processor is a LOT more
> expensive than you might think. Again, using the intel example I'm
> most familiar with... the internal L1 cache on a intel part can
> deliver a 32bit word in one internal 200Mhz clock (5nS). A cache miss
> will require (in the case of a EDO Dram memory system) something like
> 19 clocks at 66MHz or 288nS to fetch a 64 byte cache line (assuming
> the typical 5-2-2-2-2-2-2-2 burst timing of a pentium EDO memory
> subsystem). Thats about 57 cpu clocks! So if in fact 1 in 10 memory
> accesses (thats your 10% number) were missing that L1 cache and going
> straight to main memory, that 1 in 10 clocks would be taking 57
> clocks! This is a extreme example, I left out the L2 cache since
> NOONE has been able to tell me just how fast the P6's L2 cache really
> is... On a p54 or p55 system, the pipeline burst external L2 cache
> has 3-1-1-1-1-1-1-1 timings or 10 clocks at 66MHz, 150nS or about 30
> internal CPU clocks at 200MHz to fill a 64byte l1 cache line.

It would be interesting to see a study on all this.

--

Scott Nudds

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

(Daniel Brooks) wrote:
: You could always build a 3D "chip" so that you stack multiple layers on
: top of each other. this not only can reduce the distance to parts of the
: ram even further, but allows you to build a bigger chip because you have
: more area. a 3D "chip" with sides of 5 cm by 5cm by 5cm gives you 125
: cm^3. if a normal 2D chip was .1cm thick, you could fit 250 layers in the
: 3D chip, giving you an area of roughly 6250cm^2! That would be like having
: a chip that was 50cm by 125cm!

True, but you are going to have a big problem with heating. With
single layer chips consuming say 10 watts, a 250 layer chip of the same
type is going to consume 2500 watts!

If concentrated in cube 1cm to a side, the temperature will rise to
2,655'C.

Point being that cooling requirements are such that only a small
number of layers can be stacked without resorting to extreme designs
that provide forced - non-radiative - cooling.

--
<---->

Scott Nudds

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

(Christian Bau) wrote:
: It is true that when you add more cache, you only go after the 10% miss

: rate. However, these ten percent take a lot of cycles.

Divide the cache in two and distribute it to two CPU's operating in
parallel.

--
<---->

Brian Case

unread,

Jan 29, 1997, 3:00:00 AM1/29/97

to

Robert Collins wrote:

> Please educate me regarding TI's 125-million transistor chip. Let's
> see, if a 16 M-bit DRAM has 5 transistors per bit of memory storage,
> that's equal to (a minimum of) 80-million transistors. Using the same
> scale, a 64 M DRAM has at least 320-million transistors.

Um, a 16-M DRAM, like all DRAM, has one transistor per bit of memory
(and a big capacitor too). Actually, the average is slightly over
one transistor per bit since there are a few thousand transistors
for things like sense amps and I/O receivers and drivers (and decoders,
etc.).

Where did you get five transistors per bit?

Chris

unread,

Jan 30, 1997, 3:00:00 AM1/30/97

to

Rob Nicholson <rob.ni...@zetnet.co.uk> wrote in article
<VA.000003c...@zetnet.co.uk>...

MIPS dropped NT support, PPC dropped NT support, Alpha is the last one.

Mark S. Bilk

unread,

Jan 30, 1997, 3:00:00 AM1/30/97

to

In article <32f0fe16...@news.scruznet.com>, j.r. pierce [see .sig] <anti...@here.not> wrote:
>"Tim Behrendsen" <t...@a-sis.com> wrote:
>>This is not that hard to believe. Bring to down to real terms; what
>>is the code *doing* that would require all these giant loops?
>>Programs are big nowadays because of features, not because of giant
>>loops. If I don't use a feature, then it's not required to be in
>>the cache.

>Programs today ARE giant loops. Message Loops. For example,
>virtually every MS-windows application looks something like...
>

> while (GetMessage(...)
> {
> TranslateMessage(...)
> DispatchMessage(...)
> }
>
>and in turn, a HUGE
>
> WinProc(msg...)
> {
> switch(msg)
> {
> case WM_dsrreq:
> dsrreqfunc(...);
> break;
> case WM_fdasfsaaw:
>
> ....
> }
>

>I'm sure macintosh applications or x-windows applications aren't
>significantly different than this
>
>Typically these case statements handle a few hundred messages, every
>menu, dialog option, etc of a win app is a case. In most apps, the
>case statements in turn are calling subroutines to process each
>message. This is not the tight, localality of reference style of
>coding that make caches work efficiently.

This is a (big) loop, but it is not a high-speed inner loop.
It goes around once every time a message comes in, and these
come from key press/release, mouse movement, etc. The only
frequent messages are from the mouse, and they occur no more
often than (estimating from the fleeting mouse images on
my screen) every 5 msec., or once every 500,000 cycles for
a 100 MHz. processor.

As a sometimes electronics engineer who used to program
minicomputers that required about 16 microseconds to execute
an instruction, I find it helpful to remember that today's
processors have instruction execution rates in the FM (VHF)
band! Two thousand times faster!

Anyhow, my tired old 20 MHz. '386 has no trouble keeping up
with mouse movement messages, and the current pentiums and
hexiums are at least 20 times as fast.

Thus, the message dispatch loop does not have to reside in
the cache. It absorbs a negligible fraction of the processor
power.

The actual *inner* loops are found inside the subroutines
that are called in the message loop's switch cases, e.g.,
"dsrreqfunc(...)" above.

I think there is a simple criterion that tells if a loop
will execute at nearly full speed (i.e., will be adequately
cached). Define a "burst" as a contiguous (in time) series
of loop executions. Then if:

SPI = (CPU) seconds per instruction (about 1.0E-08)
IPL = instructions per loop
LPB = loops (repetitions) per burst
CRP = cache reload penalty = cache speed / RAM speed

We have:

1. If the loop remains in the cache between bursts, it
will execute efficiently.

2. If the loop does not remain in the cache between bursts,
and so has to be reloaded for each burst, then:

execution efficiency averaged over each burst =

= fully cached execution time / actual execution time

= (SPI * IPL * LPB) / ( (SPI * IPL * CRP) + (SPI * IPL * (LPB-1)) )
[1st loop cycle] [remaining cycles]

= LPB / ( CRP + (LPB-1) )

= 1 / ( (CRP / LPB) + (1 - 1 / LPB) )

> 1 / ( (CRP / LPB) + 1 )

In the second case, if L1 cache runs at 4 bytes/10 nsec.
and RAM runs at 8 bytes/100 nsec., then CRP = 5.
So if the loop executes 10 times in each burst, the
efficiency is at least 66%. 30 times per burst, 86%.

In other words, as intuition tells us: if the number of loop
executions is at least several times greater than the ratio
of cache speed to RAM speed, then the initial long duration
of the first loop execution (which runs at RAM speed) is
swamped out by the duration of all the rest of the executions
(which run at full cache speed).

With an average cache to RAM speed ratio of about 5, most
loop execution bursts, which traverse the loop at least
20 times, will run at nearly full speed.

Kristian Thommesen

unread,

Jan 30, 1997, 3:00:00 AM1/30/97

to

I think a high number of channels with liquid nitrogen circulating
through the core would be sufficient to cool that son of a bitch ;->

Paul Hsieh

unread,

Jan 30, 1997, 3:00:00 AM1/30/97

to

m...@netcom.com said:
> But why does it matter if the inner loops are *retained* in
> the cache over a single second?
>
> Given:
>
> o A time slice is 1/18 sec. (one PC timer tick).

Windows reprograms the tick rate to go much higher than 1/18th of
a second. Furthermore, the code and data they use to switch the
tasks themselves eat into the precious 256K linearly, since each
VM clearly requires a different context.

> o The main memory bandwidth is 80 MB/sec. (64 bits every
> 100 ns., which is pessimistic because it doesn't take
> into account rapid page-mode access).

80 mb/sec? That's ridiculously slow, SDRAMS have a much higher
throughput than that.

> Then in 1/8 of a time slice (1/144 sec.), over 500 KB of
> inner loop(s) can be loaded into the L2 cache, and can
> run there at full speed (at least if each individual loop
> sequentially fits into the L1 cache) for the remaining 7/8
> of the time slice.

I don't get your reasoning. Yes, memory can move around faster
than a typical time slice, but that doesn't make it fast. Today's
world of multimedia is centered around dealing with huge amounts of
data in ways that closely approximate real time (MPEG playback is
the most typical example of this.)

> >Like what kind of a small fraction? Like less than 0.4%? I am not
> >missing the point. The cache is only good for small jobs and for
> >sustained simple inner loops of singular tasks (like stupid benchmarks.)
>
> Since the caches can reload in a tiny fraction of a time
> slice, the entire 256 or 512 KB L2 cache is available to
> efficiently hold the working set for each task. This allows
> for a huge working set, which can also change totally 20
> times a second without degrading performance significantly.

The task switcher eats into the "working set" and a single task is
not necessarily going to be satisfied with a "single process". Think
about it, by taking a step back. If you limit all your algorithms
to code that can stay in the L2 cache nearly all the time, you will
find that that you run out of space for flexibility real fast.

Software that deals with multimedia, or very complex algorithms that
manipulate complicated/large data (like just about any object oriented
program, that is flexing its muscles) break these barriers easily.
Nowhere in anyone's C compiler does it restrict programs in their
"working set".

> The exceptions to this calculation occur when two or more
> inner loops are simultaneously executing (during the same
> time slice) and are mapping to the same sections of cache.
> An example might be a graphics application and the graphics
> driver that it is calling.

Well, at least you demonstrate *some* imagination. Graphics are but
*one* task that a typical application would desire high transfer rates
of large amounts of data to.

> A larger cache will minimize the probability of this happen-
> ing.

This has not been demonstrated in tests I have seen. A larger cache
yield asymptotically low performance improvements beyond 256K. Size is
not the issue (unless you could make 32MB or something, but that would
be silly.) Hint: Graphic memory that is being accessed by an external
device (such as a graphics accelerator) cannot be directly cached.

> >> In any case, 512K caches are not uncommon, but as near as I can
> >> recall, you start getting very diminishing returns beyond this.
> >
> >Yes, the diminishing returns comes from the fact that its impossible
> >to predict where the next memory packets are coming from next as
> >opposed to being wildly over successful at what its trying doing.
>
> I don't grasp your meaning here, but perhaps the probability
> of two inner loops competing for the same cache area in the
> same time slice is so small with 256KB of L2 cache that an
> increase to 512KB doesn't yield much of an improvement--the
> CPU is already running at full speed most of the time because
> it rarely has to wait for main memory.

Right.

--
Paul Hsieh
qed "at" chromatic "dot" com
http://www.geocities.com/SiliconValley/9498
Graphics Programmer
Chromatic Research

What I say and what my company says are not always the same thing

Paul Hsieh

unread,

Jan 30, 1997, 3:00:00 AM1/30/97

to

t...@a-sis.com said:
> Paul Hsieh <q...@chromatic.com> wrote in article

> > t...@a-sis.com says...
> > > Paul Hsieh wrote:
> > > > Your problem is that you fundamentally dont understand what is wrong
> > > > with relying on the "cache" to give you fast memory. [...] if
> > > > you believe old Win 3.x/DOS based statistics that say your code hangs
> > > > around the L2 cache for 90% of its data acesses, then you are wearing
> > > > blinders!
> > >
> > > So what *is* the number then, for a typical Win/95 machine? I really
> > > doubt that it's below 80%.
> >
> > 80%?? You're out of your nut. I have a 32MB system, I have 5
> > applications loaded (thats pretty low for me) and Windows is using 50.9MB

> > out of 70.1MB of virtual memory [...]
>
> So what? I could have 1000 applications loaded, but I only use one
> at a time. (Unless the others are doing background crap, of course).

Well, to further your argument, you are mostly idle even in the one
application you are using aren't you? And what is the reason your
sitting so idle for so much of the time? It is simple. You are under-
utilizing your PC. Crays and Apple ]['s have remarkably similar
performance characteristics when the machine is idle.

The only time the average person cares about the performance of their
machine is when it is going slow. Something happening in 1/100th of a
second versus something happening in 1/10000th of a second goes completely
unnoticed by the user. The question is really what happens when things
are going slowly. Measuring the performance of your machine when you are
under utilizing it is obviously pointless.

The modern trend of applications these days that want to give the illusion
that *their* application runs fast, will appeal to obvious trade offs of
memory for performance (hash tables, etc.) and look ahead/shadowed operations
(spell checking in the background, disk compressor, read ahead web page
browser, modem compression/decompression as well as error correction protocols,
etc.)

This means that your machine will be standing idle, less and less, and the
reason is because "out of context" applications will want to steal cycles
and inherently cover a wider memory usage. Intel, Cyrrix, AMD, etc can deal
with (and have dealt pretty well with) the CPU with ever increasing clock
rates and general CPU architecture, but the memory architecture has been the
black sheep of the PC internals. The PCI BUS has stood still at 33MHz for
the past three years, the internal memory BUS is not much better at 66Mhz
and the Triton chipset crawls along barely improving from generation to
generation at all.

> > Two of the Windows are Netscape (a bizarrely popular application), one is
> > an empty 50 line DOS Box, one is my newsreader and one is the reply
> > window I am writing this message in. (There are obviously other tasks
> > running in the system including a memory monitor program which does not
> > have a task bar entry.)
>
> Yes, and as you type in your reply window, I would say it's highly
> likely that the mechanisms involved are sitting in your L2 cache.

Well, yes, but you don't see me downgrading my PC to a 386, because I can
type in my reply windows at exactly the same speed do you? I don't consider
how quickly my computer performs when I don't ask it to do anything! I'd
say if you really truly believe what you are saying why don't you trade in
you Pentium for a 386?!?!!

(Now, how did I know you had a Pentium?)

> > Now, if even I *were* hitting a rate of 80% hits in the L2 cache that
> > would imply that 0.4% of all the code currently resident in memory is
> > actually active at once.
>

> This is not that hard to believe.

It is for me. Anyone who has done and serious debugging of system level
applications (such as drivers) with a debugger like SOFT-ICE for Windows
knows that its real easy to chew through 256K of L2 cache.

> [...] Bring to down to real terms; what is the code *doing* that would

> require all these giant loops?

Its not always giant loops that are your enemy, but rather giant data-
types. Think Object Oriented, think 32 bit. Why do you think programmers
have embraces these things on the PC? Because it allows them to be memory
chewing pigs!!

> Programs are big nowadays because of features, not because of giant
> loops.

So tell me, when do you expect the end of the software industries obsession
with "features"? If these features are sitting inside the loops, then
there's not much we're going to about that loop's performance, now is
there?

> If I don't use a feature, then it's not required to be in the cache.

Similarly, if you aren't using a feature, you are under utilizing your PC.
Remember, that the entire raison de etre for the software industry it to
grab your attention while you are using their product, and that means
making you use their ever complicated and growing feature list, and that
means eating more resources to activate them ... and sooner or later that
means eating more CPU horsepower, then more MEMORY!!!

> > And if the machine is idle (that is if I am
> > not doing anything and hence dont care how fast my machine is going)
> > then I could believe such a statistic. But I am obviously more
> > interested in cases where my PC is very active. This would mean more

> > active Windows, more dynamic applications (such as Active X, a video

> > game, a compile, or some other such nonsense) using even more memory
> > and more percentage of it for inner loops.
>

> Again, the key is locality. I agree that there are probably
> times that the cache doesn't help a lot, but I think you
> drastically overestimate the amount of uncached code that is
> executed on a statistical basis.

On a statistical basis, the machine is idle you silly!

Improving systemic performance means improving performance only when
your machine is *NOT* idle.

> > In real world situtations, over the course of a single second of
> > relevant operation, I would be surprised if the L2 cache retains 20%
> > of your inner loop code.
>

> Well, it's all guesswork on both our parts unless we can find some
> actual measurements to back up these claims.

I don't have precise measurements, but my work tracing through Windows
(oh boy what a joy that is) backs up my conclusion.

> > > In any case, 512K caches are not uncommon, but as near as I can
> > > recall, you start getting very diminishing returns beyond this.
> >
> > Yes, the diminishing returns comes from the fact that its impossible
> > to predict where the next memory packets are coming from next as
> > opposed to being wildly over successful at what its trying doing.
>

> No, the diminishing returns come from the fact that more cache
> does not significantly increase the hit rate. The two extremes
> are no cache, where I get zero hits, and a 32 meg L2 cache for 32 meg
> of RAM, where I get 100% hits. In between is a 90%, 80%, etc point.
> The question is, where is the 80% point? I submit that it's much
> lower than you think.

But as you admit above, you have no data. I do. 80% memory hits would
be *very* difficult to achieve. I'd guess 8MB or more on a typical 64MB
of virtual memory usage on a 32 MB system. That is to say, it is not
practically feasible.

> > > > http://www.mpact.com/
> > >
> > > I think this is *exactly* the solution.
> >
> > Glad you think so.
>
> Just make sure that it's 50 bucks, and you will be successful. :-)

We'll worry about making it successful. We'll let you you mull over how
you are going to take advantage of the fact that the MPACT may well be a
very prevalent piece of hardware in the very near future.

> > > [...] The real solution is to
> > > have custom chips to perform low level grunt work based on high
> > > level protocols. The new generation of 3D graphics cards are a
> > > perfect example of this.
> >
> > Uh ... well I don't exactly support your reasoning, but your
> > conclusion is correct. The x86/PCI architecture just isn't well
> > suited for many tasks, such as multimedia, and it doesnt look like
> > Intel will be dealing with their short comings any time soon. So
> > that leaves open the fruitful ASIC/DSP/Coprocessor market for
> > assisting the poor over burdoned x86.
>
> Well, we can argue about cache hite rates all day long, but even
> a 100% hit rate is not going to solve the general question of the
> fact that graphics are a helluva lot of data passing through the
> CPU.

Uh .. yes. Buts graphics are not the end of the story. (Got a couple
of 5 channel SRS encoded audio you need to deal with? Video
Teleconferencing? ... its coming to a PC near you, and the poor x86
ain't going to help you nearly as much as Intel its trying to make you
think it is ...)

JeffMo

unread,

Jan 30, 1997, 3:00:00 AM1/30/97

to

Kristian Thommesen <u95...@daimi.aau.dk> wrote:

>Scott Nudds wrote:
>>
>> (Daniel Brooks) wrote:
>> : You could always build a 3D "chip" so that you stack multiple layers on
>> : top of each other. this not only can reduce the distance to parts of the
>> : ram even further, but allows you to build a bigger chip because you have
>> : more area. a 3D "chip" with sides of 5 cm by 5cm by 5cm gives you 125
>> : cm^3. if a normal 2D chip was .1cm thick, you could fit 250 layers in the
>> : 3D chip, giving you an area of roughly 6250cm^2! That would be like having
>> : a chip that was 50cm by 125cm!
>>
>> True, but you are going to have a big problem with heating. With
>> single layer chips consuming say 10 watts, a 250 layer chip of the same
>> type is going to consume 2500 watts!
>>
>> If concentrated in cube 1cm to a side, the temperature will rise to
>> 2,655'C.
>>
>> Point being that cooling requirements are such that only a small
>> number of layers can be stacked without resorting to extreme designs
>> that provide forced - non-radiative - cooling.

>I think a high number of channels with liquid nitrogen circulating
>through the core would be sufficient to cool that son of a bitch ;->

Wonder what happens to the chip when you lose power to the nitrogen
system? 8^)

JeffMo

"A valid argument is not formed solely by ignorance." -JeffMo
"A valid argument is not formed solely by assertion." -JeffMo

Religion : Science :: Methamphetamine : Exercise

For email replies, remove the "dipstick." from my eddress.
It should be self-evident that I am not a dipstick. ;-)