Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

DEC/Intel suit

5 views
Skip to first unread message

Al Germaine

unread,
May 13, 1997, 3:00:00 AM5/13/97
to

n...@wam.umd.edu (Hydrochloric Asad) wrote:


>Have the patent numbers been given out yet on the DEC suit? I'm trying to
>look them up at the IBM Patent server site...

>BTW I'll be surprised if Intel did purposely violated the patent. I used
>to work at Intel and the first thing done on our project was to get the
>patents issues cleared up...

It was probably a matter of interpretation. I don't think they
would knowingly violate a defendable patent owned by a party capable
of defending its interests. It's likely they felt that existing
patents did not cover what they wanted to do. It's certainly a matter
of interpretation now that it's headed for court.

ATG

Hydrochloric Asad

unread,
May 13, 1997, 3:00:00 AM5/13/97
to

Have the patent numbers been given out yet on the DEC suit? I'm trying to
look them up at the IBM Patent server site...

BTW I'll be surprised if Intel did purposely violated the patent. I used
to work at Intel and the first thing done on our project was to get the
patents issues cleared up...

-Bobby

Michael Pettengill

unread,
May 13, 1997, 3:00:00 AM5/13/97
to

Al Germaine wrote:
>
> It was probably a matter of interpretation. I don't think they
> would knowingly violate a defendable patent owned by a party capable
> of defending its interests. It's likely they felt that existing
> patents did not cover what they wanted to do. It's certainly a matter
> of interpretation now that it's headed for court.
>
> ATG

See:

http://www.techweb.com/investor/newsroom/tinews/may/0513dec.html
http://www.techweb.com/investor/newsroom/tinews/may/0513palmer.html

In particular, [from the Wall Street Journal]
the August 26, 1996 Corporate Focus. The title is
"Intel Shifts Its Focus To Long-Term Original
Research." Under that it says: "Microprocessor-maker
forms special team as 'there's nothing left to copy.'"


--
Michael Pettengill NIO/B2 Salem, NH
Digital Cluster Buster - Devos make 'em; we break 'em.
[DEC:.nio.]cvg::pettengill pette...@cvg.enet.dec.com

David T. Wang

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

Michael Pettengill (PETTE...@cvg.enet.dec.com) wrote:

: Al Germaine wrote:
: >
: > It was probably a matter of interpretation. I don't think they
: > would knowingly violate a defendable patent owned by a party capable
: > of defending its interests. It's likely they felt that existing
: > patents did not cover what they wanted to do. It's certainly a matter
: > of interpretation now that it's headed for court.
: >
: > ATG

: See:

: http://www.techweb.com/investor/newsroom/tinews/may/0513dec.html
: http://www.techweb.com/investor/newsroom/tinews/may/0513palmer.html

64 byte microprocessor?

: In particular, [from the Wall Street Journal]


: the August 26, 1996 Corporate Focus. The title is
: "Intel Shifts Its Focus To Long-Term Original
: Research." Under that it says: "Microprocessor-maker
: forms special team as 'there's nothing left to copy.'"

The Wall Street Journal is useless. What would be useful, really useful,
are two things.

1. Pointers to the DEC Patents.

2. Pointers to the specific architectural features that Intel supposedly
copied in the Pentium/Pentium Pro/Pentium II.

Are there anyone else that is "violating" DEC Patents? i.e. these
archtectural features, covered by DEC patents are found in Intel's
processors, but not in AMD's K5, K6, Cyrix's M1, M2, Exponentials' X704,
Motorola, IBM, HP, and MIPS et al?

I think that given these facts, then people can make up their own minds
for themselves. Without this, hardly anyone will buy the "Pentium Pro
is similar to Alpha" argument, architecturally they are quite different,
but I think that DEC is arguing about specific architectural
implementations that they hold a patent over, lets see the patents and
the architectural features that violate these patents.

: --

Frankie Teo

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

True, more details would make it easier for people to decide what the
merits are according to DEC's claims. I do think DEC is not stupid in
filing this suit. They MUST have strong grounds on their patents to be able
to legally prove their case. If you see the number of patents DEC has filed
in terms of MPU technologies, you will probably know why.
Who has more to lose in this case ?. Will DEC suffer if they lost ?. Will
they really abandon Intel CPUs if they lost ?.

As we get more info, some of these questions will be clearer. But Round
One, loser is Intel as below.
**From PCweek Online **
"Intel's stock fell about $7 toward the end of the day, knocking its market
capitalization down by $6.3 billion. "

Robert Rodgers

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

"Frankie Teo" <fran...@netvigator.com> wrote:
>True, more details would make it easier for people to decide what the
>merits are according to DEC's claims. I do think DEC is not stupid in
>filing this suit.

Personally, I think digital's gone around the bend. But since people
want reasons, a friend suggested that Dec wants to cajole Intel into
letting them use the Pentium trademark to market fx32 and 21164pc
machines.

Anyone have any speculation as to the motives? "Money" isn't enough,
unless there's been an IQ drain at Digital HQ, because there wont be a
payoff this decade.

Del Cecchi

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

In article <5lbfil$8o4$1...@hecate.umd.edu>, no_...@Glue.umd.edu (David T. Wang) writes:

[previous quotes snipped]


|>
|> The Wall Street Journal is useless. What would be useful, really useful,
|> are two things.
|>
|> 1. Pointers to the DEC Patents.
|>
|> 2. Pointers to the specific architectural features that Intel supposedly
|> copied in the Pentium/Pentium Pro/Pentium II.
|>
|> Are there anyone else that is "violating" DEC Patents? i.e. these
|> archtectural features, covered by DEC patents are found in Intel's
|> processors, but not in AMD's K5, K6, Cyrix's M1, M2, Exponentials' X704,
|> Motorola, IBM, HP, and MIPS et al?
|>
|> I think that given these facts, then people can make up their own minds
|> for themselves. Without this, hardly anyone will buy the "Pentium Pro
|> is similar to Alpha" argument, architecturally they are quite different,
|> but I think that DEC is arguing about specific architectural
|> implementations that they hold a patent over, lets see the patents and
|> the architectural features that violate these patents.

Actually, many of the companies you list are probably cross licensed with DEC. For
example, I am pretty sure IBM is. After all, IBM has some patents of its own that DEC is
likely to want to use. :-) Likewise, probably IBM has cross licensing agreements with Intel.

Apparently, for whatever reason, Intel doesn't have a license to use the subject patents.
And isn't this the same Intel that claimed ownership of the number 86? :-)

Of course, we don't know what the patents are, whether they are valid, or whether Intel
infringed on them. It would be interesting to get the patent numbers. I think it is
unlikely that people will be able to correctly "make up their minds", unless these people
are patent lawyers. It can be pretty tricky to read a patent and determine infringement, to
say nothing of validity.

My guess is that the patents, in addition to the cache structure, are related to the out of
order execution which is used in alpha and is new to Intel processors with P6.

--

Del Cecchi
Personal Opinions Only
asdfghjkl;'

--

Del Cecchi

Robert Harley

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

kn...@acm.org writes:
>Anyone have any speculation as to the motives? "Money" isn't enough,
>unless there's been an IQ drain at Digital HQ, because there wont be a
>payoff this decade.

My take is that they intend to throw a spanner in the works for the
HP-Intel development of the next generation x86 chip. I got the
impression recently that DEC is very concerned that about it competing
head-on with the 21264, which presumably would mean the end of DEC's
huge margins on its high-end machines.

I just saw on comp.sys.dec that Terry Shannon said "Instead, the
long-term objective is to send the ia64 design team back to the
drawing board!" If he says it, it must be true!

--Rob.
.-. Robert...@inria.fr .-.
/ \ .-. .-. / \
/ \ / \ .-. _ .-. / \ / \
/ \ / \ / \ / \ / \ / \ / \
/ \ / \ / `-' `-' \ / \ / \
\ / `-' `-' \ /
`-' Linux - 500MHz Alpha - 256MB SDRAM `-'

Jim Hull

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

Hydrochloric Asad (n...@wam.umd.edu) wrote:

> Have the patent numbers been given out yet on the DEC suit? I'm trying to
> look them up at the IBM Patent server site...

Check out:

http://www.mercurycenter.com/business/dec/suit051397.htm

This contains the full text of DEC's suit, including (of course) the
relevant patent numbers.

--
Jim Hull

"If so strong the force in Yoda is, construct a sentence with words in
the proper order then why can't he?" -- Teng-Kiat Lee

Patrick Chase

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

In article <5lcdqv$19mk$1...@news.rchland.ibm.com>, cec...@signa.rchland.ibm.com (Del Cecchi) writes:
|> My guess is that the patents, in addition to the cache structure, are
|> related to the out of order execution which is used in alpha and is new
|> to Intel processors with P6.

All current Alphas (21064 and 21164 generations) are in-order machines. The
upcoming 21264 does out-of-order execution, but the P6 predates it by a couple
years.

----------------------------------------------------------------------------
Patrick Chase Not speaking foe Hewlett-Packard...
H-P San Diego

Paul Ayers

unread,
May 14, 1997, 3:00:00 AM5/14/97
to
^^^^^^^^^^^^

Rob,

I've been thinking of running Linux with a new Alpha chip. Would you
recommend this? How's it working for you? Thanks in advance!

Wolfe

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

Robert Rodgers wrote:
>
> "Frankie Teo" <fran...@netvigator.com> wrote:
> >True, more details would make it easier for people to decide what the
> >merits are according to DEC's claims. I do think DEC is not stupid in
> >filing this suit.
>
> Personally, I think digital's gone around the bend. But since people
> want reasons, a friend suggested that Dec wants to cajole Intel into
> letting them use the Pentium trademark to market fx32 and 21164pc
> machines.
>
> Anyone have any speculation as to the motives? "Money" isn't enough,
> unless there's been an IQ drain at Digital HQ, because there wont be a
> payoff this decade.

Actually, it might be the other way around. See theis news.com article:

http://www.news.com/News/Item/0,4,10668,00.html

If Intel has to license the technology from Digital the way AMD and
Cyrix have to license MMX, DEC could stand to gain a ton of cash. But of
course, Intel will fight this tooth and nail. They're not in the habit
of just giving money away.

- Chris

--
To reply by email, replace 'nospam' with 'erols'.

Paul A. Jacobi

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

In article <5lbfil$8o4$1...@hecate.umd.edu>, no_...@Glue.umd.edu says...

>1. Pointers to the DEC Patents.
>
>2. Pointers to the specific architectural features that Intel supposedly
>copied in the Pentium/Pentium Pro/Pentium II.
>

Full text of lawsuit is available at the following URL, including specific
patent numbers in question.

http://www.boston.com/globe/eco/14dectext.htm

+---------------------------------------------------------------------------+
| Paul A. Jacobi Phone: (603) 881-1948 |
| Digital Equipment Corporation FAX : (603) 881-0189 |
| OpenVMS Systems Group, ZKO3-4/W23 Email: jac...@star.enet.dec.com |
| 110 Spitbrook Road |
| Nashua, NH 03062-2698 |
| |
| Anti-spam enabled! To reply, remove "nospam-" from the return address. |
+---------------------------------------------------------------------------+

Tilman Bohn

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

Frankie Teo wrote:
>
> True, more details would make it easier for people to decide what the
> merits are according to DEC's claims.

Here you go, this is the complete text:
http://www.sjmercury.com/business/dec/suit051397.htm

For the impatient who want to look up the relevant patents,
here's an extract:

[...]
IV. THE PERTINENT FACTS

A. The Digital Patents in Suit

6.On July 5, 1988, United States Patent No. 4,755,936 titled ``Apparatus And
Method For Providing A Cache Memory Unit With A Write Operation Utilizing
Two System Clock Cycles'' was duly and legally issued to Digital on an application
filed by Robert E. Stewart, Barry J. Flahive and James B. Keller. Since that date,
Digital has been and still is the owner of all right and title to this patent, including the
right to recover for infringement.

7. On July 11, 1989, United States Patent No. 4,847,804 titled ``Apparatus And
Method For Data Copy Consistency In A Multi-Cache Data Processing Unit'' was
duly and legally issued to Digital on an application filed by Stephen J. Shaffer and
Richard A. Warren. Since that date, Digital has been and still is the owner of all right
and title to this patent, including the right to recover for infringement.

8. On February 25, 1992, United States Patent No. 5,091,845 titled ``System For
Controlling The Storage Of Information In A Cache Memory,'' was duly and legally
issued to Digital on an application filed by Paul I. Rubinfeld. Since that date, Digital
has been and still is the owner of all right and title to this patent, including the right to
recover for infringement.

9. On June 23, 1992, United States Patent No. 5,125,083 titled ``Method And
Apparatus For Resolving A Variable Number Of Potential Memory Access
Conflicts In A Pipelined Computer System'' was duly and legally issued to Digital
on an application filed by David B. Fite, Tryggve Fossum, Ricky C. Hetherington,
John E. Murray and David A. Webb. Since that date, Digital has been and still is the
owner of all right and title to this patent, including the right to recover for
infringement.

10. On September 15, 1992, United States Patent No. 5,148,536 titled ``Pipeline
Having An Integral Cache Which Processes Cache Misses And Loads Data In
Parallel'' was duly and legally issued to Digital on an application filed by Richard T.
Witck, Douglas D. Williams, Timothy J. Stanley, David M. Fenwick, Douglas J.
Burns, Rebecca L. Stamm and Richard Heye. Since that date, Digital has been and
still is the owner of all right and title to this patent, including the night to recover for
infringement.

11. On January 12, 1993, United States Patent No. 5.179,673 titled ``Subroutine
Return Prediction Mechanism Using Ring Buffer And Comparing Predicated
Address With Actual Address To Validate Or Flush The Pipeline'' was duly and
legally issued to Digital on an application filed by Simon C. Steely, Jr. and David J.
Sager. Since that date, Digital has been and still is the owner of all right and title to
this patent, including the right to recover for infringement.

12. On March 23, 1993, United States Patent No. 5, 197, 132 titled ``Register
Mapping System Having A Log Containing Sequential Listing of Registers That
Were Changed In Preceding Cycles For Precise Post-Branch Recovery'' was duly
and legally issued to Digital on an application filed by Patent, including the right to
recover for infringement.

13. On February 28, 1995. United States Patent No. 5.394.529 titled `Branch
Prediction Unit For High-Performance Processors' was duly legally issued to
Digital on an application filed by John F. Brown, III, Shawn Persels and Jeanne
Meyer. Since that date, Digital has been and still is the owner of all right and title to
this patent, including the right to recover for infringement.

14. On July 4, 1995, United Sates No. 5.430.888 titled `` Pipeline Utilizing An Integral
Cache For Transferring Data To And From A Register '' was duly and legally
issued to Digital on an application filed by Richard T. Witek, Douglas D. Williams,
Timothy J. Stanley, David M. Fenwick, Douglas J. Burns, Rebecca L. Stamm and
Richard Heye. Since that date, Digital has been and still is the owner of all right and
title to this patent, including the right to recover for infringement.

15. On October 22, 1996, United states Patent No. 5.568.624 titled ``Byte-Compare
Operation For High-Performance Processor'' was duly and legally issued to Digital
on an application filed by Richard L. Sites and Richard T. Witek. Since that date,
Digital has been and still is the owner of all right and title to this patent, including the
right to recover for infringement.
[...]

Hope this helps.
(If I may: I think at least the titles of these patents seem to
indicate they're significantly more specific than was recently
presumed in several threads on this issue. But that's just my
opinion of course.)

Please, someone else look these up on the ibm server.
--
Cheers,
Tilman Bohn

To reply via e-mail, remove the strings "NOSPAM." in the From: field.
------------------------------------------------------------------------
Max-Planck-Institut fuer Kernphysik 'God is dead.' -- Nietzsche
Abteilung fuer Kosmophysik 'Nietzsche is dead.' -- God
Heidelberg, Germany 'Nietzsche is God.' -- The dead
<bo...@boris.mpi-hd.mpg.de>
------------------------------------------------------------------------

Jeff Kenton

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

"Frankie Teo" <fran...@netvigator.com> writes:

>True, more details would make it easier for people to decide what the

>merits are according to DEC's claims. I do think DEC is not stupid in

>filing this suit. They MUST have strong grounds on their patents to be able

Having been a "technical expert" in several patent infringement suits, I
can say that a major incentive is always $money$. If you are suing for
infringement against a product with $10,000,000,000 in market share, even
a 1% win is huge. It's especially huge in the case where you patent some
technology that you never use in your own product (I have no reason to know
if this is the case in the DEC/Intel suit), and just want to make something
from your R&D effort.


--
-------------------------------------------------------------------------
= Jeff Kenton (617) 894-4508 =
= jke...@world.std.com =
-------------------------------------------------------------------------

Doug Siebert

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

pat...@sdd.hp.com (Patrick Chase) writes:

>In article <5lcdqv$19mk$1...@news.rchland.ibm.com>, cec...@signa.rchland.ibm.com (Del Cecchi) writes:
>|> My guess is that the patents, in addition to the cache structure, are
>|> related to the out of order execution which is used in alpha and is new
>|> to Intel processors with P6.

>All current Alphas (21064 and 21164 generations) are in-order machines. The
>upcoming 21264 does out-of-order execution, but the P6 predates it by a couple
>years.


Just because it hadn't been implemented on shipping versions of the Alpha
doesn't mean DEC could not have patented it. Not arguing for or against the
suit itself, just pointing this out...

I would be interested in hearing any comments from anyone with enough
knowledge of architecture and law to do better than just guess whether there
is a chance in hell of DEC winning this. Of course, all we'll probably hear
is guesses and theories about why DEC really is suing if not just because they
really think they can win. So far, the candidates are:

Wanting permission to use the Pentium trademark to market their 21164PC+FX32
machines.

Wanting to hurt progress of the IA-64 Intel/HP effort.

Worried about the long-term viability of NT on the Alpha so they want to make
MS have enough worry about possible problems with x86 sales to want to have a
second vendor on board.

Trying to get investors to keep from jumping ship due to possible gravy train
if the suit wins.

Any others?

--
Douglas Siebert Director of Computing Facilities
douglas...@uiowa.edu Division of Mathematical Sciences, U of Iowa

He who lives in a glass house should not invite in he who is without sin.

Rich Adams

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

Robert Harley (har...@pauillac.inria.fr) wrote:
: kn...@acm.org writes:
: >Anyone have any speculation as to the motives? "Money" isn't enough,

: >unless there's been an IQ drain at Digital HQ, because there wont be a
: >payoff this decade.
:
: My take is that they intend to throw a spanner in the works for the

: HP-Intel development of the next generation x86 chip. I got the
: impression recently that DEC is very concerned that about it competing
: head-on with the 21264, which presumably would mean the end of DEC's
: huge margins on its high-end machines.
:
: I just saw on comp.sys.dec that Terry Shannon said "Instead, the
: long-term objective is to send the ia64 design team back to the
: drawing board!" If he says it, it must be true!

Some of my speculations..

- With the advent of P2 and Intel's long range plan, which looks
very much like one to seize and make proprietary x86 architecture,
could Digital be attempting to prevent them from holding the
entire market, from the home user to super computing?
(IMHO Intel making the x86 architecture proprietary is like saying,
"We want what happened to Apple, CBM, and Atari to happen to us.")

- DEC put 1 year into research and preparation for this litigation
and has every intention of pushing Intel back into a technological
stone age. Thus opening a very large market for Alpha.

- Were the quotes from Intel executives (WSJ; April 26, 1996) in
reference to adaptation of DEC intellectual property (protected by
patents) and, therefore, an enormous blunder for which they spent many
sleepless nights wondering why they had been to foolishly candid?
Could an out-of-court settlement be in the future?

- Or could it be an effort to stall x86 and ia64 research long enough
for DEC to gain desktop/workstation/infoserver marketshare? An
intriguing prospect, but one which could backfire in ugly ways. But
I'm disinclined to think this could be the entire case.

Palmer's language is certainly take-no-prisoners, I'm sure it prompted a
few meetings in Santa Clara. Oh, to be a fly on Andy Grove's wall...

--
|Rich Adams [DNRC] | "The time has come", the Walrus said, "To talk of |
|ri...@alpha.delta.edu | many things: Of shoes--and ships--and sealing wax-- |
| | Of cabbages--and kings--And why the sea is boiling |
| | hot--And whether pigs have wings." - Lewis Carroll |

Wolfe

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

Wolfe wrote:
>
> Actually, it might be the other way around. See theis news.com article:
>
> http://www.news.com/News/Item/0,4,10668,00.html
>
> If Intel has to license the technology from Digital the way AMD and
> Cyrix have to license MMX, DEC could stand to gain a ton of cash. But of
> course, Intel will fight this tooth and nail. They're not in the habit
> of just giving money away.
>
> - Chris
>
> --
> To reply by email, replace 'nospam' with 'erols'.

Oops. I just re-read the NEWS.COM article on the Intel/AMD agreement,
and it didn't mention any money exchanging hands, just that Intel now
allows AMD and Cyrix to use the MMX trademark on its chips and in its
ads. My bad.

Joseph H Allen

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

In article <01bc603c$bc9de560$b514...@Fteo.netvigator.com>,
Frankie Teo <fran...@netvigator.com> wrote:

>Will they really abandon Intel CPUs if they lost ?.

Who cares? They can still buy AMD K6s. How's AMD stock doing?

They really have nothing to lose and billions to win.


Tony Tribelli

unread,
May 14, 1997, 3:00:00 AM5/14/97
to

ri...@alpha.delta.edu (Rich Adams) wrote:

> - DEC put 1 year into research and preparation for this litigation
> and has every intention of pushing Intel back into a technological
> stone age. Thus opening a very large market for Alpha.

A theoretically large market that will probably go to x86 clones and
PowerPC, not to Alpha despite its technical superiority. The typical PC
buyer follows the applications, not the CPU with the best specs. I doubt
FX32! will prove to be very convincing for these users.

> Palmer's language is certainly take-no-prisoners, I'm sure it prompted a
> few meetings in Santa Clara. Oh, to be a fly on Andy Grove's wall...

Probably just huffing and puffing in hopes of negotiating a better
licensing deal, assuming he has a case.

Tony

--
Tony Tribelli
adtri...@acm.org

Douglas Borsom

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

dsie...@icaen.uiowa.edu (Doug Siebert) writes:

>[...]


>Of course, all we'll probably hear
>is guesses and theories about why DEC really is suing if not just because they
>really think they can win. So far, the candidates are:

>Wanting permission to use the Pentium trademark to market their 21164PC+FX32
>machines.

>Wanting to hurt progress of the IA-64 Intel/HP effort.

>Worried about the long-term viability of NT on the Alpha so they want to make
>MS have enough worry about possible problems with x86 sales to want to have a
>second vendor on board.

>Trying to get investors to keep from jumping ship due to possible gravy train
>if the suit wins.

>Any others?

Here's a +really+ wild idea.

How about Digital honestly believes Intel violated their
patents, that they can establish that in court, and that they
should be reimbursed for that violation.

No. Too crazy. Must be time for my medication.

-doug

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

4755936 : Apparatus and method for providing a cache memory unit with a
write operation
utilizing two system clock cycles

INVENTORS:
Stewart; Robert E., Stow, MA
Flahive; Barry J., Westford, MA
Keller; James B., Arlington, MA
ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:
July 5 , 1988

FILED:
Jan. 29, 1986
SERIAL NUMBER:
823805

FEE STATUS:

INTL. CLASS (Ed. 4):
G06F 13/00;
U.S. CLASS:
364-200
FIELD OF SEARCH:
364-200 MS File ;
AGENTS:
Holloway; William W.; Moran; Maura K.;


ABSTRACT: A cache memory unit is disclosed in which, in response to
the application of a write command, the write operation is performed in
two system
clock cycles. During the first clock cycle, the data signal group is
stored in a temporary storage unit while a determination is made if the
address signal group
associated with the data signal group is present in the cache memory
unit. When the address signal group is present, the data signal group is
stored in the cache
memory unit during the next application of a write command to the cache
memory unit. If a read command is applied to the cache memory unit
involving the
data signal group stored in the temporary storage unit, then this data
signal group is transferred to the central processing unit in response
to the read command.
Instead of performing the storage into the cache memory unit as a result
of the next write command, the storage of the data signal in the cache
memory unit can
occur during any free cycle.


4755936 : Apparatus and method for providing a cache memory unit with a
write operation
utilizing two system clock cycles


8 CLAIMS:

What is claimed is:

1. A cache memory unit associated with a central processing unit of
a data processing unit comprising:
first storage means for temporarily storing a preselected data
signal group in response to a WRITE command;
address storage means for temporarily storing an index of a
first address signal group associated with said preselected data signal
group in
response to said WRITE command;
second address storage means for storing a plurality of second
address signal groups, each of said second address signal groups
identified by a
second address storage means location and a comparison signal
group stored in said second address signal group location;
comparison means for comparing said first address signal group
and said second address signal groups, said comparison means providing a
first
signal when said first address signal group and a second
address signal group are equivalent, said first signal being stored in
said address storage
means;
second storage means for storing a plurality of second data
signal groups, each of said second data signal groups associated with a
one of said
second address signal groups, said preselected data signal
group replacing a second data signal group associated with a second
address signal
showing a positive comparison by said comparison means during
a next WRITE command when said first signal has been stored; and
retrieval means for retrieving a data signal group identified
by a second address signal group associated with a READ command, showing
a
positive comparison with said first address signal group.
2. The cache memory unit associated with a central processing unit
of a data processing unit of claim 1 further comprising:
second comparison means for comparing an index portion of said
address signal group portion associated with a READ command with an
index
portion of said first address signal group stored in said
address storage means, wherein a positive comparison by said second
comparison means
causes said preselected data signal group to be retrieved by
said retrieval means when said comparison means provides a positive
signal.
3. The cache memory unit of claim 2 further including means for
responding to a READ command occurring between said WRITE command and
said
next WRITE command.
4. The cache memory unit of claim 3 further including means for
storage of said preselected data signal group can occur during a command
free clock
cycle when said command free cycle occurs prior to said next WRITE
command.
5. The method of operation of a cache memory unit associated with a
central processing unit comprising the steps of:
identifying when a first address associated with a first data
signal group of a WRITE command is present in said cache memory unit;
temporarily storing said first data signal group during said
identifying step;
storing said data signal group in said cache memory at a
location determined by said first address when said identifying step is
repeated for a next
WRITE command and said first address is present in said cache
memory unit;
retrieving said first data signal group in response to a READ
command from temporary storage when said first address is identical to a
second
address associated with said READ command; and
retrieving a data signal group from said cache memory unit
when said second address is present in said cache memory unit and said
second
address is not identical with said first address.
6. The method of operation of a cache memory unit of claim 5
wherein said retrieving steps can take place between said identifying
step and said storing
step.
7. The method of operation of a cache memory unit of claim 5
wherein said storing step takes place in an absence of said retrieving
step and said
identifying step.
8. A cache memory unit of a central processing subsystem for
storing a data signal group associated with an address signal group,
said associated
address signal group also being stored in said cache memory unit,
comprising:
a first storage unit storing a comparison address signal
portion at a location determined by an index address signal portion of
said address signal
group;
a second storage unit storing a data signal group associated
with said stored address signal group at a location determined by said
index address
signal portion;
comparison apparatus providing a first signal when a
comparison signal portion of an input address signal group associated
with a READ and a
WRITE command is identical with said stored comparison address
portion at a first location in said first storage unit determined by
said index
address signal portion;
retrieval apparatus applying said data signal group associated
with said stored address signal group to output terminals of said cache
memory unit
when said comparison apparatus provides said first signal in
response to an input address signal group associated with said READ
command;
auxiliary storage apparatus temporarily storing a data signal
group identified by said WRITE command address signal group and
temporarily
storing an index portion of said input address signal group
associated with a WRITE command and temporarily storing said first
signal associated
with said WRITE command, said auxiliary storage apparatus
storing said WRITE command data signal group in said second storage unit
at a
location determined by said index portion of said address
signal group stored in said auxiliary storage apparatus upon application
of a next
WRITE command when said first signal is stored in said
auxiliary storage apparatus; and
second comparison apparatus for comparing an index input
address signal portion associated with a READ command and an index
address signal
portion stored in said auxiliary storage apparatus in response
to said READ command, said second comparison apparatus providing a
second
signal when said stored index address portion and said READ
command address signal portion are equal, wherein said READ command
causes
said data signal group in said auxiliary storage apparatus to
be applied to said cache memory unit output terminals when said first
and said second
signals and a first signal from a previous WRITE command
stored in said auxiliary apparatus are provided.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

5125083 : Method and apparatus for resolving a variable number of
potential memory access
conflicts in a pipelined computer system

INVENTORS:
Fite; David B., Northboro, MA
Fossum; Tryggve, Northboro, MA
Hetherington; Ricky C., Northboro, MA
Murray; John E., Acton, MA
Webb; Jr. David A., Berlin, MA


ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:

June 23, 1992

FILED:
Feb. 3 , 1989
SERIAL NUMBER:
306767

FEE STATUS:

INTL. CLASS (Ed. 5):
G06F 9/30; G06F 9/312;
U.S. CLASS:
395-375; 364-946.2; 364-231.8; 364-259.9;
FIELD OF SEARCH:
364-200 MS File, 900 MS File ;
AGENTS:
Arnold, White & Durkee;


ABSTRACT: An operand processing unit delivers a specified address and
at least one read/write signal in response to an instruction being a
source of
destination operand, and delivers the source operand to an execution
unit in response to completion of the preprocessing. The execution unit
receives the
source operand, executes it and delivers the resultant data to memory. A
"write queue" receives the write addresses of the destination operands
from the
operand processing unit, stores the write addresses, and delivers the
stored preselected addresses to memory in response to receiving the
resultant data
corresponding to the preselected address. The addresses of the source
operand is compared to the write addresses stored in the write queue,
and the operand
processing unit is stalled whenever at least one of the write addresses
in the write queue is equivalent to the read address. Therefore,
fetching of the operand is
delayed until the corresponding resultant data has been delivered by the
execution unit.

onflicts in a pipelined computer system


16 CLAIMS:

We claim:

1. Apparatus for controlling memory access during execution of
memory access instructions in a program for a pipelined processor, said
apparatus
comprising:
a memory having a plurality of addressable storage locations
for storing data at a specified one of said addressable storage
locations in response
to receiving said data and the corresponding address;
means for preprocessing said instructions including means for
fetching an operation code and at least one operand specifier for each
instruction
and delivering a specified address and at least one read/write
signal in response to decoding, said read/write signal indicating
whether said address
is a read address for a specified source operand or a write
address for a specified destination operand, said read addresses being
delivered for
fetching the specified source operands from the memory;
executing means responsive to said operation code for
performing an operation corresponding to said operation code upon the
specified operand
when said operand is a source operand, and delivering
resultant data to said memory when said operand is a destination
operand;
write buffer means for receiving the specified addresses and
the read/write signals from the preprocessing means, storing the write
addresses, and
delivering the stored addresses to said memory in response to
receiving the resultant data corresponding to the write address;
means for comparing the read addresses with each of the
addresses stored in the write buffer means, and delivering a stall
signal in response to at
least one of the addresses stored in the write buffer being
equivalent to one of the read address; and
means responsive to said stall signal for stalling said means
for preprocessing, whereby the fetching of source operands from said
memory is
delayed until the corresponding write buffer addresses have
been delivered to said memory and the result data corresponding to said
write buffer
addresses are available from said executing means.
2. Apparatus, as set forth in claim 1, wherein the write buffer
means includes a first-in first-out buffer having a plurality of
registers, an insert pointer
indicating the register where the next write address will be
stored, and a remove pointer indicating which of the plurality of
registers contains the address
corresponding to the received resultant data.
3. Apparatus, as set forth in claim 1, wherein the preprocessor
means includes means for predicting which of alternative instruction
paths execution of the
program will take in response to a branching instruction, placing
the predicted instructions in the pipeline, and delivering a flush
signal in response to
detecting that execution of the program will not take the predicted
path; and said apparatus further comprises means for invalidating the
write addresses
stored in the write buffer means in response to receiving the flush
signal.
4. Apparatus, as set forth in claim 3, wherein:
said executing means includes means for delivering multiple
words of resultant data resulting from a single operation to be stored
in said memory at
a plurality of consecutive write addresses, and means for
delivering a signal indicating the delivery of said multiple words of
resultant data; and
said apparatus further comprises means for preventing delivery
of the flush signal in response to receiving the signal indicating the
delivery of said
multiple words of resultant data.
5. Apparatus, as set forth in claim 1, further including a main
memory having a plurality of subdivisions of a preselected length;
a cache memory having a portion of the main memory
subdivisions stored therein;
means for setting a page change flag in response to detecting
that data stored beginning at an address stored in the write buffer
means will be
stored in separate cache memory subdivisions;
means for comparing the cache memory subdivisions to the
subdivisions needed to store the data at the address stored in the write
buffer means in
response to the page change flag being set, and delivering a
first signal in response to the needed memory subdivision being
unavailable in the
cache; and means for preventing storing of the data in
response to receiving the first signal.
6. Apparatus, as set forth in claim 5, wherein:
said preprocessing means includes means for delivering a
plural word indicating signal indicating when one of said write operands
specify a
plurality of words of resultant data to be stored at
consecutive write addresses;
said write buffer means further includes means for storing,
responsive to said plural word signal, the multiple consecutive write
addresses
associated with said plural word signal, and an indication
flagging the last consecutive write address associated with said plural
word signal being
stored; and
said comparing means for comparing the cache memory
subdivisions includes means for determining the needed memory
subdivision of the next
write address having being stored and flagged by said
indication as the last consecutive write address associated with said
plural word signal.
7. Apparatus, as set forth in claim 6, wherein said write buffer
means includes storage registers for storing said write addresses, said
storage registers
have respective sequential addresses, and wherein said means for
comparing includes:
shifting means for obtaining indications of ones of said
registers storing last consecutive write addresses associated with said
plural word signal and
shifting said indications in response to the address of the
storage register storing the write address of the next resultant data to
be delivered by said
means for executing to obtain a series of indications
beginning with an indication of whether the write address of the next
resultant data to be
delivered by said means for executing is the last consecutive
write address associated with said plural word signal, and
priority encoder means responsive to said series of
indications to obtain a relative address of the register storing the
next last consecutive write
address associated with said plural word signal, said relative
address being relative to the address of the storage location storing
the write address
of the next resultant to be delivered by said means for
executing, and
means for translating said relative address by the address of
the storage location storing the write address of the next resultant to
be delivered by
said means for executing, to thereby obtain the address of the
register storing the last consecutive write address associated with said
plural word
signal.
8. Apparatus for controlling memory access during execution of
memory access instructions in a program for a pipelined processor, said
apparatus
comprising:
a memory having a plurality of addressable storage locations
for storing data at a specified one of said addressable storage
locations in response
to receiving said data and the corresponding address;
means for preprocessing said instructions including means for
fetching an operation code and at least one operand specifier for each
instruction
and delivering a specified address and at least one read/write
signal in response to decoding, said read/write signal indicating
whether said address
is a read address for a specified source operand or a write
address for a specified destination operand, said read addresses being
delivered for
fetching the specified source operands from the memory;
executing means responsive to said operation code for
performing an operation corresponding to said operation code upon the
specified operand
when said operand is a source operand, and delivering
resultant data to said memory when said operand is a destination
operand;
write buffer means for receiving the specified addresses and
the read/write signals from the preprocessing means, storing the write
addresses, and
delivering the stored addresses to said memory in response to
receiving the resultant data corresponding to the write address;
wherein the preprocessor means includes means for predicting
which of alternative instruction paths execution of the program will
take in response
to a branching instruction, placing the predicted instructions
in the pipeline, and delivering a flush signal in response to detecting
that execution of
the program will not take the predicted path; and said
apparatus further comprises means for invalidating the write addresses
stored in the write
buffer means in response to receiving the flush signal.
9. Apparatus, as set forth in claim 8, wherein:
said executing means includes means for delivering multiple
words of resultant data resulting from a single operation to be stored
in said memory at
a plurality of consecutive write addresses, and means for
delivering a signal indicating the delivery of said multiple words of
resultant data; and
said apparatus further comprises means for preventing delivery
of the flush signal in response to receiving the signal indicating the
delivery of said
multiple words of resultant data.
10. Apparatus, as set forth in claim 8, further including a main
memory having a plurality of subdivisions of a preselected length;
a cache memory having a portion of the main memory
subdivisions stored therein;
means for setting a page change flag in response to detecting
that data stored beginning at an address stored in the write buffer
means will be
stored in separate cache memory subdivisions;
means for comparing the cache memory subdivisions to the
subdivisions needed to store the data at the address stored in the write
buffer means in
response to the page change flag being set, and delivering a
first signal in response to the needed memory subdivision being
unavailable in the
cache; and means for preventing storing of the data in
response to receiving the first signal.
11. Apparatus, as set forth in claim 10, wherein:
said preprocessing means includes means for delivering a
plural word indicating signal indicating when one of said write operands
specify a
plurality of words of resultant data to be stored at
consecutive write addresses;
said write queue means further includes means for storing,
responsive to said plural word signal, the multiple consecutive write
addresses
associated with said plural word signal, and an indication
flagging the last consecutive write address associated with said plural
word signal being
stored; and
said means for comparing the cache memory subdivisions
includes means for determining the needed memory subdivision of the next
write
address having being stored and flagged by said indication as
the last consecutive write address associated with said plural word
signal.
12. Apparatus, as set forth in claim 11, wherein said write buffer
includes storage registers for storing said write addresses, said
storage registers have
respective sequential addresses, and wherein said means for
comparing includes:
shifting means for obtaining indications of ones of said
registers storing last consecutive write addresses associated with said
plural word signal and
shifting said indications in response to the address of the
storage register storing the write address of the next resultant data to
be delivered by said
means for executing to obtain a series of indications
beginning with an indication of whether the write address of the next
resultant data to be
delivered by said means for executing is the last consecutive
write address associated with said plural word signal, and
priority encoder means responsive to said series of
indications to obtain a relative address of the register storing the
next last consecutive write
address associated with said plural word signal, said relative
address being relative to the address of the storage location storing
the write address
of the next resultant to be delivered by said means for
executing, and
means for translating said relative address by the address of
the storage location storing the write address of the next resultant to
be delivered by
said means for executing, to thereby obtain the address of the
register storing the last consecutive write address associated with said
plural word
signal.
13. Apparatus for controlling memory access during execution of
memory access instructions in a program for a pipelined processor, said
apparatus
comprising:
a main memory having a plurality of subdivisions of a
preselected length;
cache memory having a plurality of storage locations each
being identified by a specified address, the cache memory being adapted
for delivering
data stored at a preselected storage location in response to
receiving a signal indicating a read operation and a specified address,
and for storing
data at a preselected storage location in response to
receiving a signal indicating a write operation and a specified address,
the cache memory
having a portion of the main memory subdivisions stored
therein;
means for preprocessing said instructions including means for
fetching an operation code and at least one operand specifier for each
instruction
and delivering a specified address and at least one read/write
signal in response to decoding, said read/write signal indicating
whether said address
is a read address for a specified source operand or a write
address for a specified destination operand, said read addresses being
delivered for
fetching the specified source operands from the memory;
executing means responsive to said operation code for
performing an operation corresponding to said operation code upon the
specified operand
when said operand is a source operand, and delivering
resultant data to said memory when said operand is a destination
operand;
write buffer means for receiving the specified addresses and
the read/write signals from the preprocessing means, storing the write
addresses, and
delivering the stored addresses to said memory in response to
receiving the resultant data corresponding to the write address; stored
addresses to
said memory in response to receiving the resultant data
corresponding to the write address;
means for setting a page change flag in response to detecting
that data stored beginning at an address stored in the write buffer
means will be
stored in separate cache memory subdivisions;
means for comparing the cache memory subdivisions to the
subdivisions needed to store the data at the address stored in the write
buffer means in
response to the page change flag being set, and delivering a
first signal in response to the needed memory subdivision being
unavailable in the
cache; and means for preventing storing of the data in
response to receiving the first signal.
14. Apparatus, as set forth in claim 13, wherein:
said preprocessing means includes means for delivering a
plural word indicating signal indicating when one of said write operands
specify a
plurality of words of resultant data to be stored at
consecutive write addresses;
said write buffer means further includes means for storing,
responsive to said plural word signal, the multiple consecutive write
addresses
associated with said plural word signal, and an indication
flagging the last consecutive write address associated with said plural
word signal being
stored; and
said comparing means for comparing the cache memory
subdivisions includes means for determining the needed memory
subdivision of the next
write address having being stored and flagged by said
indication as the last consecutive write address associated with said
plural word signal.
15. Apparatus, as set forth in claim 14, wherein said write buffer
means includes storage registers for storing said write addresses, said
storage registers
have respective sequential addresses, and wherein said means for
comparing includes:
shifting means for obtaining indications of ones of said
registers storing last consecutive write addresses associated with said
plural word signal and
shifting said indications in response to the address of the
storage register storing the write address of the next resultant data to
be delivered by said
means for executing to obtain a series of indications
beginning with an indication of whether the write address of the next
resultant data to be
delivered by said means for executing is the last consecutive
write address associated with said plural word signal, and
priority encoder means responsive to said series of
indications to obtain a relative address of the register storing the
next last consecutive write
address associated with said plural word signal, said relative
address being relative to the address of the storage location storing
the write address
of the next resultant to be delivered by said means for
executing, and
means for translating said relative address by the address of
the storage location storing the write address of the next resultant to
be delivered by
said means for executing, to thereby obtain the address of the
register storing the last consecutive write address associated with said
plural word
signal.
16. A method of controlling memory access during execution of
memory access instructions in a program for a pipelined processor, said
pipeline
processor including: a memory having a plurality of addressable
storage locations for storing data at a specified one of said
addressable storage locations
in response to receiving said data and the corresponding address:
an operand processing unit for preprocessing said instructions including
means for
fetching an operation code and at least one operand specifier for
each instruction and delivering a specified address and at least one
read/write signal in
response to decoding, said read/write signal indicating whether
said address is a read address for a specified source operand or a write
address for a
specified destination operand, and an execution unit responsive to
said operation code for performing an operation corresponding to said
operation code
upon the specified operand when said specified operand is a source
operand, and delivering resultant data to said memory when said operand
is a
destination operand; said method including the steps of:
(a) inserting said write addresses delivered by said means for
preprocessing in a first-in first-out queue until said execution unit
delivers the
corresponding resultant data to said memory, and thereupon
removing said write addresses from said queue and storing the
corresponding
resultant data in said memory at the write addresses removed
from said queue; and
(b) comparing said read addresses delivered by said operand
processing unit with each of the addresses stored in the queue, and
stalling said
operand processing unit when at least one of the write
addresses in the queue is equivalent to one of the read addresses,
whereby the fetching of
source operands from said memory is delayed until the
corresponding write buffer addresses have been delivered to said memory
and the
resultant data corresponding to said write buffer addresses
are available from said executing means.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

INVENTORS:
Rubinfeld; Paul I., Wayland, MA

ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:
Feb. 25, 1992

FILED:
Aug. 11, 1989
SERIAL NUMBER:
392783

FEE STATUS:

INTL. CLASS (Ed. 5):
G06F 13/00;
U.S. CLASS:
395-425; 364-243.41; 364-259.4; 364-264.5; 364-DIG.1;

FIELD OF SEARCH:
364-200 MS File,900 MS File ;
AGENTS:

Fish & Richardson;


ABSTRACT: The invention provides a system for controlling the storage
of information in a cache memory and features a processor to be
connected to a bus,
the bus including information signal transfer lines for transferring
information signals and a cache control signal transfer line for
transferring a cache control signal
having a plurality of conditions, the processor including a cache memory
and a bus interface circuit connected to the cache memory and for
connection to the
bus, the bus interface circuit including: i. an information signal
transfer circuit for performing a read operation in which it receives
information signals from the
information signal transfer lines, the information signal transfer
circuit transferring the received information signals to the cache
memory; and ii. a cache control
circuit connected to the cache memory and the information signal
transfer circuit and for connection to the cache control signal transfer
line for controlling
whether the received information is to be stored in the cache memory in
response to the condition of the cache control signal.

5091845 : System for controlling the storage of information in a cache
memory


25 CLAIMS:

What is claimed as new and desired to be secured by Letters Patent of
the United States is:

1. A processor for connection to a bus, said bus including
information signal transfer lines for transferring data and address
information signals from a
source unit connected to said bus and a cache control signal
transfer line for transferring a cache control signal having a plurality
of conditions from said
source unit, said processor including:
A. a cache memory and
B. a bus interface circuit connected to said cache memory and
for connection to said bus, said bus interface circuit including:
i. an information signal transfer circuit for performing
a read operation in which it receives said data and address information
signals from
said information signal transfer lines, said information
signal transfer circuit transferring said received information signals
to said cache
memory; and
ii. a cache control circuit connected to said cache
memory and said information signal transfer circuit and for connection
to said cache
control signal transfer line for controlling whether said
received information signals are to be stored in said cache memory in
response to the
condition of said cache control signal,
whereby said source unit issues said cache control signal for
controlling encacheability of said data information signals that said
source unit
transfers over said bus.
2. The processor of claim 1 wherein said source unit issues said
cache control signal contemporaneously with said data and address
information signals.
3. A digital data processing system comprising a processor and at
least one other unit interconnected by a bus, said bus including
information signal
transfer lines and a cache control signal transfer line,
A. said other unit including:
i. an information transfer circuit connected to said
information signal transfer lines for transmitting information signals
thereover to said
processor in a read operation; and
ii. a cache control signal transmitting circuit for
transmitting a cache control signal on said cache control line
contemporaneous with the
transmission of said information signals by said
information transfer circuit;
B. said processor including:
i. a cache memory and
ii. a bus interface circuit connected to said cache
memory and for connection to said bus, said bus interface circuit
including:
a. an information signal transfer circuit for performing a
read operation in which it receives said information signals from said
information signal
transfer lines, said information signal transfer circuit
transferring said received information signals to said cache memory; and
b. a cache control circuit connected to said cache memory and
said information signal transfer circuit and for connection to said
cache control
signal transfer line for controlling whether said received
information signals are to be stored in said cache memory in response to
the condition of
said cache control signal.
4. A digital data processing system as recited in claim 3, wherein
said bus further includes arbitration signal transfer lines, and
said other unit further includes an arbitration circuit for
performing an arbitration operation in connection with said arbitration
lines and generating
in response thereto an arbitration determination, and
said information transfer circuit performs an information
transfer in response to the arbitration determination.
5. A digital data processing system as recited in claim 4, wherein
the arbitration signal transfer lines comprise a request signal transfer
line and a grant
signal transfer line, and
said arbitration circuit includes a transfer request circuit
for asserting a transfer request signal over the request signal transfer
line, and a grant
receiving circuit for receiving a transfer grant signal over
the grant signal transfer line and for generating in response thereto
the arbitration
determination, and
said bus interface circuit further includes a grant circuit
for receiving the transfer request signal over the request signal
transfer line, and for
asserting the transfer grant signal over the grant signal
transfer line in response to the receipt of the request signal.
6. A digital data processing system as recited in claim 3, wherein
said bus further includes a transfer type signal transfer line and
control lines, and
said bus interface circuit further includes a transfer type
signal transfer circuit for transmitting a transfer type signal over the
transfer type signal
transfer line to indicate the direction of a data transfer,
and a control signal circuit for transmitting a control signal over the
control signal transfer
lines, and
said other unit further includes a control circuit for
receiving the transfer type signal and the control signal, and
controlling the transfer of
information by said information transfer circuit in response
thereto.
7. A digital data processing system as recited in claim 6, wherein
said information signals comprise both data and address signals and said
control signal
lines comprise an address strobe signal transfer line and a data
strobe signal transfer line, and
said control signal circuit on said bus interface circuit
includes an address strobe signal transfer circuit for transmitting an
address strobe signal over
the address strobe signal transfer line contemporaneous with
the transfer of address signals on said information transfer lines, and
a data strobe
signal transfer circuit for transmitting a data strobe signal
over the data strobe signal transfer line, and
said control circuit on said other unit receives the address
strobe signal to enable the information transfer circuit to receive
address signals which
identify the location at which a transfer is to take place,
and a data strobe signal to enable the information transfer circuit to
perform a transfer with
respect to the location identified by the address signals.
8. A digital data processing system as recited in claim 7, wherein
the transfer type signal indicates a write operation, and said bus
interface circuit
transmits the data strobe signal contemporaneous with the transfer
of said signals over said information transfer line.
9. A digital data processing system as recited in claim 7, wherein
said bus further includes a ready signal transfer line, and
said control circuit on said other unit further includes a
ready signal transfer circuit for transmitting a ready signal over said
ready signal transfer line
to indicate that the information transfer circuit has received
a data transfer from the processor successfully, and
said bus interface circuit further includes a ready signal
transfer circuit for receiving the ready signal and negating the data
strobe signal in response
thereto.
10. A digital data processing system, as recited in claim 7,
wherein the transfer type signal indicates a read operation, and said
information transfer circuit
transmits data in response to receipt of the transfer type and
address strobe signals.
11. A digital data processing system as recited in claim 7, wherein
said bus further includes a ready signal transfer line, and
said control circuit on said other unit further includes a
ready signal transfer circuit for transmitting a ready signal over the
ready signal transfer line
to indicate that said information transfer circuit has
transferred data, and
said bus interface circuit further includes a ready signal
transfer circuit for receiving the ready signal and asserting the data
strobe signal in response
thereto.
12. A digital data processing system as recited in claim 3, wherein
said bus further includes an error signal transfer line, and
said other unit further includes an error signal transfer
circuit for transmitting an error signal over the error signal transfer
line to indicate that the
information transfer circuit has not received a data transfer
from the processor successfully, and
said bus interface circuit further includes an error signal
receiving circuit for receiving the error signal, said processor
performing an error recovery
operation in response thereto.
13. A digital data processing system as recited in claim 3, wherein
said bus further includes an error signal transfer line, and
said other unit further includes an error signal transfer
circuit for transmitting an error signal over the error signal transfer
line to indicate that the
information transfer circuit has not transferred data
successfully, and
said bus interface circuit further includes an error signal
receiving circuit for receiving the error signal, said processor
performing an error recovery
operation in response to the error signal.
14. A processor as recited in claim 3, wherein said bus further
includes arbitration signal transfer lines, and
said information signal transfer circuit is inhibited from
performing an information signal transfer circuit is inhibited from
performing an information
transfer in response to signals received over the arbitration
signal transfer lines.
15. A digital data processing system as recited in claim 14,
wherein the arbitration signal transfer lines comprise a request signal
transfer line and a grant
signal transfer line, and
said bus interface circuit further includes a grant circuit
for receiving a transfer request signal over said request signal
transfer line, and for asserting
a transfer grant signal over said grant signal transfer line
in response to receipt of the request signal.
16. A processor as recited in claim 3, wherein said bus further
includes a transfer type signal transfer line and control signal lines,
and
said bus interface circuit further includes a transfer type
signal transfer circuit for transmitting a transfer type signal over the
transfer type signal
transfer line to indicate the direction of a data transfer,
and a control signal circuit for transmitting a control signal over the
control signal transfer
lines.
17. A digital data processing system as recited in claim 16,
wherein said information signals comprise both data and address signals
and said control
signal lines comprise an address strobe signal transfer line and a
data strobe signal transfer line, and
said control signal circuit on said bus interface circuit
includes an address strobe signal transfer circuit for transmitting an
address strobe signal over
the address strobe signal transfer line contemporaneous with
the transfer of address signals on the information transfer lines, and a
data strobe
signal transfer circuit for transmitting a data strobe signal
over the data strobe signal transfer line.
18. A processor as recited in claim 17, wherein said transfer type
signal indicates a write operation, and said bus interface circuit
transmits the data
strobe signal contemporaneous with the transfer of data signals on
the information transfer line.
19. A digital data processing system as recited in claim 17,
wherein said bus further includes a ready signal transfer line, and
said bus interface circuit further includes a ready signal
transfer circuit for receiving a ready signal and negating the data
strobe signal in response
thereto.
20. A digital data processing system as recited in claim 17,
wherein said bus further includes an error signal transfer line, and
said bus interface circuit further includes an error signal
transfer circuit for receiving an error signal, said processor
performing an error recovery
operation in response thereto.
21. A processor as recited in claim 17, wherein said transfer type
signal indicates a read operation, and the information signal transfer
circuit receives
data in response to transmission of the transfer type and address
strobe signals.
22. A processor as recited in claim 17, wherein said bus further
includes a ready signal transfer line, and
said bus interface circuit further includes a ready signal
transfer circuit for receiving a ready signal and asserting the data
strobe signal in response
thereto.
23. A processor as recited in claim 17, wherein said bus further
includes an error signal transfer line, and
said bus interface circuit further includes an error signal
transfer circuit for receiving an error signal, said processor
performing an error recovery
operation in response to said error signal.
24. A processor for connection to a bus, said bus including
information transfer lines for transferring data and address information
signals from a source
unit connected to said bus, a cache control signal transfer line
for transferring a cache control signal having a plurality of conditions
from said source unit,
a transfer type signal transfer line, an address strobe signal
transfer line, a data strobe signal transfer line, a ready signal
transfer line, an error signal
transfer line, and arbitration signal transfer lines, said
processor including:
A. a cache memory and
B. a bus interface circuit connected to said cache memory and
for connection to said bus, said bus interface circuit including:
i. an information signal transfer circuit for receiving
said data and address information signals from said information transfer
lines and
transferring the received information signals to said
cache memory,
ii. a cache control circuit connected to said cache
memory and said information signal transfer circuit and for connection
to said cache
control signal transfer line for controlling whether said
information signals received by said information signal transfer circuit
are to be stored
in said cache memory in response to the condition of said
cache control signal, whereby said source unit issues said cache control
signal for
controlling encacheability of said data information
signals that said source unit transfers over said bus,
iii. a transfer type signal transfer circuit for
transmitting a transfer type signal to indicate the direction of data
transfer,
iv. an address strobe signal transfer circuit for
transmitting an address strobe signal over the address strobe signal
transfer line
contemporaneous with the transfer of address signals on
said information transfer lines, and a data strobe signal transfer
circuit for
transmitting a data strobe signal over the data strobe
signal transfer line,
v. a ready signal transfer circuit for receiving a ready
signal to indicate that a data transfer was successful, and negating the
data strobe
signal in response thereto,
vi. an error signal transfer circuit for receiving an
error signal and performing an error recovery operation in response
thereto, and
vii. an arbitration circuit for performing an arbitration
operation for controlling transfer over the bus, said arbitration
circuit controlling said
information transfer circuit in response to said
arbitration operation.
25. The processor of claim 24 wherein said source unit issues said
cache control signal contemporaneously with said data and address
information
signals.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

5148536 : Pipeline having an integral cache which processes cache misses
and loads data in
parallel

INVENTORS:
Witek; Richard T., Littleton, MA
Williams; Douglas D., Pepperell, MA
Stanley; Timothy J., Leominster, MA
Fenwick; David M., Chelmsford, MA
Burns; Douglas J., Billerica, MA
Stamm; Rebecca L., Newton, MA
Heye; Richard, Somerville, MA


ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:

Sep. 15, 1992

FILED:
July 25, 1988
SERIAL NUMBER:
224483

FEE STATUS:

INTL. CLASS (Ed. 5):
G06F 9/38;
U.S. CLASS:
395-425; 364-243.41; 364-247; 364-248.6; 364-231.8;
364-DIG.1;
FIELD OF SEARCH:
364-300,200 MS File,900 MS File ; 395-425 ;
AGENTS:
Kenyon & Kenyon;


ABSTRACT: A load/store pipeline in a computer processor for loading
data to registers and storing data from the registers has a cache memory
within the
pipeline for storing data. The pipeline includes buffers which support
multiple outstanding read request misses. Data from out of the pipeline
is obtained
independently of the operation of the pipeline, this data corresponding
to the request misses. The cache memory can then be filled with the data
that has been
requested. The provision of a cache memory within the pipeline, and the
buffers for supporting the cache memory, speed up loading operations for
the
computer processor.

5148536 : Pipeline having an integral cache which processes cache misses
and loads data in
parallel


27 CLAIMS:

What is claimed is:

1. A load/store pipeline in a vector processor for loading data
entries to registers and storing data entries from said registers, the
pipeline comprising:
a cache memory for storing data entries;
a data loading device which loads data entries from said cache
memory to said registers and which stores data from said registers into
said cache
memory with loading and storing data entries from and to said
cache memory being pipelined;
a determining device which determines when one of said data
entries is a miss;
a plurality of read miss buffers coupled to said miss
determining device which store command addresses corresponding to said
missed data entries
and store data entries from out of said pipeline independently
of the pipelining of data entries to and from the registers of the
vector processor,
said stored data entries corresponding to said command
addresses;
a controller utilizing said stored command addresses to obtain
and store said corresponding missed data entries in said read miss
buffers
independently of the pipelining of data entries to and from
the registers; and
a filling device, in parallel with said data loading device,
which fills the cache memory with said stored data entries and writes
said stored data
entries to said registers.
said data loading device continuing to operate when said
determining device determines one of said data entries is a miss.
2. The load/store pipeline of claim 1, wherein said read miss
buffers include buffers which store said obtained data entries, receive
said command
address data entries in arbitrary order from out of said pipeline
and fill the cache memory with the data entries in the order the command
addresses were
stored in said read miss buffers.
3. The load/store pipeline of claim 1, wherein said determining
device includes: a cache tag look up stage for parsing a physical
address into a cache
index and tag compare data; a tag store coupled to the cache tag
look up stage for storing tags corresponding to addresses in said cache,
said tag store
receiving said cache index and outputting corresponding tag data;
and a comparator coupled to said tag store and said cache tag look up
stage for
receiving said tag data and said tag compare data and generating a
hit signal when said tag data and said tag compare data match, and a
miss signal
when said tag data and said tag compare data do not match.
4. The load/store pipeline of claim 3, further comprising a delay
coupled between said cache tag look up stage and said cache memory which
delays
receipt of said cache tag address by said cache memory.
5. The load/store pipeline of claim 4, further comprising an
address generator coupled to said cache tag look up stage for receiving
load and store
commands and generating pipe entries which include virtual
addresses.
6. The load/store pipeline of claim 5, further comprising a
translation buffer coupled between said cache tag lookup stage and said
address generator for
translating said virtual addresses into physical addresses.
7. The load/store pipeline of claim 1, wherein said read miss
buffers include buffers that store said command addresses, receive said
command address
data in arbitrary order from out of said pipeline and fill the
cache memory with the data in the order the command addresses were
stored in said read
miss buffers.
8. The load/store pipeline of claim 7, wherein said cache memory is
coupled to said cache look up stage so as to receive a cache index.
9. The load/store pipeline of claim 1, further comprising a
load/store controller for controlling the pipeline into either a load
mode, a store mode or a
cache fill mode.
10. The load/store pipeline of claim 1, wherein said loading device
includes a cache data bus coupled to said cache memory and to said
registers which
transfers data entries between said cache memory and said
registers; and said filling device includes an internal data bus
coupling said read miss buffers
to said cache data bus and carrying said obtained data entries.
11. The load/store pipeline of claim 1, wherein said read miss
buffers include obtaining devices which obtain additional data entries
from out of said
pipeline when said data entry corresponding to said command address
is obtained, and wherein said filling device fills the cache memory with
said
obtained data entry and said additional data entries.
12. The load/store pipeline of claim 11, wherein said data entries
are longwords, and said additional data entries are longwords within an
aligned
hexaword which contains a longword corresponding to said obtained
data.
13. The load/store pipeline of claim 1, wherein said data is a
quadword having two longwords.
14. The load/store pipeline of claim 13, wherein said entries
contain information regarding alignment of said longwords of a quadword
such that
nonaligned longwords are storable in said registers.
15. The load/store pipeline of claim 14, wherein said information
is cycle type information and said register is presented by said
pipeline with an address
and an address plus 1 for each said quadword.
16. The load/store pipeline of claim 1, further comprising a write
buffer coupled to said cache memory which buffers data entries
corresponding to a
storing operation out of said pipeline, said write buffer having a
capacity to store contents of an entire register, such that an operation
subsequent to said
storing operation can be performed in said pipeline while said data
entries corresponding to said storing operation are buffered out of said
pipeline.
17. The load/store pipeline of claim 16, wherein said write buffer
is a dynamically configurable write buffer which is configurable during
storing
operations to accommodate different data formats.
18. A method of loading data from a cache to registers in a vector
processor pipeline, the method comprising the steps of:
a) requesting a data block from said cache;
b) checking whether there is a hit in said cache for said
block of data to be loaded, said hit indicating that said data block is
in said cache, and a
miss indicating said data block is not in said cache;
c) driving said data block from said cache to said registers
when there is a hit; and
d) obtaining said data block from a memory when there is a
miss and writing said obtained data block into said cache and said
registers, wherein
steps a, b, and c are repeated for subsequent data blocks
after the step of checking for said data block indicates a miss; and
wherein the step of
the obtaining of said data block is performed during the
repeating of steps a, b, and c for said subsequent data blocks.
19. The method of claim 18, wherein step d) includes the steps of
sending a command address so at least one read miss buffer, receiving in
said read
miss buffer said data block from the memory over an external bus,
and sending said data block to said cache and said registers via an
internal data bus.
20. The method of claim 19, wherein said data block includes a
plurality of individual data elements, and said read miss buffer sends
said data block only
when all said data elements of said data block are received from
the memory.
21. The method of claim 20, wherein a plurality of read miss
buffers are provided, and further comprising the steps of: e) halting
the repeating of steps a,
b and c when all of said read miss buffers contain a command
address; f) writing an obtained data block in one of said read miss
buffers into said cache
and said registers; and g) repeating steps a, b, c and d until all
said read miss buffers contain command addresses and then repeating
steps e, f and g,
until all of said data blocks are loaded into said registers.
22. The method of claim 19, wherein step d includes obtaining said
data block and further data blocks when there is a miss, and said read
miss buffer
sends said data block and said further data blocks to said cache
and said registers, with said data block and said further data blocks
being filled in said
cache and only said data block being written into said registers.
23. The method of claim 18, wherein said loading of data occurs
simultaneously with a portion of a previous storing operation in which
data blocks are
sent to said memory.
24. The method of claim 23, further comprising the step of halting
said storing operation when there is a miss during the loading
operation.
25. A pipeline in computer processor for loading data entries in
registers and storing data entries from said registers, the pipeline
comprising:
a cache memory for storing data;
a loading device which loads data entries from said cache
memory to said registers and which stores data entries from said
registers in said cache
memory with loading and storing data entries from and to said
cache memory being pipelined;
a determining device which determines when one of said data
entries is a miss;
a controller operating to obtain a return data entry
corresponding to the one of the data entries determined to be a miss,
the controller operating
independently of the pipelining of data entries to and from
the registers;
a storage device coupled to the controller for storage of the
return data entry;
a filling device coupled to the storage device which fills the
cache memory with said return data entry, said pipeline continuing to
operate without
stalling in the presence of missed data entries.
26. The pipeline of claim 25, further including a controller which
aborts instructions within said pipeline when one of said missed data
entries corresponds
to a load instruction and one of said data entries within said
pipeline corresponds to a store instruction subsequent to said load
instruction.
27. The pipeline of claim 26, further including a content
addressable memory which compares information in said missed data
entries which correspond
to said load instructions with information in said data entries
which correspond to said store instruction and causes said controller to
abort said
instructions within said pipeline based on said comparison.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

5179673 : Subroutine return prediction mechanism using ring buffer and
comparing predicated
address with actual address to validate or flush the pipeline

INVENTORS:
Steely, Jr.; Simon C., Hudson, NH
Sager; David J., Acton, MA


ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:

Jan. 12, 1993

FILED:
Dec. 18, 1989
SERIAL NUMBER:
451943

FEE STATUS:

INTL. CLASS (Ed. 5):
G06F 9/40; G06F 9/42;
U.S. CLASS:
395-375; 364-238.8; 364-DIG.1; 364-231.8; 364-261.3;
FIELD OF SEARCH:
395-375,800 ;
AGENTS:
Kenyon & Kenyon;


ABSTRACT: A method and arrangement for producing a predicted
subroutine return address in response to entry of a subroutine return
instruction in a
computer pipeline that has a ring pointer counter and a ring buffer
coupled to the ring pointer counter. The ring pointer counter contains a
ring pointer that is
changed when either a subroutine call instruction or return instruction
enters the computer pipeline. The ring buffer has buffer locations which
store a value
present at its input into the buffer location pointed to by the ring
pointer when a subroutine call instruction enters the pipeline. The ring
buffer provides a value
from the buffer location pointed to by the ring pointer when a
subroutine return instruction enters the computer pipeline, this
provided value being the predicted
subroutine return address.

5179673 : Subroutine return prediction mechanism using ring buffer and
comparing predicated
address with actual address to validate or flush the pipeline


5 CLAIMS:

What is claimed:

1. An arrangement for producing a predicted subroutine return
address in response to entry of a subroutine return instruction in a
computer pipeline,
comprising:
(a) means for detecting the entry into the computer pipeline
of a subroutine call instruction or the subroutine return instruction;
(b) a ring pointer counter that contains a ring pointer that
is incremented when the subroutine call instruction enters the computer
pipeline, and is
decremented when the subroutine instruction enters the
computer pipeline;
(c) a ring buffer coupled to said ring pointer counter and
having buffer locations, an input and an output, said ring buffer
storing a value present at
said input into the buffer location pointed to by said ring
pointer when the subroutine call instruction enters the computer
pipeline and providing a
value at said output from the buffer location pointed to by
the ring pointer when the subroutine return instruction enters the
computer pipeline, said
value at said output being the predicted subroutine return
address; and
(d) a comparison unit coupled to said ring buffer and to the
computer pipeline, the comparison unit comparing an actual return
address produced
by the computer pipeline in response to the processing of the
subroutine return instruction with the predicted return address for that
return
instruction, and having an output at which is provided a
mis-comparison signal when the actual return address is not the same as
the predicted
return address, the mis-comparison signal being coupled to the
computer pipeline to cause the computer pipeline to flush the computer
pipeline
when the actual return address is not the same as the
predicted return address.
2. A computer pipeline comprising:
a) an instruction cache which stores coded instructions and
has an input that receives a program count which indexes the coded
instructions, and
an output at which the indexed coded instructions are
provided;
b) an instruction fetching decoder having an input coupled to
the instruction cache output and which decodes the coded instructions,
and having as
outputs:
i) a subroutine call address when the coded instruction
is a subroutine call instruction,
ii) a multiplexer control signal which indicates whether the
coded instruction is a return instruction, the call instruction or
neither,
iii) a ring pointer counter control signal which is a
first value when the coded instruction is a return instruction and a
second value when the
coded instruction is the call instruction, and
iv) a decoded instruction;
b) an execution stage coupled to the instruction fetching
decoder which executes the decoded instruction;
c) a program counter coupled to the input of the instruction
cache and having an output at which is provided the program count to the
instruction
cache input;
d) a multiplexer having a plurality of inputs, a control input
coupled to the multiplexer control signal output of the instruction
fetching decoder, and
an output coupled to the program counter input;
e) an adder having a an input coupled to the output of the
program counter and an output coupled to one of said multiplexer inputs
at which is
provided a value equal to the program count plus one;
f) a ring pointer counter having an input coupled to the
instruction fetching decoder to receive the ring pointer counter control
signal, and
containing a ring pointer which points to buffer locations in
response to the ring pointer counter control signal, said ring pointer
being incremented
when the instruction fetching decoder decodes a subroutine
call instruction and being decremented when the instruction fetching
decoder decodes
a subroutine return instruction;
g) a ring buffer having an input coupled to the adder output,
a plurality of buffer locations, and an output coupled to one of said
multiplexer inputs,
said ring buffer storing said value received from the adder
output as a return value in the buffer location pointed to by said ring
pointer when a
subroutine call instruction is decoded and providing said
return value from the buffer location pointed to by the ring pointer at
the ring buffer output
when a subroutine return instruction is decoded, said return
value at said ring buffer output being the predicted subroutine return
address;
h) the multiplexer operating to output:
i) the subroutine call address when the multiplexer
control signal is indicative of the subroutine call instruction,
ii) the predicted return address when the multiplexer is
indicative of the subroutine return instruction, and
iii) the output of the adder when the multiplexer control
signal is indicative of neither the subroutine call instruction nor the
subroutine return
instruction; and
i) a comparison unit coupled to said ring buffer and to said
execution stage, the comparison unit comparing an actual return address
produced by
the execution stage in response to the processing of the
return instruction with the predicted return address for that return
instruction, and having
an output at which is provided a mis-comparison signal when
the actual return address is not the same as the predicted return
address, the
mis-comparison signal being coupled to the computer pipeline
to cause the computer pipeline to flush the computer pipeline when the
actual return
address is not the same as the predicted return address.
3. The pipeline of claim 2, further comprising means for processing
a correct sequence of instructions beginning with the instruction
indicated by the
actual return address, the means coupled to the instruction cache
and the ring pointer counter and, when the mis-comparison signal
indicates that the
predicted return address and the actual return address do not
match, operating to input the actual return address into the instruction
cache and to return
the ring pointer counter to its pre-trap state.
4. The pipeline of claim 2, further comprising a confirmed ring
pointer counter coupled to the execution stage and the ring pointer
counter, and containing
a confirmed ring pointer that is incremented when the execution
stage receives a subroutine call instruction and is decremented when the
execution stage
receives a subroutine return instruction, and which provides the
confirmed ring pointer to the ring pointer counter to replace the ring
pointer when a trap
occurs.
5. A method of predicting subroutine return addresses in a
pipelines computer comprising:
(a) storing in one of a plurality of buffer locations in a
ring buffer a value equal to one plus an address of a call instruction
in response to that call
instruction;
(b) pointing to the buffer location containing the most
recently stored value;
(c) providing the pointed to most recently stored value as an
output in response to a return instruction, said output being the
predicted subroutine
return address;
(d) pointing to the buffer location containing the next most
recently stored value;
(e) comparing an actual return address produced by the
computer pipeline in response to the processing of the return
instruction with the
predicted return address for that return instruction to
determine whether the predicted return address is valid; and
(f) when the determination indicates that the predicted return
address is not valid, generating a mis-comparison signal to cause a
flush of the
computer pipeline.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

5394529 : Branch prediction unit for high-performance processor

INVENTORS:
Brown, III; John F., Northboro, MA
Persels; Shawn, Northboro, MA
Meyer; Jeanne, Watertown, MA


ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:

Feb. 28, 1995

FILED:
July 1 , 1993
SERIAL NUMBER:
086355

FEE STATUS:

INTL. CLASS (Ed. 6):
G06F 9/26;
U.S. CLASS:
395-375; 364-261.3; 364-261.5; 364-261.7; 364-263.1;
364-DIG.1;

FIELD OF SEARCH:
395-375,800 ;
AGENTS:

Arnold, White & Durkee;


ABSTRACT: A pipelined CPU executes instructions of variable length,
and references memory using various data widths. Macroinstruction
pipelining is
employed (instead of microinstruction pipelining), with queueing between
units of the CPU to allow flexibility in instruction execution times. A
branch prediction
method employs a branch history table which records the taken vs.
not-taken history of branch opcodes recently used, and uses an empirical
aglorithm to
predict which way the next occurrence of this branch will go, based upon
the history table. The branch history table stores in each entry a
number of bits for
each branch address, each bits indicating "taken" or "not-taken" for one
occurrence of the branch. The table is indexed by branch address. A
register stores the
empirical aglorithm, and upon occurrence of a branch its history is
fetched from the table and used to select a location in the register
containing a prediction for
this particular pattern of branch history.

5394529 : Branch prediction unit for high-performance processor


24 CLAIMS:

What is claimed is:

1. A method of branch prediction, comprising the steps of:
a) storing a record of whether a conditional branch executed
by a processor is taken or not taken for each conditional branch
instruction executed
by a processor, said storing being by memory address for each
conditional branch instruction, said record including a plurality of
occurrences of
either a taken indicator or a not-taken indicator or both for
each conditional branch instruction at each said memory address;
b) when a conditional branch instruction is executed using one
said memory address, predicting whether or not the branch will be taken
in
response to said record accessed using said one memory address
and based upon a multi-bit prediction pattern, using said taken and
not-taken
indicators in said record; said prediction pattern being a
multi-bit value in a register, said register being writable by said
processor for dynamically
changing the prediction pattern during operation of said
processor, said record for said one memory address being used for
selection within said
multi-bit value;
c) fetching instruction in an instruction stream in response
to said prediction.
2. A method according to claim 1 wherein said record includes at
least four said occurrences.
3. A method according to claim 1 wherein said record is a table
indexed by instruction address in a range of addresses.
4. A method according to claim 1 wherein said record is a plurality
of bits, each bit providing one of said taken or not taken indicators
for a branch
instruction at an instruction address.
5. A method according to claim 1 including the step of entering
into said record the correct one of said taken or not taken indicators
for each branch
instruction after each branch instruction is executed.
6. A method of branch prediction, comprising the steps of:
a) storing a record of whether a conditional branch is taken
or not taken for each conditional branch instruction executed by a
processor, said
record including a plurality of occurrences of either a taken
indicator or a not-taken indicator or both for each conditional branch
instruction at
each memory address; wherein said record is a plurality of
bits, each bit providing one of said taken or not taken indicators for a
branch
instruction at an instruction address;
b) when a conditional branch instruction is executed,
predicting whether or not the branch will be taken based upon said
record and upon a
prediction pattern using said taken and not-taken indicators;
wherein said record is used to address a register containing said
prediction pattern;
c) fetching instructions in an instruction stream in response
to said prediction.
7. A method according to claim 6 wherein said branch instruction is
executed by a processor, and said processor has a plurality of internal
registers, and
has a data path for accessing said internal registers under control
of instructions, and wherein said register is accessible by said data
path of said
processor.
8. A method of operating a computer, comprising the steps of:
a) fetching a sequence of instructions using a sequence of
addresses, wherein said instructions may include conditional branch
instructions;
b) detecting a conditional branch instruction in said
sequence, and, when a conditional branch instruction is detected,
selecting an entry in a branch
history table in response to the address of said branch
instruction in said sequence, the branch history table containing a
plurality of the entries,
each entry containing a plurality of indicators of taken or
not taken, each indicator being the result of one of a plurality of
occurrences of each
branch instruction at a given memory address;
c) in response to said plurality of indicators in said branch
history table, predicting whether or not a branch will be taken when
said conditional
branch instruction is executed, and, if it is predicted that
the branch will be taken, fetching instructions of a sequence different
from said sequence,
said predicting being by selecting from a multi-bit
programmable prediction value using said plurality of indicators as an
index for selecting within
said multi-bit prediction value; said prediction value being
in an internal location in said computer writable by said computer for
dynamically
changing said multi-bit prediction;
d) executing said conditional branch instruction in a
processing unit and updating said branch history table to indicate
whether or not said branch
was actually taken.
9. A method according to claim 8 wherein said branch history table
includes an entry for each address in a range of addresses which is less
than the
addressing range for said instructions, each entry containing said
plurality of indicators.
10. A method according to claim 9 wherein said range is 512.
11. A method according to claim 10 wherein each said entry contains
four branch-history bits, each bit being one of said indicators of
whether a branch
was taken or not in a previous execution of a conditional branch at
an address in said range.
12. A method according to claim 9 wherein said step of updating
includes shifting the contents of a location in said branch history
table and adding a
value according to the predicted value of branch taken or not
taken.
13. A method of operating a computer, comprising the steps of:
a) fetching a sequence of instructions using a sequence of
addresses, wherein said instructions may include conditional branch
instructions;
b) detecting a conditional branch instruction in said
sequence, and, when a conditional branch instruction is detected,
selecting an entry in a branch
history table in response to the address of said branch
instruction in said sequence, the branch history table containing a
plurality of indicators of
taken or not taken, each indicator being the result of one of
a plurality of occurrences of each branch instruction; wherein said
branch history table
includes an entry for each address in a range of addresses
which is less than the addressing range for said instructions, each
entry containing said
plurality of indicators; wherein each said entry contains a
plurality of branch-history bits, each bit being one of said indicators
of whether a branch
was taken or not in a previous execution of a conditional
branch at an address in said range;
c) in response to said plurality of indicators in said branch
history table, predicting whether or not a branch will be taken when
said conditional
branch instruction is executed, and, if it is predicted that
the branch will be taken, fetching instructions of a sequence different
from said sequence,
said predicting being based upon a programmable prediction
algorithm using said plurality of indicators; wherein said step of
predicting includes
selecting a value from a register using bits from said history
table as an index to said register;
d) executing said conditional branch instruction in a
processing unit and updating said branch history table to indicate
whether or not said branch
was actually taken.
14. A method according to claim 13 wherein said computer includes a
plurality of internal registers, said internal registers being
accessible via a datapath
upon execution of instructions of said sequence of instructions,
and wherein said register is accessible by said datapath of said
computer.
15. A processor, said processor fetching and executing a sequence
of instructions, wherein said instructions may include conditional
branch instructions,
said processor having a branch predictor comprising:
a) a branch history table storing a plurality entries, each
entry containing a plurality of indicators of whether a branch was taken
or not taken for
each of said conditional branch instructions in a range of
addresses for said instructions;
b) a detector detecting a conditional branch instruction in
said sequence, and, when a conditional branch in instruction is
detected, selecting one of
said entries in said branch history table in response to the
address of said branch instruction in said sequence;
c) a predictor generating an output indicating a programmable
empirically-based prediction of whether or not a branch will be taken
when said
conditional branch instruction is executed, said prediction
being responsive to said plurality of indicators of an entry, and, if it
is predicted that the
branch will be taken, generating a new address sequence for
fetching instructions differing from said sequence; wherein said
predictor includes a
register containing a plurality of bits, said register being
addressed by said indicators for said selected one of said entries, each
bit of said register
being said prediction for a pattern of said indicators as
recorded in one of said entries in said history table;
d) a processing unit executing said conditional branch
instruction and updating said branch history table to indicate whether
or not said branch was
actually taken.
16. A processor according to claim 15 wherein said plurality of
occurrences is at least four.
17. A processor according to claim 15 wherein said processor
contains a plurality in internal registers accessible via a datapath,
and wherein said
register is accessible for writing by said datapath of said
processor.
18. A processor for executing instructions including conditional
branch instructions, having a branch prediction comprising:
a) means for storing a record of entries containing indicators
of whether a conditional branch is taken or not taken for each
conditional branch
instruction executed by said processor, said record including
a plurality of entries, each said entry containing a plurality of said
indicators for a
plurality of occurrences of each conditional branch
instruction at each memory address;
b) means responsive to a conditional branch instruction being
executed, for predicting whether or not the branch will be taken
responsive to one
of said entries of said record and based upon a stored
programmable prediction pattern; said prediction pattern being a
multi-bit value addressed
by said one of said entries and selection from said multi-bit
value being in response to the value of said one of said entries, said
multi-bit value
being in a location in said processor writable by an
instruction executed by said processor;
c) and means for fetching instructions in an instruction
stream in response to said prediction.
19. A processor according to claim 18 wherein each said entry of
said record includes at least four of said indicators.
20. A processor according to claim 18 wherein said record is a
table indexed by instruction address in a range of addresses.
21. A processor according to claim 18 wherein each said entry of
said record is a plurality of bits, each bit being one of said
indicators indicating taken
or not taken for a branch instruction at an instruction address.
22. A processor according to claim 18 including means for entering
into said record the correct indication of taken or not taken for each
branch
instruction after each branch instruction is executed.
23. A processor for executing instructions including conditional
branch instructions, having a branch prediction comprising:
a) means for storing a record of entries containing indicators
of whether a conditional branch is taken or not taken for each
conditional branch
instruction executed by said processor, said record including
an entry containing a plurality of said indicators for a plurality of
occurrences of each
conditional branch instruction at each memory address; wherein
each said entry of said record is a plurality of bits, each bit being
one of said
indicators indicating taken or not taken for a branch
instruction at an instruction address
b) means responsive to a conditional branch instruction being
executed, for predicting whether or not the branch will be taken
responsive to one
of said entries of said record and based upon a stored
programmable prediction pattern; wherein a selected one of said entries
of said record is
used to address a register containing said prediction pattern;
c) and means for fetching instructions in an instruction
stream in response to said prediction.
24. A processor according to claim 23 wherein said processor
includes a plurality of internal registers, said internal registers
being accessible via a data
path upon execution of said instructions, and wherein said register
is accessible by said data paths of said processor.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

5197132 : Register mapping system having a log containing sequential
listing of registers that were
changed in preceding cycles for precise post-branch recovery

INVENTORS:
Steely, Jr.; Simon C., Hudson, NH

Sager; David J., Acton, MA


ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:

Mar. 23, 1993

FILED:
June 29, 1990
SERIAL NUMBER:
546411

FEE STATUS:

INTL. CLASS (Ed. 5):
G06F 9/00; G06F 11/30;
U.S. CLASS:
395-375; 364-231.8; 364-261.3; 364-261.5; 364-DIG.1;
395-425;
FIELD OF SEARCH:
395-800,425,200,375 ;
AGENTS:
Kenyon & Kenyon;


ABSTRACT: A register map having a free list of available physical
locations in a register file, a log containing a sequential listing of
logical registers changed
during a predetermined number of cycles, a back-up map associating the
logical registers with corresponding physical homes at a back-up point
in a computer
pipeline operation and a predicted map associating the logical registers
with corresponding physical homes at a current point in the computer
pipeline operation.
A set of valid bits is associated with the maps to indicate whether a
particular logical register is to be taken from the back-up map or the
predicted map
indication of a corresponding physical home. The valid bits can be
"flash cleared" in a single cycle to back-up the computer pipeline to
the back-up point during
a trap event.


5197132 : Register mapping system having a log containing sequential
listing of registers that were
changed in preceding cycles for precise post-branch recovery


10 CLAIMS:

What is claimed is:

1. An arrangement for mapping m logical registers used in the
execution of instructions processed through a computer pipeline,
comprising:
a register file having n locations for storing values, said
locations being physical homes of the m logical registers, where m
a register map coupled to the register file and receiving
instructions as input and generating mapped instructions as output to
the register file, the
register map comprising:
a free list that contains a number of locations p, each said
free list location containing a register file location, the free list
indicating which of said
register file locations are free for use in a current cycle;
a log that contains a sequential listing of which of the m
logical registers were changed in each of t cycles preceding a current
cycle;
a backup map that contains a map associating m of the n
physical homes to the m logical registers at a backup point, wherein the
backup point is
a preselected number of cycles preceding the current cycle and
wherein the preselected number of cycles is equal to, or less than, t;
a predicted map that contains a map associating m of the n
physical homes to the m logical registers during the current cycle; and
a register map control device coupled to each of the free
list, log, backup map, and predicted map, the register map control
device receiving the
instructions input to the register map, maintaining the free
list, log, backup map and predicted map, the backup map being maintained
using the
sequential listing of the log, the register map control device
generating the mapped instructions output by the register map.
2. The arrangement of claim 1, wherein the register map further
comprises a valid bits register coupled to the register map control
device, the valid bits
register including a set of valid bits, each one of the valid bits
corresponding to one of the m logical registers for indicating whether
the physical home,
associated with the corresponding logical register, is to be taken
from the backup map or the predicted map when mapping registers.
3. A method of maintaining a mapping of m logical registers to n
physical homes contained in a register file, wherein m
receiving an instruction, during a current cycle, the
instruction specifying at least one address of one of the m logical
registers to be mapped in a
register map;
maintaining a free list that indicates which of the n physical
homes in the register file are free for use in the current cycle;
mapping the at least one address of the received instruction
into at least one of the free physical homes indicated in the free list
and associating the
one of the free physical homes with the corresponding one of
the m logical registers;
maintaining in a log a sequential listing of which of the m
logical registers were changed in each of t cycles preceding the current
cycle;
utilizing the log to maintain in a backup map the physical
home associated with each one of the m logical registers at a backup
point, wherein the
backup point is a preselected number of cycles preceding the
current cycle and wherein the preselected number of cycles is equal to,
or less than,
t;
maintaining in a predicted map a map of the physical home
associated with each one of the m logical registers during the current
cycle;
maintaining a set of valid bits that indicate whether the
physical home associated with a specific logical register is to be taken
from the backup map
or the predicted map when mapping registers.
4. The method of claim 3, further comprising the step of
flash-clearing the set of valid bits in response to a control signal, to
thereby indicate that a
correct physical home for a logical register is to be taken from
the backup map when backing up an instruction stream.
5. The method of claim 3, wherein the step of maintaining a
freelist includes storing a present physical home for a logical register
being changed, in a first
location in the freelist, and assigning a new physical home to the
logical register being changed from a second location in the freelist.
6. The method of claim 5, wherein the step of maintaining a
freelist further includes aging of the present physical home that is
stored by storing the
present physical home for a period of time at least equal to a
backup time before assigning the aged physical home as a new physical
home to a logical
register being changed, wherein the backup time is equal to the
number of cycles occurring from the backup point to the current cycle.
7. The method of claim 6, wherein the step of maintaining in a
predicted map includes storing the new physical home for the logical
register being
changed, into the predicted map at a location indexed by the
logical register being changed.
8. The method of claim 7, wherein the step of maintaining a set of
valid bits includes setting a valid bit that corresponds to the logical
register being
changed.
9. The method of claim 8, wherein the step of maintaining in a
backup map includes addressing the log to identify a logical register
that was changed a
backup time ago, identifying from the freelist a physical home that
was assigned to the logical register that was changed a backup time ago,
and storing
this physical home into the backup map at a location indexed by the
logical register that was changed a backup time ago.
10. A method of assigning physical homes to logical registers used
in executing instructions processed through a computer pipeline,
comprising the steps
of:
receiving in a register mapping station a new instruction to
be executed during a current cycle;
identifying the logical register that is being written to in
the current cycle;
identifying the logical registers that are being read in the
current cycle;
maintaining a log that contains a sequential listing of which
of the m logical registers were changed in each of t cycles preceding
the current cycle,
the step of maintaining the log
including the step of updating the log to include the register
written to during the current cycle;
maintaining a backup map by using the sequential listing of
the log, the backup map comprising a map of physical homes associated
with the
logical registers at a backup point, wherein the backup point
is a preselected number of cycles preceding the current cycle and
wherein the
preselected number of cycles is equal to, or less than, t;
maintaining in a predicted map, a map of the physical home
associated with each one of the m logical registers during the current
cycle;
maintaining a set of valid bits, one valid bit being
associated with each of the logical registers;
determining the state of valid bits associated with each of
the logical registers that are being read in the current cycle;
providing a physical home for each logical register that is
being read in the current cycle from the backup map if the state of the
valid bit
associated with that logical register is in a first state, and
from the predicted map if the state of the valid bit associated with
that logical register is in
a second state; and
assigning from a freelist a new physical home to the logical
register that is being written to in the current cycle;
wherein the number of physical homes is greater that the
number of logical registers.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

5430888 : Pipeline utilizing an integral cache for transferring data to
and from a register

INVENTORS:
Witek; Richard T., Littleton, MA
Williams; Douglas D., Pepperell, MA
Stanley; Timothy J., Leominster, MA
Fenwick; David M., Chelmsford, MA
Burns; Douglas J., Billerica, MA
Stamm; Rebecca L., Newton, MA

Heye; Richard, Somerville, MA


ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:

July 4 , 1995

FILED:
Oct. 26, 1993
SERIAL NUMBER:
143509

FEE STATUS:

INTL. CLASS (Ed. 6):
G06F 12/08;
U.S. CLASS:
395-800; 364-243.4; 364-260; 364-256.3; 364-271.5;


364-DIG.1; 395-425;
FIELD OF SEARCH:

395-800,425 ;
AGENTS:
Maloney; Denis G.; Fisher; Arthur W.;


ABSTRACT: A load/store pipeline in a computer processor for loading
data to registers and storing data from the registers has a cache memory
within the
pipeline for storing data. The pipeline includes buffers which support
multiple outstanding read request misses. Data from out of the pipeline
is obtained
independently of the operation of the pipeline, this data corresponding
to the request misses. The cache memory can then be filled with the

requested for data.
The provision of a cache memory within the pipeline, and the buffers for


supporting the cache memory, speed up loading operations for the
computer
processor.


5430888 : Pipeline utilizing an integral cache for transferring data to
and from a register


2 CLAIMS:

What is claimed is:

1. A load/store pipeline in a scalar processor for loading data to
registers and storing data from said registers, the pipeline comprising:
a) a register file which holds the results of an arithmetic
logic operation;
b) a translation buffer and a cache tag look up which are both
coupled in parallel to the register file and receive a virtual address
from the register
file, said translation buffer performing a translation of the
virtual address into a physical address, the cache tag look up
performing a look up on
untranslated bits of said virtual address;
c) a comparator for comparing an output of said cache tag look
up and an output of said translation buffer and producing a hit or miss
signal
based on said comparison;
d) a data cache coupled to said register file which stores or
retrieves data when there is a hit signal generated by said comparator;
e) an output fifo coupled to said translation buffer which
sends information out of said pipeline when there is a miss signal
generated by said
comparator to request data which is to be filled into said
data cache;
f) an input buffer coupled to said data cache which receives
data from out of said pipeline which is to be filled into said data
cache; and
g) a memory reference tag which sends the tag corresponding to
the data received by the input buffer to be stored in said cache tag
look up.
2. A pipeline in a computer processor for loading data from a cache
memory to registers and storing data from the registers to the cache
memory, the
pipeline comprising:
a) an address generator operating to generate address
information entries serially, each one of the address information
entries relating to a
corresponding one of a series of data entries;
b) a cache tag look-- up and comparator device coupled to the
address generator to receive the address information entries serially
from the
address generator and operating to perform a cache tag look13
up and tag compare operation, serially, for each one of the address
information
entries, the cache tag look-- up and comparator device
outputting one of a hit or miss signal as a result of the cache tag
look-- up and compare
operation;
c) a data cache including an input coupled to each of the
address generator to receive the address information entries serially
from the address
generator and the cache tag look-- up and comparator device to
receive the one of a hit or miss signal, and an output coupled to the
registers;
d) the data cache operating to selectively perform in respect
of each of the data entries corresponding to the address information
entries received
from the address generator, one of a load of the each of the
data entries from the cache memory to the registers and a store of each
one of the
data entries from the registers to the cache memory only if
the cache tag look-- up and comparator device output generates a hit for
the address
information entry received from the address generator
corresponding to the data entry; and
e) the coupling between the data cache and the address
generator including a delay device operating to delay the receiving of
each one of the
address information entries by the data cache relative to the
receiving of the each of the address information entries by the cache
tag look-- up and
comparator device by at least one cycle time, said at least
one cycle time equal to at least an access time of the cache tag look--
up and
comparator device.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

5568624 : Byte-compare operation for high-performance processor

INVENTORS:
Sites; Richard L., Boylston, MA


Witek; Richard T., Littleton, MA

ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:

Oct. 22, 1996

FILED:
Aug. 13, 1993
SERIAL NUMBER:
106316

FEE STATUS:

INTL. CLASS (Ed. 6):
G06F 7/02; G06F 9/00;
U.S. CLASS:
395-375; 364-247; 364-254.9; 364-259.2; 364-259.8;
364-262.4; 364-262.81; 364-DIG.1;
FIELD OF SEARCH:
395-375,800 ;

AGENTS:
Maloney; Denis G.; Fisher; Arthur W.;


ABSTRACT: A high-performance CPU of the RISC (reduced instruction set)
type employs a standardized, fixed instruction size, and permits only
simplified
memory access data width and addressing modes. The instruction set is
limited to register-to-register operations and register load/store
operations. Byte
manipulation instructions, included to permit use of
previously-established data structures, include the facility for doing
in-register byte extract, insert and
masking, along with non-aligned load and store instructions. The
provision of load/locked and store/conditional instructions permits the
implementation of
atomic byte writes. By providing a conditional move instruction, many
short branches can be eliminated altogether. A conditional move
instruction tests a
register and moves a second register to a third if the condition is met;
this function can be substituted for short branches and thus maintain
the sequentiality of
the instruction stream.

5568624 : Byte-compare operation for high-performance processor


19 CLAIMS:

What is claimed is:

1. A method of operating a single-chip processor of the type having
an on-chip register set having a plurality of registers, said method
comprising the
steps of:
by executing a first instruction, loading to a first register
of said register set a first eight-byte value; said first register being
identified in said register
set by a field of said first instruction;
by executing a second instruction, loading to a second
register of said register set a second eight-byte value; said second
register being identified
in said register set by a field of said second instruction;
by executing a third instruction, comparing the contents of
said first and second registers to produce a one-byte value in a third
register of said
register set; said first, second, and third registers being
identified in said register set by first, second and third fields,
respectively, of said third
instruction; said one-byte value consisting of a result of a
byte-by-byte comparison of each of the bytes in said first eight-byte
value and in said
second eight-byte value, said one-byte value containing eight
bits with each one of said eight bits representing a result of comparing
one byte of
said first eight-byte value to one byte of said second
eight-byte value;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor.
2. A method according to claim 1 wherein said one-byte value in
said third register is loaded in the low-order byte of said third
register and
zero-extended.
3. A method of operating a single-chip processor of the type having
an on-chip register set having a plurality of registers, said method
comprising the
steps of:
by executing a first instruction, loading to a first register
of said register set a first eight-byte value; said first register being
identified in said register
set by a field of said first instruction;
by executing a second instruction, loading to a second
register of said register set a second eight-byte value; said second
register being identified
in said register set by a field of said second instruction;
by executing a third instruction, loading to a third register
of said register set a third eight-byte value; said third register being
identified in said
register set by a field of said third instruction;
by executing a fourth instruction, loading to a fourth
register of said register set a fourth eight-byte value; said fourth
register being identified in said
register set by a field of said fourth instruction;
by executing a fifth instruction, comparing the contents of
said first and second registers to produce a first one-byte value in a
fifth register of said
register set; said first, second, and fifth registers being
identified in said register set by first, second and third fields,
respectively, of said fifth
instruction; said first one-byte value consisting of a result
of a byte-by-byte comparison of each of the bytes in said first
eight-byte value and in
said second eight-byte value, said first one-byte value
containing eight bits with each one of said eight bits of said first
one-byte value representing
a result of comparing one byte of said first eight-byte value
to one byte of said second eight-byte value; and
by executing a sixth instruction, comparing the contents of
said third and fourth registers to produce a second one-byte value in a
sixth register,
said second one-byte value consisting of a result of a
byte-by-byte comparison of each of the bytes in said third eight-byte
value in said third
register and in said fourth eight-byte value in said fourth
register, said second one-byte value containing eight bits with each one
of said eight bits
of said second one-byte value representing a result of
comparing one byte of said third eight-byte value from said third
register and one byte of
said fourth eight-byte value from said fourth register;
wherein said first, second, third, fourth, fifth and sixth
registers are interchangeable registers of said register set, said
interchangeable registers
being general purpose registers accessible by instructions
executed by said processor.
4. A method of operating a single-chip processor of the type having
an on-chip register set having a plurality of registers, said method
comprising the
steps of:
by executing a first instruction, loading to a first register
of said register set a first eight-byte value; said first register being
identified in said register
set by a field of said first instruction;
by executing a second instruction, loading to a second
register of said register set a second eight-byte value; said second
register being identified
in said register set by a field of said second instruction;
by executing a third instruction, comparing the contents of
said first and second registers to produce a one-byte value in a third
register of said
register set; said first, second, and third registers being
identified in said register set by first, second and third fields,
respectively, of said third
instruction; said one-byte value consisting of a result of a
byte-by-byte comparison of each of the bytes in said first eight-byte
value and in said
second eight-byte value, said one-byte value containing eight
bits with each one of said eight bits representing a result of comparing
one byte of
said first eight-byte value and one byte of said second
eight-byte value;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor; and
wherein said one-byte value in said third register is loaded
in a low-order byte of said third register and zero-extended; and
wherein each bit of said low-order byte is set to 1 if the
corresponding byte of the value in said first register is greater than
or equal to the
corresponding byte of the value in said second register.
5. A method of operating a single-chip processor of the type having
an on-chip register set having a plurality of registers, said method
comprising the
steps of:
by executing a first instruction, loading to a first register
of said register set a first eight-byte value; said first register being
identified in said register
set by a field of said first instruction;
by executing a second instruction, loading to a second
register of said register set a second eight-byte value; said second
register being identified
in said register set by a field of said second instruction;
by executing a third instruction, comparing the contents of
said first and second registers to produce a one-byte value in a third
register of said
register set; said first, second, and third registers being
identified in said register set by first, second and third fields,
respectively, of said third
instruction; said one-byte value consisting of a result of a
byte-by-byte comparison of each of the bytes in said first eight-byte
value and in said
second eight-byte value, said one-byte value containing eight
bits with each one of said eight bits representing a result of comparing
one byte of
said first eight-byte value one byte of said second eight-byte
value;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor; and
wherein the method further includes the steps of:
in one instruction, setting selected bytes from a fourth
register to zero in accordance with the contents of said one-byte value
in said third
register, and copying into a fifth register the content
of said fourth register modified by said setting of the selected bytes
from the fourth
register to zero.
6. A method of operating a single-chip processor of the type having
an on-chip register set having a plurality of registers, said method
comprising the
steps of:
by executing a first instruction, loading to a first register
of said register set a first eight-byte value; said first register being
identified in said register
set by a field of said first instruction;
by executing a second instruction, loading to a second
register of said register set a second eight-byte value; said second
register being identified
in said register set by a field of said second instruction;
by executing a third instruction, comparing the contents of
said first and second registers to produce a one-byte value in a third
register of said
register set; said first, second, and third registers being
identified in said register set by first, second and third fields,
respectively, of said third
instruction; said one-byte value consisting of a result of a
byte-by-byte comparison of each of the bytes in said first eight-byte
value and in said
second eight-byte value, said one-byte value containing eight
bits with each one of said eight bits representing a result of comparing
one byte of
said first eight-byte value to one byte of said second
eight-byte value;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor; and
wherein said first, second and third registers are each
separate eight-byte registers, and said steps of loading and comparing
are performed by
instructions of fixed, four-byte length.
7. A method of operating a processor, said processor having a
register set including a plurality of registers, said method comprising
the steps of:
by a first instruction executed by said processor, loading to
a first register of said register set a first N-byte value having N
bytes, where N is an
integer;
by a second instruction executed by said processor, loading to
a second register a second N-byte value having N bytes; and
by a third instruction executed by said processor, loading to
a third register of said register set a third N-byte value having N
bytes;
by a fourth instruction executed by said processor, loading to
a fourth register a fourth N-byte value having N bytes;
by a fifth instruction executed by said processor, in response
to the contents of said first and second registers, loading an N-bit
value to a fifth
register, said N-bit value consisting of a result of a
byte-by-byte comparison of said first N-byte value to said second N-byte
value, each one of
the bits of said N-bit value representing a result of
comparing one byte of said first N-byte value and one byte of said
second N-byte value, said
third instruction identifying said first, second and fifth
registers;
by a sixth instruction executed by said processor, in response
to the contents of said third and fourth registers, loading an N-bit
value to a sixth
register, said N-bit value consisting of a result of a
byte-by-byte comparison of said third N-byte value in said third
register to said fourth N-byte
value in said fourth register, said sixth instruction
identifying said third, fourth, and sixth registers by fields of said
sixth instruction;
wherein said first, second, third, fourth, fifth, and sixth
registers are interchangeable registers of said register set, said
interchangeable registers
being general purpose registers accessible by instructions
executed by said processor.
8. A method of operating a processor, said processor having a
register set including a plurality of registers, said method comprising
the steps of:
by a first instruction executed by said processor, loading to
a first register of said register set a first N-byte value having N
bytes, where N is an
integer;
by a second instruction executed by said processor, loading to
a second register a second N-byte value having N bytes; and
by a third instruction executed by said processor, in response
to the contents of said first and second registers, loading an N-bit
value to a third
register, said N-bit value consisting of a result of a
byte-by-byte comparison of said first N-byte value to said second N-byte
value, each one of
the bits of said N-bit value representing a result of
comparing one byte of said first N-byte value and one byte of said
second N-byte value, said
third instruction identifying said first, second and third
registers;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor;
wherein said N-bit value is loaded into a low-order portion of
said third register and said N-bit value is zero-extended in said third
register; and
wherein each bit of said low-order portion of said third
register is set to 1 if the corresponding byte of the value in said
first register is greater than
or equal to the corresponding byte of the value in said second
register.
9. A method of operating a processor, said processor having a
register set including a plurality of registers, said method comprising
the steps of:
by a first instruction executed by said processor, loading to
a first register of said register set a first N-byte value having N
bytes, where N is an
integer;
by a second instruction executed by said processor, loading to
a second register a second N-byte value having N bytes; and
by a third instruction executed by said processor, in response
to the contents of said first and second registers, loading an N-bit
value to a third
register, said N-bit value consisting of a result of a
byte-by-byte comparison of said first N-byte value to said second N-byte
value, each one of
the bits of said N-bit value representing a result of
comparing one byte of said first N-byte value and one byte of said
second N-byte value, said
third instruction identifying said first, second and third
registers;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor; and
wherein the method further includes the steps of:
by a fourth instruction executed by said processor,
setting selected bytes from a fourth register to zero in accordance with
the contents of
said N-bit value in said third register, and copying into
a fifth register the content of said fourth register modified by said
setting of the
selected bytes from the fourth register to zero.
10. A single-chip processor having a register set, said single-chip
processor comprising:
means for loading to a first register of said register set a
first N-byte value having N bytes, where N is an integer, in response to
execution of a
first instruction identifying said first register;
means for loading to a second register of said register set a
second N-byte value having N bytes, in response to execution of a second
instruction
identifying said second register; and
compare means responsive to execution of a third instruction
and coupled to receive the contents of said first and second registers
and loading a
first N-bit value to a third register, said first N-bit value
consisting of a result of a byte-by-byte comparison of bytes of said
first N-byte value to
bytes of said second N-byte value, each one of the bits of
said first N-bit value representing a result of comparing one byte of
said first N-byte
value and one byte of said second N-byte value, said third
instruction identifying said first, second, and third registers;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor; and
further including:
means for loading to a fourth register a third N-byte
value having N bytes; and
means for loading to a fifth register a fourth N-byte
value having N bytes;
said compare means including means, responsive to execution of
a fourth instruction and responsive to the contents of said fourth and
fifth
registers, for loading a second N-bit value to a sixth
register, said second N-bit value consisting of a result of a
byte-by-byte comparison of bytes
in said fourth register to bytes in said fifth register, each
one of the bits of said second N-bit value representing a result of
comparing one byte of
said third N-byte value and one byte of said fourth N-byte
value, said fourth instruction identifying said fourth, fifth, and sixth
registers.
11. A single-chip processor having a register set, said single-chip
processor comprising:
means for loading to a first register of said register set a
first N-byte value having N bytes, where N is an integer, in response to
execution of a
first instruction identifying said first register;
means for loading to a second register of said register set a
second N-byte value having N bytes, in response to execution of a second
instruction
identifying said second register; and
compare means responsive to execution of a third instruction
and coupled to receive the contents of said first and second registers
for loading an
N-bit value to a third register, said N-bit value consisting
of a result of a byte-by-byte comparison of bytes of said first N-byte
value to bytes of
said second N-byte value, each one of the bits of said N-bit
value representing a result of comparing one byte of said first value
and one byte of
said second value, said third instruction identifying said
first, second, and third registers;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor;
wherein said one-byte value is loaded into the low-order byte
of said third register and is zero-extended in said third register; and
wherein said compare means includes means for setting each bit
of said low-order byte to 1 if the corresponding byte of the value in
said first
register is greater than or equal to the corresponding byte of
the value in said second register.
12. A single-chip processor having a register set, said single-chip
processor comprising:
means for loading to a first register of said register set a
first N-byte value having N bytes, where N is an integer, in response to
execution of a
first instruction identifying said first register;
means for loading to a second register of said register set a
second N-byte value having N bytes, in response to execution of a second
instruction
identifying said second register; and
compare means responsive to execution of a third instruction
and coupled to receive the contents of said first and second registers
for loading an
N-bit value to a third register, said N-bit value consisting
of a result of a byte-by-byte comparison of bytes of said first N-byte
value and said
second N-byte value, each one of the bits of said N-bit value
representing a result of comparing one byte of said first N-byte value
and one byte
of said second N-byte value, said third instruction
identifying said first, second, and third registers;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor; and
further including:
means responsive to execution of a fourth instruction for
setting selected bytes from a fourth register to zero in accordance with
the contents
of said N-bit value in said third register and for
copying into a fifth register the content of said fourth register
modified by said setting of the
selected bytes from the fourth register to zero.
13. A method of operating a single-chip processor of the type
having an on-chip register set having a plurality of registers, said
method comprising:
by executing a first instruction, loading to a first register
of said register set a first eight-byte value; said first register being
identified in said register
set by a field of said first instruction;
by executing a second instruction, loading to a second
register of said register set a second eight-byte value; said second
register being identified
in said register set by a field of said second instruction;
by executing a third instruction, comparing the contents of
said first and second registers to produce a one-byte value in a third
register of said
register set; said first, second, and third registers being
identified in said register set by first, second and third fields,
respectively, of said third
instruction; said one-byte value consisting of a result of a
byte-by-byte comparison of each of the bytes in said first eight-byte
value and in said
second eight-byte value, said one-byte value containing eight
bits with each one of said eight bits representing a result of comparing
one byte of
said first eight-byte value to one byte of said second
eight-byte value; and
wherein said eight-bit value is loaded into a low-order
portion of said third register and is zero-extended in said third
register; and
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor.
14. The method of claim 13, wherein said one-byte value in said
third register is loaded in the low-order byte of said third register
and zero-extended.
15. A method of operating a processor, said processor having a
register set including a plurality of registers, said method comprising:
by a first instruction executed by said processor, loading to
a first register of said register set a first eight-byte value;
by a second instruction executed by said processor, loading to
a second register a second eight-byte value; and
by a third instruction executed by said processor, in response
to the contents of said first and second registers, loading an eight-bit
value to a third
register, said eight-bit value consisting of a result of a
byte-by-byte comparison of said first eight-byte value to said second
eight-byte value, each
one of the bits of said eight-bit value representing a result
of comparing one byte of said first eight-byte value and one byte of
said second
eight-byte value, said third instruction identifying said
first, second and third registers;
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor.
16. The method of claim 15, wherein each of said first, second and
third instructions is of the same fixed length, and said third
instruction contains the
addresses of said first, second and third registers.
17. The method of claim 15, wherein said eight-bit value in said
third register is loaded into a low-order portion of said third register
and zero-extended.
18. A single-chip processor having a register set, said single-chip
processor comprising:
means for loading to a first register of said register set a
first N-byte value having N bytes, where N is an integer, in response to
execution of a
first instruction identifying said first register;
means for loading to a second register of said register set a
second N-byte value having N bytes, in response to execution of a second
instruction
identifying said second register; and
compare means responsive to execution of a third instruction
and coupled to receive the contents of said first and second registers
for loading an
N-bit value to a third register, said N-bit value consisting
of a result of a byte-by-byte comparison of bytes of said first N-byte
value and said
second N-byte value, each one of the bits of said N-bit value
representing a result of comparing one byte of said first N-byte value
and one byte
of said second N-byte value, said third instruction
identifying said first, second, and third registers;
wherein said N-bit value is loaded into a low-order portion of
said third register and is zero-extended in said third register; and
wherein said first, second and third registers are
interchangeable registers of said register set, said interchangeable
registers being general purpose
registers accessible by instructions executed by said
processor.
19. The processor of claim 18, wherein said functions of loading
are each done by a separate single instruction executed by said
processor.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

Digital claims to have invented multiprocessor cache consistency,
in particular cache protocols using altered (M) states.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

This post is personal. I am presently a graduate student
at the University of Wisconsin, but I remain affiliated with Intel,
work there in the summers, and intend to return to work at Intel
when I finish my Ph.D.

Digital is accusing me of stealing their inventions. Righteous anger
compels me to make some minimal counter-statements. Because I am not
presently
at Intel, the lawyers have not yet told me to shut up, although I am
sure
that they soon will.

This post is personal. It is not a statement of Intel or the University
of
Wisconsin.

What I have posted from the IBM patent server is straight fact, raw
material.
If I yield to the temptation to summarize or comment upon the patents,
that is my own opinion, not Intel's.

Jonathan Kirwan

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

Here are the patent abstracts for those patents named in the
recent DEC/Intel suit.

4,755,936: Apparatus and method for providing a cache memory unit


with a write operation utilizing two system clock cycles

Filed: Jan. 29, 1986
Issued: Jul. 5, 1988

8 Claims, 6 Drawing Figures

A cache memory unit is disclosed in which, in response to the
application of a write command, the write operation is performed
in two system clock cycles. During the first clock cycle, the data
signal group is stored in a temporary storage unit while a
determination is made if the address signal group associated with
the data signal group is present in the cache memory unit. When
the address signal group is present, the data signal group is

stored in the cache memory unit during the next application of a


write command to the cache memory unit. If a read command is
applied to the cache memory unit involving the data signal group
stored in the temporary storage unit, then this data signal group
is transferred to the central processing unit in response to the
read command. Instead of performing the storage into the cache

memory unit as a result of the next write command, the storage of


the data signal in the cache memory unit can occur during any free
cycle.

4,847,804: Apparatus and method for data copy consistency in a
multi-cache data processing unit
Filed: Mar. 1, 1988
Issued: Jul. 11, 1989
Continuation of (including streamline cont.) Ser. No.698,364, Feb.
5, 1985, abandoned.

6 Claims, 7 Drawing Figures

In a multi-processor unit data processing system, apparatus and
method are described for providing that only the most recent
version of any data signal group will be available for
manipulation by a requesting data processing unit. A "multiple"
state for a data signal group is defined by the presence of a
particular data signal group stored in the cache memory units of a
plurality of data processing units. The "multiple" state is
associated with each copy of a data signal group by control
signals. When a data signal group is changed by the local data
processing unit, an "altered" state is associated with the new
data signal group. The simultaneous presence of an "altered" state
and "multiple" state is forbidden and requires immediate response
by the data processing system to insure consistency among the data
signal groups. In addition to apparatus for identifying and
storing the state of the data signal groups, apparatus must be
provided for communication of the selected states to the data
processing units.

5,091,845: System for controlling the storage of information in a
cache memory
Filed: Aug. 11, 1989
Issued: Feb. 25, 1992
Division of Ser No. 300,755, Jan. 23, 1989, abandoned, which is a
continuation of Ser. No. 17,517, Feb. 24, 1987, abandoned.

25 Claims, 15 Drawing Figures

The invention provides a system for controlling the storage of
information in a cache memory and features a processor to be
connected to a bus, the bus including information signal transfer
lines for transferring information signals and a cache control
signal transfer line for transferring a cache control signal
having a plurality of conditions, the processor including a cache
memory and a bus interface circuit connected to the cache memory
and for connection to the bus, the bus interface circuit
including: i. an information signal transfer circuit for
performing a read operation in which it receives information
signals from the information signal transfer lines, the
information signal transfer circuit transferring the received
information signals to the cache memory; and ii. a cache control
circuit connected to the cache memory and the information signal
transfer circuit and for connection to the cache control signal
transfer line for controlling whether the received information is
to be stored in the cache memory in response to the condition of
the cache control signal.

5,125,083: Method and apparatus for resolving a variable number of


potential memory access conflicts in a pipelined computer system

Filed: Feb. 3, 1989
Issued: Jun. 23, 1992

16 Claims, 9 Drawing Figures

An operand processing unit delivers a specified address and at
least one read/write signal in response to an instruction being a
source of destination operand, and delivers the source operand to
an execution unit in response to completion of the preprocessing.
The execution unit receives the source operand, executes it and
delivers the resultant data to memory. A "write queue" receives
the write addresses of the destination operands from the operand
processing unit, stores the write addresses, and delivers the
stored preselected addresses to memory in response to receiving
the resultant data corresponding to the preselected address. The
addresses of the source operand is compared to the write addresses
stored in the write queue, and the operand processing unit is
stalled whenever at least one of the write addresses in the write
queue is equivalent to the read address. Therefore, fetching of
the operand is delayed until the corresponding resultant data has
been delivered by the execution unit.

5,148,536: Pipeline having an integral cache which processes cache


misses and loads data in parallel

Filed: Jul. 25, 1988
Issued: Sept. 15, 1992

27 Claims, 25 Drawing Figures

A load/store pipeline in a computer processor for loading data to
registers and storing data from the registers has a cache memory
within the pipeline for storing data. The pipeline includes
buffers which support multiple outstanding read request misses.
Data from out of the pipeline is obtained independently of the
operation of the pipeline, this data corresponding to the request

misses. The cache memory can then be filled with the data that has

been requested. The provision of a cache memory within the


pipeline, and the buffers for supporting the cache memory, speed
up loading operations for the computer processor.

5,179,673: Subroutine return prediction mechanism using ring


buffer and comparing predicated address with actual address to
validate or flush the pipeline

Filed: Dec. 18, 1989
Issued: Jan. 12, 1993

5 Claims, 3 Drawing Figures

A method and arrangement for producing a predicted subroutine
return address in response to entry of a subroutine return
instruction in a computer pipeline that has a ring pointer counter
and a ring buffer coupled to the ring pointer counter. The ring
pointer counter contains a ring pointer that is changed when
either a subroutine call instruction or return instruction enters
the computer pipeline. The ring buffer has buffer locations which
store a value present at its input into the buffer location
pointed to by the ring pointer when a subroutine call instruction
enters the pipeline. The ring buffer provides a value from the
buffer location pointed to by the ring pointer when a subroutine
return instruction enters the computer pipeline, this provided
value being the predicted subroutine return address.

5,197,132: Register mapping system having a log containing


sequential listing of registers that were changed in preceding
cycles for precise post-branch recovery

Filed: Jun. 29, 1990
Issued: Mar. 23, 1993

10 Claims, 2 Drawing Figures

A register map having a free list of available physical locations
in a register file, a log containing a sequential listing of
logical registers changed during a predetermined number of cycles,
a back-up map associating the logical registers with corresponding
physical homes at a back-up point in a computer pipeline operation
and a predicted map associating the logical registers with
corresponding physical homes at a current point in the computer
pipeline operation. A set of valid bits is associated with the
maps to indicate whether a particular logical register is to be
taken from the back-up map or the predicted map indication of a
corresponding physical home. The valid bits can be "flash cleared"
in a single cycle to back-up the computer pipeline to the back-up
point during a trap event.

5,394,529: Branch prediction unit for high-performance processor
Filed: Jul. 1, 1993
Issued: Feb. 28, 1995
Continuation of (including streamline cont.) Ser. No.547,804, Jun.
29, 1990, abandoned.

24 Claims, 27 Drawing Figures

A pipelined CPU executes instructions of variable length, and
references memory using various data widths. Macroinstruction
pipelining is employed (instead of microinstruction pipelining),
with queueing between units of the CPU to allow flexibility in
instruction execution times. A branch prediction method employs a
branch history table which records the taken vs. not-taken history
of branch opcodes recently used, and uses an empirical aglorithm
to predict which way the next occurrence of this branch will go,
based upon the history table. The branch history table stores in
each entry a number of bits for each branch address, each bits
indicating "taken" or "not-taken" for one occurrence of the
branch. The table is indexed by branch address. A register stores
the empirical aglorithm, and upon occurrence of a branch its
history is fetched from the table and used to select a location in
the register containing a prediction for this particular pattern
of branch history.

5,430,888: Pipeline utilizing an integral cache for transferring


data to and from a register

Filed: Oct. 26, 1993
Issued: Jul. 4, 1995
Continuation of (including streamline cont.) Ser. No.599,405, Oct.
17, 1990, abandoned, which is a division of Ser. No 224,483, Jul.
25, 1988, Pat. No. 5,148,536.

2 Claims, 25 Drawing Figures

A load/store pipeline in a computer processor for loading data to
registers and storing data from the registers has a cache memory
within the pipeline for storing data. The pipeline includes
buffers which support multiple outstanding read request misses.
Data from out of the pipeline is obtained independently of the
operation of the pipeline, this data corresponding to the request
misses. The cache memory can then be filled with the requested for
data. The provision of a cache memory within the pipeline, and the
buffers for supporting the cache memory, speed up loading
operations for the computer processor.

5,568,624: Byte-compare operation for high-performance processor
Filed: Aug. 13, 1993
Issued: Oct. 22, 1996
Continuation of (including streamline cont.) Ser. No.547,992, Jun.
29, 1990, abandoned.

19 Claims, 11 Drawing Figures

Al Germaine

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

bor...@netcom.com (Douglas Borsom) wrote:

>>[Andy Glew's generously provided patent details deleted]
>
>Great Wog!!!
>
>Digital expects to defend these in a jury trial?
>
> -doug

I think a jury will be hard pressed to see the difference (whatever
there is) in the way Intel accomplishes the same thing. They are
likely to give DEC something. This threat will keep a cloud over Intel
for a long time.
Intel could counter sue, even if the grounds are not strong. They
would be putting the same sort of threat to DEC. Then they might go
for a settlement.

ATG


John Hascall

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

Tony Tribelli <a...@netcom.com> wrote:
}ri...@alpha.delta.edu (Rich Adams) wrote:
}> - DEC put 1 year into research and preparation for this litigation
}> and has every intention of pushing Intel back into a technological
}> stone age. Thus opening a very large market for Alpha.
}
}A theoretically large market that will probably go to x86 clones and
}PowerPC, not to Alpha despite its technical superiority.

What do you bet DEC just entered a very cozy agreement with AMD...

}> Palmer's language is certainly take-no-prisoners, I'm sure it prompted a
}> few meetings in Santa Clara. Oh, to be a fly on Andy Grove's wall...
}
}Probably just huffing and puffing in hopes of negotiating a better
}licensing deal, assuming he has a case.

Assuming he has a case (a big assuming at this point),
he has Intel by the short ones. What do you suppose
triple damages times XX million Pentium/PPros already
sold is? Then Intel has to try to license the patents
to continue selling them -- what if DEC says "$900 per"
is our price? What if they have to go back to the
drawing board for a couple years? What will happen to
their stock -- how will they capitalize the next gen fab?

All the above is, of course, very speculative, but I
know this -- when my employer, ISU, found out that
just about every fax machine on the planet was using
a technology we happened to have a patent on, big, big
wads of cash arrived here in a hurry to settle up
(if we had actually been competing in the fax business,
I suspect we'd have been much more hardball too).

John
--
John Hascall, Software Engr. Shut up, be happy. The conveniences you
ISU Computation Center demanded are now mandatory. -Jello Biafra
mailto:jo...@iastate.edu
http://www.cc.iastate.edu/staff/systems/john/welcome.html <-- the usual crud

Robert Harley

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

Paul Ayers <pa...@netcom.com> writes:
>Robert Harley wrote:
>>[...]

>> `-' Linux - 500MHz Alpha - 256MB SDRAM `-'
> ^^^^^^^^^^^^
>Rob,
>I've been thinking of running Linux with a new Alpha chip.
>Would you recommend this?
Yes, definitely.

>How's it working for you?

Absolutely brilliant.

>Thanks in advance!
You're welcome!

-- Rob.
PS: It's fast, reliable, powerful, fast, stable and fast.
If you want Unix and speed for a reasonable number of pesos,
it's a combination that can't be beaten.

Clark L. Coleman

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

In article <337B17B6...@cs.wisc.edu>,

Andy Glew <gl...@cs.wisc.edu> wrote:
>This post is personal. I am presently a graduate student
>at the University of Wisconsin, but I remain affiliated with Intel,
>work there in the summers, and intend to return to work at Intel
>when I finish my Ph.D.
>
>Digital is accusing me of stealing their inventions.

No, they aren't. They are saying that their patents were infringed
upon.

Patent infringement need not be intentional in order for back
royalties plus interest to be awarded. The suit does not have to be
taken as saying anything about knowledge or premeditation on the part
of Intel. Perhaps Intel engineers were not aware that DEC is the
inventor of branch prediction, cache memory management, etc. :-)

Try to take things less personally until DEC publicly alleges illegal
intent on Intel's part.


--
--------------------------------------------------------------------------
"I have prevented my kids from watching MTV at home. It's not safe for kids."
---- Tom Freston, MTV president, 4/14/95 Buffalo News.
||| cl...@virginia.edu (Clark L. Coleman) http://www.cs.virginia.edu/~clc5q

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

I am starting this new stream to hold actual, real, information
about the patents DEC has listed in its action against Intel.

I am obtaining the abstracts and the claims from IBM's patent server,
http://patent.womplex.ibm.com/

[I see no warning that I should not post information from this server
here, although I would not be at all surprised to find myself accused
of wilfully crossposting...]

Those familiar with patents will be aware that, while the abstract
and description helps understand what the patent is, the claims are
the actual meat of the matter.

I am sure that it is unwise for me to comment on this case.
But I cannot see that publicizing the basis is bad.

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

4847804 : Apparatus and method for data copy consistency in a
multi-cache data processing unit

INVENTORS:
Shaffer; Stephen J., Harvard, MA
Warren; Richard A., Austin, TX


ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:

July 11, 1989

FILED:
Mar. 1 , 1988
SERIAL NUMBER:
166814

FEE STATUS:

INTL. CLASS (Ed. 4):
G06F 9/00;
U.S. CLASS:
364-900
FIELD OF SEARCH:
364-200,900 ;
AGENTS:
Holloway; William W.; Paciulan; Richard J.;


ABSTRACT: In a multi-processor unit data processing system, apparatus


and method are described for providing that only the most recent version
of any data
signal group will be available for manipulation by a requesting data
processing unit. A "multiple" state for a data signal group is defined
by the presence of a
particular data signal group stored in the cache memory units of a
plurality of data processing units. The "multiple" state is associated
with each copy of a data
signal group by control signals. When a data signal group is changed by
the local data processing unit, an "altered" state is associated with
the new data signal
group. The simultaneous presence of an "altered" state and "multiple"
state is forbidden and requires immediate response by the data
processing system to
insure consistency among the data signal groups. In addition to
apparatus for identifying and storing the state of the data signal
groups, apparatus must be
provided for communication of the selected states to the data processing
units.


4847804 : Apparatus and method for data copy consistency in a
multi-cache data processing unit


6 CLAIMS:

What is claimed is:

1. A cache memory unit for use in a data processing system, said
data processing system including a plurality of data processing units
coupled to a
system bus, each data processing unit having a cache memory unit
with a write back or write thru mode of operation whereby the result of
every
associated data processing unit operation is stored into the cache
memory unit associated therewith, comprising:

a plurality of addressable storage locations for storing

signal groups identified by and associated with an address signal group;
a status register means associated with each of said locations
for storing status signals identifying parameters of a signal group
stored in said
associated locations;
activity means for applying an associated address signal group
to said system bus for each signal group in said main memory unit
retrieved by said
associated data processing unit, said activity means for
applying a first first control signal and an associated address signal
group to said system
bus for each signal group altered by said associated data
processing unit, said activity means for applying a second control
signal to said system
bus when data processing unit not associated with said cache
memory unit retrieves a signal group from said main memory unit also
stored in said
cache memory unit; and
update means coupled to said system bus and responsive to
address signal groups to said control signals applied to said system bus
for changing
status signal in said status register associated for signal
groups stored in said cache memory unit.
2. The cache memory unit of claim 1 wherein said update means
changes said status signals associated with a signal group when said
signal group is
retrieved by said associated data processing unit.
3. The cache memory unit of claim 2 wherein said status signals
indicate when a signal group associated therewith includes a valid
signal group, when a
signal group associated therewith has been altered and when a
signal group associated therewith is stored in a plurality of cache
memory units.
4. The cache memory unit of claim 3 wherein status signals
associated with a signal group can designate altered data is stored in
only one cache memory
unit.
5. The cache memory unit of claim 4 wherein said status signals can
not designate that said associated signal group is altered and stored in
a plurality of
locations simultaneously.
6. The cache memory unit of claim 1 wherein said activity means
applies a first control signal to said system bus when a non-associated
data processing
unit attempts to retrieve a signal group from said main memory unit
when said cache memory unit has an altered instance of said signal
group.

Rich Adams

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

Tony Tribelli (a...@netcom.com) wrote:
: ri...@alpha.delta.edu (Rich Adams) wrote:
:
: > - DEC put 1 year into research and preparation for this litigation
: > and has every intention of pushing Intel back into a technological
: > stone age. Thus opening a very large market for Alpha.
:
: A theoretically large market that will probably go to x86 clones and
: PowerPC, not to Alpha despite its technical superiority. The typical PC

: buyer follows the applications, not the CPU with the best specs. I doubt
: FX32! will prove to be very convincing for these users.

You're looking at the wrong market, there's such a thing as Merced.
If Intel had to not just take a boot to the head on P5, P6 and PII, but
also had to run back to the drawing board on cache management, predictive
branching, etc... well... That's more time for DEC to establish the Alpha
as a standard. (and then roll out some 128 bit processor as Intel
launches mercend...)

All speculation, of course...

Douglas Borsom

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

Andy Glew

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

4755936 : Apparatus and method for providing a cache memory unit with a

write operation
utilizing two system clock cycles

INVENTORS:


Stewart; Robert E., Stow, MA
Flahive; Barry J., Westford, MA
Keller; James B., Arlington, MA

ASSIGNEES:
Digital Equipment Corporation, Maynard, MA
ISSUED:

July 5 , 1988

FILED:
Jan. 29, 1986
SERIAL NUMBER:
823805

FEE STATUS:

INTL. CLASS (Ed. 4):
G06F 13/00;
U.S. CLASS:
364-200
FIELD OF SEARCH:


364-200 MS File ;
AGENTS:
Holloway; William W.; Moran; Maura K.;


ABSTRACT: A cache memory unit is disclosed in which, in response to


the application of a write command, the write operation is performed in
two system
clock cycles. During the first clock cycle, the data signal group is
stored in a temporary storage unit while a determination is made if the
address signal group
associated with the data signal group is present in the cache memory
unit. When the address signal group is present, the data signal group is

stored in the cache


memory unit during the next application of a write command to the cache
memory unit. If a read command is applied to the cache memory unit
involving the

data signal group stored in the temporary storage unit, then this data


signal group is transferred to the central processing unit in response
to the read command.

Instead of performing the storage into the cache memory unit as a result


of the next write command, the storage of the data signal in the cache
memory unit can
occur during any free cycle.


4755936 : Apparatus and method for providing a cache memory unit with a


write operation
utilizing two system clock cycles


8 CLAIMS:

What is claimed is:

1. A cache memory unit associated with a central processing unit of

WRITE command and said first address is present in said cache
memory unit;

Joe

unread,
May 15, 1997, 3:00:00 AM5/15/97
to

> You're looking at the wrong market, there's such a thing as Merced.
> If Intel had to not just take a boot to the head on P5, P6 and PII, but
> also had to run back to the drawing board on cache management, predictive
> branching, etc... well... That's more time for DEC to establish the Alpha
> as a standard. (and then roll out some 128 bit processor as Intel
> launches mercend...)
>
> All speculation, of course...

Well, one can hope! ;-)

Joe.

Stefan Monnier

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

ri...@alpha.delta.edu (Rich Adams) writes:
> - Or could it be an effort to stall x86 and ia64 research long enough

I fail to see why this research should be stalled in any way by such an
attack. They know the problem will get solved somehow, so why stop or slowdown
the current research ?


Stefan

dew

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

In article <5ld5a6$l...@flood.weeg.uiowa.edu>,
Doug Siebert <douglas...@uiowa.edu> wrote:
}
}Any others?

My guess:

They will settle for a patent cross licensing agreement with Intel such
as AMD and TI have. With this they will design processors that
implement IA32 and IA64 instruction sets. DEC sees that the Intel
juggernaught cannot be stopped and they will use their considerable
processor design and fabrication technology to design a better "Intel"
processor than Intel.

Jonathan Kirwan

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

On Fri, 16 May 1997 00:03:05 -0500, Andy Glew <gl...@cs.wisc.edu>
wrote:

>I have, in a related thread, posted both the abstracts and the
>claims of Digital's patents.
>
>The claims are that which is actually protected.

I noticed, after I posted. Thanks for your effort.

Jon

Robert Harley

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

H.W. Stockman <hws...@swcp.com> writes:
>I previously found less than stellar performance
>for gcc under Linux, [...]
>
>Could you compile and run your hand-optimized
>lattice Boltzmann example under Linux, to compare
>with the impressive results you got under DEC Unix?

Sure.
The Linux PWS, Corton, has 2MB of L3 cache.
The Digital Unix AlphaStation, Mayday, has 8MB of L3 cache.
Both have 500MHz 21164a chips.

I compiled and ran with "gcc -O4 -funroll-loops" on both machines.
I also compiled with "cc -O5 -tune ev5 -migrate -std1 -non_shared" on
Mayday and ran the result on both machines. Here are the times for
the test case (in seconds) with the % change:

Corton Mayday
2MB cache 8MB cache

gcc 1556 -16% 1312

-17% -33% -20%

cc 1295 -19% 1048

This shows that the run-time decreases by ~18% by going from gcc to cc
(on this test) and that it decreases by ~17% by going from 2MB cache
to 8MB (on this test). Doing both gives 33%.

By the way, this is somewhat biased in that when I (re)wrote the
critical section of the code, I tweaked it for cc on Mayday. If I
were to go back and tune it for gcc on Corton, the differences would
be reduced by a few percent (but I'm too busy to do so right now).

Nevertheless the change in gcc->cc is fairly typical for
"Fortran-like" numerical code like that involved here. For other
types of code, munging pointers, integers and so on there is little
difference between the two compilers. If you make use of the gcc
extensions to C then it can win hands down (like on the Caml Light
runtime written by guys in the next office).

The change due to increasing the cache size is not so representative,
though: the working-set is 7.5 MB which makes the 8MB cache perfect!
The two machines are equal on small problems, Mayday is faster for
medium problems like yours and Corton is faster on big problems due to
much improved main-memory.

The prices of AlphaStations have been falling recently but I believe
they are still a good bit more expensive than the PWSs. Clones with
1MB cache from people like Aspen Systems are much cheaper (but a bit slower).
Digital Unix is very expensive whereas Linux is free of course...

-- Rob.

David Chase

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

Andy Glew wrote:
>
> Digital claims to have invented multiprocessor cache consistency,
> in particular cache protocols using altered (M) states.

Remember (those of you who may think that these patents are
obviously obvious) that what matters is the filing date of
the patent. Was the patent obvious THEN? Is there any work
which predates it? In this case, the filing date was March
1988.

David Chase

David Chase

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

David T. Wang wrote:

> -------------------------------------------------------------------
> http://www.sjmercury.com/business/dec/suit051397.htm
>
> This is an action for damages and injunctive relief to remedy the willful
> infringement by defendant Intel Corporation (``Intel'') of patents issued to
> plaintiff Digital Equipment Corporation (``Digital'') by the United States
> Patent and Trademark Office
> -------------------------------------------------------------------
>
> DEC alleges that this is a "willful" infringement, that Intel architects
> knowingly stole DEC Patents.

That's not what that language says. My reading (and I am not a lawyer)
is that DEC alleges that Intel, the corporation, sold devices which
made use of DEC's patents, and that Intel, the corporation, did so
knowing that DEC had these patents. It does not say that Intel copied
what it read in the patents, merely that it knew that these patents
existed, and nonetheless sold a device which infringed the patents.

The actual people involved in knowing about the patents, designing
the devices, and choosing to sell the devices, need not be the
same people, though they would all need to work for Intel. It also
says nothing in detail about the timing; Intel could have spent
years hard at work without knowing about the patents, and only
"learned" of their existence after a few months of Pentium sales.

David Chase

David T. Wang

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

David Chase ("mylastname"@worlddotstd.com) wrote:
: David T. Wang wrote:


Therein lies the paradox. If it was a case of "Parallel development",
If it is the case of "parallel development", but Digital simply holds
the license, Digital would perhaps send the obligatory "cease and desist"
letter, or ask for licensing fees. The case that Palmer has laid out is
clearly more sinister, that Intel Engineers, back in 1990-1991, during
the time of the Alpha negotiations, learned of "trade secrets" under NDA,
these engineers then went and spread those ideas to the Pentium and
Pentium Pro design teams, thus directly leading to the quantum leap in
performance over the 486. It is for this reason that Digital didn't go
through the process of negotions, but instead filed suit directly, and
is asking for Triple damages.

: David Chase

Joe Buck

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

dew <d...@nym.alias.net> writes:
>My guess:
>
>They will settle for a patent cross licensing agreement with Intel such
>as AMD and TI have. With this they will design processors that
>implement IA32 and IA64 instruction sets.

Makes the most sense of any I've seen yet. I don't think that Intel
will easily concede such a thing, though.
--
-- Joe Buck http://www.synopsys.com/pubs/research/people/jbuck.html

Help stamp out Internet spam: see http://spam.abuse.net/spam/

Przemek Klosowski

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

dew <d...@nym.alias.net> writes:

> Doug Siebert <douglas...@uiowa.edu> wrote:
> }
> }Any others?
>

> My guess:
>
> They will settle for a patent cross licensing agreement with Intel such
> as AMD and TI have. With this they will design processors that

> implement IA32 and IA64 instruction sets. DEC sees that the Intel
> juggernaught cannot be stopped and they will use their considerable
> processor design and fabrication technology to design a better "Intel"
> processor than Intel.

How about this scenario: DEC extracts a crosslicencing agreement from
Intel and adds X86 instruction decoder to Alpha. Seeing that they
worked hard on FX!32 (for WinNT) and em86 (for Linux), they might be
able to implement such dynamic translators in hardware... That would
be an interesting competition to Intel's plans for IA64, which
apparently includes an X86 box.

Strategically, DEC seems to have given up on convincing ISVs en masse
to port to Alpha (forget DEC Unix and Linux, they can't even seem to
get many NT ports); they pin their hopes on leveraging Intel application market.

Hey, maybe I should go and patent this idea real quick... Oh well, too late,
I just posted it. Shucks.

--
przemek klosowski <prz...@nist.gov> (301) 975-6249
NIST Center for Neutron Research (bldg. 235), E111
National Institute of Standards and Technology
Gaithersburg, MD 20899, USA

Andy Glew

unread,
May 16, 1997, 3:00:00 AM5/16/97
to jki...@ix.netcom.com

Douglas Borsom

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

Andy Glew <gl...@cs.wisc.edu> writes:

>This post is personal. I am presently a graduate student
>at the University of Wisconsin, but I remain affiliated with Intel,
>work there in the summers, and intend to return to work at Intel
>when I finish my Ph.D.

>Digital is accusing me of stealing their inventions. Righteous anger
>compels me to make some minimal counter-statements....

Don't take it too hard, Andy. This isn't nearly as bad as
what Minnesotan's are saying about you for dwelling in the
land of Cheeseheads.

Go down to the Terrace, get a beer, watch the wind play with
the blue waters of Lake Mendota, and blow it off. :-)

-doug

Terry C. Shannon

unread,
May 16, 1997, 3:00:00 AM5/16/97
to

Rich Adams wrote:
>
> Tony Tribelli (a...@netcom.com) wrote:
> : ri...@alpha.delta.edu (Rich Adams) wrote:
> :
> : > - DEC put 1 year into research and preparation for this litigation
> : > and has every intention of pushing Intel back into a technological
> : > stone age. Thus opening a very large market for Alpha.
> :
> : A theoretically large market that will probably go to x86 clones and
> : PowerPC, not to Alpha despite its technical superiority. The typical PC
> : buyer follows the applications, not the CPU with the best specs. I doubt
> : FX32! will prove to be very convincing for these users.
>

Ssssshhhhh... nobody's supposed to know about MRISC. Eat this post after
reading it!

Terry "Chas. Matco" Shannon
Publisher, Shannon Knows DEC
sha...@world.std.com


Jonathan Kirwan

unread,
May 17, 1997, 3:00:00 AM5/17/97
to

On 16 May 1997 14:41:44 -0400, Przemek Klosowski
<prz...@rrdjazz.nist.gov> wrote:

>How about this scenario: DEC extracts a crosslicencing agreement from
>Intel and adds X86 instruction decoder to Alpha. Seeing that they
>worked hard on FX!32 (for WinNT) and em86 (for Linux), they might be
>able to implement such dynamic translators in hardware... That would
>be an interesting competition to Intel's plans for IA64, which
>apparently includes an X86 box.

I like it! Sounds very reasonable, given my own experience with
such discussions by top management.

Jon

Anton Ertl

unread,
May 17, 1997, 3:00:00 AM5/17/97
to gl...@cs.wisc.edu

In article <337B17B6...@cs.wisc.edu>, Andy Glew <gl...@cs.wisc.edu> writes:
> Digital is accusing me of stealing their inventions.

Cool down. They accuse Intel of patent infringement, which is not the
same thing as stealing.

AFAIK, even if you invent something, that has been patented,
completely and provably by yourself, you can infringe on a patent (Of
course, the patent application has to be filed before your invention).

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

H.W. Stockman

unread,
May 17, 1997, 3:00:00 AM5/17/97
to

Henry Baker wrote:
>
> In article <337B6D...@swcp.com>, hws...@swcp.com wrote:
>
> > This reminds me; what ever happened
> > with the infamous "XOR" patent suit?
>
> I suppose that it grinds inexorably along...

I thought so; more lawyers charging exorbitant fees. We must
exorcise this litigious demon from science and technology,
this constant exornation on the plain, basic exordial facts of the
patent law. These pettifoggers should focus on seX OR other
understandable human frailties.

Don North

unread,
May 17, 1997, 3:00:00 AM5/17/97
to

Or, more likely Intel digs thru their portfolio of patents, comes up with
ten or so that Digital has infringed upon. They litigate for a while, and
the net result is Intel 0, Digital 0, lawyers $$$; a draw, except that the
lawyers always seem to come out ahead.

---------------------------------------------------------------------------
Donald N. North KD6JTT don....@technologist.com
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
{{{{{{{{ Facts are facts, but any opinions expressed are my own }}}}}}}}}
---------------------------------------------------------------------------

stan lass

unread,
May 18, 1997, 3:00:00 AM5/18/97
to

In article <1997051606583...@nym.alias.net>, d...@nym.alias.net
says...

>My guess:
>
>They will settle for a patent cross licensing agreement with Intel such
>as AMD and TI have. With this they will design processors that
>implement IA32 and IA64 instruction sets. DEC sees that the Intel
>juggernaught cannot be stopped and they will use their considerable
>processor design and fabrication technology to design a better "Intel"
>processor than Intel.

Perhaps DEC cannot stop the Intel juggernaut's current romp in the
market, but IMO, coming changes will greatly redefine the computer
market.

Until now and probably for a few years, each doubling in processor
speed provides a useful increment in performance. However, when
the coming set top boxes are able to provide "Toy Story" quality
animation in real time, then further doublings in processor speed
will not be appreciated by most users. About this time, the
competition becomes more to reduce the price, less so to increase
performance.

Also, by adding a few peripherals, a set top box can serve as a thin
client computer on most desktops, perhaps 60% of the desktops.
The set top box, expanded with peripherals, would likely takeaway
many home PC sales as well. (Assuming that the user already has
a set top box, the incremental cost of the peripherals would be much
less than the cost of a Wintel PC.)

The market for the set top boxes includes one in most every household
and thin client desktops, a market larger than the PC market. The size
of this market will exert a strong pull on third party software developers.
(If they develop in Java, they can market to both the set top box
market and the Wintel market.)

So, my prediction is that the set top box will takeover the low cost
end of the computer market, and that is where much of the action will
be over the next several years.

Stan Lass http://www.netins.net/showcase/stanlass

Stefan Monnier

unread,
May 18, 1997, 3:00:00 AM5/18/97
to

y...@somehost.somedomain (stan lass) writes:
> However, when the coming set top boxes are able to provide "Toy Story"
> quality animation in real time, then further doublings in processor speed
> will not be appreciated by most users.

Hahaha !
The ever-lasting predictions of "sufficient power" !
Sure enough 640K are sufficient for everyone, right ?

"Toy Story" in real-time is not here yet and when it will be here, we'll be
looking forward to the democratization of something else.


Stefan

Donald Robinson

unread,
May 18, 1997, 3:00:00 AM5/18/97
to

On 18 May 1997 01:53:23 -0400, Stefan Monnier
<monnier+/news/comp/ar...@tequila.cs.yale.edu> wrote:

I think the question here is not "how much power", but "how will that
power be delivered". If a sufficiently powered mediaprocessor can
outperform an IA-64 (or alpha or whatever), _only_ in rendering,
sound, text-voice recongnition, and other stuff made "easily"
parallel, they will be percieved as faster by consumers.

Just look at why anyone would buy a faster computer. MS bloat
mainly requires more memory, processing power doesn't seem
to be nearly as effected. Intel basically pointed this out with the
MMX project (which is probably aimed at pushing back the
feasibility of mediaprocessors for a year).

Face it, consumer tasks don't consist of SPECfp, its more MS
Office (which just eats memory) and games (which need huge
graphics requirements).

Scott Robinson


David Chase

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

David T. Wang wrote:

> ... The case that Palmer has laid out is


> clearly more sinister, that Intel Engineers, back in 1990-1991, during
> the time of the Alpha negotiations, learned of "trade secrets" under NDA,
> these engineers then went and spread those ideas to the Pentium and
> Pentium Pro design teams, thus directly leading to the quantum leap in
> performance over the 486. It is for this reason that Digital didn't go
> through the process of negotions, but instead filed suit directly, and
> is asking for Triple damages.

Filing suit could just be an initial negotiating position. I'm
surprised
to see that nobody else has commented on the claims in the patents
themselves; I've looked at some of them over the weekend, and (assuming
validity of the patents) it looks like DEC has quite a chance. I'm
a little surprised at the two-bit cache patent, but *superficially*,
reading Intel's own documentation, that one looks like a match.

The byte-comparison patent, as written, looks like a miss to me (the
Intel MMX instructions produce 8 byte-wide logical results, the patent
discusses 8 (or N) bit-wide logical results crammed into a single byte).
However, who knows what continuations are or might be filed on that
patent (as I understand, it is still possible to continue an "old"
patent, and that can be a powerful, if perhaps unfair, weapon).

The branch-prediction patent is clever, and clearly deserved, though
if there is a hit, it probably has to be via claim 6, which is much
broader than the rest. What is described by claim 6 is an
implementation
of a branch predictor, in which the per-branch history (some small
number
of bits) is used to form an index into a boolean array, which gives you
the prediction. For example, if Taken = 1 and not Taken = 0

0 0 0 => 0 (3 not takens implies not taken)
0 0 1 => 0 (usually not taken implies not taken)
0 1 0 => 0
0 1 1 => 1
1 0 0 => 0
1 0 1 => 1
1 1 0 => 0
1 1 1 => 1

The other claims concern the number of bits in the history, and the
ability to modify the contents of the array, apparently either via
feedback on the success of previous uses of the array, or under
program control.

Of course, I've got no way to open up the Pentium to see if it works
this way, though I did scan some of the on-line documentation.

There are more claims, but this is not my job, and I don't have spare
money to invest in either company's stock. The companies I'd like to
buy stock in are the legal firms involved in this little altercation.

As for Palmer's "sinister case", I don't buy it. Intel would have
to be foolish beyond my imagination to intentionally rip off Digital.
Failure for person A to connect with person B, yes; simultaneous
development, yes; intentional rip-off, no way on earth. They've
got too much to lose, and the risk is too high.

David Chase

Jeffrey L. Bell

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

In article <EAEtI...@world.std.com>,

David Chase <mylastname, @, world, dot, std, ., com> wrote:
>
>As for Palmer's "sinister case", I don't buy it. Intel would have
>to be foolish beyond my imagination to intentionally rip off Digital.
>Failure for person A to connect with person B, yes; simultaneous
>development, yes; intentional rip-off, no way on earth. They've
>got too much to lose, and the risk is too high.
>

It depends what you mean by rip-off.

If you were to design a processor today, you almost certainly include
some form of multi-issue capability with somie form of branch
prediction, write-back caches and so on. None of these are things
that you or I invented, but we heard about them at conferences, or in
papers.

No doubt all of them have patents on certain implementations.

If you can't remember where you heard of something, it doesn't make it
obvious. It's only obvious after you heard about it.

-Jeff Bell

John R. Campbell

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

Well, this is the reason that many organizations have a "captive
lawyer" division where prior art (and patents) are searched for
conflicts.

If you're gonna release a product w/ plenty of innovations,
you'd want to make sure to perform a patent search as part of
your own patenting effort (i.e., when you think you've invented
something new).

Of course, if you're sure it's been done before, but you don't
research for prior patents, well, then, ignorance is no protection
from the wrath of a patent infringement suit.

Odd observation: has DEC covered their own butt? I'm sure they
must be using technologiy patented by others, so, do they have
all of the necessary paperwork in-place?

Mind you, my choice of platform is Alpha based. That don't mean
I can afford it, y'know.

--
John R. Campbell Speaker to Machines so...@jtan.com
"As a SysAdmin, yes, I CAN read your e-mail, but I DON'T get that bored!"-me
Disclaimer: I'm just a consultant at the bottom of the food chain, so,
if you're thinking I speak for anyone but myself, you must
have more lawyers than sense.


Joe Buck

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

y...@somehost.somedomain (stan lass) writes:
>> However, when the coming set top boxes are able to provide "Toy Story"
>> quality animation in real time, then further doublings in processor speed
>> will not be appreciated by most users.

Stefan Monnier <monnier+/news/comp/ar...@tequila.cs.yale.edu> writes:
>"Toy Story" in real-time is not here yet and when it will be here, we'll be
>looking forward to the democratization of something else.

"Toy Story" is only a 2-D image, and while well-done, is still clearly
an artificial world. Rendering in three dimensions is vastly more
expensive (I'm talking about having a "trixel" at each point in a cube,
not just building two 2-D images for use by 3-D glasses). Then you can
make that system evolve in time. I suppose you can then figure out how
to generate electromagnetic fields to make the images seem solid.

To quote Scott Adams (from memory):

All change is driven by technological progress and male hormones. When
realistic virtual reality becomes cheaper than dating, civilization
will collapse.

"Where's Dilbert?"

"He's been in the holodeck since March."

Chris...@arm.remove_this_part_when_replying.com

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

In article <5lpnr0$q...@nntpb.cb.lucent.com>,
Neil Kirby <n...@atlas.cb.lucent.com> wrote:
>
>Let's take a quick side track to color palette depth. Do you need more
>than 64K colors? Always? Usually? Sometimes? Never? In the case of
>DOOM, the game is in 320x200. And since it has 64K *PIXELS*, it hardly
>needs more than 64K colors.

This is a little bogus. If you're doing a colour gradient, 16 bit
colour leaves a 'setpping' effect, unless you error-correct, which may
not be practical for fast moving games. 24 bit colour modes do not
have this problem, and neither do paletted modes, providing you don't
want to use more than 256 colours.
--
/* _ */main(int k,char**n){char*i=k&1?"+L*;99,RU[,RUo+BeKAA+BECACJ+CAACA"
/* / ` */"CD+LBCACJ*":1[n],j,l=!k,m;do for(m=*i-48,j=l?m/k:m%k;m>>7?k=1<<m+
/* | */8,!l&&puts(&l)**&l:j--;printf(" \0_/"+l));while((l^=3)||l[++i]);}
/* \_,hris Brown -- All opinions expressed are probably wrong. */

stan lass

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

In article <5lafltl...@tequila.systemsz.cs.yale.edu>,
monnier+/news/comp/ar...@tequila.cs.yale.edu says...

>
>y...@somehost.somedomain (stan lass) writes:
>> However, when the coming set top boxes are able to provide "Toy Story"
>> quality animation in real time, then further doublings in processor speed
>> will not be appreciated by most users.
>
>Hahaha !
>The ever-lasting predictions of "sufficient power" !
>Sure enough 640K are sufficient for everyone, right ?
>
>"Toy Story" in real-time is not here yet and when it will be here, we'll be
>looking forward to the democratization of something else.

I actually have a serious point. Of the five senses that humans
have, computers relate mainly to the visual and auditory senses.
Of the two, the visual sense is by far the most computationally
demanding. When computer generated animation is nearly
indistinguishable from real life movies, e.g. current movies
without computer generated special effects, then further
improvement won't be much appreciated by most users. (I'm
assuming that the images would be displayed on an HDTV monitor.)

My comments relate to the home market and perhaps 60% of the
desktops. I'll amend my prediction some. Beginning with when

the coming set top boxes are able to provide "Toy Story" quality

animation in real time, set top boxes will begin to takeover much
of the home market and ~60% of the desktop market. Set top boxes
will improve until computer generated animation is nearly
indistinguishable from real life movies, then level off.

Using higher resolution would increase the computational
requirement. However, HDTV will likely be the highest resolution
display in widespread use for many years in the home market.

Computers able to solve the grand challenges of computing are in
a different market.

Stan Lass http://www.netins.net/showcase/stanlass

Robert Rodgers

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

n...@atlas.cb.lucent.com (Neil Kirby) wrote:
>Let's take a quick side track to color palette depth. Do you need more
>than 64K colors? Always? Usually? Sometimes? Never? In the case of
>DOOM, the game is in 320x200. And since it has 64K *PIXELS*, it hardly
>needs more than 64K colors.

Unless you're generating and changing the entire palette on a frame by
frame basis, this is ridiculous. You might as well go with 64k
colors.

As to whether 64k colors is enough, well, actually, no. (Uh oh,
someone will call me radical for this, but) 64k colors really *isn't*
enough for nice lighting effects at higher resolutions. 64k shows
obvious banding even with dithering and only gives you 32 shades of
grey. For games with dark, dank atmospheres with fog and lighting
effects, 24 bit would be a real improvement, assuming it worked and
wasn't a performance killer. Hey, if the world was all sun and
fruitloops, we wouldn't need no stinkin' fog & haze.

We also need a higher fill rate (it's not hard to swamp 3dfx, and 3dfx
is probably going to represent the performance lead until sometime
early in 1998), faster geometry, and higher resolutions (which in turn
means higher fill rate, faster geometry.. It goes on and on.)

>What's that got to do with graphics speed? When the graphics pipeline
>(CPU+ 3D hardware) hits about 60M polys/sec, that's enough. Why is that
>enough?

60M polys/sec doesn't mean anything, and isn't coming any time soon,
anyway, to PCs near you.

>Most big monitors display about a million pixels. 0.768M and 1.280M pixels
>are the common numbers. And they display those pixels at least 60 times a
>second. So 60M polys a second is around the limiting case; you can't show
>polygons any faster than you can show pixels.

That assumes zero overdraw. Not bloody likely -- forget blending &
transparency, think about objects with windows. And if the zbuffer is
your solution, I'd like to see the <$600 card that will do 1024x768
double buffered with 32bit Z at even 1/3rd the kind of performance
you're describing.

Graphics demands for 3d games are just beginning -- they haven't even
*begun* to ramp up yet.

Neil Kirby

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

In article <5lafltl...@tequila.systemsz.cs.yale.edu>,

Stefan Monnier <monnier+/news/comp/ar...@tequila.cs.yale.edu> wrote:
>y...@somehost.somedomain (stan lass) writes:
>> However, when the coming set top boxes are able to provide "Toy Story"
>> quality animation in real time, then further doublings in processor speed
>> will not be appreciated by most users.
>
>Hahaha !
>The ever-lasting predictions of "sufficient power" !
>Sure enough 640K are sufficient for everyone, right ?
>
>"Toy Story" in real-time is not here yet and when it will be here, we'll be
>looking forward to the democratization of something else.
>
> Stefan

Let me put out a bit of analysis that came to me while I was the the 1997
Computer Game Developers Conference. The subject was graphics speed. With
the oncoming onslaught of cheap and fast 3D, how will we know when we have
ENOUGH speed?

Let's take a quick side track to color palette depth. Do you need more
than 64K colors? Always? Usually? Sometimes? Never? In the case of
DOOM, the game is in 320x200. And since it has 64K *PIXELS*, it hardly
needs more than 64K colors.

What's that got to do with graphics speed? When the graphics pipeline


(CPU+ 3D hardware) hits about 60M polys/sec, that's enough. Why is that
enough?

Most big monitors display about a million pixels. 0.768M and 1.280M pixels


are the common numbers. And they display those pixels at least 60 times a
second. So 60M polys a second is around the limiting case; you can't show
polygons any faster than you can show pixels.

When we get "enough" performance in one category, something else always
pops up to use what's available.

(The game AI crowd is waiting for this day so that they can get a bigger
chunk of CPU away from the graphics engines.)

---
Neil Kirby DoD #0783 n...@lucent.com
Lucent Technologies - Home of Bell Labs Innovations
Bell Labs Columbus OH USA +1 (614) 860-5304

Zalman Stern

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

Chris...@arm.remove_this_part_when_replying.com wrote:
: Neil Kirby <n...@atlas.cb.lucent.com> wrote:
: >Let's take a quick side track to color palette depth. Do you need more

: >than 64K colors? Always? Usually? Sometimes? Never? In the case of
: >DOOM, the game is in 320x200. And since it has 64K *PIXELS*, it hardly
: >needs more than 64K colors.

Provided you get to choose each of the 64k colors.

: This is a little bogus. If you're doing a colour gradient, 16 bit
: colour leaves a 'setpping' effect, unless you error-correct, which may
: not be practical for fast moving games. 24 bit colour modes do not
: have this problem, and neither do paletted modes, providing you don't
: want to use more than 256 colours.

16-bit color typically does not go through a color palette. Rather it
allocates a fixed number of bits to each of red, green, and blue. Hence the
banding in gradients. This is why the visual quality of 24-bit color is
noticeably better than 16-bit. For general purpose graphics, arranging a
64k color palette is considered quite a hassle, but I expect that could be
dealt with. Whether there is aneough performance gain and cost savings over
just using 24-bit is another matter.

-Z-


Zalman Stern

unread,
May 19, 1997, 3:00:00 AM5/19/97
to

Przemek Klosowski (prz...@rrdjazz.nist.gov) wrote:
: Seeing that they

: worked hard on FX!32 (for WinNT) and em86 (for Linux), they might be
: able to implement such dynamic translators in hardware...

I seriously question whether experience doing static x86->Alpha translation
in software (FX!32 is a static translator coupled with an x86 ISA
interpreter) has much relevance to doing dynamic hardware "translation" of
x86 instructions. Especially since one has much more "flexibility" in how
to do things if the CPU microarchitecture can be modified to improve x86
executtion. (And you can bet that Intel will be designing with good x86
execution in mind. So to be competitive with a purely hardware solution,
one would have to add at least some x86 specific stuff to the core CPU.)

On the other hand, I expect there are some folks at DEC who have experience
designing hardware that runs ugly legacy CISC ISAs fast :-)

Yes, this might be a way of forcing Intel to hand over dollars and a cross
licensing agreement at the same time. But it might be a lot of things. The
simplest answer is that DEC management thinks they have a case and are
going for it. (As they must being liable to shareholders and whatnot.)

-Z-

Tyson Richard DOWD

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

y...@somehost.somedomain (stan lass) writes:

>I actually have a serious point. Of the five senses that humans
>have, computers relate mainly to the visual and auditory senses.
>Of the two, the visual sense is by far the most computationally
>demanding. When computer generated animation is nearly
>indistinguishable from real life movies, e.g. current movies
>without computer generated special effects, then further
>improvement won't be much appreciated by most users. (I'm
>assuming that the images would be displayed on an HDTV monitor.)

You're only worried about image display. For a long time, people have
been impressed with a computer imitating reality - it didn't really
matter that it was just a ball bouncing or a teapot dancing. When you
can't tell the difference anymore - when the images and sound are of
sufficient quality to fool the sense, content will start to matter a lot
more. If all you want to do is watch digital movies, you might not
appreciate any further improvement. But I suspect people will want to
play baseketball with that ball, take personal dancing lessons from the
teapots, and change the plot of Toy Story - yet still get a satisfying
ending. There's still going to be a market for more powerful machines -
even if they are only going to be used for games. It's just that less
and less power will need to be used creating a realistic display - more
and more on creating a realistic experience.

(At least, I hope so. I couldn't think of anything more useless than
just watching digital movies will all that CPU power).


Chris...@arm.remove_this_part_when_replying.com

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

In article <zalmanEA...@netcom.com>,

Zalman Stern <zal...@netcom.com> wrote:
>
>16-bit color typically does not go through a color palette. Rather it
>allocates a fixed number of bits to each of red, green, and blue. Hence the
>banding in gradients. This is why the visual quality of 24-bit color is
>noticeably better than 16-bit. For general purpose graphics, arranging a
>64k color palette is considered quite a hassle,

Not to mention the number of colour registers you'd need!

>but I expect that could be dealt with. Whether there is aneough
>performance gain and cost savings over just using 24-bit is another
>matter.

That's not the only consideration. Obviuously 24 bit gives you the
best of all worlds when it comes to colour, although it may be
marginally slower. Another cosideration is resoloution though. Memory
constraints could well mean that you get more pixels if you elect to
use 16 bit colour.

Neil Kirby

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

In article <5lps3m$h...@sis.armltd.co.uk>,
<Chris...@arm.remove_this_part_when_replying.com> wrote:
>In article <5lpnr0$q...@nntpb.cb.lucent.com>,

>Neil Kirby <n...@atlas.cb.lucent.com> wrote:
>>
>>Let's take a quick side track to color palette depth. Do you need more
>>than 64K colors? Always? Usually? Sometimes? Never? In the case of
>>DOOM, the game is in 320x200. And since it has 64K *PIXELS*, it hardly
>>needs more than 64K colors.
>
>This is a little bogus. If you're doing a colour gradient, 16 bit
>colour leaves a 'setpping' effect, unless you error-correct, which may
>not be practical for fast moving games. 24 bit colour modes do not
>have this problem, and neither do paletted modes, providing you don't
>want to use more than 256 colours.

Damn, I did not say it well enough to get the point across.

There is never a need to have more colors selected into the palette than
the display has pixels. More colors don't help, because you can not show
more colors than you have pixels. DOOM limits at 64K colors. My windows
desktop never needs more than 1.28M colors.

So a limiting case for color depth is pixel count.

(And likewise, the limiting case for polygons per second is the pixel per
second bandwidth. Can not show more polys than you have pixels)

Tom Womack

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

Neil Kirby (n...@atlas.cb.lucent.com) wrote:

: What's that got to do with graphics speed? When the graphics pipeline


: (CPU+ 3D hardware) hits about 60M polys/sec, that's enough. Why is that
: enough?

: Most big monitors display about a million pixels. 0.768M and 1.280M pixels
: are the common numbers. And they display those pixels at least 60 times a
: second. So 60M polys a second is around the limiting case; you can't show
: polygons any faster than you can show pixels.

I've just visited the Silicon Graphics RealityCentre. They've got a 4 megapixel
display (3840 x 1024), and a technology which requires you to render the scene
several times (4 or 5) to get the reflection effects correct. They've also
got very impressive filtering (bicubic interpolation?), which I doubt is cheap
in processor speeds.

So make that 1G polys a second.

Tom

Neil Kirby

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

In article <3380c3bd...@news.wam.umd.edu>,
Robert Rodgers <kn...@acm.org> wrote:

>n...@atlas.cb.lucent.com (Neil Kirby) wrote:
>>Let's take a quick side track to color palette depth. Do you need more
>>than 64K colors? Always? Usually? Sometimes? Never? In the case of
>>DOOM, the game is in 320x200. And since it has 64K *PIXELS*, it hardly
>>needs more than 64K colors.
>
>Unless you're generating and changing the entire palette on a frame by
>frame basis, this is ridiculous. You might as well go with 64k
>colors.
>
>As to whether 64k colors is enough, well, actually, no. (Uh oh,
>someone will call me radical for this, but) 64k colors really *isn't*
>enough for nice lighting effects at higher resolutions. 64k shows
^^^^^^^^^^^^^^^^^^^^^

>obvious banding even with dithering and only gives you 32 shades of
>grey. For games with dark, dank atmospheres with fog and lighting
>effects, 24 bit would be a real improvement, assuming it worked and
>wasn't a performance killer. Hey, if the world was all sun and
>fruitloops, we wouldn't need no stinkin' fog & haze.

The start of this discussion was something along the line of, "when is it
enough?" The case of the 640K RAM limit was sited as an example of how
easy it is to set a limit that seems larger than any foreseeable need.
And we'll only need a dozen computers in America :-)

My point is that on some issues, we can see and end case now, even if we
are not there yet.

In general, we don't need more colors than we have pixels.

In this case, technology has caught up with the current end case. 24
bit color depth exceeds the pixel count of the monitor. The color
depth is ready for the next few waves of display resolution growth.

In general, we don't need to display polygons faster than we can display
pixels.

As shown below, we aren't there yet. But that is not to say that we
can not characterize the end case. As long as the graphics engines get
faster at a rate higher than monitors grow resolution, we can posit
that the limiting case will be achieved.

>We also need a higher fill rate (it's not hard to swamp 3dfx, and 3dfx
>is probably going to represent the performance lead until sometime
>early in 1998), faster geometry, and higher resolutions (which in turn
>means higher fill rate, faster geometry.. It goes on and on.)

I concur. I know that the Real3D people would like to de-throne 3dfx, and
that many others are taking their best shot too.

>>What's that got to do with graphics speed? When the graphics pipeline
>>(CPU+ 3D hardware) hits about 60M polys/sec, that's enough. Why is that
>>enough?
>

>60M polys/sec doesn't mean anything, and isn't coming any time soon,
>anyway, to PCs near you.

I humbly submit that 60M polys/sec is a "line in the sand." It may be
drawn on a beach far, far from where the crowd is. But that does not say
that it does not exist. Or that no one is interested in crossing that
line. At first it will cost an arm and a leg. Later it will get cheap.

GLQuake on SGI runs 1600x1280 at 60 frames a second. That's 120M pixels
a second, I don't know what the poly count is. I have faith that the 3D
accelerator people and intel will soon get that onto PC.

[snip]


>Graphics demands for 3d games are just beginning -- they haven't even
>*begun* to ramp up yet.

Agreed. But this time we have a clue or two about when we have enough.

Back of the envelope musings (on 60M pixels/sec) show that..

10 pixels/poly need 6M polys/sec.
100 pixels/poly need 600K polys/sec.

I saw 750K polys/sec on a viewgraph at the Computer Game Developers
Conference as the target for a couple years out.

The game is afoot.

Mike Froggatt

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

In article <5lsa9o$3...@nntpb.cb.lucent.com>,

Neil Kirby <n...@atlas.cb.lucent.com> wrote:
>So a limiting case for color depth is pixel count.

This is true for cases where colour palettes are used, but it is rare,
certainly in the PC sphere, to use palettes when pixel depth goes
above 8 bits. As others have remarked, when some form of direct colour
is used then large pixel depths may be required (though changing the
colour encoding to something other than RGB may give acceptable
results for smaller pixel depths).

> (And likewise, the limiting case for polygons per second is the
> pixel per second bandwidth. Can not show more polys than you have
> pixels)

Again, as has been said, this ignores overdraw and transparency
issues. Scan line algorithms and their ilk can help with overdraw, but
for such systems you still have to process every poly to see whether
it needs to be displayed - this may require substantial bandwidth,
particularly for degenerate cases with many small polys.

Mike Froggatt
Thin Film Microelectronics Group
Cambridge University Engineering Department

sys...@niuhep.physics.niu.edu

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

a...@netcom.com (Tony Tribelli) writes:
>ri...@alpha.delta.edu (Rich Adams) wrote:

>> - DEC put 1 year into research and preparation for this litigation
>> and has every intention of pushing Intel back into a technological
>> stone age. Thus opening a very large market for Alpha.

>A theoretically large market that will probably go to x86 clones and
>PowerPC, not to Alpha despite its technical superiority.

In the meantime Digital can move with one less competetor in the high
end market.

And since a big chunk of their business is also in x86s kicking
Intel where it hurts will give Intel's competitors a leg up increasing
competition, not only for CPUs but also for motherboard chips.

(e.g. if AMD has a bit more manuvering room in the cpu arena
their current try at producing motherboard chip sets is more likely
to succeed)

More competition means cheaper chips for Digital to put in its
computers. Also means Intel has slightly less to spend on
R&D to compete. (maybe)

specify the e-mail address below, my reply-to: has anti-spam added to it
Mor...@physics.niu.edu
Real Men change diapers

sys...@niuhep.physics.niu.edu

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

Andy Glew <gl...@cs.wisc.edu> writes:

>This post is personal. I am presently a graduate student
>at the University of Wisconsin, but I remain affiliated with Intel,
>work there in the summers, and intend to return to work at Intel
>when I finish my Ph.D.

>Digital is accusing me of stealing their inventions. Righteous anger
>compels me to make some minimal counter-statements. Because I am not
>presently at Intel, the lawyers have not yet told me to shut up,
>although I am sure that they soon will.

>This post is personal.

Andy,

1)did you get mail from Digital?
2a)Were you present at the non-disclosure meetings with Digital?
b)If so, did you talk to anybody about the ideas presented
at those meetings?

3a)Did you work on the design of the pentium and PPro?
b)If so, were you aware of the patents that Digital held?

If 1) occured as a result of your posts... you are certainly
justified.

If not and if 2b and 3b are not true/do not apply then cool down
and realize the difference between corporate responsibility
and individual actions.

Robert

sys...@niuhep.physics.niu.edu

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

David Chase <"mylastname "@ world dot std . com> writes:

>As for Palmer's "sinister case", I don't buy it. Intel would have
>to be foolish beyond my imagination to intentionally rip off Digital.
>Failure for person A to connect with person B, yes; simultaneous
>development, yes; intentional rip-off, no way on earth. They've
>got too much to lose, and the risk is too high.

Never underestimate the stupidity that can be caused by arrogence.

(e.g. their public comment that they had nothing left to steal)

>David Chase

Jon Leech

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

In article <5lsc3e$3...@nntpb.cb.lucent.com>,

Neil Kirby <n...@atlas.cb.lucent.com> wrote:
>In article <3380c3bd...@news.wam.umd.edu>,
>Robert Rodgers <kn...@acm.org> wrote:
>>60M polys/sec doesn't mean anything, and isn't coming any time soon,
>>anyway, to PCs near you.
>
>I humbly submit that 60M polys/sec is a "line in the sand." It may be
>drawn on a beach far, far from where the crowd is. But that does not say
>that it does not exist. Or that no one is interested in crossing that
>line. At first it will cost an arm and a leg. Later it will get cheap.

And no longer be interesting or particularly relevant. For example, once
you start doing complicated per-sample effects, with multiple texture
lookups and procedural shading computations at each sample, it can easily
completely dominate traditional geometry processing and rasterization costs.
Likewise global illumination.

15 years ago, the hot performance figure was vectors/second. But few
people care about line drawing performance today, and few people will care
about Gouraud shaded polygons in another 15 years.

A better analogy is a bar that keep being raised, not a line in the
sand.

Jon
__@/

Jeroen T. Vermeulen

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

In article <5lgarp$8qa$2...@hecate.umd.edu> no_...@Glue.umd.edu (David T. Wang) writes:

> DEC alleges that this is a "willful" infringement, that Intel architects
> knowingly stole DEC Patents.

David,

I'm not saying that you stepped on my foot on purpose, but by now you ought to
have noticed what you're standing on. If you don't get off now, you're
obviously standing on my foot on purpose!

And so it is with patents.


Jeroen


Look, I'm not after sex or anything but... PLEASE go to bed with me!

Zalman Stern

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

Neil Kirby (n...@atlas.cb.lucent.com) wrote:
[...]
: My point is that on some issues, we can see and end case now, even if we
: are not there yet.

Given certain assumptions, your reasoning is valid, however the specific
computer graphics arguments you posted are somewhat invalid for actual
graphics practice.

: In general, we don't need more colors than we have pixels.
[...]

Just as we often "over allocate" the number of bits processors support in
logical addresses to support more flexible partitioning of the address
space and less frequent remapping, we must also over allocate the number of
bits used for color values in many situations. The particular case of "how
many color values do we need?" varies depending on the output device and
desired quality. 16-bit per component color (48 or 64-bits per pixel) is
commonly used in film industry applications and high-end prepress work.

-Z-

Greg Limes

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

(Subject changed to reflect changed subject :)

n...@atlas.cb.lucent.com (Neil Kirby) writes:
>
> In general, we don't need more colors than we have pixels.
>

> In this case, technology has caught up with the current end case. 24
> bit color depth exceeds the pixel count of the monitor. The color
> depth is ready for the next few waves of display resolution growth.

You are confusing "how many different colors can you
display at the same time" with "how many different
colors could this pixel be".

> In general, we don't need to display polygons faster than we can display
> pixels.

No, but you might need to toss out a whole lot of hidden
polygons, which is going to take some CPU time
*somewhere* in the process.

And you might need to paint some pixels more than once,
depending on your drawing algorithms. For instance,
floodfill the background then paint visible polygons in
depth order ... but I'm not really a graphics expert, I
just run into them in the hallways, and sometimes they
mumble stuff like this.

> >60M polys/sec doesn't mean anything, and isn't coming any time soon,
> >anyway, to PCs near you.
>
> I humbly submit that 60M polys/sec is a "line in the sand."

Hmmm. I wonder what rate our current high end is
doing. Technology does seem to trickle down at a fairly
predictable rate ...

> GLQuake on SGI runs 1600x1280 at 60 frames a second. That's 120M pixels
> a second, I don't know what the poly count is. I have faith that the 3D
> accelerator people and intel will soon get that onto PC.

That's probably a good stake in the ground; but when PCs
can match SGI's current generation, I'm sure game
developers will find even more fancy things to want to
do with graphics.

And, what about the demands of volume rendering? One of
those future tech toy shows (Beyond 2000, I think) did a
thing on volume rendering, and to my untrained eye, it
looks like if things go that direction, it is going to
take a big chunk of bandwidth. Even a 100x100x100
display, which would be rather a toy, is already 1M
voxels; bring it up to something with some precision --
say, 1024x1024x1024 -- and you start getting into some
serious crunch. 1G voxels by 60hz, the ante goes up by a
factor of a thousand. That should keep things going for
at least a few more years, before we hit the plateau.

And I have no idea what the computational load would be
if you managed to build a holographic display (no, not
the Star Trek kind, just somehow being able to generate
a real hologram, in real time), but I know enough math
to stick my neck out and predict that we're looking at a
big load of floating point per frame. The letters F-F-T
drift across my vision ...

> Neil Kirby DoD #0783 n...@lucent.com
> Lucent Technologies - Home of Bell Labs Innovations
> Bell Labs Columbus OH USA +1 (614) 860-5304

When there is nothing left to invent, we will know it,
then we will find out we were wrong when someone comes
along and invents some more stuff.

--
Greg Limes using alternate account to deflect SPAM
delivery of email to this account may be delayed.

H.W. Stockman

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

I recently heard that Digital had rights to the
song, "He's So Fine", and that Intel had purchased
the former Beatle's song "My Sweet Lord" (I was
quite surprised at the latter, thinking Apple had
purchased Apple). Anyone who sings these two songs
can see the common tune. Is this similarity the
basis of the current Digital suit?

sys...@niuhep.physics.niu.edu

unread,
May 20, 1997, 3:00:00 AM5/20/97
to

In article <5lfj4e$3k8$1...@news.iastate.edu>, jo...@iastate.edu (John Hascall) writes:

>Tony Tribelli <a...@netcom.com> wrote:
>}ri...@alpha.delta.edu (Rich Adams) wrote:
>}> - DEC put 1 year into research and preparation for this litigation
>}> and has every intention of pushing Intel back into a technological
>}> stone age. Thus opening a very large market for Alpha.

>}A theoretically large market that will probably go to x86 clones and
>}PowerPC, not to Alpha despite its technical superiority.

> What do you bet DEC just entered a very cozy agreement with AMD...

Well they did just contract to buy lots of K6s for their pcs.

Robert Rodgers

unread,
May 21, 1997, 3:00:00 AM5/21/97
to

n...@atlas.cb.lucent.com (Neil Kirby) wrote:
>In general, we don't need more colors than we have pixels.

Do you want to show me how you're going to *pre-compute* a 64k+ entry
palette for every frame of a real-time 3d scene?

> In this case, technology has caught up with the current end case. 24
> bit color depth exceeds the pixel count of the monitor. The color
> depth is ready for the next few waves of display resolution growth.
>

>In general, we don't need to display polygons faster than we can display
>pixels.

What about overdraw?

[..]

>GLQuake on SGI runs 1600x1280 at 60 frames a second. That's 120M pixels
>a second, I don't know what the poly count is.

Very, very low. Quake is a very "light" 3d app.

>Back of the envelope musings (on 60M pixels/sec) show that..
>
>10 pixels/poly need 6M polys/sec.
>100 pixels/poly need 600K polys/sec.

What about overdraw? Zbuffer checks aren't free (I don't think they
even short-circuit on 3dfx, at least there is no apparent performance
advantage for front-to-back with Z).


Andrew Reilly

unread,
May 21, 1997, 3:00:00 AM5/21/97
to

In article <5lrpkc$b...@sis.armltd.co.uk>,

Chris...@arm.remove_this_part_when_replying.com writes:
> In article <zalmanEA...@netcom.com>,
> Zalman Stern <zal...@netcom.com> wrote:
>>
>>but I expect that could be dealt with. Whether there is aneough
>>performance gain and cost savings over just using 24-bit is another
>>matter.
>
> That's not the only consideration. Obviuously 24 bit gives you the
> best of all worlds when it comes to colour, although it may be
> marginally slower. Another cosideration is resoloution though. Memory
> constraints could well mean that you get more pixels if you elect to
> use 16 bit colour.

I read some time ago that it was the intensity levels that
the eye was most sensitive to, and that one could get all
the colour accuracy of 24 bits with a 16 bit hue/
saturation/intensity scheme that used seven or eight bits
for intensity at full resolution but shared the hue and
saturation bits over adjacent pixels.

Is anyone doing this?

Presumably it makes the frame-buffer filling logic more
complicated, because I guess most software thinks in RGB,
but it could allow useful 16-bit pixels rather than packed
24 or padded 32-bit pixels.

--
Andrew Reilly: rei...@zeta.org.au +61 2 9559 5029 (h)
36 Acton St. Senior DSP Design Engineer +61 2 9233 8655 (w)
Hurlstone Park Lake DSP Pty Ltd Fax: +61 2 9233 8656 (w)
NSW 2193 Australia GPO Box 4067, Sydney 2001 and...@lake.com.au

Tom Womack

unread,
May 21, 1997, 3:00:00 AM5/21/97
to

Greg Limes (net...@straylight.engr.sgi.com) wrote:


: And, what about the demands of volume rendering? One of


: those future tech toy shows (Beyond 2000, I think) did a
: thing on volume rendering, and to my untrained eye, it
: looks like if things go that direction, it is going to
: take a big chunk of bandwidth. Even a 100x100x100
: display, which would be rather a toy, is already 1M
: voxels; bring it up to something with some precision --
: say, 1024x1024x1024 -- and you start getting into some
: serious crunch. 1G voxels by 60hz, the ante goes up by a
: factor of a thousand.

That's not much of a problem. I've seen a RealityMonster rendering
the Visible Human on-screen at 30Hz, 1280x1024. The Visible Human
is something like 2000x500x300 voxels.

That has quite a lot of hardware assistance (understatement of the
%long_length_of_time), but seemed to let you do clever things with
transparency and clip-planes, as well as rendering the outside of
the model.

Tom

Chris...@arm.remove_this_part_when_replying.com

unread,
May 21, 1997, 3:00:00 AM5/21/97
to

In article <5lub19$6...@gurney.zeta.org.au>,

Andrew Reilly <rei...@zeta.org.au> wrote:
>
>I read some time ago that it was the intensity levels that
>the eye was most sensitive to, and that one could get all
>the colour accuracy of 24 bits with a 16 bit hue/
>saturation/intensity scheme that used seven or eight bits
>for intensity at full resolution but shared the hue and
>saturation bits over adjacent pixels.
>
>Is anyone doing this?

I don't know if anyone's doing this, but the good old Amgia had/has a
graphics mode called Hold And Modify (HAM), which allows you to get
an 18 bit colour range from 8 bits per pixel, using a clever system
which means you end up sacrificing horizontal chrominance resoloution,
but tended to keep the intensity resoloution. This type of display
takes more computing power to plot, but the results can look better
than 16 bit, in only half the memory usage, and vastly better than a
traditional paletted 8 bit display. If anyone is interested I can post
the details of how it works.

Ian Kemmish

unread,
May 21, 1997, 3:00:00 AM5/21/97
to

In article <5lub19$6...@gurney.zeta.org.au>, and...@gurney.zeta.org.au says...

>
>In article <5lrpkc$b...@sis.armltd.co.uk>,
> Chris...@arm.remove_this_part_when_replying.com writes:
>> In article <zalmanEA...@netcom.com>,
>> Zalman Stern <zal...@netcom.com> wrote:
>>>
>>>but I expect that could be dealt with. Whether there is aneough
>>>performance gain and cost savings over just using 24-bit is another
>>>matter.
>>
>> That's not the only consideration. Obviuously 24 bit gives you the
>> best of all worlds when it comes to colour, although it may be
>> marginally slower. Another cosideration is resoloution though. Memory
>> constraints could well mean that you get more pixels if you elect to
>> use 16 bit colour.
>
>I read some time ago that it was the intensity levels that
>the eye was most sensitive to, and that one could get all
>the colour accuracy of 24 bits with a 16 bit hue/
>saturation/intensity scheme that used seven or eight bits
>for intensity at full resolution but shared the hue and
>saturation bits over adjacent pixels.

Actually, Hollywood will generally use 36 bits for RGB scanning and printing
(although you can get down to just 24 bits for internal processing by being
sneaky with your colour correction curves).

Printers, of course will use at least 32 bits for colour information, more if
they use more than four inks (six is becoming the new standard for higher
quality colour work) or more than 8 bits per colour....

Both of these industries are cost-conscious, and wouldn't use the extra bits if
they weren't needed a measurable amount of the time:-)

============================================================================
Ian Kemmish 18 Durham Close, Biggleswade, Beds SG18 8HZ
i...@five-d.com Tel: +44 1767 601 361 Fax: +44 1767 312 006
Info on Jaws and 5D's other products on http://www.five-d.com/5d
============================================================================
`Save string while you're young. Then when you're older, you'll have a ball.'


Bernd Paysan

unread,
May 21, 1997, 3:00:00 AM5/21/97
to

Zalman Stern wrote:
> : In general, we don't need more colors than we have pixels.
> [...]
>
> Just as we often "over allocate" the number of bits processors support in
> logical addresses to support more flexible partitioning of the address
> space and less frequent remapping, we must also over allocate the number of
> bits used for color values in many situations. The particular case of "how
> many color values do we need?" varies depending on the output device and
> desired quality. 16-bit per component color (48 or 64-bits per pixel) is
> commonly used in film industry applications and high-end prepress work.

Hm, I think it's easier to convince him. If you have as many colors as
pixels, you just need a colormap (to get all the color values you need
to satisfy human demands) and no screen memory. Why no screen memory?
Since each element on the screen has a color of it's own, it points to a
fixed location in the colormap. You have the colormap in RAM (24 bits
per color or whatever you feel necessesary), and the screen memory is
obsolete.

--
Bernd Paysan
"Late answers are wrong answers!"
http://www.informatik.tu-muenchen.de/~paysan/

Greg Limes

unread,
May 21, 1997, 3:00:00 AM5/21/97
to


wom...@ox.compsoc.org.uk (Tom Womack) writes:

Sounds like "volume rendering" has more meanings than I thought.

I'm considering the gizmo that sweeps a screen through a volume, and as
it does, portions of the screen are illuminated (or not) by a
laser. True 3D, from any angle, since you really *are* placing a bit of
light at a particular location in 3D space. Walk around the display
volume to see things from other angles.

The only downside I can see is, you can't do hidden *anything* because
you can only add light to the tank, but not opaque things to block a
given voxel from being seen from certain angles; the images shown were
some wireframe, and an Air Traffic Control display application.

Aaron Spink

unread,
May 22, 1997, 3:00:00 AM5/22/97
to

n...@atlas.cb.lucent.com (Neil Kirby) writes:


> [snip]
> >Graphics demands for 3d games are just beginning -- they haven't even
> >*begun* to ramp up yet.
>
> Agreed. But this time we have a clue or two about when we have enough.
>

> Back of the envelope musings (on 60M pixels/sec) show that..
>
> 10 pixels/poly need 6M polys/sec.
> 100 pixels/poly need 600K polys/sec.
>

> I saw 750K polys/sec on a viewgraph at the Computer Game Developers
> Conference as the target for a couple years out.
>
> The game is afoot.

The problem with this is that in almost all cases you will have more
triangle generated than the number that are actually displayed. If
you want to take this to the extreme, we'll just take a look at what
you would need to render my office. I have to monitors that both have
a black background and are slightly reflective, therefore we have to
render everything that they could see. I have a window which also
will also require enviroment mapping.

It doesn't matter if we are just rendering what I myself can see, we
still have to render it all because of the enviroment mapping. Add in
transparent polys, and other such things and it gets even higher. I
don't think we will have enough power until we can create in real time
scenes that are more realistic than those generated by both RenderMan
and the Radiance ray tracer. We have a long way to go, and we aren't
getting there any sooner. As far as pixels per second go, things will
reach a plateu, but I don't want to think about the resulution of my
35" flat moniter in, oh say, 2010. I'd guess around 10,000 x 10,000.
but that is after all just a guess.

Oh yeah, for the 60m pixels/sec you forgot to add in the effects of
anti-aliasing. I think that for a 1280x1024 screen, I would want
around 160-200 MP/S at least.

Andy Newman

unread,
May 22, 1997, 3:00:00 AM5/22/97
to

rei...@zeta.org.au writes:
>I read some time ago that it was the intensity levels that
>the eye was most sensitive to, and that one could get all
>the colour accuracy of 24 bits with a 16 bit hue/
>saturation/intensity scheme that used seven or eight bits
>for intensity at full resolution but shared the hue and
>saturation bits over adjacent pixels.

Just like TV. YUV with sub-sampled chroma.

>Is anyone doing this?

Yep (see above).

--
Andy Newman <an...@research.canon.com.au>

Alberto C Moreira

unread,
May 22, 1997, 3:00:00 AM5/22/97
to

In article <Pine.PMDF.3.91.970522034442.633379443A-
100...@alpha2.curtin.edu.au>, zrep...@alpha2.curtin.edu.au says...

> On 20 May 1997, Neil Kirby wrote:
>
> > >As to whether 64k colors is enough, well, actually, no. (Uh oh,
> > >someone will call me radical for this, but) 64k colors really *isn't*
> > >enough for nice lighting effects at higher resolutions. 64k shows
>
> > In general, we don't need more colors than we have pixels.
>
> This is pretty safe most of the time I'd think.

The problem is the gradients. At 565 64k colors, you have at most six bits
for green, and five bits for red and blue. Therefore you can only paint up
to 64 shades of green in a screen before you run out of colors. On a 24- or
32-bit/pixel system, you have up to 256 shades of red, green or blue, which
is significantly better. You need as many shades of each primitive color as
your eye can distinguish, or else there will be cases where banding will be
apparent.


Alberto.


J. D. McDonald

unread,
May 22, 1997, 3:00:00 AM5/22/97
to


The problem with any of these attempts at information-reduction is
that sometimes they fail. They are OK for pictorial images that
you just look at, but will always fail for some case or other
where you examine the fine details.

It's better just to say "do it right, don't lose any information".

In audio the case is far worse: no one has been able to come
up with **any** information reduction scheme that is inaudible.
Oh, sure, they can come up with lots of schemes that lots of
people in lots of tests can't tell from un-information-reduced
signals, but they can't fool golden ears who are allowed to
chose the test signals and ancillary equipment. There are
lots of "golden ears" out there who can "hear" differences that don't
exist, but few that will miss those that do!

Doug McDonald

Brian Drummond

unread,
May 22, 1997, 3:00:00 AM5/22/97
to

David Chase <"mylastname "@ world dot std . com> wrote:

>Andy Glew wrote:
>>
>> Digital claims to have invented multiprocessor cache consistency,
>> in particular cache protocols using altered (M) states.
>
>Remember (those of you who may think that these patents are
>obviously obvious) that what matters is the filing date of
>the patent. Was the patent obvious THEN? Is there any work
>which predates it? In this case, the filing date was March
>1988.
>
About the same time as the Motorola 88100/88200 chipset appeared,
with its multiprocessor cache consistency support?

- Brian


Paul Repacholi (prep)

unread,
May 22, 1997, 3:00:00 AM5/22/97
to

On 20 May 1997, Neil Kirby wrote:

> >As to whether 64k colors is enough, well, actually, no. (Uh oh,
> >someone will call me radical for this, but) 64k colors really *isn't*
> >enough for nice lighting effects at higher resolutions. 64k shows

> In general, we don't need more colors than we have pixels.

This is pretty safe most of the time I'd think.

> In this case, technology has caught up with the current end case. 24
> bit color depth exceeds the pixel count of the monitor. The color
> depth is ready for the next few waves of display resolution growth.

the problem here is that the _efective_ depth depends on the colour...
Show a near pure primary colour, and you have an 8 bit display, a near
pure secondary colour, and it is 16 bits. Also, the depth changes rapidly
and non-linearly around the primary and, to a lesser degree, secondary
colours.

~Paul


It is loading more messages.
0 new messages