Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Cell Documents
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  19 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Del Cecchi  
View profile  
 More options Aug 25 2005, 3:00 pm
Newsgroups: comp.arch
From: Del Cecchi <cecchinos...@us.ibm.com>
Date: Thu, 25 Aug 2005 14:00:50 -0500
Local: Thurs, Aug 25 2005 3:00 pm
Subject: Cell Documents
For those interested in Cell, there is a document dump at

http://www-128.ibm.com/developerworks/power/cell/

for a little bedtime reading.  I think you have to register, but it is
free.

--------------------------------------------------------------------
Download: Cell Broadband Engine documentation

  The following papers define the Cell specification and will be posted
to the IBM Semiconductor Solutions Technical Library in September.
Readers with a current IBM ID are invited to see them early and gain
access to participate in the Power Architecture™ zone's Cell discussion
forum. If you are not already a registered user, you can register now
(Note: Registered users will need to sign in to download).

Cell architecture from 20,000 feet
A high-level description of the Cell Broadband Engine (CBE), the
Synergistic Processing Elements (SPEs), and how they work together.
2 pages, 27KB | HTML (no registration required)

Cell Broadband Engine Architecture V1.0
Like the Power Architecture, but different -- the CBE Architecture
builds upon knowledge contained in the Power Architecture "books" and
describes the app-level User Mode Environment (UME) and the OS-level
Privileged Mode Environment (PME) in astonishingly rich detail.
327 pages, 4.51MB | PDF (registration required)

Synergistic Processor Unit (SPU) Instruction Set Architecture V1.0
Somewhere between a general-purpose processor and special-purpose
hardware lies the Cell SPU: designed to provide leadership performance
in game, media, and broadband applications, this document describes the
Application Binary Interface (ABI) of the Synergistic Processor Unit
(SPU). Get to know all of its instructions.
30 pages, 1.89MB | PDF (registration required)

SPU Application Binary Interface Specification V1.3
Including low-level system and language binding information, information
on loading and linking, and coding examples, this specification defines
the system interface for SPU-targeted object files to help ensure
maximum binary portability across implementations.
37 pages, 357KB | PDF (registration required)

SPU Assembly Language Specification V1.2
Unleash the full processing power of the SPUs -- you know you want to!
This specification will prove an indispensable aid in your efforts as it
takes you on a carefully-worded journey describing SPU assembly-level
syntax and machine-dependent features for the GNU assembler (but serves
as an example specification for other SPU assemblers as well).
30 pages, 122KB | PDF (registration required)

SPU C/C++ Language Extensions V2.0
Describes the basic data types, operations on these data types, and
directives and program controls required by the CBE specification;
includes sample code.
98 pages, 462KB | PDF (registration required)

Cell forum
Technical discussion of these documents is going on now at the Cell
Architecture forum; comments and errata are welcome there also, or by
e-mail to CBE_Documentat...@us.ibm.com.
1 forum, HTML format (registration required)

--
Del Cecchi
"This post is my own and doesn’t necessarily represent IBM’s positions,
strategies or opinions.”


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paul  
View profile  
 More options Aug 25 2005, 3:24 pm
Newsgroups: comp.arch
From: "Paul" <paulnospamletterjaynospambax...@hotmail.com>
Date: Thu, 25 Aug 2005 20:24:55 +0100
Local: Thurs, Aug 25 2005 3:24 pm
Subject: Re: Cell Documents
"Del Cecchi" <cecchinos...@us.ibm.com> wrote in message

news:3n6ir2F4bpfU1@individual.net...

> For those interested in Cell, there is a document dump at

> http://www-128.ibm.com/developerworks/power/cell/

> for a little bedtime reading.  I think you have to register, but it is
> free.

If you don't fancy registering, its also available here:
http://cell.scei.co.jp/index_e.html

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alexander Terekhov  
View profile  
 More options Aug 25 2005, 4:14 pm
Newsgroups: comp.arch
From: Alexander Terekhov <terek...@web.de>
Date: Thu, 25 Aug 2005 22:14:04 +0200
Local: Thurs, Aug 25 2005 4:14 pm
Subject: Re: Cell Documents

Paul wrote:

[...]

> If you don't fancy registering, its also available here:
> http://cell.scei.co.jp/index_e.html

Do you know the laws of Japan? ;-)

regards,
alexander.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Terje Mathisen  
View profile  
 More options Aug 26 2005, 2:52 am
Newsgroups: comp.arch
From: Terje Mathisen <terje.mathi...@hda.hydro.com>
Date: Fri, 26 Aug 2005 08:52:58 +0200
Local: Fri, Aug 26 2005 2:52 am
Subject: Re: Cell Documents

Paul wrote:
> "Del Cecchi" <cecchinos...@us.ibm.com> wrote in message
> news:3n6ir2F4bpfU1@individual.net...

>>For those interested in Cell, there is a document dump at

>>http://www-128.ibm.com/developerworks/power/cell/

>>for a little bedtime reading.  I think you have to register, but it is
>>free.

> If you don't fancy registering, its also available here:
> http://cell.scei.co.jp/index_e.html

Thanks, I just got them all.

Noturally, I started reading the SPU asm manual, and that makes it
immediately obvious that this is a cpu directly targeted at MPEG style
video processing:

  absdb         Absolute difference of bytes
  avgb          Average bytes: dest = (a+b+1) >> 1 (MPEG interpolation)

  ct            Carry Generate: Target = carry out of (A+B)
  addx          Add word extended: Target = A+B+(Target & 1)

Notice the last one! It uses the least significant bit of each part of
the target register as input to an AddWithCarry operation, which means
that you need three read ports.

This pair of opcodes seems to me to be meant as building blocks for
extended/arbitrary precision calculations.

It has a full set of branch instructions that as a side-effect either
enable or disable interrupts, i.e. critical sections are supposed to be
handled this way.

It seems to handle sub-register size operations with a set of opcodes,
where one of a group of GenerateMask operations is used to generate an
input mask for a general shuffle operation.
...
There's a bunch of generalized three-input FMAC opcodes, all working on
SIMD data, like fnms (T = Acc - (a * b).

It has fsqest and frest to generate approximate reciprocal square root
and reciprocal lookup values. However, these operations does not seem to
deliver results in a standard format, instead each resulting element
consists of two parts, a base and a step, so that a following fi
(Floating Interpolate) can improve upon the table lookup results.

I'm guessing you'd then want one NR iteration to get somewhere close to
IEEE single precision.

The shufb (Shuffle bytes) opcode seems like a small extension to the
Altivec Permute, in that in addition to using 5 bits to select one of 32
possible input bytes, and can also specify three different immediate
values (0, 0x80 and 0xFF), which would be needed to make it work with
the GenerateMask operations mentioned above.

All in all a pretty general set of opcodes for SIMD data processing, it
is particularly obvious in the way each of the possible operations has
forms to work on either a set of input data (reg or immediate), or on
it's complement. This saves a lot of bubble-introducing mask setup
operations, but is normally not considered to be required on a regular cpu.

Terje
--
- <Terje.Mathi...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Zeb  
View profile  
 More options Aug 26 2005, 3:51 am
Newsgroups: comp.arch
From: Zeb <zeb_...@yahoo.com>
Date: Fri, 26 Aug 2005 07:51:26 GMT
Local: Fri, Aug 26 2005 3:51 am
Subject: Re: Cell Documents
One significant point is that single precision (32-bit) floating point
arithmeic is only available as round-to-zero mode. This may be fine for
some graphics algorithms, but for large scale computing it just won't
do. If you want round-to-nearest mode on CELL you'll have to go to
double precision, at half the throughput. At that point you are up
against commoner quad processors like POWER, etc.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric P.  
View profile  
 More options Aug 26 2005, 8:30 am
Newsgroups: comp.arch
From: "Eric P." <eric_patti...@sympaticoREMOVE.ca>
Date: Fri, 26 Aug 2005 08:30:55 -0400
Local: Fri, Aug 26 2005 8:30 am
Subject: Re: Cell Documents

Terje Mathisen wrote:

> ...
> It has a full set of branch instructions that as a side-effect either
> enable or disable interrupts, i.e. critical sections are supposed to be
> handled this way.

Why would diddling interrupt enable as a branch side effect be a
benefit, as compared to the normal explicit disable & enable
instructions?

Eric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Maynard Handley  
View profile  
 More options Aug 26 2005, 9:56 am
Newsgroups: comp.arch
From: Maynard Handley <nam...@name99.org>
Date: Fri, 26 Aug 2005 13:56:45 GMT
Local: Fri, Aug 26 2005 9:56 am
Subject: Re: Cell Documents
What I find absolutely bizarre (and not at all encouraging for the
future of Cell as a general purpose processor as IBM and Sony people
have occasionally claimed) is that there is STILL no document in this
lot that describes how to handle the very real issues of models for
handling the miniscule memory space available to each SPU.
I've said it before and will say it again; this is the Achilles' heel of
these beasts. In this day and age, people are simply not interested in
dicking around with segments, overlays and all that weird crap from the
80's. Sure, they will do it for games; I don't deny the value of this
part in games and game-like boxes (PVRs, DTVs, audio-mixing consoles and
so on), but a general purpose box (running, presumably, Linux), where I
care about all round performance --- I want Apache and MySQL and gcc and
perl and php to run fast --- I just don't see it.

My guess is that some of these "workstations" will ship with some very
specialized code on them that does one thing and one thing only well,
maybe H264 encode, maybe some bio algorithm, and that'll be the
face-saving exit strategy from this bizarre claim that never made sense
in the first place.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ricardo Bugalho  
View profile  
 More options Aug 26 2005, 10:09 am
Newsgroups: comp.arch
From: Ricardo Bugalho <rbuga...@ibili.uc.pt>
Date: Fri, 26 Aug 2005 15:09:40 +0100
Local: Fri, Aug 26 2005 10:09 am
Subject: Re: Cell Documents
Hi,
peak double precision throughput in CELL is expected to be one tenth of
peak single precision throughput.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Terje Mathisen  
View profile  
 More options Aug 26 2005, 2:54 pm
Newsgroups: comp.arch
From: Terje Mathisen <terje.mathi...@hda.hydro.com>
Date: Fri, 26 Aug 2005 20:54:36 +0200
Local: Fri, Aug 26 2005 2:54 pm
Subject: Re: Cell Documents

Eric P. wrote:
> Terje Mathisen wrote:

>>...
>>It has a full set of branch instructions that as a side-effect either
>>enable or disable interrupts, i.e. critical sections are supposed to be
>>handled this way.

> Why would diddling interrupt enable as a branch side effect be a
> benefit, as compared to the normal explicit disable & enable
> instructions?

It is only a benefit if this pair of instructions must be done a _lot_,
which is why I assume it is the intended way of operation.

Terje

--
- <Terje.Mathi...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christian Bau  
View profile  
 More options Aug 26 2005, 4:30 pm
Newsgroups: comp.arch
From: Christian Bau <christian....@cbau.freeserve.co.uk>
Date: Fri, 26 Aug 2005 21:30:28 +0100
Local: Fri, Aug 26 2005 4:30 pm
Subject: Re: Cell Documents
In article <pan.2005.08.26.14.09.35.720...@ibili.uc.pt>,
 Ricardo Bugalho <rbuga...@ibili.uc.pt> wrote:

> Hi,
> peak double precision throughput in CELL is expected to be one tenth of
> peak single precision throughput.

However, peak double precision throughput will be very easy to achieve,
while getting peak single precision throughput is really really hard.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
anonymous  
View profile  
 More options Aug 26 2005, 6:03 pm
Newsgroups: comp.lang.java.machine, comp.arch
From: "anonymous" <cpu16x1...@wmconnect.com>
Date: 26 Aug 2005 15:03:58 -0700
Local: Fri, Aug 26 2005 6:03 pm
Subject: Re: Cell Documents

I noticed while reading your response, about CELL's application
specific instruction set ( language extensions ) included with using
IBM's (Toshiba/IBM)  CELL processor, for MPEG and bit slice operations,
 in contrast,

VLIW SMP MPP FORTH is a hypothetical solution for the MIMD shared
memory problem, synchronizing data access AND maintaining memory cache
consistency.

These problems where both SIMPLY solved by applied using a ( SMP MPP )
matrix microchip ( VLIW FORTH ) microcode engine architecture,
similarly, as in the following references from Mr. Moore and myself,(
URLs, *SMP MPP VLIW for machine code Java, Forth, C, Scheme, etc.,
http://groups.google.com/group/comp.lang.java.machine/msg/b400d03ddc0...
  , *Java decode alternative,
http://groups.google.com/group/comp.lang.java.machine/msg/38236e7c426...
 ) ( Or, google usenet, ask The Senate, write IBM/Defense, request
copies of my notes, etc.,  for VLIW SMP MPP and FORTH information. )

The essence of VLIW SMP MPP FORTH is an efficient scalable parallel
microprocessor architecture, and, importantly, is for manufacturer
customized instruction, such like that of the new IBM/Toshiba CELL
processor.  ( only a few hundred of four thousand instruction openings
are defined, anyway, by me.)  Those open, undefined instructions,
provide an unlimited sub-expression reduction possibility.  ( If a
vertical market application needs a certain instruction feature, maybe,
for IBM or Intel, (
http://groups.google.com/group/comp.sys.ibm.pc.hardware.chips/msg/34d...
), FPU16, VID16, NET16, ..., CPU16s function as the traffic light
network for maybe a wide variety of vehicles, ( super scalable
architecture )),.

Simply, IBM/Toshiba CELL is easily outperformed, surpassed thru an
ultra high fabrication efficiency. ( Theory of co-divisional of
electronics and math expression limit(s),
http://groups.google.com/group/sci.math/msg/b5d2f119b8eeee56?dmode=so...
)

---

President Clinton is a jerk.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Reilly  
View profile  
 More options Aug 26 2005, 7:42 pm
Newsgroups: comp.arch
From: Andrew Reilly <andrew-newsp...@areilly.bpc-users.org>
Date: Sat, 27 Aug 2005 09:42:38 +1000
Local: Fri, Aug 26 2005 7:42 pm
Subject: Re: Cell Documents

It would also make it easy to do atomic os trap style things pretty
easily.  When masking off interrupts is separate from branching, you often
(well, I've encountered it, anyway) have to muck about to check pipeline
latencies to make sure that the interrupts are truly off at the time you
branch.

Since the SPEs have unshared memory, interrupt masking is as much as you
should need for atomic operations.

Mind you: I didn't think that the SPEs were intended to be doing much
multiprogramming themselves: they've got quite heavyweight state (BIG
register file) and no memory protection.  They're clearly intended for
running single tasks to completion.

And then on the other hand, the obvious programming model is to operate
entirely within completion call-backs from the DMA engine that's running
the off-chip memory access program.  Maybe it's reasonable to code small
queue manipulation routines without doing a full state swap.

Interesting beastie, anyway.  I'd love to see some information about the
software architecture that they've obviously got in mind to hold
everything together.

--
Andrew


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Reilly  
View profile  
 More options Aug 26 2005, 7:50 pm
Newsgroups: comp.arch
From: Andrew Reilly <andrew-newsp...@areilly.bpc-users.org>
Date: Sat, 27 Aug 2005 09:50:07 +1000
Local: Fri, Aug 26 2005 7:50 pm
Subject: Re: Cell Documents

They could reasonably ship with an optimised BLAS/LAPAC/FFTPAC library
that made good use of them.  Then you'd just use it like a vector
supercomputer of some sort.  Probably get quite good performance on that
sort of program.

I saw an article recently where ClearSpeed had done something like this
with their "50GFlop" co-processor card: they wrote a library that was
compatible with the Intel Performance Primitive (IPP) numeric library.
Code written against that would "just work (faster)".

Years and years ago, that was the model for interaction with the various
AT&T DSP32C and i860 floating point accellerator cards that were available.

Not as flexible as getting your own inner loops sped-up, but lots of real
work can be done anyway.

--
Andrew


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wes Felter  
View profile  
 More options Aug 26 2005, 10:10 pm
Newsgroups: comp.arch
From: Wes Felter <wes...@felter.org>
Date: Sat, 27 Aug 2005 02:10:00 GMT
Subject: Re: Cell Documents
On 2005-08-26 08:56:45 -0500, Maynard Handley <nam...@name99.org> said:

> What I find absolutely bizarre (and not at all encouraging for the
> future of Cell as a general purpose processor as IBM and Sony people
> have occasionally claimed) is that there is STILL no document in this
> lot that describes how to handle the very real issues of models for
> handling the miniscule memory space available to each SPU.

Apparently this is still a research topic:

http://www.research.ibm.com/cellcompiler/compiler-mem-abstract.htm

--
Wes Felter - wes...@felter.org - http://felter.org/wesley/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric P.  
View profile  
 More options Aug 27 2005, 9:28 am
Newsgroups: comp.arch
From: "Eric P." <eric_patti...@sympaticoREMOVE.ca>
Date: Sat, 27 Aug 2005 09:28:27 -0400
Local: Sat, Aug 27 2005 9:28 am
Subject: Re: Cell Documents

The docs don't say anything about it helping with pipelines.
Typically if pipelines need to be drained that is done by the
hardware because otherwise it makes the software realllly
dependent on a specific hardware implementation. Not good.

It is also very unusual to manage interrupts this way.
Usually you want to save/push the current interrupt enable state
and disable, then restore the prior state. Yet I see no
ability to do this in the SPE. The only time interrupts
are ever explicitly enabled is during the boot sequence.
Doing explicit enables at any other time makes interrupt subroutines
impossible because the lower level routines reenable interrupts
when they should not. That is an interrupt management 101 mistake.

So I still don't see why it is designed this way.

> Since the SPEs have unshared memory, interrupt masking is as much as you
> should need for atomic operations.

> Mind you: I didn't think that the SPEs were intended to be doing much
> multiprogramming themselves: they've got quite heavyweight state (BIG
> register file) and no memory protection.  They're clearly intended for
> running single tasks to completion.

At a minimum a slave needs at least 1 Master triggered interrupt
so it can interrupt a running task to terminate it and cause the
slave to move to the next item in its work queue. This would happen
if a parent thread died in the master cpu or during code development
to abort an errant infinite loop. In practice this would be a
general Command Message From Master interrupt, and then the
message would say what to do.

You would also want debugging and single stepping capabilities
and need to be able to force a register bank dump and/or load.

There are also exceptions: floating point, maybe invalid address,
maybe integer overflow, stack overflow, etc.

So there are a variety of interrupts and traps even the simplest
slave processor needs.

> And then on the other hand, the obvious programming model is to operate
> entirely within completion call-backs from the DMA engine that's running
> the off-chip memory access program.  Maybe it's reasonable to code small
> queue manipulation routines without doing a full state swap.

It still needs an absolutlely minimal monitor for control
and communications.

> Interesting beastie, anyway.  I'd love to see some information about the
> software architecture that they've obviously got in mind to hold
> everything together.

> --
> Andrew

Eric

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Gibson  
View profile  
 More options Aug 28 2005, 12:59 am
Newsgroups: comp.arch
From: "Alex Gibson" <n...@alxx.net>
Date: Sun, 28 Aug 2005 14:59:49 +1000
Local: Sun, Aug 28 2005 12:59 am
Subject: Re: Cell Documents

"Maynard Handley" <nam...@name99.org> wrote in message

news:name99-0F2E35.06564226082005@localhost...

Sounds pretty similar to the existing Cradle MDSP chips.

www.cradle.com
http://www.cradle.com/documentation/datasheets.shtml
http://www.cradle.com/documentation/application_notes.shtml

http://www.cradle.com/downloads/CT3400_Datasheet_DS0209.pdf
http://www.cradle.com/downloads/3056_CT3600_pbV3.pdf

The CT3600 supposedly can handle 16 streams of MP4 SP L3  at once.

They give you semaphore registers in the PE (processing elements)
which are risc cores  which you can use to manage the DSE's ("dsp cores")

In the CT3400 chips you get one processing quad - 4 PE's and 8 DSE's
and one IO quad (2 PE's and 2 MTE's (memory transfer engines).

PE's are call GPP in CT36xx chips.

Can only program the PE's in c , the rest is done using CLA - c like
assembly language.

Claim 29.5 billian macs per second on 8 bit data for the CT3400
Claim up to 96 Giga MACs for the CT3616.

Alex


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alexander Kjeldaas  
View profile  
 More options Aug 29 2005, 5:11 am
Newsgroups: comp.arch
From: Alexander Kjeldaas <astor-n...@fast.no>
Date: Mon, 29 Aug 2005 11:11:45 +0200
Local: Mon, Aug 29 2005 5:11 am
Subject: Re: Cell Documents

For one thing, the "Branch Indirect and Set Link if External Data" can
be very useful to reduce the cost of having "safe points" for
synchronization with a garbage collector.  If you know you will only be
interrupted by your GC at specific points, you can do all sorts of
"illegal" stuff with your registers between these points.

You get a safe point with one instruction in this case, compared to
load, compare, branch on "normal" architectures.  The cost is having to
reserve one register for the indirect branch address.

astor


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joe Seigh  
View profile  
 More options Aug 29 2005, 8:46 am
Newsgroups: comp.arch
From: Joe Seigh <jseigh...@xemaps.com>
Date: Mon, 29 Aug 2005 08:46:50 -0400
Local: Mon, Aug 29 2005 8:46 am
Subject: Re: Cell Documents

RCU, which is a form of GC, has the concept of safe points.  They're called
quiescent states.  And the overhead is even lower, usually just a store
into local storage.  No call to another function for "safe point" processing.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric P.  
View profile  
 More options Aug 29 2005, 10:30 am
Newsgroups: comp.arch
From: "Eric P." <eric_patti...@sympaticoREMOVE.ca>
Date: Mon, 29 Aug 2005 10:30:39 -0400
Local: Mon, Aug 29 2005 10:30 am
Subject: Re: Cell Documents

On closer inspection, the SPUs are simpler than a slave processor.
This is not a classic master slave asymmetric multiprocessor.
SPUs do not have exceptions nor does it appear they require their
own control program.

The PPE (master) has direct control over each SPU through 3
control/status registers. These allow the PPE to load/read an SPU
program counter, run/stop the SPU, and a status register shows
the reason why the SPU stopped (e.g. illegal instruction).
There is also a way for the PPE to to single step an SPU.
SPUs run until there is a problem or the job is complete and
just stop. If anything goes wrong, a status bit indicates so and
the PPE must diagnose the problem.

If an SPU job wants to take action based on any condition,
e.g. floating point underflow, then it must manually test
for the condition and branch to handler code.

The docs indicate there is some method for PPE to load and
unload a whole register context but don't say what it is.
This would only be needed for debugging as SPUs are not
intended to context switch, just run a single job to completion.

In such an architecture, that the SPU even supports interrupts seems
somewhat anomalous because the PPE is responsible for its control.
The architects may be thinking that an SPU can also act as a real
time controller and/or IO coprocessor. Needs further investigation.

Eric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »