How Many Processor Cores Are Enough?

27 views
Skip to first unread message

Jon Forrest

unread,
Sep 27, 2006, 5:10:03 PM9/27/06
to
Today I read that we're going to get quad-core processors
in 2007, and 80-core processors in 5 years. This has
got me to wondering where the point of diminishing returns
is for processor cores.

We've all seen those bench marks in which the same test is
run on a system with 128MB of RAM, then 256MB, then 512MB, ...
What usually happens is the the first increase results in
a big improvement, the next increase in a smaller improvement,
and so on. At some point, the size of the improvement doesn't
justify its cost.

I suspect we'd see the same kind of results if we could
increase the number of processor cores. The big difference
here is that additional processor cores only help if there's
work for them to do. Modern operating systems use multiple
processes and threads but I wonder how many processes/threads
are runnable at any one time in a general purpose computer
running general purpose jobs.

Where do you think the point of diminishing returns might
be?

Jon Forrest

Casper H.S. Dik

unread,
Sep 27, 2006, 6:04:17 PM9/27/06
to
Jon Forrest <for...@ce.berkeley.edu> writes:

>Today I read that we're going to get quad-core processors
>in 2007, and 80-core processors in 5 years. This has
>got me to wondering where the point of diminishing returns
>is for processor cores.

Sun has been shipping 8 core CPUs since, I think, late last year.

It all depends on the bandwidth. (Which means it ain't a pretty
picture for Intel as long as they keep the FSB)

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Joe Seigh

unread,
Sep 27, 2006, 6:23:16 PM9/27/06
to
Jon Forrest wrote:
> Today I read that we're going to get quad-core processors
> in 2007, and 80-core processors in 5 years. This has
> got me to wondering where the point of diminishing returns
> is for processor cores.
>
> We've all seen those bench marks in which the same test is
> run on a system with 128MB of RAM, then 256MB, then 512MB, ...
> What usually happens is the the first increase results in
> a big improvement, the next increase in a smaller improvement,
> and so on. At some point, the size of the improvement doesn't
> justify its cost.

That's called scalability.

>
> I suspect we'd see the same kind of results if we could
> increase the number of processor cores. The big difference
> here is that additional processor cores only help if there's
> work for them to do. Modern operating systems use multiple
> processes and threads but I wonder how many processes/threads
> are runnable at any one time in a general purpose computer
> running general purpose jobs.
>
> Where do you think the point of diminishing returns might
> be?
>

It all depends. The industry has a real chicken and egg problem.
Applications won't start adapting to the new architecture for a
while yet. And what the new architecture will actually end up
being isn't real clear. It could be shared memory, NUMA or otherwise.
It could be based on message passing. Or something else entirely.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Message has been deleted

Dennis M. O'Connor

unread,
Sep 27, 2006, 7:01:50 PM9/27/06
to
"Jon Forrest" <for...@ce.berkeley.edu> wrote ...

> Today I read that we're going to get quad-core processors
> in 2007, and 80-core processors in 5 years. This has
> got me to wondering where the point of diminishing returns
> is for processor cores.

It depends on your application. Examples: A web server
like Apache can effectively use a lot more cores than
an old DOS game. But future games on the PC may be
able to effectively exploit every core you can provide them.
--
Dennis M. O'Connor dm...@primenet.com


Jon Forrest

unread,
Sep 27, 2006, 7:48:51 PM9/27/06
to Dennis M. O'Connor

No doubt, but I'm talking about general purpose computing.

I bet there is diminishing return curve for web servers
too. The curve would look different at each site, but
it would exist.

Jon

Rick Jones

unread,
Sep 27, 2006, 8:24:27 PM9/27/06
to
Casper H.S. Dik <Caspe...@sun.com> wrote:
> It all depends on the bandwidth. (Which means it ain't a pretty
> picture for Intel as long as they keep the FSB)

Is it really just a question of bandwidth? I would have thought that
application (I'm assuming the system vendors deal with the OSes)
behaviour would be equally important.

How different is having an FSB for a single socket with N cores on the
chip than having a "link" for a single-socket with N cores on the
chip?

I would think that as the cores per chip increase, the issues that the
folks selling large SMP's deal with will become known to the
single-socket crowd.

rick jones
--
portable adj, code that compiles under more than one compiler
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

Chris Thomasson

unread,
Sep 27, 2006, 8:37:11 PM9/27/06
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:tuCdnVxRCuGpZIfY...@comcast.com...

> Jon Forrest wrote:
>> Today I read that we're going to get quad-core processors
>> in 2007, and 80-core processors in 5 years. This has
>> got me to wondering where the point of diminishing returns
>> is for processor cores.
>>
>> We've all seen those bench marks in which the same test is
>> run on a system with 128MB of RAM, then 256MB, then 512MB, ...
>> What usually happens is the the first increase results in
>> a big improvement, the next increase in a smaller improvement,
>> and so on. At some point, the size of the improvement doesn't
>> justify its cost.
>
> That's called scalability.

Indeed.

Applications should be designed to "automagically" adapt to a situation that
could involve more and more processors; we already know how to do it. I
would assume that a lot of people that frequent this group know how to do it
too.

Does the "average" programmer know how to do it? Damn!

I have a solution with a simple and fairly powerful API that controls my
virtually zero-overhead memory allocator and PDR synchronization logic; its
as easy to use as POSIX Threads... It can possibly help so-called "normal"
programmers address this "so-called scalability and throughput" crises the
major chip vendors are currently experiencing... Oh well...

Humm...

>> I suspect we'd see the same kind of results if we could
>> increase the number of processor cores. The big difference
>> here is that additional processor cores only help if there's
>> work for them to do. Modern operating systems use multiple
>> processes and threads but I wonder how many processes/threads
>> are runnable at any one time in a general purpose computer
>> running general purpose jobs.
>>
>> Where do you think the point of diminishing returns might
>> be?
>>
>
> It all depends. The industry has a real chicken and egg problem.
> Applications won't start adapting to the new architecture for a
> while yet.

You think you will live to see the day when major database vendors use PDR
and lock-free reader patterns for parallel queries; screw all of that
per-table, per-row, whatever locking crap for readers... Heck, writers can
even be lock-free, hurray for the lock-free offset trick!

?


> And what the new architecture will actually end up
> being isn't real clear. It could be shared memory, NUMA or otherwise.
> It could be based on message passing. Or something else entirely.

I would be happy with NUMA. As you know, the PDR solution is highly
adaptable. It basically can give you performance across different types of
cache coherency protocols. Therefore, I hope the model of the future has
many cores, very-deep pipelines, and fairly weak cache coherency
system(s)...


Rick Jones

unread,
Sep 27, 2006, 8:26:18 PM9/27/06
to
Jon Forrest <for...@ce.berkeley.edu> wrote:
> I bet there is diminishing return curve for web servers too. The
> curve would look different at each site, but it would exist.

Yes - particularly if the web server was constrained to a single NIC -
albeit some NICs out there now can spread their interrupt load across
cores.

Increasing core counts won't do all that much for individual TCP
connections - getting very much parallelism in a single connection
isn't really possible.

rick jones
--
a wide gulf separates "what if" from "if only"

Terje Mathisen

unread,
Sep 28, 2006, 12:34:51 AM9/28/06
to
Casper H.S. Dik wrote:
> Jon Forrest <for...@ce.berkeley.edu> writes:
>
>> Today I read that we're going to get quad-core processors
>> in 2007, and 80-core processors in 5 years. This has
>> got me to wondering where the point of diminishing returns
>> is for processor cores.
>
> Sun has been shipping 8 core CPUs since, I think, late last year.
>
> It all depends on the bandwidth. (Which means it ain't a pretty
> picture for Intel as long as they keep the FSB)

That 80-core Intel demo chip has a vertically mounted SRAM chip as well,
providing 20 MB (afair) directly to each code.

For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold
everything in a nicely distributed manner, you're going to see _very_
impressive performance indeed, particularly since they also have a
(presumably very fast) mesh network connecting the individual cores.

Intel's press releases talks about aggregate bandwidth in the TB/s range
for this 80-core chip, from which we can calculate that each core must
have at least 12.5 GB/s.

Since the SRAM is directly attached, a 256-bit interface seems very
reasonable, in which case the SRAM can idle along at about 400 Mhz.

Alternatively, a somewhat narrower interface running at higher frequency
would give the same result.

Seems reasonable to me!

And yes, I'd like to have one and see what I could do with it. :-)

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Dennis M. O'Connor

unread,
Sep 28, 2006, 1:50:12 AM9/28/06
to
"Jon Forrest" <for...@ce.berkeley.edu> wrote i...

> Dennis M. O'Connor wrote:
>> "Jon Forrest" <for...@ce.berkeley.edu> wrote ...
>>> Today I read that we're going to get quad-core processors
>>> in 2007, and 80-core processors in 5 years. This has
>>> got me to wondering where the point of diminishing returns
>>> is for processor cores.
>>
>> It depends on your application. Examples: A web server
>> like Apache can effectively use a lot more cores than
>> an old DOS game. But future games on the PC may be
>> able to effectively exploit every core you can provide them.
>
> No doubt, but I'm talking about general purpose computing.

What's this "general purpose computing" you speak of ?
Is it Spider Solitaire ? MS Office ? Gnu CC ? Web browsing ?
Or is it protein folding ?

Nick Maclaren

unread,
Sep 28, 2006, 5:40:48 AM9/28/06
to

In article <%CESg.422$fP5...@news.cpqcorp.net>,

Rick Jones <rick....@hp.com> writes:
|> Casper H.S. Dik <Caspe...@sun.com> wrote:
|> > It all depends on the bandwidth. (Which means it ain't a pretty
|> > picture for Intel as long as they keep the FSB)
|>
|> Is it really just a question of bandwidth? I would have thought that
|> application (I'm assuming the system vendors deal with the OSes)
|> behaviour would be equally important.

Yes and no. The bandwidth is definitely the leading bottleneck.

|> How different is having an FSB for a single socket with N cores on the
|> chip than having a "link" for a single-socket with N cores on the
|> chip?

Not at all.

|> I would think that as the cores per chip increase, the issues that the
|> folks selling large SMP's deal with will become known to the
|> single-socket crowd.

Yup. They are already hitting them.


But, to answer the question:

For most workstations, the answer is probably 4 (at least in the near
future - see later), because few workloads have more than a few genuinely
active threads. Very fancy graphics is another matter.

For servers and embarassingly parallel HPC, the answer is until you have
saturated the bandwidth (modified by the demands of the applications).
Multiple cores is merely a cheaper form of multiple sockets.

For genuinely parallel, high communication, applications, the answer is
how parallelisable is your application? And the answer to THAT (outside
HPC) is generally "2 way, when I am lucky".

The last is not a law of nature, but isn't going to change any time soon,
as it is caused by the programming paradigms that people use.


Regards,
Nick Maclaren.

Joe Seigh

unread,
Sep 28, 2006, 7:47:55 AM9/28/06
to
Jon Forrest wrote:
> Today I read that we're going to get quad-core processors
> in 2007, and 80-core processors in 5 years. This has
> got me to wondering where the point of diminishing returns
> is for processor cores.
...

> Where do you think the point of diminishing returns might
> be?
>

Rethinking this, the question should be what would you do
with an unlimited number of processors?

For one thing, the operating system would change. Interrupt
handlers for asynchronous interrupts would go away. You'd have
dedicated, possibly special purpose, processors to handle devices.
They're already talking about this with "coprocessors".

The scheduler would go away. No need for it when every thread has
it's own dedicated hardware thread. This would affect realtime
programming. No need to play games with thread priorities and
any of the timeouts that could be caused by not being scheduled
quickly enough, i.e. no dispatch latency.

Polling and IPC mechanisms would have to be worked on a bit. E.g.
make things like MONITOR/MWAIT efficient. Possibly some new
instructions. The hw architects would have to be a little more
proactive here. The latest proposals from Intel seem to be a little
lacking here. What's with architectual extensions? It seems to be
a "ready to fight the last war" kind of thing. Who cares if you can
run a 20 year old application real fast.

Distributed algorithms would become more important. How do you
coordinate threads and how do you do it efficiently from a hw
point of view.

Etc... (more stuff when I think of it)

Bill Todd

unread,
Sep 28, 2006, 8:58:28 AM9/28/06
to
Terje Mathisen wrote:
> Casper H.S. Dik wrote:
>> Jon Forrest <for...@ce.berkeley.edu> writes:
>>
>>> Today I read that we're going to get quad-core processors
>>> in 2007, and 80-core processors in 5 years. This has
>>> got me to wondering where the point of diminishing returns
>>> is for processor cores.
>>
>> Sun has been shipping 8 core CPUs since, I think, late last year.
>>
>> It all depends on the bandwidth. (Which means it ain't a pretty
>> picture for Intel as long as they keep the FSB)
>
> That 80-core Intel demo chip has a vertically mounted SRAM chip as well,
> providing 20 MB (afair) directly to each code.
>
> For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold
> everything in a nicely distributed manner, you're going to see _very_
> impressive performance indeed, particularly since they also have a
> (presumably very fast) mesh network connecting the individual cores.

Well, since IIRC the processing cores are running at a princely 1.91 MHz
(allegedly not a typo) I'm not sure how truly impressive that demo's
performance would be: perhaps better to wait for the real thing in
around 5 years' time.

As for the SRAM, I rather suspect that the 20 MB is the *total* figure
shared among the 80 cores: if Intel could really get 1.6 GB of SRAM on
anything like a single chip, we'd be seeing a lot more cache in Itanics.

- bill

Terje Mathisen

unread,
Sep 28, 2006, 11:25:32 AM9/28/06
to
Joe Seigh wrote:
> Jon Forrest wrote:
>> Today I read that we're going to get quad-core processors
>> in 2007, and 80-core processors in 5 years. This has
>> got me to wondering where the point of diminishing returns
>> is for processor cores.
> ...
>> Where do you think the point of diminishing returns might
>> be?
>>
>
> Rethinking this, the question should be what would you do
> with an unlimited number of processors?
>
> For one thing, the operating system would change. Interrupt
> handlers for asynchronous interrupts would go away. You'd have
> dedicated, possibly special purpose, processors to handle devices.
> They're already talking about this with "coprocessors".

You still need some what to handle async inter-core communication! I.e.
I believe that you really don't have any choice here, except to make
most of your cores interruptible.

This leads back to the old thread about having multiple cores which are
compatible but not symmetrical: I.e. some of them are optimized for long
timeslots doing stream/HPC/serious number crunching, using a
microachitecture like the P4 which really doesn't like to be interrupted.

Other cores could be much more Pentium-like: Possibly superscalar, but
in-order, with very low branch miss penalty, and optimized for
twisty/branchy/hard to predict code.

As long as these cpus are compatible, an OS which knows that some
processes prefer to run on a given kind of cpu could do quite well, and
the programming task becomes _much_ easier than for a disjoint set as
used in the PPC/Cell combination.

> The scheduler would go away. No need for it when every thread has
> it's own dedicated hardware thread. This would affect realtime
> programming. No need to play games with thread priorities and
> any of the timeouts that could be caused by not being scheduled
> quickly enough, i.e. no dispatch latency.

I believe you'd still need it, but not for anything that's timecritical.
I.e. after sufficient time with tens of cores/hundreds of threads
available, programming patterns to use/abuse them all will turn up, and
you'll run out of resources anyway. :-(

Terje Mathisen

unread,
Sep 28, 2006, 11:29:34 AM9/28/06
to
Bill Todd wrote:

> Terje Mathisen wrote:
>> That 80-core Intel demo chip has a vertically mounted SRAM chip as
>> well, providing 20 MB (afair) directly to each code.
>>
>> For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold
>> everything in a nicely distributed manner, you're going to see _very_
>> impressive performance indeed, particularly since they also have a
>> (presumably very fast) mesh network connecting the individual cores.
>
> Well, since IIRC the processing cores are running at a princely 1.91 MHz
> (allegedly not a typo) I'm not sure how truly impressive that demo's
> performance would be: perhaps better to wait for the real thing in
> around 5 years' time.

I don't think we have other option than to wait, no matter what the
current speed is.

However, if they really run at 2 MHz, then the claimed TB/s total
bandwidth seems totally bogus, even if they also include 4 sets of
cpu-cpu mesh links.


>
> As for the SRAM, I rather suspect that the 20 MB is the *total* figure
> shared among the 80 cores: if Intel could really get 1.6 GB of SRAM on
> anything like a single chip, we'd be seeing a lot more cache in Itanics.

Oops, you're almost certainly right. :-(

Oh, well. It was fun as long as the fantasy lasted. :-)

Eugene Miya

unread,
Sep 28, 2006, 11:56:51 AM9/28/06
to
In article <efep6h$vbl$1...@agate.berkeley.edu>,
Jon Forrest <for...@ce.berkeley.edu> wrote:
>quad-core
>80-core
...

>Where do you think the point of diminishing returns might be?

Ask Dave Patterson on the 5th floor of Soda about PIMs.

--

Nick Maclaren

unread,
Sep 28, 2006, 11:52:36 AM9/28/06
to

In article <e46tu3-...@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.m...@hda.hydro.com> writes:

|> Joe Seigh wrote:
|> >
|> > For one thing, the operating system would change. Interrupt
|> > handlers for asynchronous interrupts would go away. You'd have
|> > dedicated, possibly special purpose, processors to handle devices.
|> > They're already talking about this with "coprocessors".
|>
|> You still need some what to handle async inter-core communication! I.e.
|> I believe that you really don't have any choice here, except to make
|> most of your cores interruptible.

Nope. I posted a design that did away with that, at the hardware level,
and there have been lots of systems that proved you could do without
them at the software level. You wouldn't start from existing software
designs and interfaces, of course :-)


Regards,
Nick Maclaren.

Nick Maclaren

unread,
Sep 28, 2006, 11:54:22 AM9/28/06
to

When are we going to see them, then?

Seriously, they have been talked about as imminent for 20 years, so
either there is a major problem or the IT industry is suffering a
collective failure of nerve. Or both.


Regards,
Nick Maclaren.

Joe Seigh

unread,
Sep 28, 2006, 12:19:16 PM9/28/06
to
Terje Mathisen wrote:

> Joe Seigh wrote:
>> Rethinking this, the question should be what would you do
>> with an unlimited number of processors?
>>
>> For one thing, the operating system would change. Interrupt
>> handlers for asynchronous interrupts would go away. You'd have
>> dedicated, possibly special purpose, processors to handle devices.
>> They're already talking about this with "coprocessors".
>
>
> You still need some what to handle async inter-core communication! I.e.
> I believe that you really don't have any choice here, except to make
> most of your cores interruptible.

Async presumes processors are a scarce commodity and you want to have it
do other work while it's waiting for something to be done. That goes
away if you have unlimited numbers of processors.

>
>> The scheduler would go away. No need for it when every thread has
>> it's own dedicated hardware thread. This would affect realtime
>> programming. No need to play games with thread priorities and
>> any of the timeouts that could be caused by not being scheduled
>> quickly enough, i.e. no dispatch latency.
>
>
> I believe you'd still need it, but not for anything that's timecritical.
> I.e. after sufficient time with tens of cores/hundreds of threads
> available, programming patterns to use/abuse them all will turn up, and
> you'll run out of resources anyway. :-(

> \

The OP posed the question of whether you can have too many cores. You're
saying there will never be enough? :)

Nick Maclaren

unread,
Sep 28, 2006, 12:25:32 PM9/28/06
to

In article <srKdnUtcqoTDaIbY...@comcast.com>,
Joe Seigh <jsei...@xemaps.com> writes:

|> Terje Mathisen wrote:
|> >
|> > You still need some what to handle async inter-core communication! I.e.
|> > I believe that you really don't have any choice here, except to make
|> > most of your cores interruptible.
|>
|> Async presumes processors are a scarce commodity and you want to have it
|> do other work while it's waiting for something to be done. That goes
|> away if you have unlimited numbers of processors.

Not really. Let's assume that you do have such an infinite number of
threads, and thread A wants to prod thread B at a time it is doing something
else. That can't be done without some form of asynchronicity.

As you may remember, my design used FIFOs for inter-thread communication.
That avoids the problem of interrupts, but is no less asynchronous.

TANSTAAFL.


Regards,
Nick Maclaren.

Mitch...@aol.com

unread,
Sep 28, 2006, 12:31:01 PM9/28/06
to

Nick Maclaren wrote:
> In article <%CESg.422$fP5...@news.cpqcorp.net>,
> Rick Jones <rick....@hp.com> writes:
> |> Casper H.S. Dik <Caspe...@sun.com> wrote:
> |> > It all depends on the bandwidth. (Which means it ain't a pretty
> |> > picture for Intel as long as they keep the FSB)
> |>
> |> Is it really just a question of bandwidth? I would have thought that
> |> application (I'm assuming the system vendors deal with the OSes)
> |> behaviour would be equally important.
>
> Yes and no. The bandwidth is definitely the leading bottleneck.

Er, no:: there are more problems that are latency limited than are
bandwidth limited.

However, the FSB is a similar impediment in both cases.

The whole reason that threading is taking off is the inability of
(implementable) computer (micro)architectures to reduce memory latency.
Threading is a means to tollerate that latency.

<snip>
> Regards,
> Nick Maclaren.

Regards,
Mitch Alsup

Gabriel Loh

unread,
Sep 28, 2006, 12:38:32 PM9/28/06
to

> Where do you think the point of diminishing returns might
> be?

I haven't seen the topic come up in the parallel threads, but one
question that's really interesting to think about (at least to me) is
how in the world are you going to deliver power to all of these cores?
BTW, did anyone see if Intel mentioned the total power consumption for
their 80-core demo system? The wattage for the 80-core processor has to
be astronomical, or the power budget per core has to be anemic, or only
3 out of 80 cores can be turned on at anytime.

just my two cents...

-Gabe

mike

unread,
Sep 28, 2006, 12:49:56 PM9/28/06
to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:_8mdnSlgO-FYKIbY...@comcast.com...

Do not stop with the OS kernal. Imagine an implementation of SQL
where a thread is spawned for every record in a table!

Mike Sicilian


Tommy Thorn

unread,
Sep 28, 2006, 1:26:15 PM9/28/06
to
Bill Todd wrote:
> Well, since IIRC the processing cores are running at a princely 1.91 MHz

Well, you recall incorrectly. It's 3 GHz, cf.
http://www.tomshardware.com/2006/09/27/idf_fall_2006/page2.html

Tommy

Joe Seigh

unread,
Sep 28, 2006, 1:31:52 PM9/28/06
to
Nick Maclaren wrote:
> In article <srKdnUtcqoTDaIbY...@comcast.com>,
> Joe Seigh <jsei...@xemaps.com> writes:
> |> Terje Mathisen wrote:
> |> >
> |> > You still need some what to handle async inter-core communication! I.e.
> |> > I believe that you really don't have any choice here, except to make
> |> > most of your cores interruptible.
> |>
> |> Async presumes processors are a scarce commodity and you want to have it
> |> do other work while it's waiting for something to be done. That goes
> |> away if you have unlimited numbers of processors.
>
> Not really. Let's assume that you do have such an infinite number of
> threads, and thread A wants to prod thread B at a time it is doing something
> else. That can't be done without some form of asynchronicity.

Why would B be doing something else if you have an infinite number of threads
on hand to do that something else? This is just a "what if" exercise.

Ok, let me pose this question then. What are the limits to how many cores
one can imagine using? Are there fundimental laws of logic here, or are we the
next generation of punch card mentality?

Felger Carbon

unread,
Sep 28, 2006, 1:59:49 PM9/28/06
to
"Jon Forrest" <for...@ce.berkeley.edu> wrote in message
news:efep6h$vbl$1...@agate.berkeley.edu...

> Today I read that we're going to get quad-core processors
> in 2007, and 80-core processors in 5 years. This has
> got me to wondering where the point of diminishing returns
> is for processor cores.

A very few power users - say, 3 to 5 in the world - will be able to use lots
and lots of cores. The vast majority of the public will not run more than
one task at a time, which at this time means only one core. In the distant
future, applications that are in common use long enough to be called "legacy
programs" may have been written to use several cores, so at that time even
the general public will need several cores. But this is a long, long way
off.

In between, some workstations and their users can probably use a few cores,
as soon as the software is rewritten for multicores.


Rick Jones

unread,
Sep 28, 2006, 2:18:27 PM9/28/06
to
Joe Seigh <jsei...@xemaps.com> wrote:
> Why would B be doing something else if you have an infinite number
> of threads on hand to do that something else? This is just a "what
> if" exercise.

> Ok, let me pose this question then. What are the limits to how many
> cores one can imagine using? Are there fundimental laws of logic
> here, or are we the next generation of punch card mentality?

Would the existing Internet be an approximation? It has what ammounts
to a virtually unlimited number of "cores" (systems).

rick jones
--
web2.0 n, the dot.com reunion tour...

Nick Maclaren

unread,
Sep 28, 2006, 2:23:22 PM9/28/06
to

In article <2KGdnWmuYvz-m4HY...@comcast.com>,

Joe Seigh <jsei...@xemaps.com> writes:
|> >
|> > Not really. Let's assume that you do have such an infinite number of
|> > threads, and thread A wants to prod thread B at a time it is doing something
|> > else. That can't be done without some form of asynchronicity.
|>
|> Why would B be doing something else if you have an infinite number of threads
|> on hand to do that something else? This is just a "what if" exercise.

Because, if you split the assignments to processors right down to
atomic (communication-free) units, you end up with a huge amount of
communication between processors. TANSTAAFL again.

|> Ok, let me pose this question then. What are the limits to how many cores
|> one can imagine using? Are there fundimental laws of logic here, or are we the
|> next generation of punch card mentality?

There are fundamental laws, but no particular limits. I can imagine using
millions of cores, possibly even thousands of millions for big tasks, but
the problem is communicating between them.


Regards,
Nick Maclaren.

Eugene Miya

unread,
Sep 28, 2006, 2:52:16 PM9/28/06
to
In article <efgr7e$6oa$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>|> PIMs.
>
>When are we going to see them, then?

We? "What do you mean 'we?' white man?" --Tonto
I've seen them. I'm under an NDA.

>Seriously, they have been talked about as imminent for 20 years, so
>either there is a major problem or the IT industry is suffering a
>collective failure of nerve. Or both.

You have to locate the knowledgeable in your country.

--

Joe Seigh

unread,
Sep 28, 2006, 2:55:59 PM9/28/06
to
Rick Jones wrote:

> Joe Seigh <jsei...@xemaps.com> wrote:
>
>>Ok, let me pose this question then. What are the limits to how many
>>cores one can imagine using? Are there fundimental laws of logic
>>here, or are we the next generation of punch card mentality?
>
>
> Would the existing Internet be an approximation? It has what ammounts
> to a virtually unlimited number of "cores" (systems).
>

Google probably. Except they and the internet don't have anything
like shared memory for communication.

Thomas Womack

unread,
Sep 28, 2006, 2:50:24 PM9/28/06
to
In article <efgu43$dv3$1...@news-int.gatech.edu>,

Gabriel Loh <my-las...@cc.gatech.edu> wrote:
>
>> Where do you think the point of diminishing returns might
>> be?
>
>I haven't seen the topic come up in the parallel threads, but one
>question that's really interesting to think about (at least to me) is
>how in the world are you going to deliver power to all of these cores?

>BTW, did anyone see if Intel mentioned the total power consumption for
>their 80-core demo system? The wattage for the 80-core processor has to
>be astronomical, or the power budget per core has to be anemic

1.5 watts is an extravagent power budget for something like an ARM
core, and those cores were less than four square millimetres and a
quarter taken up by router.

An ARM Cortex-A8 is 750MHz, three square millimetres in 65nm and
375mW; the ARM11 MPCore is 620MHz, 2.54 square millimetres in 90nm
(with 32kb of cache) and 300mW. I'm slightly surprised that nobody's
made a load-of-ARMs chip even as a proof of concept.

Tom

Thomas Womack

unread,
Sep 28, 2006, 2:46:21 PM9/28/06
to
In article <UJCdnbZAYsZoW4bY...@metrocastcablevision.com>,
Bill Todd <bill...@metrocast.net> wrote:
>Terje Mathisen wrote:

>> That 80-core Intel demo chip has a vertically mounted SRAM chip as well,
>> providing 20 MB (afair) directly to each code.

I suspect it has 20MB in total, 256kb per core; 1.6GB of SRAM on a
chip is not remotely feasible with current fabrication processes,
whilst 20MB on 300mm^2 is twice the density of Montecito, so just
about right for 65nm.

>Well, since IIRC the processing cores are running at a princely 1.91 MHz
>(allegedly not a typo) I'm not sure how truly impressive that demo's
>performance would be: perhaps better to wait for the real thing in
>around 5 years' time.

That was a different demo, with a stack of boards plugging into a
Socket 7 (Pentium) motherboard, running something capable of enough
x86 to run Windows XP on an FPGA; I believe it was the HDL code for
what Intel intend to use as the 'mini x86_64 core'.

Tom

Terje Mathisen

unread,
Sep 28, 2006, 3:36:38 PM9/28/06
to
Joe Seigh wrote:

> Terje Mathisen wrote:
>> You still need some what to handle async inter-core communication!
>> I.e. I believe that you really don't have any choice here, except to
>> make most of your cores interruptible.
>
> Async presumes processors are a scarce commodity and you want to have it
> do other work while it's waiting for something to be done. That goes
> away if you have unlimited numbers of processors.

Unlimited? Yeah, in that case you can do a lot of stuff in new ways.

The problem is that I can't see any easy way around is when you want to
do A which depends on input from both B and C, which might occur in any
order.

It seems like the absolute minimum is to have a wait_for_any(...). The
alternative is to run around in a tight loop polling both B and C to
check if they have any data available, something which could cost you a
lot of memory bandwidth.

Jon Forrest

unread,
Sep 28, 2006, 5:09:13 PM9/28/06
to Felger Carbon
Felger Carbon wrote:

> A very few power users - say, 3 to 5 in the world - will be able to use lots
> and lots of cores. The vast majority of the public will not run more than
> one task at a time, which at this time means only one core.

I don't think this is true, at least not for *nix systems running
X-Windows. Think about what happens when you type a character in
an X-term window, or in any other window that echoes keystrokes.
Both the X-term and the X-server have to run at the same time.
Of course, they don't have to run very long at the same time,
but it's a good example of how dual cores help.

If I remember correctly, in the early days of X, slow single
processor systems with limited memory resulted in noticeable
latencies for this reason.

Jon Forrest

Joe Seigh

unread,
Sep 28, 2006, 6:38:22 PM9/28/06
to
Terje Mathisen wrote:
> Joe Seigh wrote:
>
>> Terje Mathisen wrote:
>>
>>> You still need some what to handle async inter-core communication!
>>> I.e. I believe that you really don't have any choice here, except to
>>> make most of your cores interruptible.
>>
>>
>> Async presumes processors are a scarce commodity and you want to have it
>> do other work while it's waiting for something to be done. That goes
>> away if you have unlimited numbers of processors.
>
>
> Unlimited? Yeah, in that case you can do a lot of stuff in new ways.
>
> The problem is that I can't see any easy way around is when you want to
> do A which depends on input from both B and C, which might occur in any
> order.
>
> It seems like the absolute minimum is to have a wait_for_any(...). The
> alternative is to run around in a tight loop polling both B and C to
> check if they have any data available, something which could cost you a
> lot of memory bandwidth.
>
Maybe. There's things like bus snooping which doesn't contribute to
memory usage. It doesn't matter. The IPC mechanism might end up looking
totally different than anything we know today. The processors manufacturers
will have to solve it by the time they get up 100's of cores.

Bill Todd

unread,
Sep 28, 2006, 8:03:43 PM9/28/06
to

You really need to work on your reading comprehension: there is nothing
on the page that you cite above that remotely suggests that the
prototype runs at 3 GHz (the only mention of that clock rate is in
reference to a current P4's FP performance).

The reference that gives the 1.91 MHz figure is
http://www.theinquirer.net/default.aspx?article=34623

- bill

Bill Todd

unread,
Sep 28, 2006, 8:11:24 PM9/28/06
to
Joe Seigh wrote:
> Nick Maclaren wrote:
>> In article <srKdnUtcqoTDaIbY...@comcast.com>,
>> Joe Seigh <jsei...@xemaps.com> writes:
>> |> Terje Mathisen wrote:
>> |> > |> > You still need some what to handle async inter-core
>> communication! I.e. |> > I believe that you really don't have any
>> choice here, except to make |> > most of your cores interruptible.
>> |> |> Async presumes processors are a scarce commodity and you want to
>> have it
>> |> do other work while it's waiting for something to be done. That goes
>> |> away if you have unlimited numbers of processors.
>>
>> Not really. Let's assume that you do have such an infinite number of
>> threads, and thread A wants to prod thread B at a time it is doing
>> something
>> else. That can't be done without some form of asynchronicity.
>
> Why would B be doing something else if you have an infinite number of
> threads
> on hand to do that something else?

Because the 'something else' that Thread B is doing is the reason that
Thread A needs to communicate with it.

Duh.

It is of course possible to program all threads such that they
frequently poll some shared (and appropriately interlocked) portion of
RAM to see whether anyone has left a message there for them, but that's
equally possible with today's SMPs and doesn't seem to be the mechanism
of choice a lot of the time.

- bill

Chris Thomasson

unread,
Sep 28, 2006, 9:49:25 PM9/28/06
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:e7KdnXEkRJml04HY...@comcast.com...

Why not PDR on the hardware:

http://groups.google.com/group/comp.programming.threads/msg/6236a9029d80527a

I mentioned this idea of mine to Andy Glew:

http://groups.google.com/group/comp.arch/msg/2a0f4163f8e13f1e

I have not received any sort of response... Andy, are you there?

;)


Any thoughts' on my PDR w/ hardware assist design? My idea can scale to any
number of processors. Lock-free reader patterns can scale. Period.


Joe Seigh

unread,
Sep 28, 2006, 10:30:54 PM9/28/06
to

I don't know. I haven't event seen McKenney file any hardware patents in
that area and he would have been the likely one to do that kind of stuff.

The IPC would be more than just PDR. The whole memory model could change
and they go to something like Occam style message passing. Because I don't
think the current strongly coherent cache scheme will scale up.
Of course that PDR supports a more relaxed cache/memory model doesn't hurt
things.

Dennis M. O'Connor

unread,
Sep 29, 2006, 2:06:09 AM9/29/06
to

"Felger Carbon" <fns...@jps.net> wrote ..

> The vast majority of the public will not run more than
> one task at a time, which at this time means only one core.

What nonsense. Most people using modern OS's
are running multiple tasks from at the least the
moment the GUI pops up. And most people are
also running multiple apps simultaneously: solitaire
and a porn, er I mean web browser at a minimum. ;-)
--
Dennis M. O'Connor dm...@primenet.com


Nick Maclaren

unread,
Sep 29, 2006, 4:23:18 AM9/29/06
to

In article <451c19e0$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|> In article <efgr7e$6oa$1...@gemini.csx.cam.ac.uk>,
|> Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
|> >|> PIMs.
|> >
|> >When are we going to see them, then?
|>
|> We? "What do you mean 'we?' white man?" --Tonto
|> I've seen them. I'm under an NDA.

Well, actually, many people have. The ICL DAP (and, I believe, the BBN
Butterfly) could well be classified as prototypes. The issue is when
(and if!) they will be available openly enough and cheaply enough for
a wide range of people to experiment with. And 20+ years from being
the next great thing to mere NDA isn't exactly rapid progress ....

|> >Seriously, they have been talked about as imminent for 20 years, so
|> >either there is a major problem or the IT industry is suffering a
|> >collective failure of nerve. Or both.
|>
|> You have to locate the knowledgeable in your country.

Eh? Delivery is as delivery does. Damn the claims - let's see the
products.


Regards,
Nick Maclaren.

Nick Maclaren

unread,
Sep 29, 2006, 4:25:48 AM9/29/06
to

In article <451C39F9...@ce.berkeley.edu>,

Jon Forrest <for...@ce.berkeley.edu> writes:
|>
|> If I remember correctly, in the early days of X, slow single
|> processor systems with limited memory resulted in noticeable
|> latencies for this reason.

And, God help us, modern multi-core, multi-GB systems STILL show such
delays even when there is almost no background activity :-(

No matter how much power a desktop has, the bloatware specialists
are fully capable of running out of it.


Regards,
Nick Maclaren.

ken...@cix.compulink.co.uk

unread,
Sep 29, 2006, 7:15:08 AM9/29/06
to
In article <2KGdnWmuYvz-m4HY...@comcast.com>,
jsei...@xemaps.com (Joe Seigh) wrote:

> Ok, let me pose this question then. What are the limits to
> how many cores one can imagine using? Are there fundimental
> laws of logic here, or are we the next generation of punch card
> mentality?

Apart from any logical limit there are physical limits. First
Chip fabs are limited in the size of wafer they can handle,
Second there are packaging limits. All the cores are generating
heat you have to get rid off. Also interconnection becomes a
problem not just inside the package but from the package to it's
socket. It took the abandonment of the dual in line package to
give enough pins for modern processors. There are also limits on
how much you can shrink the fab limits without needing a quantum
physicist to design chips. Get things small enough and quantum
tunnelling will become a major factor.

I do not have enough information to make any predictions on what
effect those limits will have on design. However I can see that a
change in architecture might be appropriate. Possibly something
on the lines of the old vector processors where most of the cores
are just ALU with memory access being handled by a dedicated
core. Or come to that revert to the transputer design an put
multiple ones on the chip with a memory control unit to give
parallel access to main memory.

Ken Young

Joe Seigh

unread,
Sep 29, 2006, 8:27:52 AM9/29/06
to

I don't disagree with you. Those are all valid points. But you
are arguing the premise.

So maybe I'll withdraw the question at this point. From a software
perspective, there isn't much you can do to influence what hardware
vendors will do. There's no such thing as too many cores or not
enough cores. There is just whatever there is and you can either use
it or not.

I find the proclamations from Intel and such that software programmers,
as a group, need to learn how to parallelize programs better rather
amusing. We, as a group, don't need to do anything. It's Intel's problem.

Message has been deleted

Jon Forrest

unread,
Sep 29, 2006, 2:07:05 PM9/29/06
to
Andi Kleen wrote:

> Just because today's desktop software is mostly single threaded this doesn't mean
> that future software has to be.

Being multi-threaded is a start, but how many threads will be in
runnable state simultaneously is what I'm wondering about
(for general purpose computing).

Jon

Thomas Womack

unread,
Sep 29, 2006, 2:11:55 PM9/29/06
to
In article <7IKdnUrZObRC_4HY...@metrocastcablevision.com>,

Bill Todd <bill...@metrocast.net> wrote:
>Tommy Thorn wrote:
>> Bill Todd wrote:
>>> Well, since IIRC the processing cores are running at a princely 1.91 MHz
>>
>> Well, you recall incorrectly. It's 3 GHz, cf.
>> http://www.tomshardware.com/2006/09/27/idf_fall_2006/page2.html
>
>You really need to work on your reading comprehension: there is nothing
>on the page that you cite above that remotely suggests that the
>prototype runs at 3 GHz (the only mention of that clock rate is in
>reference to a current P4's FP performance).

Go to the source

http://www.intel.com/pressroom/kits/events/idffall_2006/pdf/IDF%2009-26-06%20Justin%20Rattner%20Keynote%20Transcript.pdf

includes the paragraph

'We just got the silicon back earlier this week of our Terascale
Research prototype. And as you can see in the accompanying diagram,
each one of the 80 cores on this die consists of a simple
processor. It has a simple instruction set -- not IA compatible; it's
just has a simple instruction set that lets us do simple computations
in floating point and basically push data across the on-die fabric
that connects the 80-cores on this die together. Now, in aggregate,
those 80 cores produce one teraflop of computing performance, so a
single one of these Terascale Research prototypes is a one
teraflop-class processor. It delivers an energy efficiency of 10
gigaflops per watt at its nominal operating frequency of 3.1
gigahertz. That's an order of magnitude better than anything available
today.'

and

'For each core on that die, there's a 256-kilobyte static RAM, and the
aggregate bandwidth between all of the cores and all of those SRAM
arrays is one trillion bytes per second, truly an astonishing amount
of memory bandwidth.'

>The reference that gives the 1.91 MHz figure is
>http://www.theinquirer.net/default.aspx?article=34623

That's talking about a completely different project; if nothing else,
it's explicitly described as IA-compatible (and is running WinXP),
whilst the terascale chip is explicitly described as not.

Tom

Bill Todd

unread,
Sep 29, 2006, 3:19:36 PM9/29/06
to
Thomas Womack wrote:
> In article <7IKdnUrZObRC_4HY...@metrocastcablevision.com>,
> Bill Todd <bill...@metrocast.net> wrote:
>> Tommy Thorn wrote:
>>> Bill Todd wrote:
>>>> Well, since IIRC the processing cores are running at a princely 1.91 MHz
>>> Well, you recall incorrectly. It's 3 GHz, cf.
>>> http://www.tomshardware.com/2006/09/27/idf_fall_2006/page2.html
>> You really need to work on your reading comprehension: there is nothing
>> on the page that you cite above that remotely suggests that the
>> prototype runs at 3 GHz (the only mention of that clock rate is in
>> reference to a current P4's FP performance).
>
> Go to the source

I don't have time to scrounge around looking for 'the source' for every
random comment I happen to encounter on the Internet: what I did was
look at the *reference* that was cited, and its content was exactly what
I reported it to be.

>
> http://www.intel.com/pressroom/kits/events/idffall_2006/pdf/IDF%2009-26-06%20Justin%20Rattner%20Keynote%20Transcript.pdf
>
> includes the paragraph
>
> 'We just got the silicon back earlier this week of our Terascale
> Research prototype. And as you can see in the accompanying diagram,
> each one of the 80 cores on this die consists of a simple
> processor. It has a simple instruction set -- not IA compatible; it's
> just has a simple instruction set that lets us do simple computations
> in floating point and basically push data across the on-die fabric
> that connects the 80-cores on this die together. Now, in aggregate,
> those 80 cores produce one teraflop of computing performance, so a
> single one of these Terascale Research prototypes is a one
> teraflop-class processor. It delivers an energy efficiency of 10
> gigaflops per watt at its nominal operating frequency of 3.1
> gigahertz. That's an order of magnitude better than anything available
> today.'

My limited experience with 'first silicon' of a product suggests that
the actual device is then likely running (if indeed it runs yet at all)
at a rather small fraction of 3.1 GHz (3.1 GHz being its 'nominal'
target, which anyone who remembers Itanic's early clock-rate targets
will understand sometimes bears rather little resemblance to reality).

>
> and
>
> 'For each core on that die, there's a 256-kilobyte static RAM, and the
> aggregate bandwidth between all of the cores and all of those SRAM
> arrays is one trillion bytes per second, truly an astonishing amount
> of memory bandwidth.'

An interesting comment, since the 256 KB of SRAM per core is quite
comparable to today's amount of per-core L2 cache, which has per-core
bandwidth comparable to that of each of those mini-cores.

I.e., sounds kind of uninspiring. One relevant observation is that,
while stacking the SRAM chip on top of the processor chip is
interesting, it only effectively doubles the total chip area (i.e., is
equivalent to about one process shrink). If one could stack multiple
layers of SRAM this might start to become more interesting (assuming
that the stack remained coolable and the additional layers didn't
increase access latency over-much).

>
>> The reference that gives the 1.91 MHz figure is
>> http://www.theinquirer.net/default.aspx?article=34623
>
> That's talking about a completely different project; if nothing else,
> it's explicitly described as IA-compatible (and is running WinXP),
> whilst the terascale chip is explicitly described as not.

And now that you've actually provided a source with such information in
it, that is clear - but it certainly wasn't earlier. In particular, the
Inq article describing the IA-compatible 'mini-core' effort does so
explicitly in the context of Intel's 'Tera-Scale' effort.


- bill

Del Cecchi

unread,
Sep 29, 2006, 4:28:55 PM9/29/06
to
and a music player, and a torrent, and.....

--
Del Cecchi
"This post is my own and doesn’t necessarily represent IBM’s positions,
strategies or opinions.”

Eugene Miya

unread,
Sep 29, 2006, 5:07:13 PM9/29/06
to
>|> >|> PIMs.
>|> >When are we going to see them, then?
>|> We? "What do you mean 'we?' white man?" --Tonto

In article <efil5m$r5j$1...@gemini.csx.cam.ac.uk>,


Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>Well, actually, many people have. The ICL DAP (and, I believe, the BBN
>Butterfly) could well be classified as prototypes. The issue is when
>(and if!) they will be available openly enough and cheaply enough for
>a wide range of people to experiment with. And 20+ years from being
>the next great thing to mere NDA isn't exactly rapid progress ....

I saved a DAP for the CHM, and I used the BBN and I think I have
succeeded to locating one surviving representative sitting out in a
field near Denver. No, the Butterfly, Monarch, and TC2000 could not be
classified as PIMs. Their 88Ks, etc. were much more heavy heavy weight
processors. Similarly, while the DAP's where bit serial, their number
were/are comparatively small and with a wider address space.

Jon sits in a unique position that he can go an talk to Dave who is on a
different floor. I have to ensure that despite whatever happens to one
of the real PIMs, that a representative machines gets preserved even if
20 years or more after the fact. They aren't general purpose (yet, if
ever), and they aren't going to run Fortran or other conventional
language just yet, but they are popular where they sit.


>|> >Seriously, they have been talked about as imminent for 20 years, so
>|> >either there is a major problem or the IT industry is suffering a
>|> >collective failure of nerve. Or both.
>|>
>|> You have to locate the knowledgeable in your country.
>
>Eh? Delivery is as delivery does. Damn the claims - let's see the
>products.

Go talk to your friends in that big, circular, round building NW of London
near that town of Ch.*m..... They surely must have asked for some.

--

Chris Thomasson

unread,
Sep 29, 2006, 5:58:52 PM9/29/06
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:us2dnfMMPJUnGYHY...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:e7KdnXEkRJml04HY...@comcast.com...

[...]


>> Any thoughts' on my PDR w/ hardware assist design? My idea can scale to
>> any number of processors. Lock-free reader patterns can scale. Period.
>
> I don't know. I haven't event seen McKenney file any hardware patents in
> that area and he would have been the likely one to do that kind of stuff.

No kidding; he already has tons of RCU patents... In one of his bibliography
pages he even has links to your initial RCU+SMR hybrid idea. I wonder when
we are going to see patents for it...


> The IPC would be more than just PDR. The whole memory model could change
> and they go to something like Occam style message passing.

Not good. I don't want to be forced to use message passing. Especially when
we can create highly efficient virtually zero-overhead message passing
paradigms' already:

http://groups.google.com/group/comp.programming.threads/msg/6c24995ab986d410

http://groups.google.com/group/comp.programming.threads/msg/301e9153bcecf97c
(this is a good one*)

http://appcore.home.comcast.net/


;)


I hope they don't implement a model that gets away from shared memory! The
current thinking seems to be that threading and shared memory in general is
just way to complicated for any programmer to even begin to grasp:


http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b192c5ffe9b47926

http://groups.google.com/group/comp.programming.threads/msg/c3a0416b829b5dc4


What a shame! If people actually start to listen to non-sense like this,
they will always be cutting themselves short... The argument that threading
and shared memory is too complex/fragile is utterly false!


http://groups.google.com/group/comp.lang.c++.moderated/msg/d07b79e9633f3e52

:)


> Because I don't
> think the current strongly coherent cache scheme will scale up.

I agree. Why do you think the current trend seems to involve strong cache? I
could just imagine what the cache coherence protocol will look like for HTM!
It will probably have to be a bit stronger that what they have now:


http://groups.google.com/group/comp.programming.threads/msg/bacc295093eeb1fd

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f6399b3b837b0a40

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/9c572b709248ae64


> Of course that PDR supports a more relaxed cache/memory model doesn't hurt
> things.

Yup. It is a major plus, IMHO... I would support partially hardware assisted
PDR over some hardware based message passing. Like I said before, we can
create our own forms of scaleable IPC right now, we don't need hardware for
that... Do we? Na...

:)


Chris Thomasson

unread,
Sep 29, 2006, 6:02:43 PM9/29/06
to
"Chris Thomasson" <cri...@comcast.net> wrote in message
news:lsqdnXA_RfTjCYDY...@comcast.com...

> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:us2dnfMMPJUnGYHY...@comcast.com...
>> Chris Thomasson wrote:
>>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>>> news:e7KdnXEkRJml04HY...@comcast.com...

[...]


This particular message passing scheme works very well. It out performs many
of the existing message passing designs; by wide margins... The simple trick
is to augment unbounded virtually zero-overhead single-produce/consumer
queuing with an implementation of Petersons Algorithm...

You can't really beat this setup...

Any thoughts?


rohit...@gmail.com

unread,
Sep 29, 2006, 11:10:52 PM9/29/06
to
I want to ask you guys, if it makes sense to take a slightly holistic
perspective to the question.

Will software vendors ALWAYS build applications that harness the
horse-power of a CPU?

Microsoft's latest OS (Vista) is quite bulky, and I asked myself if I
needed the extra bells-and-whistles in the OS. The answer is an
astounding yes.

The number of people that spend 8 or more hours on a computer has gone
up. Workers in banks and businesses use a computer as the "tool of
their trade".

My own personal example is a testimony to the CPU. I use google search
for EVERYTHING. Information on Rashes or allergies, quotations,
technical articles, news...

On the typical week-night, I am using my 1.8ghz laptop to do all of the
following simultaneously:

- Reading news articles
- Working on a remote unix host using VNC
- Playing a video on youtube in a minimized window (I listen to the
video if its a talk show)
- logged on google talk and yahoo messenger
- Responding to work email on Microsoft Outlook

I probably have 500 threads- 600 threads on my laptop. I think we are
still far away from the utopia of computing. There's plenty of CPU
hungry applications that havent been born yet.

-Rohit

Nick Maclaren

unread,
Sep 30, 2006, 5:24:13 AM9/30/06
to

In article <tFD*6H...@news.chiark.greenend.org.uk>,
Thomas Womack <two...@chiark.greenend.org.uk> writes:
|>
|> http://www.intel.com/pressroom/kits/events/idffall_2006/pdf/IDF%2009-26-06%20Justin%20Rattner%20Keynote%20Transcript.pdf

Thanks. Interesting. I can't say that I am convinced, because the
success will depend on whether people can make use of that power,
and Intel didn't mention the communication bandwidth.

However, I noted one amusing comment:

And to provide the adequate memory bandwidth for a teraflop of
computing power, we developed what we think is a really novel
solution. The nove solution involves stacking a memory chip
directly under the processor chip.

A really novel solution? Aw, gee.


Regards,
Nick Maclaren.

Nick Maclaren

unread,
Sep 30, 2006, 5:30:11 AM9/30/06
to

In article <1159585852....@c28g2000cwb.googlegroups.com>,

"rohit...@gmail.com" <rohit...@gmail.com> writes:
|>
|> On the typical week-night, I am using my 1.8ghz laptop to do all of the
|> following simultaneously:
|>
|> - Reading news articles
|> - Working on a remote unix host using VNC
|> - Playing a video on youtube in a minimized window (I listen to the
|> video if its a talk show)
|> - logged on google talk and yahoo messenger
|> - Responding to work email on Microsoft Outlook
|>
|> I probably have 500 threads- 600 threads on my laptop. I think we are
|> still far away from the utopia of computing. There's plenty of CPU
|> hungry applications that havent been born yet.

You are missing the point. Of those threads, perhaps all but 2-5 are
waiting on an event (usually a response from some other thread). As
several of us have posted before, there are very good arguments for
a fairly large cache of contexts, so that context switching would be
very fast, but very little for more than about 4 cores that would
execute threads in parallel.

The current exceptions are (a) servers and (b) HPC. It is POSSIBLE
that desktop applications with be parallelised in the near future,
but that has been said for 20 years.


Regards,
Nick Maclaren.

Nick Maclaren

unread,
Sep 30, 2006, 5:44:19 AM9/30/06
to

In article <451d8b01@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|>
|> Go talk to your friends in that big, circular, round building NW of London
|> near that town of Ch.*m..... They surely must have asked for some.

Actually, when we want to find out what they are up to, we have to
wait for leaks from Congress! They aren't our friends - they are
your agents, under contract from 'our' government.

Democracy? That is the political system currently operational in
Afghanistan and Iraq, isn't it?


Regards,
Nick Maclaren.

Niels Jørgen Kruse

unread,
Sep 30, 2006, 12:31:26 PM9/30/06
to
Felger Carbon <fns...@jps.net> wrote:

> A very few power users - say, 3 to 5 in the world - will be able to use lots

> and lots of cores. The vast majority of the public will not run more than


> one task at a time, which at this time means only one core.

Rip a CD to AAC or MP3 in iTunes and you use more than one core.

--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

rohit...@gmail.com

unread,
Sep 30, 2006, 4:07:27 PM9/30/06
to
> You are missing the point. Of those threads, perhaps all but 2-5 are
> waiting on an event (usually a response from some other thread). As
> several of us have posted before, there are very good arguments for
> a fairly large cache of contexts, so that context switching would be
> very fast, but very little for more than about 4 cores that would
> execute threads in parallel.

I see your point, but I see that you missed the fact that we are far
away from a utopian computing environment.

Here's what I'd like:

- A secure website that tells me the status of every electrical
gadget in my home (a webserver that is polling the status of each of
these devices)
- An alarm system that would give me a live video feed when there is
an intrusion into my house (either on my cell phone or on a work
computer)
- A global calender that keeps track of my kids school year (I dont
have kids yet, but you get the idea). I'd like to see details on every
homework (submission date,
- High end video environment (DVR + DVD + On-demand + Regular TV)
all in HD
- Ability to serve videos to any display in the home

- Communication with my car, and automatically managing my calender
for when it needs to be serviced
- Updated GPS maps
- Sync all my media to my cars computer automatically
==========================================

The list is endless. I can see how every one of these applications is
ATLEAST 1 thread, if not more. These "daemons" can run on one computer
or many. But they will all have to communicate, manage data, and be
secure (think encrypted communication).

My point is, as human beings in today's society, we are forced to do
many mundane chores which can be offloaded to reasonable smart software
programs. These applications have been built, yet. They will be.

Who would've envisioned a day when you can sit on your home computer
and look at all your bank transactions for the last 5-years.

I am all ears on what you have to say, Nick :-)

-Rohit

already...@yahoo.com

unread,
Sep 30, 2006, 5:17:38 PM9/30/06
to

And how big percentage of a single 2-4GHz core solitair, web browser,
music player and a torrent occupy altogether?

Niels Jørgen Kruse

unread,
Sep 30, 2006, 5:44:01 PM9/30/06
to
rohit...@gmail.com <rohit...@gmail.com> wrote:

> - An alarm system that would give me a live video feed when there is
> an intrusion into my house (either on my cell phone or on a work
> computer)

You forgot the Tazer coaxial with the snoop cam. :-)

Tarjei T. Jensen

unread,
Sep 30, 2006, 6:57:16 PM9/30/06
to
Niels Jørgen Kruse wrote:
> You forgot the Tazer coaxial with the snoop cam. :-)


You would want a real gun. Otherwise you may be arrested for cruelty to
animals :-)


greetings,

Chris Thomasson

unread,
Sep 30, 2006, 7:55:32 PM9/30/06