How Many Processor Cores Are Enough?

Jon Forrest

unread,

Sep 27, 2006, 5:10:03 PM9/27/06

to

Today I read that we're going to get quad-core processors
in 2007, and 80-core processors in 5 years. This has
got me to wondering where the point of diminishing returns
is for processor cores.

We've all seen those bench marks in which the same test is
run on a system with 128MB of RAM, then 256MB, then 512MB, ...
What usually happens is the the first increase results in
a big improvement, the next increase in a smaller improvement,
and so on. At some point, the size of the improvement doesn't
justify its cost.

I suspect we'd see the same kind of results if we could
increase the number of processor cores. The big difference
here is that additional processor cores only help if there's
work for them to do. Modern operating systems use multiple
processes and threads but I wonder how many processes/threads
are runnable at any one time in a general purpose computer
running general purpose jobs.

Where do you think the point of diminishing returns might
be?

Jon Forrest

Casper H.S. Dik

unread,

Sep 27, 2006, 6:04:17 PM9/27/06

to

Jon Forrest <for...@ce.berkeley.edu> writes:

>Today I read that we're going to get quad-core processors
>in 2007, and 80-core processors in 5 years. This has
>got me to wondering where the point of diminishing returns
>is for processor cores.

Sun has been shipping 8 core CPUs since, I think, late last year.

It all depends on the bandwidth. (Which means it ain't a pretty
picture for Intel as long as they keep the FSB)

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Joe Seigh

unread,

Sep 27, 2006, 6:23:16 PM9/27/06

to

Jon Forrest wrote:
> Today I read that we're going to get quad-core processors
> in 2007, and 80-core processors in 5 years. This has
> got me to wondering where the point of diminishing returns
> is for processor cores.
>
> We've all seen those bench marks in which the same test is
> run on a system with 128MB of RAM, then 256MB, then 512MB, ...
> What usually happens is the the first increase results in
> a big improvement, the next increase in a smaller improvement,
> and so on. At some point, the size of the improvement doesn't
> justify its cost.

That's called scalability.

>
> I suspect we'd see the same kind of results if we could
> increase the number of processor cores. The big difference
> here is that additional processor cores only help if there's
> work for them to do. Modern operating systems use multiple
> processes and threads but I wonder how many processes/threads
> are runnable at any one time in a general purpose computer
> running general purpose jobs.
>
> Where do you think the point of diminishing returns might
> be?
>

It all depends. The industry has a real chicken and egg problem.
Applications won't start adapting to the new architecture for a
while yet. And what the new architecture will actually end up
being isn't real clear. It could be shared memory, NUMA or otherwise.
It could be based on message passing. Or something else entirely.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Message has been deleted

Dennis M. O'Connor

unread,

Sep 27, 2006, 7:01:50 PM9/27/06

to

"Jon Forrest" <for...@ce.berkeley.edu> wrote ...

> Today I read that we're going to get quad-core processors
> in 2007, and 80-core processors in 5 years. This has
> got me to wondering where the point of diminishing returns
> is for processor cores.

It depends on your application. Examples: A web server
like Apache can effectively use a lot more cores than
an old DOS game. But future games on the PC may be
able to effectively exploit every core you can provide them.
--
Dennis M. O'Connor dm...@primenet.com

Jon Forrest

unread,

Sep 27, 2006, 7:48:51 PM9/27/06

to Dennis M. O'Connor

No doubt, but I'm talking about general purpose computing.

I bet there is diminishing return curve for web servers
too. The curve would look different at each site, but
it would exist.

Jon

Rick Jones

unread,

Sep 27, 2006, 8:24:27 PM9/27/06

to

Casper H.S. Dik <Caspe...@sun.com> wrote:
> It all depends on the bandwidth. (Which means it ain't a pretty
> picture for Intel as long as they keep the FSB)

Is it really just a question of bandwidth? I would have thought that
application (I'm assuming the system vendors deal with the OSes)
behaviour would be equally important.

How different is having an FSB for a single socket with N cores on the
chip than having a "link" for a single-socket with N cores on the
chip?

I would think that as the cores per chip increase, the issues that the
folks selling large SMP's deal with will become known to the
single-socket crowd.

rick jones
--
portable adj, code that compiles under more than one compiler
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

Chris Thomasson

unread,

Sep 27, 2006, 8:37:11 PM9/27/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:tuCdnVxRCuGpZIfY...@comcast.com...

> Jon Forrest wrote:
>> Today I read that we're going to get quad-core processors
>> in 2007, and 80-core processors in 5 years. This has
>> got me to wondering where the point of diminishing returns
>> is for processor cores.
>>
>> We've all seen those bench marks in which the same test is
>> run on a system with 128MB of RAM, then 256MB, then 512MB, ...
>> What usually happens is the the first increase results in
>> a big improvement, the next increase in a smaller improvement,
>> and so on. At some point, the size of the improvement doesn't
>> justify its cost.
>
> That's called scalability.

Indeed.

Applications should be designed to "automagically" adapt to a situation that
could involve more and more processors; we already know how to do it. I
would assume that a lot of people that frequent this group know how to do it
too.

Does the "average" programmer know how to do it? Damn!

I have a solution with a simple and fairly powerful API that controls my
virtually zero-overhead memory allocator and PDR synchronization logic; its
as easy to use as POSIX Threads... It can possibly help so-called "normal"
programmers address this "so-called scalability and throughput" crises the
major chip vendors are currently experiencing... Oh well...

Humm...

>> I suspect we'd see the same kind of results if we could
>> increase the number of processor cores. The big difference
>> here is that additional processor cores only help if there's
>> work for them to do. Modern operating systems use multiple
>> processes and threads but I wonder how many processes/threads
>> are runnable at any one time in a general purpose computer
>> running general purpose jobs.
>>
>> Where do you think the point of diminishing returns might
>> be?
>>
>
> It all depends. The industry has a real chicken and egg problem.
> Applications won't start adapting to the new architecture for a
> while yet.

You think you will live to see the day when major database vendors use PDR
and lock-free reader patterns for parallel queries; screw all of that
per-table, per-row, whatever locking crap for readers... Heck, writers can
even be lock-free, hurray for the lock-free offset trick!

?

> And what the new architecture will actually end up
> being isn't real clear. It could be shared memory, NUMA or otherwise.
> It could be based on message passing. Or something else entirely.

I would be happy with NUMA. As you know, the PDR solution is highly
adaptable. It basically can give you performance across different types of
cache coherency protocols. Therefore, I hope the model of the future has
many cores, very-deep pipelines, and fairly weak cache coherency
system(s)...

Rick Jones

unread,

Sep 27, 2006, 8:26:18 PM9/27/06

to

Jon Forrest <for...@ce.berkeley.edu> wrote:
> I bet there is diminishing return curve for web servers too. The
> curve would look different at each site, but it would exist.

Yes - particularly if the web server was constrained to a single NIC -
albeit some NICs out there now can spread their interrupt load across
cores.

Increasing core counts won't do all that much for individual TCP
connections - getting very much parallelism in a single connection
isn't really possible.

rick jones
--
a wide gulf separates "what if" from "if only"

Terje Mathisen

unread,

Sep 28, 2006, 12:34:51 AM9/28/06

to

Casper H.S. Dik wrote:
> Jon Forrest <for...@ce.berkeley.edu> writes:
>
>> Today I read that we're going to get quad-core processors
>> in 2007, and 80-core processors in 5 years. This has
>> got me to wondering where the point of diminishing returns
>> is for processor cores.
>
> Sun has been shipping 8 core CPUs since, I think, late last year.
>
> It all depends on the bandwidth. (Which means it ain't a pretty
> picture for Intel as long as they keep the FSB)

That 80-core Intel demo chip has a vertically mounted SRAM chip as well,
providing 20 MB (afair) directly to each code.

For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold
everything in a nicely distributed manner, you're going to see _very_
impressive performance indeed, particularly since they also have a
(presumably very fast) mesh network connecting the individual cores.

Intel's press releases talks about aggregate bandwidth in the TB/s range
for this 80-core chip, from which we can calculate that each core must
have at least 12.5 GB/s.

Since the SRAM is directly attached, a 256-bit interface seems very
reasonable, in which case the SRAM can idle along at about 400 Mhz.

Alternatively, a somewhat narrower interface running at higher frequency
would give the same result.

Seems reasonable to me!

And yes, I'd like to have one and see what I could do with it. :-)

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Dennis M. O'Connor

unread,

Sep 28, 2006, 1:50:12 AM9/28/06

to

"Jon Forrest" <for...@ce.berkeley.edu> wrote i...

> Dennis M. O'Connor wrote:
>> "Jon Forrest" <for...@ce.berkeley.edu> wrote ...
>>> Today I read that we're going to get quad-core processors
>>> in 2007, and 80-core processors in 5 years. This has
>>> got me to wondering where the point of diminishing returns
>>> is for processor cores.
>>
>> It depends on your application. Examples: A web server
>> like Apache can effectively use a lot more cores than
>> an old DOS game. But future games on the PC may be
>> able to effectively exploit every core you can provide them.
>
> No doubt, but I'm talking about general purpose computing.

What's this "general purpose computing" you speak of ?
Is it Spider Solitaire ? MS Office ? Gnu CC ? Web browsing ?
Or is it protein folding ?

Nick Maclaren

unread,

Sep 28, 2006, 5:40:48 AM9/28/06

to

In article <%CESg.422$fP5...@news.cpqcorp.net>,

Rick Jones <rick....@hp.com> writes:
|> Casper H.S. Dik <Caspe...@sun.com> wrote:
|> > It all depends on the bandwidth. (Which means it ain't a pretty
|> > picture for Intel as long as they keep the FSB)
|>
|> Is it really just a question of bandwidth? I would have thought that
|> application (I'm assuming the system vendors deal with the OSes)
|> behaviour would be equally important.

Yes and no. The bandwidth is definitely the leading bottleneck.

|> How different is having an FSB for a single socket with N cores on the
|> chip than having a "link" for a single-socket with N cores on the
|> chip?

Not at all.

|> I would think that as the cores per chip increase, the issues that the
|> folks selling large SMP's deal with will become known to the
|> single-socket crowd.

Yup. They are already hitting them.

But, to answer the question:

For most workstations, the answer is probably 4 (at least in the near
future - see later), because few workloads have more than a few genuinely
active threads. Very fancy graphics is another matter.

For servers and embarassingly parallel HPC, the answer is until you have
saturated the bandwidth (modified by the demands of the applications).
Multiple cores is merely a cheaper form of multiple sockets.

For genuinely parallel, high communication, applications, the answer is
how parallelisable is your application? And the answer to THAT (outside
HPC) is generally "2 way, when I am lucky".

The last is not a law of nature, but isn't going to change any time soon,
as it is caused by the programming paradigms that people use.

Regards,
Nick Maclaren.

Joe Seigh

unread,

Sep 28, 2006, 7:47:55 AM9/28/06

to

Jon Forrest wrote:
> Today I read that we're going to get quad-core processors
> in 2007, and 80-core processors in 5 years. This has
> got me to wondering where the point of diminishing returns
> is for processor cores.

...

> Where do you think the point of diminishing returns might
> be?
>

Rethinking this, the question should be what would you do
with an unlimited number of processors?

For one thing, the operating system would change. Interrupt
handlers for asynchronous interrupts would go away. You'd have
dedicated, possibly special purpose, processors to handle devices.
They're already talking about this with "coprocessors".

The scheduler would go away. No need for it when every thread has
it's own dedicated hardware thread. This would affect realtime
programming. No need to play games with thread priorities and
any of the timeouts that could be caused by not being scheduled
quickly enough, i.e. no dispatch latency.

Polling and IPC mechanisms would have to be worked on a bit. E.g.
make things like MONITOR/MWAIT efficient. Possibly some new
instructions. The hw architects would have to be a little more
proactive here. The latest proposals from Intel seem to be a little
lacking here. What's with architectual extensions? It seems to be
a "ready to fight the last war" kind of thing. Who cares if you can
run a 20 year old application real fast.

Distributed algorithms would become more important. How do you
coordinate threads and how do you do it efficiently from a hw
point of view.

Etc... (more stuff when I think of it)

Bill Todd

unread,

Sep 28, 2006, 8:58:28 AM9/28/06

to

Terje Mathisen wrote:
> Casper H.S. Dik wrote:
>> Jon Forrest <for...@ce.berkeley.edu> writes:
>>
>>> Today I read that we're going to get quad-core processors
>>> in 2007, and 80-core processors in 5 years. This has
>>> got me to wondering where the point of diminishing returns
>>> is for processor cores.
>>
>> Sun has been shipping 8 core CPUs since, I think, late last year.
>>
>> It all depends on the bandwidth. (Which means it ain't a pretty
>> picture for Intel as long as they keep the FSB)
>
> That 80-core Intel demo chip has a vertically mounted SRAM chip as well,
> providing 20 MB (afair) directly to each code.
>
> For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold
> everything in a nicely distributed manner, you're going to see _very_
> impressive performance indeed, particularly since they also have a
> (presumably very fast) mesh network connecting the individual cores.

Well, since IIRC the processing cores are running at a princely 1.91 MHz
(allegedly not a typo) I'm not sure how truly impressive that demo's
performance would be: perhaps better to wait for the real thing in
around 5 years' time.

As for the SRAM, I rather suspect that the 20 MB is the *total* figure
shared among the 80 cores: if Intel could really get 1.6 GB of SRAM on
anything like a single chip, we'd be seeing a lot more cache in Itanics.

- bill

Terje Mathisen

unread,

Sep 28, 2006, 11:25:32 AM9/28/06

to

Joe Seigh wrote:
> Jon Forrest wrote:
>> Today I read that we're going to get quad-core processors
>> in 2007, and 80-core processors in 5 years. This has
>> got me to wondering where the point of diminishing returns
>> is for processor cores.
> ...
>> Where do you think the point of diminishing returns might
>> be?
>>
>
> Rethinking this, the question should be what would you do
> with an unlimited number of processors?
>
> For one thing, the operating system would change. Interrupt
> handlers for asynchronous interrupts would go away. You'd have
> dedicated, possibly special purpose, processors to handle devices.
> They're already talking about this with "coprocessors".

You still need some what to handle async inter-core communication! I.e.
I believe that you really don't have any choice here, except to make
most of your cores interruptible.

This leads back to the old thread about having multiple cores which are
compatible but not symmetrical: I.e. some of them are optimized for long
timeslots doing stream/HPC/serious number crunching, using a
microachitecture like the P4 which really doesn't like to be interrupted.

Other cores could be much more Pentium-like: Possibly superscalar, but
in-order, with very low branch miss penalty, and optimized for
twisty/branchy/hard to predict code.

As long as these cpus are compatible, an OS which knows that some
processes prefer to run on a given kind of cpu could do quite well, and
the programming task becomes _much_ easier than for a disjoint set as
used in the PPC/Cell combination.

> The scheduler would go away. No need for it when every thread has
> it's own dedicated hardware thread. This would affect realtime
> programming. No need to play games with thread priorities and
> any of the timeouts that could be caused by not being scheduled
> quickly enough, i.e. no dispatch latency.

I believe you'd still need it, but not for anything that's timecritical.
I.e. after sufficient time with tens of cores/hundreds of threads
available, programming patterns to use/abuse them all will turn up, and
you'll run out of resources anyway. :-(

Terje Mathisen

unread,

Sep 28, 2006, 11:29:34 AM9/28/06

to

Bill Todd wrote:

> Terje Mathisen wrote:
>> That 80-core Intel demo chip has a vertically mounted SRAM chip as
>> well, providing 20 MB (afair) directly to each code.
>>
>> For any problem where those 20 MB * 80 = 1.6 GB of SRAM can hold
>> everything in a nicely distributed manner, you're going to see _very_
>> impressive performance indeed, particularly since they also have a
>> (presumably very fast) mesh network connecting the individual cores.
>
> Well, since IIRC the processing cores are running at a princely 1.91 MHz
> (allegedly not a typo) I'm not sure how truly impressive that demo's
> performance would be: perhaps better to wait for the real thing in
> around 5 years' time.

I don't think we have other option than to wait, no matter what the
current speed is.

However, if they really run at 2 MHz, then the claimed TB/s total
bandwidth seems totally bogus, even if they also include 4 sets of
cpu-cpu mesh links.

>
> As for the SRAM, I rather suspect that the 20 MB is the *total* figure
> shared among the 80 cores: if Intel could really get 1.6 GB of SRAM on
> anything like a single chip, we'd be seeing a lot more cache in Itanics.

Oops, you're almost certainly right. :-(

Oh, well. It was fun as long as the fantasy lasted. :-)

Eugene Miya

unread,

Sep 28, 2006, 11:56:51 AM9/28/06

to

In article <efep6h$vbl$1...@agate.berkeley.edu>,
Jon Forrest <for...@ce.berkeley.edu> wrote:
>quad-core
>80-core
...

>Where do you think the point of diminishing returns might be?

Ask Dave Patterson on the 5th floor of Soda about PIMs.

--

Nick Maclaren

unread,

Sep 28, 2006, 11:52:36 AM9/28/06

to

In article <e46tu3-...@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.m...@hda.hydro.com> writes:

|> Joe Seigh wrote:
|> >
|> > For one thing, the operating system would change. Interrupt
|> > handlers for asynchronous interrupts would go away. You'd have
|> > dedicated, possibly special purpose, processors to handle devices.
|> > They're already talking about this with "coprocessors".
|>
|> You still need some what to handle async inter-core communication! I.e.
|> I believe that you really don't have any choice here, except to make
|> most of your cores interruptible.

Nope. I posted a design that did away with that, at the hardware level,
and there have been lots of systems that proved you could do without
them at the software level. You wouldn't start from existing software
designs and interfaces, of course :-)

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Sep 28, 2006, 11:54:22 AM9/28/06

to

When are we going to see them, then?

Seriously, they have been talked about as imminent for 20 years, so
either there is a major problem or the IT industry is suffering a
collective failure of nerve. Or both.

Regards,
Nick Maclaren.

Joe Seigh

unread,

Sep 28, 2006, 12:19:16 PM9/28/06

to

Terje Mathisen wrote:

> Joe Seigh wrote:
>> Rethinking this, the question should be what would you do
>> with an unlimited number of processors?
>>
>> For one thing, the operating system would change. Interrupt
>> handlers for asynchronous interrupts would go away. You'd have
>> dedicated, possibly special purpose, processors to handle devices.
>> They're already talking about this with "coprocessors".
>
>
> You still need some what to handle async inter-core communication! I.e.
> I believe that you really don't have any choice here, except to make
> most of your cores interruptible.

Async presumes processors are a scarce commodity and you want to have it
do other work while it's waiting for something to be done. That goes
away if you have unlimited numbers of processors.

>
>> The scheduler would go away. No need for it when every thread has
>> it's own dedicated hardware thread. This would affect realtime
>> programming. No need to play games with thread priorities and
>> any of the timeouts that could be caused by not being scheduled
>> quickly enough, i.e. no dispatch latency.
>
>
> I believe you'd still need it, but not for anything that's timecritical.
> I.e. after sufficient time with tens of cores/hundreds of threads
> available, programming patterns to use/abuse them all will turn up, and
> you'll run out of resources anyway. :-(

> \

The OP posed the question of whether you can have too many cores. You're
saying there will never be enough? :)

Nick Maclaren

unread,

Sep 28, 2006, 12:25:32 PM9/28/06

to

In article <srKdnUtcqoTDaIbY...@comcast.com>,
Joe Seigh <jsei...@xemaps.com> writes:

|> Terje Mathisen wrote:
|> >
|> > You still need some what to handle async inter-core communication! I.e.
|> > I believe that you really don't have any choice here, except to make
|> > most of your cores interruptible.
|>
|> Async presumes processors are a scarce commodity and you want to have it
|> do other work while it's waiting for something to be done. That goes
|> away if you have unlimited numbers of processors.

Not really. Let's assume that you do have such an infinite number of
threads, and thread A wants to prod thread B at a time it is doing something
else. That can't be done without some form of asynchronicity.

As you may remember, my design used FIFOs for inter-thread communication.
That avoids the problem of interrupts, but is no less asynchronous.

TANSTAAFL.

Regards,
Nick Maclaren.

Mitch...@aol.com

unread,

Sep 28, 2006, 12:31:01 PM9/28/06

to

Nick Maclaren wrote:
> In article <%CESg.422$fP5...@news.cpqcorp.net>,
> Rick Jones <rick....@hp.com> writes:
> |> Casper H.S. Dik <Caspe...@sun.com> wrote:
> |> > It all depends on the bandwidth. (Which means it ain't a pretty
> |> > picture for Intel as long as they keep the FSB)
> |>
> |> Is it really just a question of bandwidth? I would have thought that
> |> application (I'm assuming the system vendors deal with the OSes)
> |> behaviour would be equally important.
>
> Yes and no. The bandwidth is definitely the leading bottleneck.

Er, no:: there are more problems that are latency limited than are
bandwidth limited.

However, the FSB is a similar impediment in both cases.

The whole reason that threading is taking off is the inability of
(implementable) computer (micro)architectures to reduce memory latency.
Threading is a means to tollerate that latency.

<snip>
> Regards,
> Nick Maclaren.

Regards,
Mitch Alsup

Gabriel Loh

unread,

Sep 28, 2006, 12:38:32 PM9/28/06

to

> Where do you think the point of diminishing returns might
> be?

I haven't seen the topic come up in the parallel threads, but one
question that's really interesting to think about (at least to me) is
how in the world are you going to deliver power to all of these cores?
BTW, did anyone see if Intel mentioned the total power consumption for
their 80-core demo system? The wattage for the 80-core processor has to
be astronomical, or the power budget per core has to be anemic, or only
3 out of 80 cores can be turned on at anytime.

just my two cents...

-Gabe

mike

unread,

Sep 28, 2006, 12:49:56 PM9/28/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:_8mdnSlgO-FYKIbY...@comcast.com...

Do not stop with the OS kernal. Imagine an implementation of SQL
where a thread is spawned for every record in a table!

Mike Sicilian

Tommy Thorn

unread,

Sep 28, 2006, 1:26:15 PM9/28/06

to

Bill Todd wrote:
> Well, since IIRC the processing cores are running at a princely 1.91 MHz

Well, you recall incorrectly. It's 3 GHz, cf.
http://www.tomshardware.com/2006/09/27/idf_fall_2006/page2.html

Tommy

Joe Seigh

unread,

Sep 28, 2006, 1:31:52 PM9/28/06

to

Nick Maclaren wrote:
> In article <srKdnUtcqoTDaIbY...@comcast.com>,
> Joe Seigh <jsei...@xemaps.com> writes:
> |> Terje Mathisen wrote:
> |> >
> |> > You still need some what to handle async inter-core communication! I.e.
> |> > I believe that you really don't have any choice here, except to make
> |> > most of your cores interruptible.
> |>
> |> Async presumes processors are a scarce commodity and you want to have it
> |> do other work while it's waiting for something to be done. That goes
> |> away if you have unlimited numbers of processors.
>
> Not really. Let's assume that you do have such an infinite number of
> threads, and thread A wants to prod thread B at a time it is doing something
> else. That can't be done without some form of asynchronicity.

Why would B be doing something else if you have an infinite number of threads
on hand to do that something else? This is just a "what if" exercise.

Ok, let me pose this question then. What are the limits to how many cores
one can imagine using? Are there fundimental laws of logic here, or are we the
next generation of punch card mentality?

Felger Carbon

unread,

Sep 28, 2006, 1:59:49 PM9/28/06

to

"Jon Forrest" <for...@ce.berkeley.edu> wrote in message
news:efep6h$vbl$1...@agate.berkeley.edu...

> Today I read that we're going to get quad-core processors
> in 2007, and 80-core processors in 5 years. This has
> got me to wondering where the point of diminishing returns
> is for processor cores.

A very few power users - say, 3 to 5 in the world - will be able to use lots
and lots of cores. The vast majority of the public will not run more than
one task at a time, which at this time means only one core. In the distant
future, applications that are in common use long enough to be called "legacy
programs" may have been written to use several cores, so at that time even
the general public will need several cores. But this is a long, long way
off.

In between, some workstations and their users can probably use a few cores,
as soon as the software is rewritten for multicores.

Rick Jones

unread,

Sep 28, 2006, 2:18:27 PM9/28/06

to

Joe Seigh <jsei...@xemaps.com> wrote:
> Why would B be doing something else if you have an infinite number
> of threads on hand to do that something else? This is just a "what
> if" exercise.

> Ok, let me pose this question then. What are the limits to how many
> cores one can imagine using? Are there fundimental laws of logic
> here, or are we the next generation of punch card mentality?

Would the existing Internet be an approximation? It has what ammounts
to a virtually unlimited number of "cores" (systems).

rick jones
--
web2.0 n, the dot.com reunion tour...

Nick Maclaren

unread,

Sep 28, 2006, 2:23:22 PM9/28/06

to

In article <2KGdnWmuYvz-m4HY...@comcast.com>,

Joe Seigh <jsei...@xemaps.com> writes:
|> >
|> > Not really. Let's assume that you do have such an infinite number of
|> > threads, and thread A wants to prod thread B at a time it is doing something
|> > else. That can't be done without some form of asynchronicity.
|>
|> Why would B be doing something else if you have an infinite number of threads
|> on hand to do that something else? This is just a "what if" exercise.

Because, if you split the assignments to processors right down to
atomic (communication-free) units, you end up with a huge amount of
communication between processors. TANSTAAFL again.

|> Ok, let me pose this question then. What are the limits to how many cores
|> one can imagine using? Are there fundimental laws of logic here, or are we the
|> next generation of punch card mentality?

There are fundamental laws, but no particular limits. I can imagine using
millions of cores, possibly even thousands of millions for big tasks, but
the problem is communicating between them.

Regards,
Nick Maclaren.

Eugene Miya

unread,

Sep 28, 2006, 2:52:16 PM9/28/06

to

In article <efgr7e$6oa$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>|> PIMs.
>
>When are we going to see them, then?

We? "What do you mean 'we?' white man?" --Tonto
I've seen them. I'm under an NDA.

>Seriously, they have been talked about as imminent for 20 years, so
>either there is a major problem or the IT industry is suffering a
>collective failure of nerve. Or both.

You have to locate the knowledgeable in your country.

--

Joe Seigh

unread,

Sep 28, 2006, 2:55:59 PM9/28/06

to

Rick Jones wrote:

> Joe Seigh <jsei...@xemaps.com> wrote:
>
>>Ok, let me pose this question then. What are the limits to how many
>>cores one can imagine using? Are there fundimental laws of logic
>>here, or are we the next generation of punch card mentality?
>
>
> Would the existing Internet be an approximation? It has what ammounts
> to a virtually unlimited number of "cores" (systems).
>

Google probably. Except they and the internet don't have anything
like shared memory for communication.

Thomas Womack

unread,

Sep 28, 2006, 2:50:24 PM9/28/06

to

In article <efgu43$dv3$1...@news-int.gatech.edu>,

Gabriel Loh <my-las...@cc.gatech.edu> wrote:
>
>> Where do you think the point of diminishing returns might
>> be?
>
>I haven't seen the topic come up in the parallel threads, but one
>question that's really interesting to think about (at least to me) is
>how in the world are you going to deliver power to all of these cores?

>BTW, did anyone see if Intel mentioned the total power consumption for
>their 80-core demo system? The wattage for the 80-core processor has to
>be astronomical, or the power budget per core has to be anemic

1.5 watts is an extravagent power budget for something like an ARM
core, and those cores were less than four square millimetres and a
quarter taken up by router.

An ARM Cortex-A8 is 750MHz, three square millimetres in 65nm and
375mW; the ARM11 MPCore is 620MHz, 2.54 square millimetres in 90nm
(with 32kb of cache) and 300mW. I'm slightly surprised that nobody's
made a load-of-ARMs chip even as a proof of concept.

Tom

Thomas Womack

unread,

Sep 28, 2006, 2:46:21 PM9/28/06

to

In article <UJCdnbZAYsZoW4bY...@metrocastcablevision.com>,
Bill Todd <bill...@metrocast.net> wrote:
>Terje Mathisen wrote:

>> That 80-core Intel demo chip has a vertically mounted SRAM chip as well,
>> providing 20 MB (afair) directly to each code.

I suspect it has 20MB in total, 256kb per core; 1.6GB of SRAM on a
chip is not remotely feasible with current fabrication processes,
whilst 20MB on 300mm^2 is twice the density of Montecito, so just
about right for 65nm.

>Well, since IIRC the processing cores are running at a princely 1.91 MHz
>(allegedly not a typo) I'm not sure how truly impressive that demo's
>performance would be: perhaps better to wait for the real thing in
>around 5 years' time.

That was a different demo, with a stack of boards plugging into a
Socket 7 (Pentium) motherboard, running something capable of enough
x86 to run Windows XP on an FPGA; I believe it was the HDL code for
what Intel intend to use as the 'mini x86_64 core'.

Tom

Terje Mathisen

unread,

Sep 28, 2006, 3:36:38 PM9/28/06

to

Joe Seigh wrote:

> Terje Mathisen wrote:
>> You still need some what to handle async inter-core communication!
>> I.e. I believe that you really don't have any choice here, except to
>> make most of your cores interruptible.
>
> Async presumes processors are a scarce commodity and you want to have it
> do other work while it's waiting for something to be done. That goes
> away if you have unlimited numbers of processors.

Unlimited? Yeah, in that case you can do a lot of stuff in new ways.

The problem is that I can't see any easy way around is when you want to
do A which depends on input from both B and C, which might occur in any
order.

It seems like the absolute minimum is to have a wait_for_any(...). The
alternative is to run around in a tight loop polling both B and C to
check if they have any data available, something which could cost you a
lot of memory bandwidth.

Jon Forrest

unread,

Sep 28, 2006, 5:09:13 PM9/28/06

to Felger Carbon

Felger Carbon wrote:

> A very few power users - say, 3 to 5 in the world - will be able to use lots
> and lots of cores. The vast majority of the public will not run more than
> one task at a time, which at this time means only one core.

I don't think this is true, at least not for *nix systems running
X-Windows. Think about what happens when you type a character in
an X-term window, or in any other window that echoes keystrokes.
Both the X-term and the X-server have to run at the same time.
Of course, they don't have to run very long at the same time,
but it's a good example of how dual cores help.

If I remember correctly, in the early days of X, slow single
processor systems with limited memory resulted in noticeable
latencies for this reason.

Jon Forrest

Joe Seigh

unread,

Sep 28, 2006, 6:38:22 PM9/28/06

to

Terje Mathisen wrote:
> Joe Seigh wrote:
>
>> Terje Mathisen wrote:
>>
>>> You still need some what to handle async inter-core communication!
>>> I.e. I believe that you really don't have any choice here, except to
>>> make most of your cores interruptible.
>>
>>
>> Async presumes processors are a scarce commodity and you want to have it
>> do other work while it's waiting for something to be done. That goes
>> away if you have unlimited numbers of processors.
>
>
> Unlimited? Yeah, in that case you can do a lot of stuff in new ways.
>
> The problem is that I can't see any easy way around is when you want to
> do A which depends on input from both B and C, which might occur in any
> order.
>
> It seems like the absolute minimum is to have a wait_for_any(...). The
> alternative is to run around in a tight loop polling both B and C to
> check if they have any data available, something which could cost you a
> lot of memory bandwidth.
>

Maybe. There's things like bus snooping which doesn't contribute to
memory usage. It doesn't matter. The IPC mechanism might end up looking
totally different than anything we know today. The processors manufacturers
will have to solve it by the time they get up 100's of cores.

Bill Todd

unread,

Sep 28, 2006, 8:03:43 PM9/28/06

to

You really need to work on your reading comprehension: there is nothing
on the page that you cite above that remotely suggests that the
prototype runs at 3 GHz (the only mention of that clock rate is in
reference to a current P4's FP performance).

The reference that gives the 1.91 MHz figure is
http://www.theinquirer.net/default.aspx?article=34623

- bill

Bill Todd

unread,

Sep 28, 2006, 8:11:24 PM9/28/06

to

Joe Seigh wrote:
> Nick Maclaren wrote:
>> In article <srKdnUtcqoTDaIbY...@comcast.com>,
>> Joe Seigh <jsei...@xemaps.com> writes:
>> |> Terje Mathisen wrote:
>> |> > |> > You still need some what to handle async inter-core
>> communication! I.e. |> > I believe that you really don't have any
>> choice here, except to make |> > most of your cores interruptible.
>> |> |> Async presumes processors are a scarce commodity and you want to
>> have it
>> |> do other work while it's waiting for something to be done. That goes
>> |> away if you have unlimited numbers of processors.
>>
>> Not really. Let's assume that you do have such an infinite number of
>> threads, and thread A wants to prod thread B at a time it is doing
>> something
>> else. That can't be done without some form of asynchronicity.
>
> Why would B be doing something else if you have an infinite number of
> threads
> on hand to do that something else?

Because the 'something else' that Thread B is doing is the reason that
Thread A needs to communicate with it.

Duh.

It is of course possible to program all threads such that they
frequently poll some shared (and appropriately interlocked) portion of
RAM to see whether anyone has left a message there for them, but that's
equally possible with today's SMPs and doesn't seem to be the mechanism
of choice a lot of the time.

- bill

Chris Thomasson

unread,

Sep 28, 2006, 9:49:25 PM9/28/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:e7KdnXEkRJml04HY...@comcast.com...

Why not PDR on the hardware:

http://groups.google.com/group/comp.programming.threads/msg/6236a9029d80527a

I mentioned this idea of mine to Andy Glew:

http://groups.google.com/group/comp.arch/msg/2a0f4163f8e13f1e

I have not received any sort of response... Andy, are you there?

;)

Any thoughts' on my PDR w/ hardware assist design? My idea can scale to any
number of processors. Lock-free reader patterns can scale. Period.

Joe Seigh

unread,

Sep 28, 2006, 10:30:54 PM9/28/06

to

I don't know. I haven't event seen McKenney file any hardware patents in
that area and he would have been the likely one to do that kind of stuff.

The IPC would be more than just PDR. The whole memory model could change
and they go to something like Occam style message passing. Because I don't
think the current strongly coherent cache scheme will scale up.
Of course that PDR supports a more relaxed cache/memory model doesn't hurt
things.

Dennis M. O'Connor

unread,

Sep 29, 2006, 2:06:09 AM9/29/06

to

"Felger Carbon" <fns...@jps.net> wrote ..

> The vast majority of the public will not run more than
> one task at a time, which at this time means only one core.

What nonsense. Most people using modern OS's
are running multiple tasks from at the least the
moment the GUI pops up. And most people are
also running multiple apps simultaneously: solitaire
and a porn, er I mean web browser at a minimum. ;-)
--
Dennis M. O'Connor dm...@primenet.com

Nick Maclaren

unread,

Sep 29, 2006, 4:23:18 AM9/29/06

to

Well, actually, many people have. The ICL DAP (and, I believe, the BBN
Butterfly) could well be classified as prototypes. The issue is when
(and if!) they will be available openly enough and cheaply enough for
a wide range of people to experiment with. And 20+ years from being
the next great thing to mere NDA isn't exactly rapid progress ....

|> >Seriously, they have been talked about as imminent for 20 years, so
|> >either there is a major problem or the IT industry is suffering a
|> >collective failure of nerve. Or both.
|>
|> You have to locate the knowledgeable in your country.

Eh? Delivery is as delivery does. Damn the claims - let's see the
products.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Sep 29, 2006, 4:25:48 AM9/29/06

to

In article <451C39F9...@ce.berkeley.edu>,

Jon Forrest <for...@ce.berkeley.edu> writes:
|>
|> If I remember correctly, in the early days of X, slow single
|> processor systems with limited memory resulted in noticeable
|> latencies for this reason.

And, God help us, modern multi-core, multi-GB systems STILL show such
delays even when there is almost no background activity :-(

No matter how much power a desktop has, the bloatware specialists
are fully capable of running out of it.

Regards,
Nick Maclaren.

ken...@cix.compulink.co.uk

unread,

Sep 29, 2006, 7:15:08 AM9/29/06

to

In article <2KGdnWmuYvz-m4HY...@comcast.com>,
jsei...@xemaps.com (Joe Seigh) wrote:

> Ok, let me pose this question then. What are the limits to
> how many cores one can imagine using? Are there fundimental
> laws of logic here, or are we the next generation of punch card
> mentality?

Apart from any logical limit there are physical limits. First
Chip fabs are limited in the size of wafer they can handle,
Second there are packaging limits. All the cores are generating
heat you have to get rid off. Also interconnection becomes a
problem not just inside the package but from the package to it's
socket. It took the abandonment of the dual in line package to
give enough pins for modern processors. There are also limits on
how much you can shrink the fab limits without needing a quantum
physicist to design chips. Get things small enough and quantum
tunnelling will become a major factor.

I do not have enough information to make any predictions on what
effect those limits will have on design. However I can see that a
change in architecture might be appropriate. Possibly something
on the lines of the old vector processors where most of the cores
are just ALU with memory access being handled by a dedicated
core. Or come to that revert to the transputer design an put
multiple ones on the chip with a memory control unit to give
parallel access to main memory.

Ken Young

Joe Seigh

unread,

Sep 29, 2006, 8:27:52 AM9/29/06

to

I don't disagree with you. Those are all valid points. But you
are arguing the premise.

So maybe I'll withdraw the question at this point. From a software
perspective, there isn't much you can do to influence what hardware
vendors will do. There's no such thing as too many cores or not
enough cores. There is just whatever there is and you can either use
it or not.

I find the proclamations from Intel and such that software programmers,
as a group, need to learn how to parallelize programs better rather
amusing. We, as a group, don't need to do anything. It's Intel's problem.

Message has been deleted

Jon Forrest

unread,

Sep 29, 2006, 2:07:05 PM9/29/06

to

Andi Kleen wrote:

> Just because today's desktop software is mostly single threaded this doesn't mean
> that future software has to be.

Being multi-threaded is a start, but how many threads will be in
runnable state simultaneously is what I'm wondering about
(for general purpose computing).

Jon

Thomas Womack

unread,

Sep 29, 2006, 2:11:55 PM9/29/06

to

In article <7IKdnUrZObRC_4HY...@metrocastcablevision.com>,

Bill Todd <bill...@metrocast.net> wrote:
>Tommy Thorn wrote:
>> Bill Todd wrote:
>>> Well, since IIRC the processing cores are running at a princely 1.91 MHz
>>
>> Well, you recall incorrectly. It's 3 GHz, cf.
>> http://www.tomshardware.com/2006/09/27/idf_fall_2006/page2.html
>
>You really need to work on your reading comprehension: there is nothing
>on the page that you cite above that remotely suggests that the
>prototype runs at 3 GHz (the only mention of that clock rate is in
>reference to a current P4's FP performance).

Go to the source

http://www.intel.com/pressroom/kits/events/idffall_2006/pdf/IDF%2009-26-06%20Justin%20Rattner%20Keynote%20Transcript.pdf

includes the paragraph

'We just got the silicon back earlier this week of our Terascale
Research prototype. And as you can see in the accompanying diagram,
each one of the 80 cores on this die consists of a simple
processor. It has a simple instruction set -- not IA compatible; it's
just has a simple instruction set that lets us do simple computations
in floating point and basically push data across the on-die fabric
that connects the 80-cores on this die together. Now, in aggregate,
those 80 cores produce one teraflop of computing performance, so a
single one of these Terascale Research prototypes is a one
teraflop-class processor. It delivers an energy efficiency of 10
gigaflops per watt at its nominal operating frequency of 3.1
gigahertz. That's an order of magnitude better than anything available
today.'

and

'For each core on that die, there's a 256-kilobyte static RAM, and the
aggregate bandwidth between all of the cores and all of those SRAM
arrays is one trillion bytes per second, truly an astonishing amount
of memory bandwidth.'

>The reference that gives the 1.91 MHz figure is
>http://www.theinquirer.net/default.aspx?article=34623

That's talking about a completely different project; if nothing else,
it's explicitly described as IA-compatible (and is running WinXP),
whilst the terascale chip is explicitly described as not.

Tom

Bill Todd

unread,

Sep 29, 2006, 3:19:36 PM9/29/06

to

Thomas Womack wrote:
> In article <7IKdnUrZObRC_4HY...@metrocastcablevision.com>,
> Bill Todd <bill...@metrocast.net> wrote:
>> Tommy Thorn wrote:
>>> Bill Todd wrote:
>>>> Well, since IIRC the processing cores are running at a princely 1.91 MHz
>>> Well, you recall incorrectly. It's 3 GHz, cf.
>>> http://www.tomshardware.com/2006/09/27/idf_fall_2006/page2.html
>> You really need to work on your reading comprehension: there is nothing
>> on the page that you cite above that remotely suggests that the
>> prototype runs at 3 GHz (the only mention of that clock rate is in
>> reference to a current P4's FP performance).
>
> Go to the source

I don't have time to scrounge around looking for 'the source' for every
random comment I happen to encounter on the Internet: what I did was
look at the *reference* that was cited, and its content was exactly what
I reported it to be.

>
> http://www.intel.com/pressroom/kits/events/idffall_2006/pdf/IDF%2009-26-06%20Justin%20Rattner%20Keynote%20Transcript.pdf
>
> includes the paragraph
>
> 'We just got the silicon back earlier this week of our Terascale
> Research prototype. And as you can see in the accompanying diagram,
> each one of the 80 cores on this die consists of a simple
> processor. It has a simple instruction set -- not IA compatible; it's
> just has a simple instruction set that lets us do simple computations
> in floating point and basically push data across the on-die fabric
> that connects the 80-cores on this die together. Now, in aggregate,
> those 80 cores produce one teraflop of computing performance, so a
> single one of these Terascale Research prototypes is a one
> teraflop-class processor. It delivers an energy efficiency of 10
> gigaflops per watt at its nominal operating frequency of 3.1
> gigahertz. That's an order of magnitude better than anything available
> today.'

My limited experience with 'first silicon' of a product suggests that
the actual device is then likely running (if indeed it runs yet at all)
at a rather small fraction of 3.1 GHz (3.1 GHz being its 'nominal'
target, which anyone who remembers Itanic's early clock-rate targets
will understand sometimes bears rather little resemblance to reality).

>
> and
>
> 'For each core on that die, there's a 256-kilobyte static RAM, and the
> aggregate bandwidth between all of the cores and all of those SRAM
> arrays is one trillion bytes per second, truly an astonishing amount
> of memory bandwidth.'

An interesting comment, since the 256 KB of SRAM per core is quite
comparable to today's amount of per-core L2 cache, which has per-core
bandwidth comparable to that of each of those mini-cores.

I.e., sounds kind of uninspiring. One relevant observation is that,
while stacking the SRAM chip on top of the processor chip is
interesting, it only effectively doubles the total chip area (i.e., is
equivalent to about one process shrink). If one could stack multiple
layers of SRAM this might start to become more interesting (assuming
that the stack remained coolable and the additional layers didn't
increase access latency over-much).

>
>> The reference that gives the 1.91 MHz figure is
>> http://www.theinquirer.net/default.aspx?article=34623
>
> That's talking about a completely different project; if nothing else,
> it's explicitly described as IA-compatible (and is running WinXP),
> whilst the terascale chip is explicitly described as not.

And now that you've actually provided a source with such information in
it, that is clear - but it certainly wasn't earlier. In particular, the
Inq article describing the IA-compatible 'mini-core' effort does so
explicitly in the context of Intel's 'Tera-Scale' effort.

- bill

Del Cecchi

unread,

Sep 29, 2006, 4:28:55 PM9/29/06

to

and a music player, and a torrent, and.....

--
Del Cecchi
"This post is my own and doesn’t necessarily represent IBM’s positions,
strategies or opinions.”

Eugene Miya

unread,

Sep 29, 2006, 5:07:13 PM9/29/06

to

>|> >|> PIMs.
>|> >When are we going to see them, then?
>|> We? "What do you mean 'we?' white man?" --Tonto

In article <efil5m$r5j$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>Well, actually, many people have. The ICL DAP (and, I believe, the BBN
>Butterfly) could well be classified as prototypes. The issue is when
>(and if!) they will be available openly enough and cheaply enough for
>a wide range of people to experiment with. And 20+ years from being
>the next great thing to mere NDA isn't exactly rapid progress ....

I saved a DAP for the CHM, and I used the BBN and I think I have
succeeded to locating one surviving representative sitting out in a
field near Denver. No, the Butterfly, Monarch, and TC2000 could not be
classified as PIMs. Their 88Ks, etc. were much more heavy heavy weight
processors. Similarly, while the DAP's where bit serial, their number
were/are comparatively small and with a wider address space.

Jon sits in a unique position that he can go an talk to Dave who is on a
different floor. I have to ensure that despite whatever happens to one
of the real PIMs, that a representative machines gets preserved even if
20 years or more after the fact. They aren't general purpose (yet, if
ever), and they aren't going to run Fortran or other conventional
language just yet, but they are popular where they sit.

>|> >Seriously, they have been talked about as imminent for 20 years, so
>|> >either there is a major problem or the IT industry is suffering a
>|> >collective failure of nerve. Or both.
>|>
>|> You have to locate the knowledgeable in your country.
>
>Eh? Delivery is as delivery does. Damn the claims - let's see the
>products.

Go talk to your friends in that big, circular, round building NW of London
near that town of Ch.*m..... They surely must have asked for some.

--

Chris Thomasson

unread,

Sep 29, 2006, 5:58:52 PM9/29/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:us2dnfMMPJUnGYHY...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:e7KdnXEkRJml04HY...@comcast.com...

[...]

>> Any thoughts' on my PDR w/ hardware assist design? My idea can scale to
>> any number of processors. Lock-free reader patterns can scale. Period.
>
> I don't know. I haven't event seen McKenney file any hardware patents in
> that area and he would have been the likely one to do that kind of stuff.

No kidding; he already has tons of RCU patents... In one of his bibliography
pages he even has links to your initial RCU+SMR hybrid idea. I wonder when
we are going to see patents for it...

> The IPC would be more than just PDR. The whole memory model could change
> and they go to something like Occam style message passing.

Not good. I don't want to be forced to use message passing. Especially when
we can create highly efficient virtually zero-overhead message passing
paradigms' already:

http://groups.google.com/group/comp.programming.threads/msg/6c24995ab986d410

http://groups.google.com/group/comp.programming.threads/msg/301e9153bcecf97c
(this is a good one*)

http://appcore.home.comcast.net/

;)

I hope they don't implement a model that gets away from shared memory! The
current thinking seems to be that threading and shared memory in general is
just way to complicated for any programmer to even begin to grasp:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b192c5ffe9b47926

http://groups.google.com/group/comp.programming.threads/msg/c3a0416b829b5dc4

What a shame! If people actually start to listen to non-sense like this,
they will always be cutting themselves short... The argument that threading
and shared memory is too complex/fragile is utterly false!

http://groups.google.com/group/comp.lang.c++.moderated/msg/d07b79e9633f3e52

:)

> Because I don't
> think the current strongly coherent cache scheme will scale up.

I agree. Why do you think the current trend seems to involve strong cache? I
could just imagine what the cache coherence protocol will look like for HTM!
It will probably have to be a bit stronger that what they have now:

http://groups.google.com/group/comp.programming.threads/msg/bacc295093eeb1fd

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/f6399b3b837b0a40

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/9c572b709248ae64

> Of course that PDR supports a more relaxed cache/memory model doesn't hurt
> things.

Yup. It is a major plus, IMHO... I would support partially hardware assisted
PDR over some hardware based message passing. Like I said before, we can
create our own forms of scaleable IPC right now, we don't need hardware for
that... Do we? Na...

:)

Chris Thomasson

unread,

Sep 29, 2006, 6:02:43 PM9/29/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:lsqdnXA_RfTjCYDY...@comcast.com...

> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:us2dnfMMPJUnGYHY...@comcast.com...
>> Chris Thomasson wrote:
>>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>>> news:e7KdnXEkRJml04HY...@comcast.com...

[...]

> http://groups.google.com/group/comp.programming.threads/msg/301e9153bcecf97c
> (this is a good one*)

This particular message passing scheme works very well. It out performs many
of the existing message passing designs; by wide margins... The simple trick
is to augment unbounded virtually zero-overhead single-produce/consumer
queuing with an implementation of Petersons Algorithm...

You can't really beat this setup...

Any thoughts?

rohit...@gmail.com

unread,

Sep 29, 2006, 11:10:52 PM9/29/06

to

I want to ask you guys, if it makes sense to take a slightly holistic
perspective to the question.

Will software vendors ALWAYS build applications that harness the
horse-power of a CPU?

Microsoft's latest OS (Vista) is quite bulky, and I asked myself if I
needed the extra bells-and-whistles in the OS. The answer is an
astounding yes.

The number of people that spend 8 or more hours on a computer has gone
up. Workers in banks and businesses use a computer as the "tool of
their trade".

My own personal example is a testimony to the CPU. I use google search
for EVERYTHING. Information on Rashes or allergies, quotations,
technical articles, news...

On the typical week-night, I am using my 1.8ghz laptop to do all of the
following simultaneously:

- Reading news articles
- Working on a remote unix host using VNC
- Playing a video on youtube in a minimized window (I listen to the
video if its a talk show)
- logged on google talk and yahoo messenger
- Responding to work email on Microsoft Outlook

I probably have 500 threads- 600 threads on my laptop. I think we are
still far away from the utopia of computing. There's plenty of CPU
hungry applications that havent been born yet.

-Rohit

Nick Maclaren

unread,

Sep 30, 2006, 5:24:13 AM9/30/06

to

In article <tFD*6H...@news.chiark.greenend.org.uk>,
Thomas Womack <two...@chiark.greenend.org.uk> writes:
|>
|> http://www.intel.com/pressroom/kits/events/idffall_2006/pdf/IDF%2009-26-06%20Justin%20Rattner%20Keynote%20Transcript.pdf

Thanks. Interesting. I can't say that I am convinced, because the
success will depend on whether people can make use of that power,
and Intel didn't mention the communication bandwidth.

However, I noted one amusing comment:

And to provide the adequate memory bandwidth for a teraflop of
computing power, we developed what we think is a really novel
solution. The nove solution involves stacking a memory chip
directly under the processor chip.

A really novel solution? Aw, gee.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Sep 30, 2006, 5:30:11 AM9/30/06

to

In article <1159585852....@c28g2000cwb.googlegroups.com>,

You are missing the point. Of those threads, perhaps all but 2-5 are
waiting on an event (usually a response from some other thread). As
several of us have posted before, there are very good arguments for
a fairly large cache of contexts, so that context switching would be
very fast, but very little for more than about 4 cores that would
execute threads in parallel.

The current exceptions are (a) servers and (b) HPC. It is POSSIBLE
that desktop applications with be parallelised in the near future,
but that has been said for 20 years.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Sep 30, 2006, 5:44:19 AM9/30/06

to

In article <451d8b01@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|>
|> Go talk to your friends in that big, circular, round building NW of London
|> near that town of Ch.*m..... They surely must have asked for some.

Actually, when we want to find out what they are up to, we have to
wait for leaks from Congress! They aren't our friends - they are
your agents, under contract from 'our' government.

Democracy? That is the political system currently operational in
Afghanistan and Iraq, isn't it?

Regards,
Nick Maclaren.

Niels Jørgen Kruse

unread,

Sep 30, 2006, 12:31:26 PM9/30/06

to

Felger Carbon <fns...@jps.net> wrote:

> A very few power users - say, 3 to 5 in the world - will be able to use lots

> and lots of cores. The vast majority of the public will not run more than

> one task at a time, which at this time means only one core.

Rip a CD to AAC or MP3 in iTunes and you use more than one core.

--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

rohit...@gmail.com

unread,

Sep 30, 2006, 4:07:27 PM9/30/06

to

> You are missing the point. Of those threads, perhaps all but 2-5 are
> waiting on an event (usually a response from some other thread). As
> several of us have posted before, there are very good arguments for
> a fairly large cache of contexts, so that context switching would be
> very fast, but very little for more than about 4 cores that would
> execute threads in parallel.

I see your point, but I see that you missed the fact that we are far
away from a utopian computing environment.

Here's what I'd like:

- A secure website that tells me the status of every electrical
gadget in my home (a webserver that is polling the status of each of
these devices)
- An alarm system that would give me a live video feed when there is
an intrusion into my house (either on my cell phone or on a work
computer)
- A global calender that keeps track of my kids school year (I dont
have kids yet, but you get the idea). I'd like to see details on every
homework (submission date,
- High end video environment (DVR + DVD + On-demand + Regular TV)
all in HD
- Ability to serve videos to any display in the home

- Communication with my car, and automatically managing my calender
for when it needs to be serviced
- Updated GPS maps
- Sync all my media to my cars computer automatically
==========================================

The list is endless. I can see how every one of these applications is
ATLEAST 1 thread, if not more. These "daemons" can run on one computer
or many. But they will all have to communicate, manage data, and be
secure (think encrypted communication).

My point is, as human beings in today's society, we are forced to do
many mundane chores which can be offloaded to reasonable smart software
programs. These applications have been built, yet. They will be.

Who would've envisioned a day when you can sit on your home computer
and look at all your bank transactions for the last 5-years.

I am all ears on what you have to say, Nick :-)

-Rohit

already...@yahoo.com

unread,

Sep 30, 2006, 5:17:38 PM9/30/06

to

And how big percentage of a single 2-4GHz core solitair, web browser,
music player and a torrent occupy altogether?

Niels Jørgen Kruse

unread,

Sep 30, 2006, 5:44:01 PM9/30/06

to

rohit...@gmail.com <rohit...@gmail.com> wrote:

> - An alarm system that would give me a live video feed when there is
> an intrusion into my house (either on my cell phone or on a work
> computer)

You forgot the Tazer coaxial with the snoop cam. :-)

Tarjei T. Jensen

unread,

Sep 30, 2006, 6:57:16 PM9/30/06

to

Niels Jørgen Kruse wrote:
> You forgot the Tazer coaxial with the snoop cam. :-)

You would want a real gun. Otherwise you may be arrested for cruelty to
animals :-)

greetings,

Chris Thomasson

unread,

Sep 30, 2006, 7:55:32 PM9/30/06

to

"Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
news:efle9j$kke$1...@gemini.csx.cam.ac.uk...

>
> In article <451d8b01@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:

[...]

> Democracy? That is the political system currently operational in
> Afghanistan and Iraq, isn't it?

Humm... If somebody could tie the hands of Iran, and the hands of their
proxy agents (aka, Hezbollah) together... We could kick them off the cliff,
into he abyss as one... After that, yes... I would say that Iraq and
Afghanistan are legitimate Democracies...

Bill Todd

unread,

Sep 30, 2006, 10:08:53 PM9/30/06

to

Hmmm - you appear to be about as delusional in that area as you are
about the relative importance of lock-free algorithms.

- bill

Del Cecchi

unread,

Sep 30, 2006, 10:40:27 PM9/30/06

to

<already...@yahoo.com> wrote in message
news:1159651057.9...@k70g2000cwa.googlegroups.com...

I was merely replying to the absurd notion that the average person was
only doing one task at a time.

How much cpu does it take to view a movie? How much to encode video from
a camera to some other format after the editing is done?

del

Chris Thomasson

unread,

Oct 1, 2006, 12:12:31 AM10/1/06

to

"Bill Todd" <bill...@metrocast.net> wrote in message
news:VPWdnbJFjNGrvoLY...@metrocastcablevision.com...

> Chris Thomasson wrote:
>> "Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
>> news:efle9j$kke$1...@gemini.csx.cam.ac.uk...
>>> In article <451d8b01@darkstar>, eug...@cse.ucsc.edu (Eugene Miya)
>>> writes:
>> [...]

> Hmmm - you appear to be about as delusional in that area

IMHO, I do believe that Iran is causing a lot of trouble for the
Collation... I think they are ordering Hezbollah to smuggle weapons and
other assorted items from Syria, into Iraq. It is also my opinion that Iran
is probably using its western border to smuggle contraband into Iraq. Its
too bad that Hezbollah is skilled in the art of creating those nasty IED
murder weapons... Damn...

Therefore, I do believe that things will start to get somewhat better for
the Collation if they can virtually eliminate the influence that Iran casts
over the internals of Iraq.

> as you are about the relative importance of lock-free algorithms.

What is your take on the issue? Perhaps you have misunderstood me; a
majority of my postings are directed to comp.programming.threads). FWIW, I
do believe that a clever mixture of lock-free, and lock-based algorithms can
produce results that can scale extremely well. For instance, here is an
older library of mine:

http://appcore.home.comcast.net/

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/205dcaed77941352

My AppCore library is wrapped around the following basic theme:

"An Effective Marriage between Lock-Free and Lock-Based Algorithms"

Here is another example of blending lock-free with lock-based:

http://groups.google.de/group/comp.programming.threads/msg/632b6bdc2a137459

http://groups.google.de/group/comp.programming.threads/browse_frm/thread/d062e1bfa460a375

Do you see anything wrong with my approach? If so, do you have any
suggestions' that could possibly help me make some adjustments to my current
take on lock-free algorithms in general?

Thank You...

Dennis M. O'Connor

unread,

Oct 1, 2006, 12:24:59 AM10/1/06

to

<already...@yahoo.com> wrote in message

> Del Cecchi wrote:
>> Dennis M. O'Connor wrote:
>> > "Felger Carbon" <fns...@jps.net> wrote ..
>> >
>> >> The vast majority of the public will not run more than
>> >>one task at a time, which at this time means only one core.
>> >
>> >
>> > What nonsense. Most people using modern OS's
>> > are running multiple tasks from at the least the
>> > moment the GUI pops up. And most people are
>> > also running multiple apps simultaneously: solitaire
>> > and a porn, er I mean web browser at a minimum. ;-)
>> > --
>> >

>> and a music player, and a torrent, and.....
>

> And how big percentage of a single 2-4GHz core solitair, web browser,
> music player and a torrent occupy altogether?

If I get to write any of the code for any of them,
I can arrange to use all of any processor. :-)

Anyway, your question isn't relevant to the original
statement that was refuted. But to answer it anyway,
there are games being demo'd for release RSN that
will us as much of any processor you currently can give
them. One even uses an entire processor (in a multi-
processor or multi-core system) for game physics.
And even with relatively old, common, free software
(EAC and LAME) I can keep a dual 3GHz core system
totally busy ripping and compressing a single CD for
quite a few minutes.

Bill Todd

unread,

Oct 1, 2006, 3:33:40 AM10/1/06

to

Chris Thomasson wrote:
> "Bill Todd" <bill...@metrocast.net> wrote in message
> news:VPWdnbJFjNGrvoLY...@metrocastcablevision.com...
>> Chris Thomasson wrote:
>>> "Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
>>> news:efle9j$kke$1...@gemini.csx.cam.ac.uk...
>>>> In article <451d8b01@darkstar>, eug...@cse.ucsc.edu (Eugene Miya)
>>>> writes:
>>> [...]
>
>> Hmmm - you appear to be about as delusional in that area
>
> IMHO, I do believe that Iran is causing a lot of trouble for the
> Collation...

I certainly hope so: I suspect that there are considerably more people
in the world (myself included) who think that the world would be better
off if the U.S. and Israeli governments were bound together and kicked
off the cliff into the abyss than relatively minor irritants like Iran
and Hezbollah.

...

>> as you are about the relative importance of lock-free algorithms.
>
> What is your take on the issue?

Given your frequent attempts to gain attention for such approaches plus
some of the silly areas you have suggested that they would be useful in,
I suspect that you are a young and likely well-meaning enthusiast who is
as naive politically as he is technically. My comment was kind of a
cheap shot in the current context, but well-meaning individuals whose
ignorant and unquestioning support enables their leaders to dish out as
much suffering and death as people like you have lately get pretty short
shrift in my book, I'm afraid.

Perhaps you have misunderstood me; a
> majority of my postings are directed to comp.programming.threads).

Since I don't visit that newsgroup, my impression is based on what I
have seen from you here.

- bill

David Hopwood

unread,

Oct 1, 2006, 12:53:17 PM10/1/06

to

Chris Thomasson wrote:
> IMHO, I do believe that Iran is causing a lot of trouble for the
> Collation...

Yep -- all those Arabic-script languages with slightly different sort orders...

(Sorry, couldn't resist.)

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Chris Thomasson

unread,

Oct 1, 2006, 5:11:26 PM10/1/06

to

"David Hopwood" <david.nosp...@blueyonder.co.uk> wrote in message
news:1oSTg.67408$wg.4...@fe1.news.blueyonder.co.uk...

> Chris Thomasson wrote:
>> IMHO, I do believe that Iran is causing a lot of trouble for the
>> Collation...
>
> Yep -- all those Arabic-script languages with slightly different sort
> orders...
>

lol

I meant "Coalition" of course; damn typos!

:)

> (Sorry, couldn't resist.)

FWIW, I sincerely apologize if any of my humble political opinions' gravely
offended any readers of this group (e.g., I think I may have pissed Mr. Bill
Todd off...).

;^(...

Thank you all for your patience, and for not banishing me to your killfile!

;)

Chris Thomasson

unread,

Oct 1, 2006, 5:27:26 PM10/1/06

to

"Bill Todd" <bill...@metrocast.net> wrote in message

news:W6udnR5TsYzI8oLY...@metrocastcablevision.com...

> Chris Thomasson wrote:
>> "Bill Todd" <bill...@metrocast.net> wrote in message
>> news:VPWdnbJFjNGrvoLY...@metrocastcablevision.com...
>>> Chris Thomasson wrote:
>>>> "Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
>>>> news:efle9j$kke$1...@gemini.csx.cam.ac.uk...
>>>>> In article <451d8b01@darkstar>, eug...@cse.ucsc.edu (Eugene Miya)
>>>>> writes:
>>>> [...]
>>
>>> Hmmm - you appear to be about as delusional in that area
>>
>> IMHO, I do believe that Iran is causing a lot of trouble for the
>> Collation...
>
> I certainly hope so: I suspect that there are considerably more people in
> the world (myself included) who think that the world would be better off
> if the U.S. and Israeli governments were bound together and kicked off the
> cliff into the abyss than relatively minor irritants like Iran and
> Hezbollah.

IMHO, Hezbollah is a ruthless organization that is made up of many highly
trained and dedicated "individuals"... For instance, they murdered a
boatload of U.S. Marines in cold blood... Yikes! I am not too sure I agree
with your assertion that Hezbollah is a relatively "minor irritant"... Oh
well. I think we are going to have to simply agree, to disagree...

Anyway, I promise to not post anything more on this subject!

:O

>>> as you are about the relative importance of lock-free algorithms.
>>
>> What is your take on the issue?
>
> Given your frequent attempts to gain attention for such approaches plus
> some of the silly areas you have suggested that they would be useful in,

I am interested to hear some of the "silly areas" that I have advocated
using lock-free techniques for; any quick links?

I will defiantly consider any advice/suggestions you, or others, may have...

Thank You.

> I suspect that you are a young and likely well-meaning enthusiast who is
> as naive politically as he is technically. My comment was kind of a cheap
> shot in the current context, but well-meaning individuals whose ignorant
> and unquestioning support enables their leaders to dish out as much
> suffering and death as people like you have lately get pretty short shrift
> in my book, I'm afraid.

Sorry.

> Perhaps you have misunderstood me; a
>> majority of my postings are directed to comp.programming.threads).
>
> Since I don't visit that newsgroup, my impression is based on what I have
> seen from you here.

IMO, c.p.t is a great group for researching lock-free and lock-based
algorithms and techniques. It actually might be worth your while to drop by
once in a while. I am looking forward to hearing from you over there...

:)

Benny Amorsen

unread,

Oct 1, 2006, 5:31:13 PM10/1/06

to

>>>>> "CT" == Chris Thomasson <cri...@comcast.net> writes:

CT> For instance, they murdered a boatload of U.S. Marines in cold
CT> blood...

Isn't that what war is about?

/Benny

jsa...@ecn.ab.ca

unread,

Oct 2, 2006, 12:16:15 AM10/2/06

to

Jon Forrest wrote:
> The big difference
> here is that additional processor cores only help if there's
> work for them to do. Modern operating systems use multiple
> processes and threads but I wonder how many processes/threads
> are runnable at any one time in a general purpose computer
> running general purpose jobs.

I think you're asking the wrong question.

For some purposes, the point of diminishing returns is at "two".

So does that automatically mean that multi-core designs are a bad idea?

No, because there's the other half of the equation to consider.

For one thing, it is true that there is no point of diminishing returns
in making the same processor design run faster. But increasing clock
speed through making transistors faster is *hard*, and bumps into a
*lot* of things that do have diminishing returns.

For another, where is the point of diminishing returns in adding more
transistors to a CPU design, to make it faster that way? Designs with
cache, aggressive pipelining, and superscalar arithmetic units...
pretty well are *at* that point.

Once you've done just about as well as you can in these areas, about
the only reasonable-cost avenue you have *left* for improving
performance is to include more processor cores in your design.

This is where the microcomputer revolution has ended up at present -
where it's almost impossible to design a "supercomputer" that has
better single-thread performance than your desktop PC. Not totally
impossible, though; there is room to get ahead of that curve a little
bit - the Itanium is proof of that, for example.

John Savard

jsa...@ecn.ab.ca

unread,

Oct 2, 2006, 12:32:24 AM10/2/06

to

So?

War is the application of force on a larger scale than that of, say,
crime and law-enforcement.

However, despite the accident of the language in providing only one
word for war, in a war, there is still often one side that is engaged
in aggression, and one side that is seeking to live in peace.

Israel, like Denmark and the United States, is a democratic country of
happy, peaceful people. But it has neighbors that have non-democratic
governments, and poorly-educated populations that tend to support a
social order in which non-Muslims have inferior status: they need
permissions Muslims don't require to repair their places of worship,
and are not allowed to build new ones, for example.

These neighbors, when the United States showed them that it did not
approve of them getting into a fight to recapture land from Israel -
land they lost due to their attempts to drive the Israelis into the sea
- turned to the *Union of Soviet Socialist Republics* to obtain
armaments.

The USSR should need no introduction. After one totalitarian system,
Nazism, was defeated in World War II, it became the pre-eminent threat
to human freedom for several decades. It subverted democracy in Eastern
Europe, so that the peoples of these countries, after suffering like
the rest of Europe under Nazism, emerged not to a new dawn of freedom,
but to continued oppression. It sat astride the world like a terrible
shadow, thanks to the atomic weapons it obtained through espionage
operations against the United States of America. And, of course, the
horrors of its slave labor camps are recounted in the classic "Gulag
Archipelago" of Aleksandr Isaevitch Solzenitsyn.

As we also remember, in October 1973, in the course of one of their
attempts to annihilate Israel, the countries of the Arab world imposed
an oil embargo on the Western world.

THIS WAS A TIME WHEN THE USSR WAS STILL IN EXISTENCE.

Yesterday, those who desired a world where Muslims ruled everyone else,
and Jews and Christians know their place, while Hindus (not being
people of the book) would convert or die risked plunging the whole
world into Communist slavery. Today, they fly airplanes into office
buildings.

Only a few months ago, some of your countrymen were beaten by a mob in
Sa'udi Arabia for the crime of being Danish. Today, a Frenchman is in
hiding for having publicly stated the historical fact that Mohammed
*was* a merciless war leader.

Who the aggressor is in this war is clear, and the need for victory is
clear as well. It is hoped that the fanatics within Islam can be
crushed and destroyed without also harming the ordinary people of the
Muslim world, who, doubtless, like people everywhere, wish to live
peaceful lives, not being bothered by either militaristic tyrants or by
extreme religious fanatics to be more observant of their religion than
they choose to be.

John Savard

jsa...@ecn.ab.ca

unread,

Oct 2, 2006, 12:33:34 AM10/2/06

to

David Hopwood wrote:
> Yep -- all those Arabic-script languages with slightly different sort orders...

Well, the Arabs have clearly conquered the world.

The 8086 architecture, after all, is little-endian.

John Savard

Bill Todd

unread,

Oct 2, 2006, 4:11:05 AM10/2/06

to

jsa...@ecn.ab.ca wrote:

...

> As we also remember, in October 1973, in the course of one of their
> attempts to annihilate Israel, the countries of the Arab world imposed
> an oil embargo on the Western world.

Why, of all the nerve! To presume to limit access to *our* oil just
because it happened to be situated under *their* sand!

>
> THIS WAS A TIME WHEN THE USSR WAS STILL IN EXISTENCE.

And, unfortunately, we're now in a situation where that time doesn't
look that bad after all, since they served as an effective curb on U.S.
aggression.

Idiot.

- bill

Terje Mathisen

unread,

Oct 2, 2006, 4:21:50 AM10/2/06

to

jsa...@ecn.ab.ca wrote:
> Once you've done just about as well as you can in these areas, about
> the only reasonable-cost avenue you have *left* for improving
> performance is to include more processor cores in your design.

I agree.

>
> This is where the microcomputer revolution has ended up at present -
> where it's almost impossible to design a "supercomputer" that has
> better single-thread performance than your desktop PC. Not totally
> impossible, though; there is room to get ahead of that curve a little
> bit - the Itanium is proof of that, for example.

Except that the Itanium doesn't really outperform 'standard' PC cpus on
SpecInt, only on FP stuff, and barely there as well.

Price/performance is not good at all, unfortunately.

Terje

--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Bernd Paysan

unread,

Oct 2, 2006, 8:19:37 AM10/2/06

to

jsa...@ecn.ab.ca wrote:
> Israel, like Denmark and the United States, is a democratic country of
> happy, peaceful people.

Oh, man. I'm all for Denmark in this statement, since they are really quite
democratic and peaceful, despite of the Mohammed cartoons. But the USA as a
state are not so peaceful. During the cold war, the USA developed a habit
of attacking other contries without being substantial threatened by them
(e.g. the Vietnam war started, because two US soldiers were killed). A few
years ago, they illegally attacked Irak, and brought a lot more terror to
the world in the attempt to fight fire with oil.

Israel is not so peaceful, either. The founder of that country, Ben Gurion,
was a terrorist before. It is not so democratic, as well. Many presidents
were generals in the military before (apparently, the current leadership is
a rare exception); a democratically legitimized military government,
mostly. A few month ago, Israel illegally invaded Lebanon, to avenge the
kidnapping of two soldiers by Hesbollah (which by itself was to avenge the
kidnapping of half the palestinean government by Israel, an avenge to
kidnapping of one soldier by Hamas... sounds like a typical vendetta where
someone soon will bring up the still pending issue of David vs. Goliath, as
well ;-).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Dennis M. O'Connor

unread,

Oct 2, 2006, 9:55:23 AM10/2/06

to

"Bernd Paysan" <bernd....@gmx.de> wrote ...

> (e.g. the Vietnam war started, because two US soldiers were killed)

Not hardly. Where did you learn something so ridiculous ?

The most probable cause of the Vietnam War was that
US, during the Eisenhower and Kennedy administrations,
refused to allow Vietnam, divided in 1954 by the treaty with
France on a supposedly temporary basis, to re-unite peacefully.

Now, please continue all these discussions in a more
appropriate newsgroup. Thank you.

Jon Forrest

unread,

Oct 2, 2006, 11:47:41 AM10/2/06

to jsa...@ecn.ab.ca

jsa...@ecn.ab.ca wrote:

>
> I think you're asking the wrong question.
>
> For some purposes, the point of diminishing returns is at "two".
>
> So does that automatically mean that multi-core designs are a bad idea?

Nobody said anything about multi-core being a bad idea.

> Once you've done just about as well as you can in these areas, about
> the only reasonable-cost avenue you have *left* for improving
> performance is to include more processor cores in your design.

But, if this too doesn't do much to improve performance, what's
the point? Remember the Myth of Sisyphus?

If increased clock speed, multi-cores, and all the other tricks
don't result in better performance and/or lower cost then maybe
it's time to look elsewhere, or maybe we've hit a wall imposed
by human limitations, not technological.

I worked in the CS Dept. at UC Berkeley during the middle and
end of the Great Systems Era. Back then there was a huge need
for increased hardware power. Now the emphasis seems to be on
small systems, like sensors and nano-tech, where single system
compute power isn't all that important. Of course, there
are exceptions, but the trend is clear.

Jon Forrest

unread,

Oct 2, 2006, 11:54:27 AM10/2/06

to rohit...@gmail.com

rohit...@gmail.com wrote:

> The list is endless. I can see how every one of these applications is
> ATLEAST 1 thread, if not more. These "daemons" can run on one computer
> or many. But they will all have to communicate, manage data, and be
> secure (think encrypted communication).

The thing to keep in mind when thinking about future applications
is how much faster they would run if you had a computer that has an
infinite amount of computing power. In your example, and most of the
others I've seen, this would make just a slight improvement.

In your examples, most of the time you're network bound. Some of
your video needs might be a little taxing but many of them can
be done just fine today with a modern video card.

Maybe another way to describe your problem is that you're software bound,
not because the software runs too slowly but because it hasn't
been written yet in a way that lets you comfortably and reliably
do what you want.

Jon Forrest

Eugene Miya

unread,

Oct 2, 2006, 2:23:51 PM10/2/06

to

>|>>> PIM

In article <451d8b01@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
>|> Go talk to your friends in that big, circular, round building NW of London
>|> near that town of Ch.*m..... They surely must have asked for some.

In article <efle9j$kke$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>Actually, when we want to find out what they are up to, we have to
>wait for leaks from Congress! They aren't our friends - they are
>your agents, under contract from 'our' government.

Well then I guess you have to develop influence in the US Conrgess
to talk to your people about seeing US technology in the UK. The part of
the UK I deal with doesn't have any influence (I did recommend a book
to one of them).

>|>>> not PIM

>Democracy? That is the political system currently operational in
>Afghanistan and Iraq, isn't it?

Do they have an Electoral College?

--

Eugene Miya

unread,

Oct 2, 2006, 2:29:37 PM10/2/06

to

>>> "Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message

>>>> In article <451d8b01@darkstar>, eug...@cse.ucsc.edu (Eugene Miya)
>>>> writes:

Stuff about PIMs.

Stuff not about PIMs.

Please edit better guys.
In article <Up-dnUuwdY8SoILY...@comcast.com>,

Chris Thomasson <cri...@comcast.net> wrote:
>"Bill Todd" <bill...@metrocast.net> wrote in message
>news:VPWdnbJFjNGrvoLY...@metrocastcablevision.com...
>> Chris Thomasson wrote:
>> Hmmm - you appear to be about as delusional in that area
>
>IMHO, I do believe that Iran is causing a lot of trouble for the
>Collation... I think they are ordering Hezbollah to smuggle weapons and

While I am not a fan of the current Crusade:
Several fine alt.* and soc.* and talk.* groups exist on these topics.

>> as you are about the relative importance of lock-free algorithms.

Is this about Henry M.'s lock free OS of old?

>What is your take on the issue? Perhaps you have misunderstood me; a
>majority of my postings are directed to comp.programming.threads). FWIW, I
>do believe that a clever mixture of lock-free, and lock-based algorithms can
>produce results that can scale extremely well. For instance, here is an
>older library of mine:

...

>"An Effective Marriage between Lock-Free and Lock-Based Algorithms"

Is this about Henry M.'s lock free OS of old?

--

rohit...@gmail.com

unread,

Oct 2, 2006, 2:44:50 PM10/2/06

to

Am I the only one that feels that this thread is in the fringe of what
most of us understand as computer architecture?

I plead the individuals of concern here, to not be offended. I see that
you are having an interesting conversation. But, can you spare us geeks
the burden of the information, and perhaps engage in this conversation
in a more suitable news group (alt.*).

Thanks for understanding guys.

rohit

Nick Maclaren

unread,

Oct 2, 2006, 2:54:07 PM10/2/06

to

In article <1159814689.8...@h48g2000cwc.googlegroups.com>,

"rohit...@gmail.com" <rohit...@gmail.com> writes:
|>
|> Am I the only one that feels that this thread is in the fringe of what
|> most of us understand as computer architecture?

No, and I apologise for having inadvertently started it :-(

Regards,
Nick Maclaren.

Eugene Miya

unread,

Oct 2, 2006, 4:29:13 PM10/2/06

to

In article <1159814689.8...@h48g2000cwc.googlegroups.com>,
"rohit...@gmail.com" <rohit...@gmail.com> writes:
>|> Am I the only one that feels that this thread is in the fringe of what
>|> most of us understand as computer architecture?

In article <efrn8f$i4q$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>No, and I apologise for having inadvertently started it :-(

This is why the Cancel command was created.
Apology is irrelevant here.
These are the kinds of reasons why you have chased away knowledgeable people.

--

Chris Thomasson

unread,

Oct 2, 2006, 6:58:05 PM10/2/06

to

Damn. Another typo!

> I am interested to hear some of the "silly areas" that I have advocated
> using lock-free techniques for; any quick links?
>
> I will defiantly consider any advice/suggestions you, or others, may
> have...

^^^^^^^

That was suppose to be "definitely"

already...@yahoo.com

unread,

Oct 2, 2006, 7:17:23 PM10/2/06

to

I wholeheartedly agree that CPU-intensive desktop tasks do exist. Just
wanted to point out that your original list didn't contain even one of
those.

BTW, the most compute-intensive tasks that I do at work at my
workstation (which physically happens to be a cheap desktop PC) are
related to FPGA development - synthesis, place and route, timing
analysis. All these tasks are 100% CPU-bound and single-threaded. So in
order to maximize productivity I would happily trade second CPU for
just 10-15 percents of improvement in single-thread performance. Giving
up 20% of single-threaded performance for two, four or 1000 cores is a
non-starter.
So if Intel could give me a single-core Conroe running at 3.5 GHz I
would grab with both hands.

Andrew Reilly

unread,

Oct 2, 2006, 7:58:49 PM10/2/06

to

On Mon, 02 Oct 2006 17:17:23 -0700, already5chosen wrote:

> FPGA development - synthesis, place and route, timing
> analysis. All these tasks are 100% CPU-bound and single-threaded.

Are those tasks inherently single-threaded, or is that just the way your
tools vendor coded them? I would have expected synthesis to have about
the same opportunities for parallelism as other compilers: essentially
what parallel make can give you. Place and route might be parallelisable,
if they operate in an iterative try-multiple options minimisation style.
Don't know about timing analysis. There's lots of independent stuff going
on in most FPGAs, though, so I'd think that there's ample opportunity to
do that in parallel too.

Multi-processors have been available in the higher-end CAD workstation
arena for a long time. I would have thought that the code would be using
them, by now.

Cheers,

--
Andrew

Del Cecchi

unread,

Oct 2, 2006, 8:04:23 PM10/2/06

to

<already...@yahoo.com> wrote in message
news:1159831043.6...@i3g2000cwc.googlegroups.com...

Well, if you count work applications there are many. SPICE. DRC/LVS,
extraction, simulation.......

That's why server farms were invented.

del
>

already...@yahoo.com

unread,

Oct 2, 2006, 8:36:22 PM10/2/06

to

Andrew Reilly wrote:
> On Mon, 02 Oct 2006 17:17:23 -0700, already5chosen wrote:
>
> > FPGA development - synthesis, place and route, timing
> > analysis. All these tasks are 100% CPU-bound and single-threaded.
>
> Are those tasks inherently single-threaded, or is that just the way your
> tools vendor coded them? I would have expected synthesis to have about
> the same opportunities for parallelism as other compilers: essentially
> what parallel make can give you. Place and route might be parallelisable,
> if they operate in an iterative try-multiple options minimisation style.
> Don't know about timing analysis. There's lots of independent stuff going
> on in most FPGAs, though, so I'd think that there's ample opportunity to
> do that in parallel too.
>

I am stating the facts. Don't pretend to know the reasons. You can post
your questions on comp.arch.fpga.
BTW, HDL synthesis is not similar to normal software build. It is
similar to software build with link-time code generation +
inter-procedure optimization.

> Multi-processors have been available in the higher-end CAD workstation
> arena for a long time. I would have thought that the code would be using
> them, by now.
>
> Cheers,
>
> --
> Andrew

May be, in other CAD/CAE areas. Not for FPGA development.

already...@yahoo.com

unread,

Oct 2, 2006, 8:45:50 PM10/2/06

to

Del Cecchi wrote:
>
> Well, if you count work applications there are many. SPICE. DRC/LVS,
> extraction, simulation.......
>
> That's why server farms were invented.
>
> del
>

How many of those are both
1. Efficiently parallelizable
2. But not embarrassingly parallel
Because for embarrassingly-parallel case multicore is no better than
SMP, except for the price and both multicore and SMP often no better
than distributed computation (clusters, MPP).

jsa...@ecn.ab.ca

unread,

Oct 2, 2006, 9:56:46 PM10/2/06

to

Jon Forrest wrote:
> But, if this too doesn't do much to improve performance, what's
> the point? Remember the Myth of Sisyphus?

It depends on the application. The benefits of throwing more processors
at an application can range from none whatever to an increase in
performance linear with the number of processors used.

Multi-core *does* have a ceiling too, determined by chip yield. After
that, though, you can put more processor chips on the board. (Microsoft
licensing policies happen to be warping chip design at the moment, it
might be noted.)

The idea is to do as much as is reasonable in each direction - and
going multicore is one direction to go in. Putting memory on the chip
to ease the bandwidth bottleneck is an alternative possibility.

John Savard

jsa...@ecn.ab.ca

unread,

Oct 2, 2006, 11:25:55 PM10/2/06

to

Eugene Miya wrote:
> In article <efgr7e$6oa$1...@gemini.csx.cam.ac.uk>,
> Nick Maclaren <nm...@cus.cam.ac.uk> wrote:

> >When are we going to see them, then?

> We? "What do you mean 'we?' white man?" --Tonto
> I've seen them. I'm under an NDA.

That doesn't count.

Of course, it still means they *exist*. But when is the world going to
see them is a legitimate question.

John Savard

Nick Maclaren

unread,

Oct 3, 2006, 5:32:58 AM10/3/06

to

In article <pan.2006.10.02....@areilly.bpc-users.org>,

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
|> On Mon, 02 Oct 2006 17:17:23 -0700, already5chosen wrote:
|>
|> > FPGA development - synthesis, place and route, timing
|> > analysis. All these tasks are 100% CPU-bound and single-threaded.
|>
|> Are those tasks inherently single-threaded, or is that just the way your
|> tools vendor coded them? I would have expected synthesis to have about
|> the same opportunities for parallelism as other compilers: essentially
|> what parallel make can give you. Place and route might be parallelisable,
|> if they operate in an iterative try-multiple options minimisation style.

The early 1970s experience has never been contradicted - for many or
most applications, just doing that gives a small and very inefficient
use of parallelism (e.g. the efficiency is often only log(N)/N where
N is the number of CPUs). That experience debunked the claims of the
functional programming brigade that such methodology gave automatic
parallelisation.

I should be interested to hear of any real experiments where parallel
make gives a better result, and to look at the make structure. My
experience and that of many other people is that it fits the above
model to a T.

40 years experience in this area can be summed up as TANSTAAFL.

|> Multi-processors have been available in the higher-end CAD workstation
|> arena for a long time. I would have thought that the code would be using
|> them, by now.

It's actually quite hard to add to programs that weren't designed for
it and aren't naturally embarrassingly parallel.

Regards,
Nick Maclaren.

Eugene Miya

unread,

Oct 3, 2006, 11:45:29 AM10/3/06

to

In article <efgr7e$6oa$1...@gemini.csx.cam.ac.uk>,
Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>> >When are we going to see them, then?

>> We? "What do you mean 'we?' white man?" --Tonto

In article <1159845955.2...@e3g2000cwe.googlegroups.com>,

<jsa...@ecn.ab.ca> wrote:
>That doesn't count.
>Of course, it still means they *exist*.

Bring up questions of trees falling in forests?
People could only guess what the first computers were like.

>But when is the world going to
>see them is a legitimate question.

You have to remember to keep the important keyword, PIM, in case
threading gets cut.

When? Well. That's hard to say. It may be never the rate the Crusade
is going. Certainly people in the field have seen them. One of the
former posters most pissed at Nick likely saw them. The first were
integrated into preexisting boxes of manufacturers you have already
heard of as well as lesser known more expensive packages.
But the scale and degree and current numbers I have no idea now.
Museums only get access to things in some cases ages, decades after
they are done. They supposedily just look any other rectangular box.

--

Nick Maclaren

unread,

Oct 3, 2006, 11:51:43 AM10/3/06

to

In article <45228599$1@darkstar>, eug...@cse.ucsc.edu (Eugene Miya) writes:
|>
|> You have to remember to keep the important keyword, PIM, in case
|> threading gets cut.

It will be amusing to see what meaning that term attaches itself to,
assuming that some such technologies arrive in the general market.

|> When? Well. That's hard to say. It may be never the rate the Crusade
|> is going.

What Crusade? I know of only one current one and it is OT. Let's
not bring it up again.

|> Certainly people in the field have seen them. One of the
|> former posters most pissed at Nick likely saw them. The first were
|> integrated into preexisting boxes of manufacturers you have already
|> heard of as well as lesser known more expensive packages.
|> But the scale and degree and current numbers I have no idea now.
|> Museums only get access to things in some cases ages, decades after
|> they are done. They supposedily just look any other rectangular box.

Until and unless someone is prepared to describe what he saw, it is
hard to tell whether what they saw WAS just a random, off-the-shelf
computer box containing a lot of hot air. After all, could the
person who told THEM "this box is full of PIMs" be telling porkies?
Nah. Nobody EVER does that.

More seriously, those of us with Half a Clue (as well as the people who
are inside the business) have been saying for two decades that there has
never been any difficulty for the major companies to make such things;
the problem is and always has been to turn them into something that
can be used effectively by mere mortals for typical tasks.

And it is the latter aspect that is interesting, not whether there are
a few specialised systems built to amuse the spooks.

Regards,
Nick Maclaren.

Del Cecchi

unread,

Oct 3, 2006, 1:24:31 PM10/3/06

to

Some are and some aren't. But they can pretty much all use multiple
processors or multiple threads one way or another. And as far as I am
concerned, multicore isn't that much different than smp.

--
Del Cecchi
"This post is my own and doesn’t necessarily represent IBM’s positions,
strategies or opinions.”

Del Cecchi

unread,

Oct 3, 2006, 1:28:37 PM10/3/06

to

And remember that if someone could make a convincing case that they
would sell enough that there are folks who would bank roll them, and
others (Like IBM E&TS :-) ) who would help them produce such a thing.

Nick Maclaren

unread,

Oct 3, 2006, 2:14:34 PM10/3/06

to

In article <4ofku7F...@individual.net>,

Del Cecchi <cecchi...@us.ibm.com> writes:
|> >
|> > More seriously, those of us with Half a Clue (as well as the people who
|> > are inside the business) have been saying for two decades that there has
|> > never been any difficulty for the major companies to make such things;
|> > the problem is and always has been to turn them into something that
|> > can be used effectively by mere mortals for typical tasks.
|>

|> And remember that if someone could make a convincing case that they
|> would sell enough that there are folks who would bank roll them, and
|> others (Like IBM E&TS :-) ) who would help them produce such a thing.

Precisely. I can provide a convincing case that they would sell, but
not enough - indeed, I was having an almost identical conversation with
someone from IBM 20 years ago :-(

Regards,
Nick Maclaren.