Teaching old threaded programmers new parallel programming tricks

Joe Seigh

unread,

Jul 3, 2007, 5:54:56 AM7/3/07

to

Seems to be no end of articles and blogs claiming we programmers aren't
learning to parallel programs fast enough.

http://www.regdeveloper.co.uk/2007/07/02/smith_parallel_coming/
http://blogs.intel.com/research/2007/06/parallel_computing_making_sequ.html

Someone should tell Intel it takes time to come up with new
lemonade recipes. :) Intel doesn't seem to understand software
or the software business. Period. Plus there seems to be no
shortage of parallel programming types who believe that since
they could easily program the "embarrasingly parallel" problems,
parallel solutions are easy to come up with for all problems. Just us
stupid programmers seem to be the problem I guess.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Dmitriy Vyukov

unread,

Jul 3, 2007, 6:24:41 AM7/3/07

to

On Jul 3, 1:54 pm, Joe Seigh <jseigh...@xemaps.com> wrote:
> Seems to be no end of articles and blogs claiming we programmers aren't
> learning to parallel programs fast enough.

> http://www.regdeveloper.co.uk/2007/07/02/smith_parallel_coming/http://blogs.intel.com/research/2007/06/parallel_computing_making_seq...

>
> Someone should tell Intel it takes time to come up with new
> lemonade recipes. :) Intel doesn't seem to understand software
> or the software business.

Yeah. And Intel guys at least can deign to answer to posts on their f@#
$^&* threading forum!
You and Chris Thomasson know what I am talking about :)

Dmitriy V'jukov

David Schwartz

unread,

Jul 3, 2007, 7:50:51 AM7/3/07

to

On Jul 3, 2:54 am, Joe Seigh <jseigh...@xemaps.com> wrote:

> Someone should tell Intel it takes time to come up with new
> lemonade recipes. :)

It's been a decade or so. C'mon, technology is rapidly-changing. A
decade is way too long.

> Intel doesn't seem to understand software
> or the software business. Period.

Do tell.

> Plus there seems to be no
> shortage of parallel programming types who believe that since
> they could easily program the "embarrasingly parallel" problems,
> parallel solutions are easy to come up with for all problems.

If it was practical to make CPUs faster without multiple cores, don't
you think Intel would do that? That would make their CPUs immediately
faster across the board, rather than ultimately faster as technology
catches up.

Frankly, Intel has done an unbelievable job of allowing the same
design principles and code run faster as technology improved. There is
no way I would have ever conceived we could have the performance we
have today and still stick with an x86 architecture. It's frankly
totally unbelievable.

We all knew it couldn't last forever.

> Just us
> stupid programmers seem to be the problem I guess.

How long has SMP been around? This is the same thing, it's not
something new. It's just becoming more widespread, but it was *always*
around on servers. So desktop programmers will have to learn to
program like server programmers. There once was a time only server
programmers had to write 32-bit code, so this is nothing new either.

Quit yer bitchin'.

DS

Message has been deleted

Eric Sosman

unread,

Jul 3, 2007, 5:37:09 PM7/3/07

to

Elcaro Nosille wrote On 07/03/07 17:06,:
> Joe Seigh schrieb:
>
>
>>Seems to be no end of articles and blogs claiming we pro-

>>grammers aren't learning to parallel programs fast enough.
>
>

> I don't believe that. Multithreaded programming is quite easy.

Multithreaded debugging, on the other hand, ...

--
Eric....@sun.com

Chris Thomasson

unread,

Jul 3, 2007, 6:43:36 PM7/3/07

to

"Eric Sosman" <Eric....@sun.com> wrote in message
news:1183498629.528603@news1nwk...

:^0

David Schwartz

unread,

Jul 3, 2007, 6:45:42 PM7/3/07

to

On Jul 3, 2:37 pm, Eric Sosman <Eric.Sos...@sun.com> wrote:

> Elcaro Nosille wrote On 07/03/07 17:06,:

> > I don't believe that. Multithreaded programming is quite easy.

It's easy to do such that there are very few bugs and it's easy to do
such that it performs well. It is not so easy to do both at once.

> Multithreaded debugging, on the other hand, ...

Well, you can write code that is very easy to debug and very unlikely
to have bugs. But the net result is that a lot of concurrency is
sacrificed. To get more concurrency, you have to use more complex
locking and synchronization. And then debugging can be very, *very*
hard.

One can even argue that just the fact that multi-threaded programs are
less deterministic makes them significantly harder to debug in many
real world circumstances.

DS

llothar

unread,

Jul 4, 2007, 12:10:04 AM7/4/07

to

> Well, you can write code that is very easy to debug and very unlikely
> to have bugs. But the net result is that a lot of concurrency is
> sacrificed. To get more concurrency, you have to use more complex
> locking and synchronization. And then debugging can be very, *very*
> hard.

And Multithreading can get slower then single threaded apps.Especially
in the Desktop and GUI world.

llothar

unread,

Jul 4, 2007, 12:12:21 AM7/4/07

to

> Someone should tell Intel it takes time to come up with new
> lemonade recipes. :) Intel doesn't seem to understand software
> or the software business. Period. Plus there seems to be no
> shortage of parallel programming types who believe that since
> they could easily program the "embarrasingly parallel" problems,
> parallel solutions are easy to come up with for all problems. Just us
> stupid programmers seem to be the problem I guess.

Well then they should help people to get around most of the patent
problems that seem to be around a lot of modern Lock-Free algorithms.

And they could optimize there software and hardware for this.
At the moment i can't say that Intel is taking Multithreading serious
(they are only throwing out dual/quad core CPU's.

David Schwartz

unread,

Jul 4, 2007, 12:36:33 AM7/4/07

to

On Jul 3, 9:10 pm, llothar <llot...@web.de> wrote:

> And Multithreading can get slower then single threaded apps. Especially
> in the Desktop and GUI world.

You can't wish technology into existence. What we have has the

disadvantages it has. As I said:

"Frankly, Intel has done an unbelievable job of allowing the same
design principles and code run faster as technology improved. There is
no way I would have ever conceived we could have the performance we
have today and still stick with an x86 architecture. It's frankly
totally unbelievable."

DS

Chris Thomasson

unread,

Jul 4, 2007, 3:23:13 PM7/4/07

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:1183458281....@e16g2000pri.googlegroups.com...

Indeed:

http://groups.google.com/group/comp.programming.threads/msg/ca8534ee727de773

:^0

Chris Thomasson

unread,

Jul 4, 2007, 3:28:55 PM7/4/07

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:1183463451.4...@a26g2000pre.googlegroups.com...

> On Jul 3, 2:54 am, Joe Seigh <jseigh...@xemaps.com> wrote:

[...]

> Quit yer bitchin'.

You do realize that the following term hold for you as well:

http://groups.google.com/group/comp.programming.threads/msg/ca8534ee727de773

"Anyone who thinks they can create a correct program using raw primitive
construction such as semaphores, is most likely crazy, and is indeed in need
of professional mental help."

They claim it will "hurt" your brain of your try to use locks. Here is some
more:

http://groups.google.com/group/comp.arch/msg/d3f98781730ab6ab

The STM video in question actually compares locks to building a bridge out
of bananas?

Are you agreeing with that hyped bulls#%t?

John Dallman

unread,

Jul 4, 2007, 6:50:00 PM7/4/07

to

In article <1183502742.9...@w5g2000hsg.googlegroups.com>,
dav...@webmaster.com (David Schwartz) wrote:

> It's easy to do such that there are very few bugs and it's easy to do
> such that it performs well. It is not so easy to do both at once.

And it is hard to retrofit threading into existing complex code.

> One can even argue that just the fact that multi-threaded programs are
> less deterministic makes them significantly harder to debug in many
> real world circumstances.

Much harder, simply because one wastes so much time on runs that work.

One always needs to get code working reliably on a single thread first,
which means - surprise! - one can't write code that /only/ works with
multiple processors. Using multiple threads on a single processor, for
code that is CPU-limited, looses much too much performance, IME.

--
John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.

John Dallman

unread,

Jul 4, 2007, 6:50:00 PM7/4/07

to

In article <1183522341.8...@z28g2000prd.googlegroups.com>,
llo...@web.de (llothar) wrote:

> Well then they should help people to get around most of the patent
> problems that seem to be around a lot of modern Lock-Free algorithms.

How might they do this, other than (a) getting the US patent system
changed to that you can't patent software or (b) buying up a load of
patents and making them free for general use? The later has the problem
that they'll instantly be besieged by "inventors" claiming Intel need to
buy them out and make them rich. Proving that the latter don't have
anything original will require lots of time and effort, and maybe
litigation.

Frankly, while lock-free programming is encumbered with patents, I'm
avoiding it in favour of conventional threading.

> At the moment i can't say that Intel is taking Multithreading serious
> (they are only throwing out dual/quad core CPU's.

They have to make processors that will run existing single-thread code
faster than the previous generation. Something with a lot of simpler
cores that will run existing code single-thread much slower won't sell,
because the apps that would exploit it don't exist. It's a chicken-and-
egg problem.

If you have a way to build, with the transistor count of a Core 2 Duo, a
16-core processor that will run existing single-thread code at the same
speed, I suggest you approach Intel or AMD and prepare to become very
rich indeed.

Chris Thomasson

unread,

Jul 4, 2007, 8:57:18 PM7/4/07

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:nOydnZKjBvU0gRfb...@comcast.com...

> Seems to be no end of articles and blogs claiming we programmers aren't
> learning to parallel programs fast enough.
>
> http://www.regdeveloper.co.uk/2007/07/02/smith_parallel_coming/
> http://blogs.intel.com/research/2007/06/parallel_computing_making_sequ.html
>
> Someone should tell Intel it takes time to come up with new
> lemonade recipes. :)

Indeed! lol.

> Intel doesn't seem to understand software
> or the software business. Period.

They have a bleak outlook wrt how the programming community can possibly
address parallel programming in general. Not good imho.

> Plus there seems to be no
> shortage of parallel programming types who believe that since
> they could easily program the "embarrasingly parallel" problems,
> parallel solutions are easy to come up with for all problems. Just us
> stupid programmers seem to be the problem I guess.

That the tone I hear when In read those types of articles. STM hyped
bullshi% makes me cringe...

:^0

David Schwartz

unread,

Jul 5, 2007, 4:27:21 AM7/5/07

to

On Jul 4, 3:50 pm, j...@cix.co.uk (John Dallman) wrote:
> In article <1183502742.904601.289...@w5g2000hsg.googlegroups.com>,

> dav...@webmaster.com (David Schwartz) wrote:

> > It's easy to do such that there are very few bugs and it's easy to do
> > such that it performs well. It is not so easy to do both at once.

> And it is hard to retrofit threading into existing complex code.

Well, it's easy to do it if you don't need to do it well. You can just
put a huge lock around the existing complex code. You may also need to
create a service thread that proxies all data going into and out of
the existing code. This is usually quite easy to do.

On the other hand, doing it well typically requires a nearly complete
rewrite.

> > One can even argue that just the fact that multi-threaded programs are
> > less deterministic makes them significantly harder to debug in many
> > real world circumstances.

> Much harder, simply because one wastes so much time on runs that work.

You know, now that I think about it, this problem may be at least a
little bit overblown. Think like lock order reversals and data access
outside of a lock or with the wrong lock can now be detected in a run
even if there are no consequences in that run. If this continues, it
may soon not be that much harder to debug multi-threaded code due just
to its lack of deterministic function.

> One always needs to get code working reliably on a single thread first,
> which means - surprise! - one can't write code that /only/ works with
> multiple processors.

Huh?

> Using multiple threads on a single processor, for
> code that is CPU-limited, looses much too much performance, IME.

This is a common myth. If the code is CPU limited, then each thread
will use out its entire timeslice. The number of context switches will
be limited to the number of full scheduler timeslices. If one context
switch at the end of each full scheduler timeslice is a significant
performance loss, the OS is broken or mistuned.

A CPU-limit task will not experience any pre-emption. Context switches
can only be expensive (for CPU-limited tasks) if threads block or
yield while there is still work to be done. It makes no difference how
many threads are running provided blocks or yields while work remains
are minimized.

This is not a difficult thing to do, and in fact you have to work
pretty hard to make it a problem.

DS

Chris Thomasson

unread,

Jul 5, 2007, 10:19:35 AM7/5/07

to

http://groups.google.com/group/comp.arch/msg/54498fd15d7e2d82

John Dallman

unread,

Jul 5, 2007, 6:28:00 PM7/5/07

to

In article <1183624041.8...@g37g2000prf.googlegroups.com>,
dav...@webmaster.com (David Schwartz) wrote:

> j...@cix.co.uk (John Dallman) wrote:
> Well, it's easy to do it if you don't need to do it well. You can just
> put a huge lock around the existing complex code.

That seems not to allow improved performance for a single task with
multiple processors. And my customers want ever more speed.

> On the other hand, doing it well typically requires a nearly complete
> rewrite.

We're doing better than that, but it is very hard work.

> You know, now that I think about it, this problem may be at least a
> little bit overblown. Think like lock order reversals and data access
> outside of a lock or with the wrong lock can now be detected in a run
> even if there are no consequences in that run. If this continues, it
> may soon not be that much harder to debug multi-threaded code due just
> to its lack of deterministic function.

These things help, but with current technology (Intel ThreadChecker)
debugging is still very hard work. It's now readily distinguishable from
futile, but still much harder than single-thread debugging.

> > One always needs to get code working reliably on a single thread
> > first, which means - surprise! - one can't write code that /only/
> > works with multiple processors.
>
> Huh?

The product is a mathematical modelling library, which is portable
across a large number of platforms. Differences in locking are handled
with a small amount of platform-specific code in the internal locking
functions.

Some platforms out there still charge a large premium for multiple
processors, and there's also a lot of older hardware out there that
doesn't have multiple processors.

So you can't write code that absolutely relies on multiple processors
because on some platforms multiple processors are rare, and on all of
them, there are existing customers who don't have multi-processor
hardware, and if forced to buy it, may consider it cost-effective to
switch to a competitor's non-threaded product.

> > Using multiple threads on a single processor, for
> > code that is CPU-limited, looses much too much performance, IME.
>
> This is a common myth.

Last time we benchmarked, the "myth" seemed real. I'll try it again
next week, just for laughs. But please note that if code knows it is not
being multi-threaded, it doesn't have to acquire and free locks, which
takes time.

David Schwartz

unread,

Jul 5, 2007, 6:56:16 PM7/5/07

to

On Jul 5, 3:28 pm, j...@cix.co.uk (John Dallman) wrote:
> In article <1183624041.845596.323...@g37g2000prf.googlegroups.com>,

> dav...@webmaster.com (David Schwartz) wrote:
> > j...@cix.co.uk (John Dallman) wrote:
> > Well, it's easy to do it if you don't need to do it well. You can just
> > put a huge lock around the existing complex code.

> That seems not to allow improved performance for a single task with
> multiple processors. And my customers want ever more speed.

You always have to re-optimize for new processors if you want to get
the most speed possible.

> > On the other hand, doing it well typically requires a nearly complete
> > rewrite.

> We're doing better than that, but it is very hard work.

That is true.

> > You know, now that I think about it, this problem may be at least a
> > little bit overblown. Think like lock order reversals and data access
> > outside of a lock or with the wrong lock can now be detected in a run
> > even if there are no consequences in that run. If this continues, it
> > may soon not be that much harder to debug multi-threaded code due just
> > to its lack of deterministic function.

> These things help, but with current technology (Intel ThreadChecker)
> debugging is still very hard work. It's now readily distinguishable from
> futile, but still much harder than single-thread debugging.

Fair enough.

> > > One always needs to get code working reliably on a single thread
> > > first, which means - surprise! - one can't write code that /only/
> > > works with multiple processors.
>
> > Huh?

> The product is a mathematical modelling library, which is portable
> across a large number of platforms. Differences in locking are handled
> with a small amount of platform-specific code in the internal locking
> functions.
>
> Some platforms out there still charge a large premium for multiple
> processors, and there's also a lot of older hardware out there that
> doesn't have multiple processors.
>
> So you can't write code that absolutely relies on multiple processors
> because on some platforms multiple processors are rare, and on all of
> them, there are existing customers who don't have multi-processor
> hardware, and if forced to buy it, may consider it cost-effective to
> switch to a competitor's non-threaded product.

How do you write code that absolutely relies on multiple processors?
I'm not even sure how you could do that if we're taking about normal
user-space code. No matter how many CPUs are presented physically, you
are never assured you'll get more than one of them. Relying on
multiple processors is something you'd have to go out of your way to
do and would break even on machines that had them.

> > > Using multiple threads on a single processor, for
> > > code that is CPU-limited, looses much too much performance, IME.
>
> > This is a common myth.

> Last time we benchmarked, the "myth" seemed real. I'll try it again
> next week, just for laughs.

This is not an easy thing to benchmark. It's easy to get it wrong. If
you do, please post the results of the benchmark as well as as much
detail on how you did it as possible.

> But please note that if code knows it is not
> being multi-threaded, it doesn't have to acquire and free locks, which
> takes time.

You do have a trade-off. You can use very coarse locking and minimize
this cost or you can use very fine locking and maximize concurrency.
Theoretically, you could even make this dynamically tunable at run
time, but that's a lot of work. That's like everything else, if you
want to do it really well, it's tricky.

Hopefully your threading library has locks that are heavily optimized
for the uncontended case. In that case, the cost of acquiring the lock
can be very low.

Here's where hardware has been really improving though. The cost of
acquiring an uncontended read lock on Linux for a Core 2 Duo processor
is much less than doing so on a Pentium 4. (This is largely due to the
pipelines not being as deep rather than any specific improvement in
operations like 'LOCK XADD' or 'LOCK INC'.)

DS

John Dallman

unread,

Jul 6, 2007, 4:34:00 AM7/6/07

to

In article <1183676176.2...@z28g2000prd.googlegroups.com>,
dav...@webmaster.com (David Schwartz) wrote:

> > [BigLock] seems not to allow improved performance for a single

> > task with multiple processors. And my customers want ever more speed.
> You always have to re-optimize for new processors if you want to get
> the most speed possible.

There's a difference between "re-optimise" and "re-write". When you have
hundreds of thousands of lines of performance-sensitive code and the
customers require a /very/ high level of compatibility with the results
of previous versions, much care is needed.

> How do you write code that absolutely relies on multiple processors?
> I'm not even sure how you could do that if we're taking about normal
> user-space code.

True, make that "requires multiple processors to run at a decent speed".
A notable issue with this is lazy evaluation: not working things out
unless they're definitely required. If you have lots of processors and
threads, then lazy evaluation is less worthwhile, and introduces extra
flags that have to be checked to see if factor N has been evaluated,
which require locking to keep two threads from occasionally trying to do
it at the same time. But with a single processor, lazy evaluation is
very worthwhile. And with hundreds of sites for lazy evaluation, and
dependencies between them that aren't always obvious, it gets hard.

> > Last time we benchmarked, the "myth" seemed real. I'll try it again
> > next week, just for laughs.
>
> This is not an easy thing to benchmark. It's easy to get it wrong. If
> you do, please post the results of the benchmark as well as as much
> detail on how you did it as possible.

Actually, it's quite easy. Take a single-processor machine. Run the
benchmark without SMP, and with SMP turned on, with the flag that says
"yes I want two threads even though there's only one processor".

> > But please note that if code knows it is not being multi-threaded,
> > it doesn't have to acquire and free locks, which takes time.
>
> You do have a trade-off. You can use very coarse locking and minimize
> this cost or you can use very fine locking and maximize concurrency.

Actually, we usually find that there's a level at which introducing
locking into an algorithm is easiest in terms of the design. We discount
levels where the work to be done between locks is trivial; we try to
maximise concurrency because that allows more of the work to be done
with multiple threads.

There is a global flag for "threads are in use", and the macros that
wrap the calls to the thread interface check it and don't bother with no
threads in use. The sarcasm that would be administered to the first
developer to decide "I'll just turn this off for a sec" would probably
have them looking for a nice high window.

> Theoretically, you could even make this dynamically tunable at run
> time, but that's a lot of work.

And it doubles - or worse - the testing workload, which is best measured
in units of GHz-hours. The customers need consistency with previous
versions, and this is achieved by automated testing. Before we can
justify doing that, our SMP has to be more reliable.

> Hopefully your threading library has locks that are heavily optimized
> for the uncontended case.

Hopefully. We use the OS locks - with spinlocks where possible - having
realised that roll-your-own is a mug's game with lots of platforms to
support, unless you want to employ someone as a full-time locksmith and
get him examples of /every/ kind of hardware that the target OSes run on.

Eric Sosman

unread,

Jul 6, 2007, 9:18:36 AM7/6/07

to

John Dallman wrote:
> [... lots on another topic ...]

>
> And it doubles - or worse - the testing workload, which is best measured

> in units of GHz-hours. [...]

"GHz-hours" is dimensionless (1e9/s * 3600s = 3.6e12).
That makes it equivalent to other dimensionless units like
"radians," and I confess I'm unable to imagine how to measure
"testing workload" as an angle. Would you mind explaining
how you apply this peculiar-seeming unit?

--
Eric Sosman
eso...@ieee-dot-org.invalid

Chris Friesen

unread,

Jul 6, 2007, 11:50:42 AM7/6/07

to

Eric Sosman wrote:

> "GHz-hours" is dimensionless (1e9/s * 3600s = 3.6e12).

Not really. Because in the context of cpus it really means
"cycles/sec", not just "1/sec". So a GHz-hour is really an absolute
number of clock cycles.

Chris

rfis...@gmail.com

unread,

Jul 7, 2007, 12:29:43 AM7/7/07

to

Joe Seigh ha scritto:

> Seems to be no end of articles and blogs claiming we programmers aren't
> learning to parallel programs fast enough.
>
> http://www.regdeveloper.co.uk/2007/07/02/smith_parallel_coming/
> http://blogs.intel.com/research/2007/06/parallel_computing_making_sequ.html
>
> Someone should tell Intel it takes time to come up with new
> lemonade recipes. :) Intel doesn't seem to understand software
> or the software business. Period. Plus there seems to be no
> shortage of parallel programming types who believe that since
> they could easily program the "embarrasingly parallel" problems,
> parallel solutions are easy to come up with for all problems. Just us
> stupid programmers seem to be the problem I guess.

Joe: why not write the "no silver bullet" paper of parallel
programming?
It's worth at least 20 years of job security.

RF

Joe Seigh

unread,

Jul 7, 2007, 7:20:53 AM7/7/07

to

The orginal should be made into a movie (ala Groundhog Day). Every
programmer that saw it would say "Hey, that's my job!".

I'd say fork/join is the new hammer.
http://www.infoq.com/news/2007/07/concurrency-java-se-7
Get those nail redesigns ready. Learn to like data
structures that support random access efficently.

There will be some interesting problems of course. The
self inflicted ones are always the most interesting.

John Dallman

unread,

Jul 9, 2007, 3:04:00 PM7/9/07

to

In article <CsydnVWB7qTU3BPb...@comcast.com>,
eso...@ieee-dot-org.invalid (Eric Sosman) wrote:

> John Dallman wrote:
> > And it doubles - or worse - the testing workload, which is best
> > measured in units of GHz-hours. [...]
> "GHz-hours" is dimensionless (1e9/s * 3600s = 3.6e12).
> That makes it equivalent to other dimensionless units like
> "radians," and I confess I'm unable to imagine how to measure
> "testing workload" as an angle. Would you mind explaining
> how you apply this peculiar-seeming unit?

Multiply, don't divide. A GHz-hour is, loosely, the work done in an hour
on a 1GHz processor. The size of the GHz-hour varies according to the
specific processor architecture and implementation in use, of course.

Our overnight testing workload seems to be of the order of 500
GHz-hours, spread over about 50 machines. We don't use the unit every
day, because it's easier to think of "this test set is about 1.5 times
the size of that test set", but it's a useful unit for giving an
approximation to the overall workload. Doubling the SMP part of the
testing means you need to buy a bunch of new machines, and provide
space, power, cooling and network for them to run.

Eric Sosman

unread,

Jul 9, 2007, 3:40:59 PM7/9/07

to

John Dallman wrote On 07/09/07 15:04,:

> In article <CsydnVWB7qTU3BPb...@comcast.com>,
> eso...@ieee-dot-org.invalid (Eric Sosman) wrote:
>
>
>>John Dallman wrote:
>>
>>>And it doubles - or worse - the testing workload, which is best
>>>measured in units of GHz-hours. [...]
>>
>>"GHz-hours" is dimensionless (1e9/s * 3600s = 3.6e12).
>>That makes it equivalent to other dimensionless units like
>>"radians," and I confess I'm unable to imagine how to measure
>>"testing workload" as an angle. Would you mind explaining
>>how you apply this peculiar-seeming unit?
>
>
> Multiply, don't divide.

I *did* multiply, and did not divide. Where do you
see a division? (Note that the "/s" is part of the
definition of Hz, not an arithmetic operator.)

> A GHz-hour is, loosely, the work done in an hour
> on a 1GHz processor.

Sounds a lot like a MIPS-second times a scaling
factor, i.e., N instructions.

> The size of the GHz-hour varies according to the
> specific processor architecture and implementation in use, of course.
>
> Our overnight testing workload seems to be of the order of 500
> GHz-hours, spread over about 50 machines. We don't use the unit every
> day, because it's easier to think of "this test set is about 1.5 times
> the size of that test set", but it's a useful unit for giving an
> approximation to the overall workload. Doubling the SMP part of the
> testing means you need to buy a bunch of new machines, and provide
> space, power, cooling and network for them to run.

Chris Friesen's response (essentially, "GHz is a unit
misused by computer nerds") seems more comprehensible.
But thanks for the explanation anyhow.

--
Eric....@sun.com

Chris Vine

unread,

Jul 14, 2007, 7:50:43 PM7/14/07

to

On Mon, 2007-07-09 at 15:40 -0400, Eric Sosman wrote:
> John Dallman wrote On 07/09/07 15:04,:
> > In article <CsydnVWB7qTU3BPb...@comcast.com>,
> > eso...@ieee-dot-org.invalid (Eric Sosman) wrote:
> >
> >
> >>John Dallman wrote:
> >>
> >>>And it doubles - or worse - the testing workload, which is best
> >>>measured in units of GHz-hours. [...]
> >>
> >>"GHz-hours" is dimensionless (1e9/s * 3600s = 3.6e12).
> >>That makes it equivalent to other dimensionless units like
> >>"radians," and I confess I'm unable to imagine how to measure
> >>"testing workload" as an angle. Would you mind explaining
> >>how you apply this peculiar-seeming unit?
> >
> >
> > Multiply, don't divide.
>
> I *did* multiply, and did not divide. Where do you
> see a division? (Note that the "/s" is part of the
> definition of Hz, not an arithmetic operator.)

Yes, but the result is not dimensionless because the definition of 1GHz
is not 1e9/s, but 1e9 cycles/s. So the result has the dimension of
cycles (which as someone else has noted is loesely connected, by an
architecture dependent factor, on the number of instructions executed.)

Chris

--
To reply by e-mail, delete the --nospam--

Chris Vine

unread,

Jul 14, 2007, 7:53:28 PM7/14/07

to

On Mon, 2007-07-09 at 15:40 -0400, Eric Sosman wrote:

> John Dallman wrote On 07/09/07 15:04,:
> > In article <CsydnVWB7qTU3BPb...@comcast.com>,
> > eso...@ieee-dot-org.invalid (Eric Sosman) wrote:
> >
> >
> >>John Dallman wrote:
> >>
> >>>And it doubles - or worse - the testing workload, which is best
> >>>measured in units of GHz-hours. [...]
> >>
> >>"GHz-hours" is dimensionless (1e9/s * 3600s = 3.6e12).
> >>That makes it equivalent to other dimensionless units like
> >>"radians," and I confess I'm unable to imagine how to measure
> >>"testing workload" as an angle. Would you mind explaining
> >>how you apply this peculiar-seeming unit?
> >
> >
> > Multiply, don't divide.
>
> I *did* multiply, and did not divide. Where do you
> see a division? (Note that the "/s" is part of the
> definition of Hz, not an arithmetic operator.)

Yes, but the result is not dimensionless because the definition of 1GHz

John Dallman

unread,

Jul 23, 2007, 3:10:00 PM7/23/07

to

In article <1183676176.2...@z28g2000prd.googlegroups.com>,
dav...@webmaster.com (David Schwartz) wrote, at the end of an exchange
with me:

> > > > Using multiple threads on a single processor, for
> > > > code that is CPU-limited, looses much too much performance, IME.
> > > This is a common myth.
> > Last time we benchmarked, the "myth" seemed real. I'll try it again
> > next week, just for laughs.
> This is not an easy thing to benchmark. It's easy to get it wrong. If
> you do, please post the results of the benchmark as well as as much
> detail on how you did it as possible.

I got a chance to do that today. Core 2 Duo 2.133GHz, Windows XP Service
Pack 2, up-to-date with this month's patches. 2GB ram, which is plenty
for the job with no swapping. The test program runs from the command line
with no GUI; the script that runs the tests ran each one three times in
succession, to get all data files into cache. Time measurement was
elapsed time for the third run of the whole process from startup to
shutdown, using the timethis.exe tool from the Windows Resource Kit.

For single-thread, I simply didn't turn on threading in the code: when
you don't turn on threading, it doesn't do locking. The flow of control
still somewhat resembles threading, in that instead of spawning a bunch
of threads to work their way through data, you just call the function
that those threads run as their top-level code. With one heavyweight
thread running on a dual-core processor under Windows, you get a
bounce-back-and-forth effect, but the performance loss is small - under
1%.

For two threads, I got slightly devious, but I think my results work.
The normal threading mode is "count processors at initialisation, use
that many threads in multi-threaded code". So that was two threads. I
forced them to run on the same processor core by making the other core
busy with a cpu-bound task, with its affinity set to lock it to that
core. It was running at higher priority than my test app, which seems to
prevent any lower-priority thread being scheduled, given that there's
another core that's accepting lower-priority tasks.

With all that, I got between 16% and 3% slowdowns for my various SMP
testcases running them on one CPU with two threads, as compared to
running them single-threaded. The smallest slowdowns are, in general, on
the tests that scale best to a larger number of threads, which gives me
some confidence that they're correct.

The penalty is not huge, but it is not negligible either. It isn't one
we're keen to pay.