# Are we reaching the maximum of CPU performance ?

8 views

### Nicolaas Vroom

Nov 7, 2010, 12:18:58 PM11/7/10
to
I could also have raised the following question:
Which is currently the best performing CPU ?
This question is important for all people who try to simulate
physical systems.
In my case this is the movement of 7 planets around the Sun
using Newton's Law
I have tested 4 different types of CPU's.
In order to compare I use the same VB program.
In order to test performance the program
calculates the number of years in 1 minute.
a) In the case of a Intel Pentium R II processor I get 0.825 years
b) Genuineltel X86 Family 6 Model 8 I get 1.542 years
c) Intel Pentium R 4 CPU 2.8 GHZ I get 6 years
This one is roughly 8 years old. Load 100%
d) Intel Quad Core i5 M460 I get 3.9 years. Load 27%

In the last case I can also load the program 4 times.
On average I get 2.6 years. Load 100%

Does this mean that we have reached the top of CPU
performance ?
Ofcourse I can rewrite my program in a different language.
Maybe than my program runs faster but that does not
solve the issue.
I did not test a Dual Core, may be those are faster.
Of my Pentium R 4 CPU there apparently exists
a 3.3 GHZ version. Is that the solution?

Any suggestion what to do ?

Nicolaas Vroom
http://users.telenet.be/nicvroom/galaxy-mercury.htm

### Dirk Bruere at NeoPax

Nov 7, 2010, 10:09:09 PM11/7/10
to

Use a top end graphics card and something like CUDA by nVidia.
That will give you up to a 100x increase in computing power to around
1TFLOPS

http://en.wikipedia.org/wiki/CUDA

Apart from that, processing power is still increasing by around 2 orders
of magnitude per decade. If that's not enough the DARPA exascale
computer is due around 2018 (10^18 FLOPS).

--
Dirk

http://www.transcendence.me.uk/ - Transcendence UK
http://www.blogtalkradio.com/onetribe - Occult Talk Show

[[Mod. note -- Sequential (single-thread) CPU speed has indeed roughly
plateaued since 2005 or so, at roughly 3 GHz and 4 instructions-per-cycle
out-of-order. More recent progress has been
* bigger and bigger caches [transparent to the programmer]
* some small increases in memory bandwidth [transparent to the progrrammer]
* lots of parallelism [NOT transparent to the programmer]

Parallelism today comes in many flavors [almost all of which must be
explicitly managed by the programmer]:
* lots of CPU chips are "multicore" and/or "multithreaded"
* lots of computer systems incorporate multiple CPUs
[this includes the graphics-card processors mentioned in this posting]

*If* your application is such that it can be (re)programmed to use
many individual processors working in parallel, then parallelism can
be very useful. Alas, small-N N-body simulations of the type
discussed by the original poster are notoriously hard to parallelize. :(

I'd also like to point out the existence of the newsgroup comp.arch
(unmoderated), devoted to discussions of computer architecture.
-- jt]]

### Giorgio Pastore

Nov 7, 2010, 10:17:53 PM11/7/10
to
On 11/7/10 6:18 PM, Nicolaas Vroom wrote:
...

> In my case this is the movement of 7 planets around the Sun
> using Newton's Law
...

> In order to compare I use the same VB program.
...

> Any suggestion what to do ?

You said nothing about the algorithm you implemented in VB. If you chose
the wrong algorithm you may may waist a lot of CPU time.

Giorgio

[[Mod. note -- If your goal is actually to study physics via N-body
simulations of this sort, there's a considerable literature on clever
numerical methods/codes which are very accurate and efficient. A good
starting point to learn more might be
http://www.amara.com/papers/nbody.html
which has lots of references. It discusses both the small-N and
large-N cases (which require vastly different sorts of numerical
methods, and have very different accuracy/cost/fidelity tradeoffs).

A recent paper of interest (describing a set of simulations of Sun +
8 planets + Pluto + time-averaged Moon + approximate general relativistic
effects) is
J. Laskar & M. Gastineau
"Esistence of collisional trajectories of Mercury, Mars, and Venus
with the Earth"
Nature volume 459, 11 June 2009, pages 817--819
http://dx.doi.org/10.1038/nature08096
-- jt]]

### Lester Welch

Nov 8, 2010, 5:52:03 AM11/8/10
to

Isn't the OP comparing CPU's - not algorithms? An inefficient
algorithm run on a multitude of CPUs is a fair test of the CPUs is it
not?

### Nicolaas Vroom

Nov 8, 2010, 4:01:16 PM11/8/10
to

"Dirk Bruere at NeoPax" <dirk....@gmail.com> schreef in bericht
news:8jo9ku...@mid.individual.net...

> On 07/11/2010 17:18, Nicolaas Vroom wrote:
>> a) In the case of a Intel Pentium R II processor I get 0.825 years
>> b) Genuineltel X86 Family 6 Model 8 I get 1.542 years
>> c) Intel Pentium R 4 CPU 2.8 GHZ I get 6 years
>> This one is roughly 8 years old. Load 100%
>> d) Intel Quad Core i5 M460 I get 3.9 years. Load 27%
>>
>>
>> Does this mean that we have reached the top of CPU
>> performance ?
>> Ofcourse I can rewrite my program in a different language.
>> Maybe than my program runs faster but that does not
>> solve the issue.
>> I did not test a Dual Core, may be those are faster.
>> Of my Pentium R 4 CPU there apparently exists
>> a 3.3 GHZ version. Is that the solution?
>>
>> Any suggestion what to do ?
>
> Use a top end graphics card and something like CUDA by nVidia.
> That will give you up to a 100x increase in computing power to around
> 1TFLOPS

That is the same suggestion as my CPU vendor told me.
But this solution requires reprogramming and that is not what I want.
I also do not want to reprogram in C++

> http://en.wikipedia.org/wiki/CUDA
>
> --
> Dirk

>
> [[Mod. note -- Sequential (single-thread) CPU speed has indeed roughly
> plateaued since 2005 or so, at roughly 3 GHz and 4 instructions-per-cycle
> out-of-order. More recent progress has been

I did not know this. I was truelly amased by the results of my tests.

> * bigger and bigger caches [transparent to the programmer]
> * some small increases in memory bandwidth [transparent to the
> progrrammer]
> * lots of parallelism [NOT transparent to the programmer]
>
> Parallelism today comes in many flavors [almost all of which must be
> explicitly managed by the programmer]:
> * lots of CPU chips are "multicore" and/or "multithreaded"
> * lots of computer systems incorporate multiple CPUs
> [this includes the graphics-card processors mentioned in this posting]
>
> *If* your application is such that it can be (re)programmed to use
> many individual processors working in parallel, then parallelism can
> be very useful.

I agree
This is for example true for a game like chess.
SETI falls in this category.

> Alas, small-N N-body simulations of the type
> discussed by the original poster are notoriously hard to parallelize. :(

That is correct.
But there is an inpact for all simulations of allmost physical systems
where dependency is an issue.

> I'd also like to point out the existence of the newsgroup comp.arch
> (unmoderated), devoted to discussions of computer architecture.
> -- jt]]

Thanks for the comments.

Nicolaas Vroom

### Dirk Bruere at NeoPax

Nov 8, 2010, 4:28:06 PM11/8/10
to
On 08/11/2010 10:52, Lester Welch wrote:

> Isn't the OP comparing CPU's - not algorithms? An inefficient
> algorithm run on a multitude of CPUs is a fair test of the CPUs is it
> not?

Not necessarily if the algorithm is playing to a common weakness rather
than the increasing strengths of modern CPUs. For example, if the
algorithm requires significant HDD access, or overflows the onchip
cache, or simply requires more memory than is placed in the motherboard
and has to use a pagefile.

### Nicolaas Vroom

Nov 8, 2010, 4:28:07 PM11/8/10
to
"Dirk Bruere at NeoPax" <dirk....@gmail.com> schreef in bericht
news:8jo9ku...@mid.individual.net...

### Arnold Neumaier

Nov 8, 2010, 4:28:07 PM11/8/10
to
Nicolaas Vroom wrote:
>
> Does this mean that we have reached the top of CPU
> performance ?

We reached the top of CPU performance/core.

Performance is still increasing but by using slower
multiple core architectures and balancing the load.
This is the curent trend, and is likely to be so in the future.

> Ofcourse I can rewrite my program in a different language.
> Maybe than my program runs faster but that does not
> solve the issue.
> I did not test a Dual Core, may be those are faster.
> Of my Pentium R 4 CPU there apparently exists
> a 3.3 GHZ version. Is that the solution?
>
> Any suggestion what to do ?

You need to use multiple cores.
If you are purely sequential code, you are unlucky.
Computers will become _slower_ for these.

### Nicolaas Vroom

Nov 8, 2010, 10:05:13 PM11/8/10
to
"Giorgio Pastore" <pas...@units.it> schreef in bericht
news:4cd6fcf1$0$40011$4faf...@reader3.news.tin.it... > On 11/7/10 6:18 PM, Nicolaas Vroom wrote: > ... >> In my case this is the movement of 7 planets around the Sun >> using Newton's Law > ... >> In order to compare I use the same VB program. > ... >> Any suggestion what to do ? > > > You said nothing about the algorithm you implemented in VB. If you chose > the wrong algorithm you may may waist a lot of CPU time. > > Giorgio My goal is not to find the cleverst algorithm. My goal is to find the best CPU using the same program. What my tests show is that single cores are the fastest. However there is a small chance that there are dual cores which are faster, but I do not know if that is actual true. Nicolaas Vroom ### Hans Aberg unread, Nov 8, 2010, 10:13:14 PM11/8/10 to On 2010/11/08 22:28, Arnold Neumaier wrote: > Nicolaas Vroom wrote: >> >> Does this mean that we have reached the top of CPU >> performance ? > > We reached the top of CPU performance/core. > > Performance is still increasing but by using slower > multiple core architectures and balancing the load. > This is the curent trend, and is likely to be so in the future. There is no technological limitation going into higher frequencies, but energy consumption is higher than linear. So using parallelism at lower frequencies requires less energy, and is easier to cool. [[Mod. note -- Actually there are very difficult technological obstacles to increasing CPU clock rates. Historically, surmouting these obstacles has required lots of very clever electrical engineering and applied physics (and huge amounts of money). The result of this effort is that historically each successive generation of semiconductor-fabrication process has been very roughly ~1/3 faster than the previous generation (i.e., all other things being equal, each successive generation allows roughly a 50% increase in clock frequency), and uses roughly 1/2 to 2/3 the power at a given clock frequency. Alas, power generally increases at least quadratically with clock frequency. -- jt]] ### Juan R. González-Álvarez unread, Nov 9, 2010, 11:02:48 PM11/9/10 to Hans Aberg wrote on Mon, 08 Nov 2010 22:13:14 -0500: > On 2010/11/08 22:28, Arnold Neumaier wrote: >> Nicolaas Vroom wrote: >>> >>> Does this mean that we have reached the top of CPU performance ? >> >> We reached the top of CPU performance/core. >> >> Performance is still increasing but by using slower multiple core >> architectures and balancing the load. This is the curent trend, and >> is likely to be so in the future. > > There is no technological limitation going into higher frequencies, > but energy consumption is higher than linear. So using parallelism at > lower frequencies requires less energy, and is easier to cool. Currently there is stronger technological limitations (reflected in the current frequency limits for the CPUs that you can buy). And we are close to the physical limits for the current technology (that is why optical or quantum computers are an active research topic). You are right that energy consumption is not linear for increasing frecuencies, but neither is for *algorithmic* parallelism. Duplicating the number of cores roughly increase energetic consumption by a factor of 2, but not the algorithmic power. There is many variables to consider here but you would get about a 10-30% gain over a single Intel core. For obtaining double algorithmic power you would maybe need 8x cores, doing the real power consumption non-linear again. And I doubt that VB software of the original poster can use all the cores in any decent way. ### Hans Aberg unread, Nov 10, 2010, 12:45:50 PM11/10/10 to On 2010/11/10 05:02, Juan R. González-Álvarez wrote: >>>> Does this mean that we have reached the top of CPU performance ? >>> >>> We reached the top of CPU performance/core. >>> >>> Performance is still increasing but by using slower multiple core >>> architectures and balancing the load. This is the curent trend, and >>> is likely to be so in the future. >> >> There is no technological limitation going into higher frequencies, >> but energy consumption is higher than linear. So using parallelism at >> lower frequencies requires less energy, and is easier to cool. > > Currently there is stronger technological limitations (reflected in the > current frequency limits for the CPUs that you can buy). The fastest you can buy is over 5 Ghz, which sits i a mainframe (see WP "Clock rate" article). So if energy consumption and and heat dissipation wasn't a problem, you could have it in your laptop. > And we are > close to the physical limits for the current technology (that is why > optical or quantum computers are an active research topic). Don't hold your breath for the latter. > You are right that energy consumption is not linear for increasing > frecuencies, but neither is for *algorithmic* parallelism. Duplicating > the number of cores roughly increase energetic consumption by a factor > of 2, but not the algorithmic power. There is many variables to consider > here but you would get about a 10-30% gain over a single Intel core. That's why much parallel capacity currently moves into the GPU, where it is needed and is fairly easy to parallelize algorithms. > For obtaining double algorithmic power you would maybe need 8x cores, > doing the real power consumption non-linear again. And I doubt that > VB software of the original poster can use all the cores in any decent > way. Classical computer languages are made for single threading, so making full parallelization isn't easy. ### Hans Aberg unread, Nov 11, 2010, 3:11:33 AM11/11/10 to On 2010/11/08 22:01, Nicolaas Vroom wrote: >> Use a top end graphics card and something like CUDA by nVidia. >> That will give you up to a 100x increase in computing power to around >> 1TFLOPS > > That is the same suggestion as my CPU vendor told me. > But this solution requires reprogramming and that is not what I want. > I also do not want to reprogram in C++ I pointed out OpenCL in post that has not yet appeared. See WP article, which has an example. ### Hans Aberg unread, Nov 11, 2010, 3:11:32 AM11/11/10 to On 2010/11/07 18:18, Nicolaas Vroom wrote: > d) Intel Quad Core i5 M460 I get 3.9 years. Load 27% > > In the last case I can also load the program 4 times. > On average I get 2.6 years. Load 100% If you only get about a quarter of full load on a quad core, perhaps you haven't threaded it. Here is an OpenCL example implementing a FFT, which using a rather old GPU gets 144 Gflops (see reference 27): http://en.wikipedia.org/wiki/OpenCL ### Arnold Neumaier unread, Nov 11, 2010, 11:13:49 AM11/11/10 to Juan R. Gonz?lez-?lvarez wrote: > Hans Aberg wrote on Mon, 08 Nov 2010 22:13:14 -0500: > >> On 2010/11/08 22:28, Arnold Neumaier wrote: >>> Nicolaas Vroom wrote: >>>> Does this mean that we have reached the top of CPU performance ? >>> We reached the top of CPU performance/core. >>> >>> Performance is still increasing but by using slower multiple core >>> architectures and balancing the load. This is the curent trend, and >>> is likely to be so in the future. >> There is no technological limitation going into higher frequencies, >> but energy consumption is higher than linear. So using parallelism at >> lower frequencies requires less energy, and is easier to cool. > > Currently there is stronger technological limitations (reflected in the > current frequency limits for the CPUs that you can buy). And we are > close to the physical limits for the current technology (that is why > optical or quantum computers are an active research topic). > > You are right that energy consumption is not linear for increasing > frecuencies, but neither is for *algorithmic* parallelism. Duplicating > the number of cores roughly increase energetic consumption by a factor > of 2, but not the algorithmic power. There is many variables to consider > here but you would get about a 10-30% gain over a single Intel core. > > For obtaining double algorithmic power you would maybe need 8x cores, This depends very much on the algorithm. Some algorithms (e.g., Monte Carlo simulation) can be trivially parallelized, so the factor is only 2x. For many othe cases, n times the alg. speed needs a number of cores that grows slowly with n. And there are algorithms that one can hardly spped up no matter how many cores one uses. ### Juan R. unread, Nov 12, 2010, 4:14:32 AM11/12/10 to Hans Aberg wrote on Wed, 10 Nov 2010 18:45:50 +0100: > On 2010/11/10 05:02, Juan R. González-Álvarez wrote: >>>>> Does this mean that we have reached the top of CPU performance ? >>>> >>>> We reached the top of CPU performance/core. >>>> >>>> Performance is still increasing but by using slower multiple core >>>> architectures and balancing the load. This is the curent trend, and >>>> is likely to be so in the future. >>> >>> There is no technological limitation going into higher frequencies, >>> but energy consumption is higher than linear. So using parallelism >>> at lower frequencies requires less energy, and is easier to cool. >> >> Currently there is stronger technological limitations (reflected in >> the current frequency limits for the CPUs that you can buy). > > The fastest you can buy is over 5 Ghz, which sits i a mainframe (see > WP "Clock rate" article). So if energy consumption and and heat > dissipation wasn't a problem, you could have it in your laptop. The last worldwide record that I know was 8.20 Ghz overclocked. >> And we are >> close to the physical limits for the current technology (that is why >> optical or quantum computers are an active research topic). > > Don't hold your breath for the latter. Why? ### JohnF unread, Nov 12, 2010, 4:14:31 AM11/12/10 to Arnold Neumaier <Arnold....@univie.ac.at> wrote: > Juan R. Gonzalez-Alvarez wrote: >>> <snip> >> >> For obtaining double algorithmic power you would maybe >> need 8x cores, > > This depends very much on the algorithm. > Some algorithms (e.g., Monte Carlo simulation) > can be trivially parallelized, so the factor is only 2x. > For many othe cases, n times the alg. speed needs a number > of cores that grows slowly with n. And there are algorithms > that one can hardly spped up no matter how many cores one uses. That's typically true, but just for completeness, you can google "super-linear speedup" for examples of parallelized calculations that speed up faster than linearly with the number of cores. There are several short instructional examples that make it intuitively clear why this should be possible, but, sorry, I can't find links offhand. -- John Forkosh ( mailto: j...@f.com where j=john and f=forkosh ) ### Arnold Neumaier unread, Nov 12, 2010, 10:50:28 AM11/12/10 to Yes, it means that one has an inefficient serial algorithm. Any serial algorithm with superlinear speedup can be rewritten into a faster serial algorithm where the speedup is only linear. ### Surfer unread, Nov 12, 2010, 4:32:08 PM11/12/10 to On Wed, 10 Nov 2010 18:45:50 +0100 (CET), Hans Aberg <haber...@telia.com> wrote: > >The fastest you can buy is over 5 Ghz, which sits i a mainframe (see WP >"Clock rate" article). So if energy consumption and and heat dissipation >wasn't a problem, you could have it in your laptop. > By coincidence, I just saw a TV program about uses of diamond, including for semi-conductors. That prompted me to search for articles. I found the following. This paper dated June 2010 (in Japanese) discusses diamond field effect transistors http://www.ntt.co.jp/journal/1006/files/jn201006014.pdf Fig 1 suggests that gallium arsenide and diamond FETs could be made to operate up to 100s of GHz. This article describes successful manufacture of inch size diamond wafers http://www.nanowerk.com/news/newsid=15514.php The above and the high thermal conductivity of diamond implies that it could be used to build much faster CPUs. ### Dirk Bruere at NeoPax unread, Nov 13, 2010, 3:04:36 AM11/13/10 to I think almost all of the progress over the next 10 years is going to be Systems on Chip, multiple cores, power efficiency and parallel programming (esp at the OS level). After that fairly radical new tech will start to appear (memristors, graphene etc) and the clock rates will climb again. One interesting question is how much of a need is there for high clock speed serial processing? Most realworld apps can benefit from the parallel approach and do not really need a serial speedup, except to make the programming easier. Can anyone name any major serial-only applications? ### Juan R. González-Álvarez unread, Nov 15, 2010, 12:39:45 PM11/15/10 to Arnold Neumaier wrote on Thu, 11 Nov 2010 11:13:49 -0500: > Juan R. Gonz?lez-?lvarez wrote: >> Hans Aberg wrote on Mon, 08 Nov 2010 22:13:14 -0500: >> >>> On 2010/11/08 22:28, Arnold Neumaier wrote: >>>> Nicolaas Vroom wrote: (...) >> You are right that energy consumption is not linear for increasing >> frecuencies, but neither is for *algorithmic* parallelism. >> Duplicating the number of cores roughly increase energetic >> consumption by a factor of 2, but not the algorithmic power. There >> is many variables to consider here but you would get about a 10-30% >> gain over a single Intel core. >> >> For obtaining double algorithmic power you would maybe need 8x cores, > > This depends very much on the algorithm. > > Some algorithms (e.g., Monte Carlo simulation) can be trivially > parallelized, so the factor is only 2x. For many othe cases, n times > the alg. speed needs a number of cores that grows slowly with n. And > there are algorithms that one can hardly spped up no matter how many > cores one uses. As said "There is many variables to consider": algorithms, Operative System, CPU design (including node topology, size and speed of caches), motherboard... Correct me if wrong but the OP is not looking for a Monte Carlo in C++ of Fortran but for celestial dynamics using VB. I maintain my estimation that he would need many cores to get a 2x speed. ### Hans Aberg unread, Nov 15, 2010, 12:44:53 PM11/15/10 to On 2010/11/12 10:14, Juan R. Gonz?lez-?lvarez wrote: >>>> There is no technological limitation going into higher frequencies, >>>> but energy consumption is higher than linear. So using parallelism >>>> at lower frequencies requires less energy, and is easier to cool. >>> >>> Currently there is stronger technological limitations (reflected in >>> the current frequency limits for the CPUs that you can buy). >> >> The fastest you can buy is over 5 Ghz, which sits i a mainframe (see >> WP "Clock rate" article). So if energy consumption and and heat >> dissipation wasn't a problem, you could have it in your laptop. > > The last worldwide record that I know was 8.20 Ghz overclocked. At the end of of the manufacturing process, one puts the CPU through tests, and it gets marked at a frequency passing those. When overclocking, one expects certain misses, which may not be reliable. [[Mod. note -- Testing often covers a 2-D matrix of frequency and power supply voltage. A plot of test results (often color-coded for ok/bad) against these variables is often called a "Shmoo plot". See http://www.computer.org/portal/web/csdl/doi/10.1109/TEST.1996.557162 for a nice discussion. -- jt]] I computed Moore's law for frequency data from a personal computer manufacture (Apple), and it was a doubling every three years since the first Mac, whereas for RAM it is every second year (for hard disks every year). So if one can combine those, it gives five doublings every 6 years in overall capacity, or nearly one every year. But anyway, there was a Apple Developer video a few years ago describing the energy gains when clocking down and adding cores, important in these consumer devices. >>> And we are >>> close to the physical limits for the current technology (that is why >>> optical or quantum computers are an active research topic). >> >> Don't hold your breath for the latter. > > Why? Because they haven't been made yet, except possibly in rudimentary form. Here is a video: http://vimeo.com/14711919 ### Nicolaas Vroom unread, Nov 15, 2010, 12:46:00 PM11/15/10 to "Hans Aberg" <haber...@telia.com> schreef in bericht news:ib6r86$6m8$1...@news.eternal-september.org... > On 2010/11/07 18:18, Nicolaas Vroom wrote: >> d) Intel Quad Core i5 M460 I get 3.9 years. Load 27% >> >> In the last case I can also load the program 4 times. >> On average I get 2.6 years. Load 100% > > If you only get about a quarter of full load on a quad core, perhaps you > haven't threaded it. That is correct. What I need are threads which run in different processors. The question is is all the work (assuming the program runs correct) worthwhile the effort i.e. will the final program run faster The biggest problem is the communication between the treads. This should be some common type of memory, which all the processors should be able to address. I do not know if that is possible using Visual Basic. > Here is an OpenCL example implementing a FFT, which > using a rather old GPU gets 144 Gflops (see reference 27): > http://en.wikipedia.org/wiki/OpenCL When you look to that example it is terrible complex. You also get the message if you study this: http://www.openmp.org/presentations/miguel/F95_OpenMPv1_v2.pdf In my case the program looks like: Do Synchronise 0 ' Communication For i = 1 to 5 ' Activate task 1 * a = 0 * For j = 1 to 5 * d = x(i) - x(j) ' Calculate distance * a = a + 1/(d*d) ' Calculate acceleration * Next j * v(i) = a * dt ' Calculate speed next i Synchronise 1 ' Wait all threads finished For i = 1 to 5 ' Activate task 2$ x(i) = x(i) + v(i) * dt ' Calculate positions
Next i
Synchronise 2 ' Wait all threads finished
Loop

There are two parts you can do in parallel i.e. in threads.
This is the starting with * and with \$.
Each thread requires three parameters: i, task number(1 or 2)
and a communication parameter (start active finished)
To make communication easier you should dedicate each thread
to to specific i values. Thread 1 to i = 1 and 5 etc.
Again the biggest problem are the data base variables x, v and dt

http://users.pandora.be/nicvroom/

Nicolaas Vroom

[[Mod. note -- The newsgroup comp.parallel might also be of interest
here. -- jt]]

### Arnold Neumaier

Nov 15, 2010, 12:46:10 PM11/15/10
to
Dirk Bruere at NeoPax wrote:

> One interesting question is how much of a need is there for high clock
> speed serial processing? Most realworld apps can benefit from the
> parallel approach and do not really need a serial speedup, except to
> make the programming easier.
>
> Can anyone name any major serial-only applications?

Large-scale eigenvalue computations for multiparticle Hamiltonians
are difficult to parallelize.

### Dirk Bruere at NeoPax

Nov 16, 2010, 4:15:20 PM11/16/10
to

Well, I don't think Wintel is going to lose much sleep over not being
able to grab that market niche :-)

I was thinking more about possible mass market (future?) apps that could
force up clock speeds again. Graphics and gaming physics engines are
clearly *very* capable of being parallelized.

[Moderator's note: Note that some high-performance scientific computing
is done with graphics cards, since that is where the most
number-crunching ability is these days. Further posts should make sure
there is enough physics content as this thread is rapidly becoming off-
topic for the group. -P.H.]

### BW

Nov 18, 2010, 1:55:49 AM11/18/10
to
On Nov 16, 10:15 pm, Dirk Bruere at NeoPax <dirk.bru...@gmail.com>
wrote:

> I was thinking more about possible mass market (future?) apps that could
> force up clock speeds again. Graphics and gaming physics engines are
> clearly *very* capable of being parallelized.
>
> [Moderator's note:  Note that some high-performance scientific computing
> is done with graphics cards, since that is where the most
> number-crunching ability is these days.  Further posts should make sure
> there is enough physics content as this thread is rapidly becoming off-
> topic for the group.  -P.H.]

Actually, there is a very interesting parallel (haha..) between the
ability to parallelize physics simulation code and the locality of
real-world physics.

Parallelizing computer algorithms, either on GPUs or especially when
using supercomputer clusters, *very heavily* relies on either very
fast
inter-core communications or algorithms where the individual threads
don't have to communicate very much. Making chips and boards that
compute in teraflops is no problem - communicating the data is.

This more or less implies that the underlying data which a simulation
algorithm trying to simulate the real world has to use, should be as
local as possible to the interactions or transactions which the
algorithm encodes at the lowest levels, to run quickly.

The fun thing is, this is what the last 100 years of physics
exploration seem to have discovered too - that when interactions are
executed they depend only on the absolute nearest region (in a pure
QFT without external classic fields contaminating it).

You can speculate if nature did this to simplify simulation of itself,
but it sure is a nice fact because you can perfectly well conceive of
any number of working realities where interactions are not local - or
in computer code, where calculations would require direct inputs from
any number of non-neighboring compute nodes.

Note that I'm not talking about solving differential equations or
finding eigenvalues in huge matrices here, but of how the world
computes at the most basic level. I do understand that in practice
this is not how you usually simulate physical systems. Also there is
the little complication of the multiple-paths or worlds... but other
than that, local interactions should be good in the long run for
computational physics :)

Best regards,
/Bjorn

### Dirk Bruere at NeoPax

Nov 18, 2010, 3:47:04 AM11/18/10
to

To return it to being on-topic...
If physics is regarded as a computation, how is the parallelism
synchronized?

### Nicolaas Vroom

Nov 18, 2010, 3:47:34 AM11/18/10
to
"Arnold Neumaier" <Arnold....@univie.ac.at> schreef in bericht
news:4CDEE982...@univie.ac.at...

The whole issue if systems can be simulated by a parallel approach
depents if they can de divided in subsystems which are independent.
Independent subsystems can de simulated (calculated) in parallel.
The second parameter is time.

The impossible to parallize systems to parallelize are systems
were quantum mechanism is involved i.e. superpositions
and entanglement, because (i hope I describe this correct)
the whole system is all its states (depending about the number
of Qbits involved) at once.

The second most difficult system are analog computers
depending about the time involved.
The simplest is an oscillator (wave) which requires
two integrators. The problem is the two are connected
in a loop: the output of the first is input two the second
and visa versa.
You can implement (simulate) each integrator in a separate
processor but the communication between the two is such
that such a solution using only one processor is much
better. Better in the sense of the calculation time to find
the solution of a certain system time.

This stil leaves me which the question: Are there dual
processors in the market which outperform single
processor systems, assuming that in the dual proccessor
system my simulation solely runs in one processor and all
the rest in the other one ? (outperform in the sense
of # of revolutions per minute calulation time)

Nicolaas Vroom

### Joseph Warner

Nov 18, 2010, 3:48:35 AM11/18/10
to
"Dirk Bruere at NeoPax" <dirk....@gmail.com> wrote in message
news:8kf6je...@mid.individual.net...

> On 15/11/2010 17:46, Arnold Neumaier wrote:
>> at NeoPax wrote:
>>Dirk Bruere

>>> One interesting question is how much of a need is there for
>>> high clock
>>> speed serial processing? Most realworld apps can benefit from
>>> the
>>> parallel approach and do not really need a serial speedup,
>>> except to
>>> make the programming easier.

To get on to the topic again.

Is there prospect of increasing the maximum of CPU performance.
Yes there is. Intel had processes that had speeds above 4 GHz
before they moved to the current circuit architecture and this
using silicon. The line-width of the day was not small enough to
advantage of ballistic electrons across the gate. Small line
widths should be able to do that. One limiting factor in the
processing speed is the speed of the electrons going through the
gate. In standard terms the higher the mobility of the electron
the faster the transistor can respond for a give gate length.

One way to increase the mobility of the electron or lower the
transit time an electron spends in the gate area is to change
material. That is easy said than done but if GaAs crystals had
the same density of imperfections as Si then faster chips can be
made. The mobility in GaAs is ~ 3 to 4 times that of Si. Even
better yet is to use material structures such as GaAs/AlGaAs to
form quantum wells where the electrons do not appreciable scatter
when in the well for the material underneath the gate. Then
mobility between 10,000 to 100,000 are achievable. For the
material with the upper mobility though the current density is
low and the mobility falls of rapidly with temperature; therefore
keeping the transistors cooled through after cooling will help.
But alas, this materials defect density doesn't allow one to have
the scale of integration that Si presently has.

Other material systems that may be more practical is Si/SiGe. It
can form shallow quantum wells but can increase the mobility of
the electrons. In addition, devices made from them are more
radiation hard than just pure Si. Presently, I think Intel and
AMD do use Si/SiGe in there present day chips. To take better
advantages of this material one needs to change the transistor
from a FET transistor to a Bi-polar hetrojunction transistor.

On the horizon there is the graphene processor.
People are still looking into computers using Josephson devices.
In these it may be able to get speed approaching 100 GHz but the
complexity of the system is large and the density of the
junctions are still small.

### JohnF

Nov 19, 2010, 4:36:31 AM11/19/10
to
BW <bjorn...@gmail.com> wrote:
> You can speculate if nature did this to simplify simulation of itself,
> [...]
> Note that I'm not talking about solving differential equations,

> but of how the world computes at the most basic level.
> /Bjorn

The idea that "the world computes" (at any level) seems widespread
and wrong to me. Math is just our way of organizing and describing
observable behavior (which we call physics). But the world knows
nothing of math, or of our organization of its behavior.

For the simplest example, suppose you have a compass and straight
edge. And suppose you draw a circle of radius R with the compass,
using a pencil that draws lines of width w and thickness t.
Does the world "compute" the volume of graphite laid down as 2piRwt?
No. In the world, that's just what "happens". We're the only ones
doing any computing here.

Next consider, say, a vibrating string. Does the world "compute"
solutions to the one-dimensional wave equation, or even instantaneous
forces and motions? Or does it just "happen" in a more complicated
but analogous way to the drawn circle?

Likewise with, say, similarly simple and easily derived differential
equations for radiative transfer, hydrostatic equilibrium, etc, etc.
The world doesn't need to compute anything to produce these behaviors.
It's we who need to compute, just to answer our own questions about
why we see the world behave the way it does.

Is there some level below which the world's behavior becomes
fundamentally computational (i.e., in other words, does math "exist")?
As above, I'd guess not, and that math is just our way of grouping
together ("organizing and describing" in the above words) phenomena
that happen to have common explanatory models.

That is, the world produces a menagerie of behaviors, and then,
from among them all, we pick and choose and group together those
describable by one mathematical model. Then we group together others
describable by some other model, etc. But that "partition of phenomena"
is just our construction, built by mathematical methods that happen to
make sense to us. The world knows nothing of it.

The one loophole seems to me to be reproducibility. Beyond just
a "menagerie of behaviors", we can experimentally instigate, at will,
behaviors describable by any of our mathematical models we like.
So that must mean something. But jumping to the conclusion that
it means "the world computes" (or that "math exists") seems way

### Dirk Bruere at NeoPax

Nov 19, 2010, 10:10:29 PM11/19/10
to
On 19/11/2010 09:36, JohnF wrote:
> BW<bjorn...@gmail.com> wrote:
>> You can speculate if nature did this to simplify simulation of itself,
>> [...]
>> Note that I'm not talking about solving differential equations,
>> but of how the world computes at the most basic level.
>> /Bjorn
>
> The idea that "the world computes" (at any level) seems widespread
> and wrong to me. Math is just our way of organizing and describing
> observable behavior (which we call physics). But the world knows
> nothing of math, or of our organization of its behavior.
>
> For the simplest example, suppose you have a compass and straight
> edge. And suppose you draw a circle of radius R with the compass,
> using a pencil that draws lines of width w and thickness t.
> Does the world "compute" the volume of graphite laid down as 2piRwt?
> No. In the world, that's just what "happens". We're the only ones
> doing any computing here.

Yet there is a change in information as the circle is drawn.
One can quite easily, and with good reason, claim that anything that
changes information content is a computation.

### Juan R. Gonzàlez-Àlvarez

Nov 20, 2010, 1:56:26 PM11/20/10
to
BW wrote on Thu, 18 Nov 2010 07:55:49 +0100:

(...)

> The fun thing is, this is what the last 100 years of physics exploration
> seem to have discovered too - that when interactions are executed they
> depend only on the absolute nearest region (in a pure QFT without
> external classic fields contaminating it).
>
> You can speculate if nature did this to simplify simulation of itself,
> but it sure is a nice fact because you can perfectly well conceive of
> any number of working realities where interactions are not local - or in
> computer code, where calculations would require direct inputs from any
> number of non-neighboring compute nodes.
>
> Note that I'm not talking about solving differential equations or
> finding eigenvalues in huge matrices here, but of how the world computes
> at the most basic level. I do understand that in practice this is not
> how you usually simulate physical systems. Also there is the little
> complication of the multiple-paths or worlds... but other than that,
> local interactions should be good in the long run for computational
> physics :)

i)
Precisely, what science has showed in last 50 years or so, is that
locality is only an approximation to how nature works. For instance, in
the derivation of hidrodynamic equations authors invoke the so-called
"local approximation". In very far from equilibrium regimes this
approximation is very bad and, as a consequence, the hydrodynamic
equations do not describe what one observes in the lab (more general
nonlocal equations are then used).

ii)
Even if you consider only local potentials and near equilbrium regimes,
this does not mean the no existence of nonlocal interactions. Nonlocal
interactions can still appear due to a "curtain effect" [1,2].

iii)
It is ironic that you appeal to QFT to discuss locality of interactions,
when precisely this theory is known for its defects in this topic. For
instance, the five authors of [2] write:

"This demonstrates that localization in relativistic quantum field
theory cannot be complete."

H. Bacry stated in a more sound form:

"Every physicist would easily convince himself that all quantum
calculations are made in the energy-momentum space and that the
Minkowski x^\mu are just dummy variables without physical meaning
(although almost all textbooks insist on the fact that these variables
are not related with position, they use them to express locality of
interactions!)"

Yes indeed!

A more general and rigorous study of localization in both QFT and
relativistic quantum mechanics and an elegant solution is given in the
section "10 Solving the problems of relativistic localization" of [3].

References