A question about parallel computation in mathematica

pratip

unread,

Sep 23, 2009, 11:50:46 PM9/23/09

to

Hi Everybody,

Recently I was looking through many parallel computation example in
the documentation of Mathematica 7.0.1. If not very clear and adequate
those documentation looks pretty impressive at the first glance. Hence
I decided to do some Mathematica implementation of the small piece of
software named Super Pi which is very famous among the common over
clockers. It computes Pi up to a user defined decimal digits but in
parallel using all the cores of your processor. Have look
http://files.extremeoverclocking.com/file.php?f=36
So my goal was to write a pure Mathematica code that computes Pi up to
three million decimal digits eight times in parallel using the eight
kernels available in my pc. However to compute this task once in my pc
it requires just around 3.885 seconds (with Intel Core i7 975 extreme
processor).

fun[n_]:=Module[{a,tic,toc},
tic=TimeUsed[];
a=N[Pi,n*10^6];
toc=TimeUsed[];
toc-tic
];
(*For 3 million decimal digits*)
In[24]:= fun[3]
Out[24]= 3.885

Now let's see the parallel configuration of the PC. One can see that I
indeed have eight kernels present in the system.

In[16]:= ParallelEvaluate[$ProcessID]
Out[16]= {6712,6636,7928,4112,7196,5832,3992,7484}

In[17]:= ParallelEvaluate[$MachineName]
Out[17]= {flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-
pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc}

Now to compute the same thing eight times but in parallel I tried the
following combinations with no success at all. See yourself the
disappointing timing results.

First:
In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#]
&,b],Method->"CoarsestGrained"];
toc=TimeUsed[];
toc-tic
Out[4]= 30.935

Second:
In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#=
]
&,b],Method->"FinestGrained"];
toc=TimeUsed[];
toc-tic
Out[13]= 30.872

Third:
In[18]:= ParallelMap[fun[#] &, b] // Timing

Out[18]= {30.81, {3.884, 3.822, 3.854, 3.853, 3.837, 3.869, 3.822,
3.869}}

Fourth:
In[21]:= ParallelTable[fun[3],{i,1,8}]//Timing
Out[21]= {30.747,{3.868,3.807,3.837,3.838,3.806,3.854,3.884,3.853}}

Now finally to validate the fact that in spite of all these parallel
commands only one single kernel is getting used by Mathematica we map
our function over a list of eight threes b={3,3,3,3,3,3,3,3} and get
the total time for the repetitive computation.

Validation of the claim:
In[16]:= Map[fun[#]&,b]//Timing
Out[16]= {30.748,{3.854,3.822,3.853,3.838,3.837,3.822,3.869,3.853}}

This shows that parallel commands used in the above codes had been
simply useless.

I will highly appreciate if any of you guys can shade some light on
this problem. It is very basic in nature but the idea involved is
quite central in parallel computing. What I expect is that a neat and
clean Mathematica code can be written for this problem that will bring
the computation time to somewhere around 6-8 seconds in place of 30-31
seconds as we have seen above. I will continue trying on the problem
but in the mean time if any of you want to give it a try.

With best regards to all.

Pratip Chakraborty

David Bailey

unread,

Sep 27, 2009, 9:38:49 PM9/27/09

to

Since nobody else has responded, I think you need to launch some kernels
before running parallel tasks - but I have not really used this feature
of Mathematica.

LaunchKernels[]

David Bailey
http://www.dbaileyconsultancy.co.uk

Patrick Scheibe

unread,

Sep 29, 2009, 7:42:27 AM9/29/09

to

Hi,

your code is completely useless since I don't see why one should compute
the same result eight times. But here is what you missed:

fun[n_] := First@AbsoluteTiming@N[Pi, n*10^6]
ParallelTable[fun[3], {i, 1, 4}] // AbsoluteTiming
DistributeDefinitions[fun];
ParallelTable[fun[3], {i, 1, 4}] // AbsoluteTiming

{29.608226, {6.893410, 6.849890, 6.845198, 6.848202}}

{10.246625, {9.339382, 10.221913, 9.790986, 9.587946}}

you should read ParallelTools/tutorial/Overview first!

Cheers
Patrick

sakra

unread,

Sep 29, 2009, 7:43:20 AM9/29/09

to

On Sep 24, 5:50 am, pratip <pratip.chakrabo...@gmail.com> wrote:
> Hi Everybody,
>
> Recently I was looking through many parallel computation example in
> the documentation of Mathematica 7.0.1. If not very clear and adequate
> those documentation looks pretty impressive at the first glance. Hence
> I decided to do some Mathematica implementation of the small piece of
> software named Super Pi which is very famous among the common over
> clockers. It computes Pi up to a user defined decimal digits but in

> parallel using all the cores of your processor. Have lookhttp://files.ext=

> In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[=

#]
> &,b],Method->"CoarsestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[4]= 30.935
>
> Second:

> In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun=

Before running any parallel computation you have to make the
definition of the function fun available on the compute kernels by
entering:

DistributeDefinitions[fun]

Symbols defined in the controller kernel do not become available
automatically on the compute kernels. Unless the definition of the
function fun is available, a compute kernel cannot reduce an
expression involving the symbol fun. The expression will thus be
reduced on the controller kernel instead. This explains why only a
single kernel (the controller kernel) is actually used in your tests.

Sascha

Vince

unread,

Sep 29, 2009, 7:45:32 AM9/29/09

to

On Sep 23, 11:50 pm, pratip <pratip.chakrabo...@gmail.com> wrote:
> Hi Everybody,
>
> Recently I was looking through many parallel computation example in
> the documentation of Mathematica 7.0.1. If not very clear and adequate
> those documentation looks pretty impressive at the first glance. Hence
> I decided to do some Mathematica implementation of the small piece of
> software named Super Pi which is very famous among the common over
> clockers. It computes Pi up to a user defined decimal digits but in

> parallel using all the cores of your processor. Have lookhttp://files.ext=

> In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[=

#]
> &,b],Method->"CoarsestGrained"];
> toc=TimeUsed[];
> toc-tic
> Out[4]= 30.935
>
> Second:

> In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun=

Pratip,

You should see a linear speedup if you precede ParallelMap with
DistributeDefinitions[fun]. Worked for me, with your code. Without it,
I saw the same sequential behavior as you (no time to drill into that
now).

Vince Virgilio

Jaebum Jung

unread,

Sep 29, 2009, 7:46:26 AM9/29/09

to

You need to distribute definition of your function and variable (fun, b)
to subkernels.

If subkernel doesn't know, it just pass the computation to main kernel.
For example,

In[51]:= fun[n_]:=Module[{a,tic,toc},tic=TimeUsed[];
a=N[Pi,n*10^6];
toc=TimeUsed[];
toc-tic];

In[52]:= fun[3]
Out[52]= 4.66056

In[55]:= b=Table[3,{i,1,8}];

In[72]:= LaunchKernels[]
Out[72]= {KernelObject[3,local],KernelObject[4,local]}

In[73]:= ParallelEvaluate[$ProcessID]
Out[73]= {21462,21463}

In[56]:= DistributeDefinitions[fun,b]

In[66]:= Map[fun[#]&,b]//AbsoluteTiming
Out[66]=
{37.759508,{4.67046,4.67199,4.6748,4.66635,4.66351,4.65885,4.69103,4.67446}}

In[67]:=
Parallelize[Map[fun[#]&,b],Method->"CoarsestGrained"]//AbsoluteTiming
Out[67]=
{20.064523,{4.72813,4.73638,4.72329,4.70464,4.7191,4.73912,4.71899,4.69738}}

In[68]:= Parallelize[Map[fun[#]&,b],Method->"FinestGrained"]//AbsoluteTiming
Out[68]=
{20.319690,{4.73018,4.72562,4.7092,4.70702,4.74691,4.73888,4.72667,4.72734}}

In[63]:= ParallelMap[fun[#]&,b]//AbsoluteTiming
Out[63]=
{20.238199,{4.71829,4.71306,4.72525,4.71633,4.72851,4.76228,4.73172,4.76264}}

In[69]:= ParallelTable[fun[3],{i,1,8}]//AbsoluteTiming
Out[69]=
{20.372381,{4.72952,4.73892,4.72444,4.74621,4.71921,4.7143,4.7259,4.71445}}

- Jaebum

David Bailey wrote: