Recently I was looking through many parallel computation example in
the documentation of Mathematica 7.0.1. If not very clear and adequate
those documentation looks pretty impressive at the first glance. Hence
I decided to do some Mathematica implementation of the small piece of
software named Super Pi which is very famous among the common over
clockers. It computes Pi up to a user defined decimal digits but in
parallel using all the cores of your processor. Have look
http://files.extremeoverclocking.com/file.php?f=36
So my goal was to write a pure Mathematica code that computes Pi up to
three million decimal digits eight times in parallel using the eight
kernels available in my pc. However to compute this task once in my pc
it requires just around 3.885 seconds (with Intel Core i7 975 extreme
processor).
fun[n_]:=Module[{a,tic,toc},
tic=TimeUsed[];
a=N[Pi,n*10^6];
toc=TimeUsed[];
toc-tic
];
(*For 3 million decimal digits*)
In[24]:= fun[3]
Out[24]= 3.885
Now let's see the parallel configuration of the PC. One can see that I
indeed have eight kernels present in the system.
In[16]:= ParallelEvaluate[$ProcessID]
Out[16]= {6712,6636,7928,4112,7196,5832,3992,7484}
In[17]:= ParallelEvaluate[$MachineName]
Out[17]= {flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-
pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc,flowcrusher-pc}
Now to compute the same thing eight times but in parallel I tried the
following combinations with no success at all. See yourself the
disappointing timing results.
First:
In[2]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#]
&,b],Method->"CoarsestGrained"];
toc=TimeUsed[];
toc-tic
Out[4]= 30.935
Second:
In[11]:= b=Table[3,{i,1,8}];tic=TimeUsed[];re=Parallelize[Map[fun[#=
]
&,b],Method->"FinestGrained"];
toc=TimeUsed[];
toc-tic
Out[13]= 30.872
Third:
In[18]:= ParallelMap[fun[#] &, b] // Timing
Out[18]= {30.81, {3.884, 3.822, 3.854, 3.853, 3.837, 3.869, 3.822,
3.869}}
Fourth:
In[21]:= ParallelTable[fun[3],{i,1,8}]//Timing
Out[21]= {30.747,{3.868,3.807,3.837,3.838,3.806,3.854,3.884,3.853}}
Now finally to validate the fact that in spite of all these parallel
commands only one single kernel is getting used by Mathematica we map
our function over a list of eight threes b={3,3,3,3,3,3,3,3} and get
the total time for the repetitive computation.
Validation of the claim:
In[16]:= Map[fun[#]&,b]//Timing
Out[16]= {30.748,{3.854,3.822,3.853,3.838,3.837,3.822,3.869,3.853}}
This shows that parallel commands used in the above codes had been
simply useless.
I will highly appreciate if any of you guys can shade some light on
this problem. It is very basic in nature but the idea involved is
quite central in parallel computing. What I expect is that a neat and
clean Mathematica code can be written for this problem that will bring
the computation time to somewhere around 6-8 seconds in place of 30-31
seconds as we have seen above. I will continue trying on the problem
but in the mean time if any of you want to give it a try.
With best regards to all.
Pratip Chakraborty
LaunchKernels[]
David Bailey
http://www.dbaileyconsultancy.co.uk
your code is completely useless since I don't see why one should compute
the same result eight times. But here is what you missed:
fun[n_] := First@AbsoluteTiming@N[Pi, n*10^6]
ParallelTable[fun[3], {i, 1, 4}] // AbsoluteTiming
DistributeDefinitions[fun];
ParallelTable[fun[3], {i, 1, 4}] // AbsoluteTiming
{29.608226, {6.893410, 6.849890, 6.845198, 6.848202}}
{10.246625, {9.339382, 10.221913, 9.790986, 9.587946}}
you should read ParallelTools/tutorial/Overview first!
Cheers
Patrick
Before running any parallel computation you have to make the
definition of the function fun available on the compute kernels by
entering:
DistributeDefinitions[fun]
Symbols defined in the controller kernel do not become available
automatically on the compute kernels. Unless the definition of the
function fun is available, a compute kernel cannot reduce an
expression involving the symbol fun. The expression will thus be
reduced on the controller kernel instead. This explains why only a
single kernel (the controller kernel) is actually used in your tests.
Sascha
Pratip,
You should see a linear speedup if you precede ParallelMap with
DistributeDefinitions[fun]. Worked for me, with your code. Without it,
I saw the same sequential behavior as you (no time to drill into that
now).
Vince Virgilio
If subkernel doesn't know, it just pass the computation to main kernel.
For example,
In[51]:= fun[n_]:=Module[{a,tic,toc},tic=TimeUsed[];
a=N[Pi,n*10^6];
toc=TimeUsed[];
toc-tic];
In[52]:= fun[3]
Out[52]= 4.66056
In[55]:= b=Table[3,{i,1,8}];
In[72]:= LaunchKernels[]
Out[72]= {KernelObject[3,local],KernelObject[4,local]}
In[73]:= ParallelEvaluate[$ProcessID]
Out[73]= {21462,21463}
In[56]:= DistributeDefinitions[fun,b]
In[66]:= Map[fun[#]&,b]//AbsoluteTiming
Out[66]=
{37.759508,{4.67046,4.67199,4.6748,4.66635,4.66351,4.65885,4.69103,4.67446}}
In[67]:=
Parallelize[Map[fun[#]&,b],Method->"CoarsestGrained"]//AbsoluteTiming
Out[67]=
{20.064523,{4.72813,4.73638,4.72329,4.70464,4.7191,4.73912,4.71899,4.69738}}
In[68]:= Parallelize[Map[fun[#]&,b],Method->"FinestGrained"]//AbsoluteTiming
Out[68]=
{20.319690,{4.73018,4.72562,4.7092,4.70702,4.74691,4.73888,4.72667,4.72734}}
In[63]:= ParallelMap[fun[#]&,b]//AbsoluteTiming
Out[63]=
{20.238199,{4.71829,4.71306,4.72525,4.71633,4.72851,4.76228,4.73172,4.76264}}
In[69]:= ParallelTable[fun[3],{i,1,8}]//AbsoluteTiming
Out[69]=
{20.372381,{4.72952,4.73892,4.72444,4.74621,4.71921,4.7143,4.7259,4.71445}}
- Jaebum
David Bailey wrote: