ParallelMap inefficient?

Mike Yukish

unread,

Feb 16, 2001, 4:28:24 AM2/16/01

to

Hello,

I am starting to use the parallel programming toolbox, and I've found
the ParallelMap[ ] function to be essentially broken, most likely due to
some sort of comm glitch. I was curious if anyone else has experienced a
similar problem, or if it is just my network.

If I run the following commands (listtt is a 5 element list of real
numbers), with one remote server active...

x1=SessionTime[];
ParallelMap[Sin,listtt]
SessionTime[]-x1

It takes 5-7 seconds. This is to connect with a slave computer that is
sharing a hub with the master. Obviously something is wrong. By
contrast, if I define the function on the slave

maparoni[x_]:=Map[Sin,x]

and do the following...

Do[
With[{x=listtt},RemoteEvaluate[maparoni[x]] ] , {20} ]
SessionTime[]-x1

It only takes 5 seconds. Note the difference. I am executing the Sin[ ]
function twenty times more, I am passing discrete hunks of data twenty
times more, and I am passing twenty times the total amount of data, yet
it takes less time to complete. So the basic linking between computers
appears to work OK.

Anybody else have experience with the ParallelMap[ ] function? I see
similar performance with ParallelTable[ ]

Jens-Peer Kuska

unread,

Feb 18, 2001, 12:11:19 AM2/18/01

to

Hi,

it is hopeless to assume that "Take x computer and be x-times faster" is
true. In fact you have to to design the algorithm very care full to get
a speed up in parallel program.

We have here (at the faculty of computer science) a graduate course
on parallel computing and in the laboratory the students have to solve
some task with parallel programs. The laboratory is very discouraging
because typical none of the solutions of tasks give a speed up :-)

You should not expect that so simple operations are faster.
In principle the time to send/recive that data mut be much shorter
than the time for the execution of the commands.
Don't think that using more data will help you because the time
for send/recive grow linear with the data size. So your operations
with the data should have atleast a larger time complexity.

Regards
Jens

Mike Yukish

unread,

Feb 18, 2001, 10:05:14 PM2/18/01

to

Jens-Peer Kuska wrote:

> Hi,
>
> it is hopeless to assume that "Take x computer and be x-times faster" is
> true. In fact you have to to design the algorithm very care full to get
> a speed up in parallel program.

Hi Jens. The ParallelMap[ ] routine is for algorithms that are "embarassingly
parallel", i.e., there is no coupling between the processes. Therefore, if you
map N processors over a 50 element list, with a single master computer and N
slaves, an upper bound on the time might be

time to process = 50 * T_slowest_comm_master_slave + 50/N *
T_compute_slowest_computer +/- small_fudge_factor

When I hand code the ParallelMap[ ] functionality myself, I see this sort of
performance. When I use the function supplied with the toolbox, I see
something different. That was the point.

> We have here (at the faculty of computer science) a graduate course
> on parallel computing and in the laboratory the students have to solve
> some task with parallel programs. The laboratory is very discouraging
> because typical none of the solutions of tasks give a speed up :-)

I have experience with MPI and PVM on an IBM SP2 with 80 processors, and MPI
over a network of PCs.

Jens-Peer Kuska

unread,

Feb 20, 2001, 3:10:01 AM2/20/01

to

Hi,

> Hi Jens. The ParallelMap[ ] routine is for algorithms that are "embarassingly
> parallel", i.e., there is no coupling between the processes.

Since the toolkit has only the master/worker model you must send
the data from the master to the workers and get the results back.

In the simple examples you gave, the overhead for exchanging data and
data
will dominate allmost everything

> Therefore, if you
> map N processors over a 50 element list, with a single master computer and N
> slaves, an upper bound on the time might be
>
> time to process = 50 * T_slowest_comm_master_slave + 50/N *
> T_compute_slowest_computer +/- small_fudge_factor
>
> When I hand code the ParallelMap[ ] functionality myself, I see this sort of
> performance. When I use the function supplied with the toolbox, I see
> something different. That was the point.

What kind of client tasks ??

Regards
Jens

Mike Yukish

unread,

Feb 21, 2001, 3:36:56 AM2/21/01

to

Hello again,

Jens-Peer Kuska wrote:

> In the simple examples you gave, the overhead for exchanging data and
> data
> will dominate allmost everything

I've traded email with the Mathematica support folks since, and here is the nugget of my
point.

Make a list of 100 elements.

list = {0, ...., 100};

Start one remote processor. Compare these calls...

ParallelMap[Sin, list];

Map[RemoteEvaluate[Sin[#]] &, list];

Both calls do the same thing, and theoretically have the exact same amount of
communication and computation. Yet you see a six-fold difference in the time spent
other than calculating, between these two. The first call takes 200 seconds. The
second call takes 30 seconds. With two processors, it is still better to Map[ ] it
across a single remote than it is to ParallelMap[ ] it across two remote. And so on
up to six or so remote processors. These processors are on PCs communicating across
a 10 MB ethernet hub. An even faster call is:

RemoteEvaluate[Map[Sin,list]]]

Which is almost instantaneous. This points out that it is much better to split a
list up into N segments for N processors and farm them out, than it is to let
ParallelMap[ ] handle the comms for you. The advantages for using ParallelMap[ ]
are that it handles the admin details, and it will handle cases where the task time
for each element differs.