Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Proposing a new module: Parallel::Loops

2 views
Skip to first unread message

Peter Valdemar Mørch

unread,
Jun 22, 2010, 2:28:50 PM6/22/10
to
perldoc perlmodlib suggests posting here before posting on CPAN, so
here goes:

I have a new module that I'd like to upload: Parallel::Loops, and
following is the bulk of the synopsis. Is the Parallel::Loops name
appropriate and does anybody have any comments on it before I post it
on CPAN?
Its repository can be found here (code, complete perldoc, etc.)
http://github.com/pmorch/perl-Parallel-Loops

Synopsis:

use Parallel::Loops;

my $maxProcs = 5;
my $pl = new Parallel::Loops($maxProcs);

my @input = ( 0 .. 9 );

my %output;
$pl->tieOutput( \%output );

$pl->foreach(
\@input,
sub {
# This sub "magically" executed in parallel forked child
# processes

# Lets just create a simple example, but this could be a
# massive calculation that will be parallelized, so that
# $maxProcs different processes are calculating sqrt
# simultaneously for different values of $_ on different CPUs

$output{$_} = sqrt($_);
}
);

Chris Nehren

unread,
Jun 24, 2010, 12:20:06 PM6/24/10
to
On 2010-06-22, Peter Valdemar Mørch scribbled these curious markings:

> perldoc perlmodlib suggests posting here before posting on CPAN, so
> here goes:

I find it quaint that some people still follow that guideline. Most
folks just upload.

> I have a new module that I'd like to upload: Parallel::Loops, and
> following is the bulk of the synopsis. Is the Parallel::Loops name
> appropriate and does anybody have any comments on it before I post it
> on CPAN?

How does this differ from e.g. Coro or other similar modules?

> my $pl = new Parallel::Loops($maxProcs);

Indirect object syntax considered harmful:
http://www.shadowcat.co.uk/blog/matt-s-trout/indirect-but-still-fatal/

>
> my @input = ( 0 .. 9 );
>
> my %output;
> $pl->tieOutput( \%output );

Why are you using tie here?

--
Thanks and best regards,
Chris Nehren
Unless noted, all content I post is CC-BY-SA.

Peter Valdemar Mørch

unread,
Jun 24, 2010, 5:56:19 PM6/24/10
to
On Jun 24, 6:20 pm, Chris Nehren <apei...@isuckatdomains.net.invalid>
wrote:

> How does this differ from e.g. Coro or other similar modules?

It differs from Coro especially because there are several processes
involved in Parallel::Loops. Each of the iterations in the loop run in
each their own process - in parallel. Whereas Coro::Intro has:

> only one thread ever has the CPU, and if another thread wants
> the CPU, the running thread has to give it up

, the idea behind Parallel::Loops is exactly to make it easy to use
several CPUs in what resembles code for one CPU.

> > my %output;
> > $pl->tieOutput( \%output );
>
> Why are you using tie here?

Hmm... I thought the idea would be more obvious than it apparently
is...

Outside the $pl->foreach() loop, we're running in the parent process.
Inside the $pl->foreach() loop, we're running in a child process. $pl-
>tieOutput is actually the raison d'etre of Parallel::Loops. When the
child process has a result, it stores it in %output (which is tied
with Tie::Hash behind the scenes in the child process).

Behind the scenes, when the child process exits, it sends the results
(the keys written to %output) back to the parent process's version/
copy of %output, so that the user of Parallel::Loops doesn't have to
do any inter-process communication.

Perhaps the Synopsis needs to be a bit more clear on these points.

> > my $pl = new Parallel::Loops($maxProcs);
>
> Indirect object syntax considered harmful:http://www.shadowcat.co.uk/blog/matt-s-trout/indirect-but-still-fatal/

OK, thanks, I'll fix that

Ben Morrow

unread,
Jun 24, 2010, 9:16:47 PM6/24/10
to

Quoth =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6...@sneakemail.com>:

> On Jun 24, 6:20 pm, Chris Nehren <apei...@isuckatdomains.net.invalid>
> wrote:
> > How does this differ from e.g. Coro or other similar modules?
>
> It differs from Coro especially because there are several processes
> involved in Parallel::Loops. Each of the iterations in the loop run in
> each their own process - in parallel. Whereas Coro::Intro has:
>
> > only one thread ever has the CPU, and if another thread wants
> > the CPU, the running thread has to give it up
>
> , the idea behind Parallel::Loops is exactly to make it easy to use
> several CPUs in what resembles code for one CPU.

OK; how is this different from forks and forks::shared?

Ben

Peter Valdemar Mørch

unread,
Jun 25, 2010, 4:14:11 AM6/25/10
to
On Jun 25, 3:16 am, Ben Morrow <b...@morrow.me.uk> wrote:
> OK; how is this different from forks and forks::shared?

It is _much_ more similar to forks and forks::shared than to Coro.

While the forks and forks::shared API emulate the API of threads and
threads::shared (perfectly?), Parallel::Loops tries to emulate the
standard foreach and while loops as close as possible as in:

$pl->foreach(\@input, sub {
$output{$_} = do_some_hefty_calculation($_);
});

All the forking, waiting for subprocesses to finish etc. is done
behind the scenes. I find that so often, I have large calculations
that need to operate on all the elements of an array or hash, that
really could be parallelized, and with this close-to-foreach syntax,
it is so easy to write and understand/read later on.

I guess Parallel::Loops could have been written with forks and
forks::shared, and only provided syntactic sugar. (In fact it uses
Parallel::ForkManager and Tie::Hash/Tie::Array instead.)

Perhaps $pl->share(\%output) is a better name than $pl->tieOutput(\
%output), tough. I guess now is the time to change it! ;-)

I'm impressed that you guys take the time to read and comment. Thanks!

Ted Zlatanov

unread,
Jun 25, 2010, 11:33:17 AM6/25/10
to
On Fri, 25 Jun 2010 01:14:11 -0700 (PDT) Peter Valdemar M�rch <4ux6...@sneakemail.com> wrote:

PVM> On Jun 25, 3:16�am, Ben Morrow <b...@morrow.me.uk> wrote:
>> OK; how is this different from forks and forks::shared?

PVM> It is _much_ more similar to forks and forks::shared than to Coro.

PVM> While the forks and forks::shared API emulate the API of threads and
PVM> threads::shared (perfectly?), Parallel::Loops tries to emulate the
PVM> standard foreach and while loops as close as possible as in:

PVM> $pl->foreach(\@input, sub {
PVM> $output{$_} = do_some_hefty_calculation($_);
PVM> });

I like that syntax better personally than join() and detach().

PVM> I guess Parallel::Loops could have been written with forks and
PVM> forks::shared, and only provided syntactic sugar. (In fact it uses
PVM> Parallel::ForkManager and Tie::Hash/Tie::Array instead.)

`forks' brings in socket IPC which can be an issue. Your approach seems
a little cleaner IIUC.

Ted

Ben Morrow

unread,
Jun 25, 2010, 6:34:14 PM6/25/10
to

Quoth Ted Zlatanov <t...@lifelogs.com>:
> On Fri, 25 Jun 2010 01:14:11 -0700 (PDT) Peter Valdemar Mørch

> <4ux6...@sneakemail.com> wrote:
>
> PVM> On Jun 25, 3:16 am, Ben Morrow <b...@morrow.me.uk> wrote:
> >> OK; how is this different from forks and forks::shared?
>
> PVM> It is _much_ more similar to forks and forks::shared than to Coro.
>
> PVM> While the forks and forks::shared API emulate the API of threads and
> PVM> threads::shared (perfectly?), Parallel::Loops tries to emulate the
> PVM> standard foreach and while loops as close as possible as in:
>
> PVM> $pl->foreach(\@input, sub {
> PVM> $output{$_} = do_some_hefty_calculation($_);
> PVM> });
>
> I like that syntax better personally than join() and detach().

Personally I find

my %output :shared;

for my $i (@input) {
async {
$output{$i} = do_some_hefty_calculation($i);
}
}

somewhat clearer, but that's just a matter of taste. (With 5.10
presumably a 'my $_' would make $_ work too.)



> PVM> I guess Parallel::Loops could have been written with forks and
> PVM> forks::shared, and only provided syntactic sugar. (In fact it uses
> PVM> Parallel::ForkManager and Tie::Hash/Tie::Array instead.)
>
> `forks' brings in socket IPC which can be an issue. Your approach seems
> a little cleaner IIUC.

THe IPC has to be done *somehow*. Sockets are probably as reliable as
any other mechanism.

Ben

Peter Valdemar Mørch

unread,
Jun 26, 2010, 3:15:07 AM6/26/10
to
On Jun 26, 12:34 am, Ben Morrow <b...@morrow.me.uk> wrote:
> Personally I find
>
> my %output :shared;
>
> for my $i (@input) {
> async {
> $output{$i} = do_some_hefty_calculation($i);
> }
> }
>
> somewhat clearer, but that's just a matter of taste. (With 5.10
> presumably a 'my $_' would make $_ work too.)

In fact, I think that looks better too. I do have a few concerns:

* Having "my %output : shared" and just async without a
Parallel::Loops reference parameter inevitably leads to global
variables. I don't like them. One could have two different
calculations in different sections of the code, that don't need the
same variables "shared", so I'd prefer to have info about shared
variables associated with a specific Parallel::Loops instance. What do
you think?

* About the async {} instead of $pl->foreach: The implementation needs
to wait for the last loop to finish, and only continue after the '}'
after all the processes have finished. I don't know how to do that
unless something like:

for my $i (@input) {
# This fires up the parallel processes
$pl->async {
$output{$i} = do_some_hefty_calculation($i);
}
}
# This waits for them all to finish before continuing.
$pl->joinAll();

This syntax could easily co-exist with the $pl->foreach and $pl->while
syntax. I'm worried though that people will forget to call $pl-
>joinAll()! I guess one could also have async return some reference to
the actual forked process (pid comes to mind) and then $pl->join($pid)
to wait for it to finish.

Regardless, I now think $pl->share(\%output) is a better name than $pl-
>tieOutput(\%output)

The rest of this post is about "my %difficulties : with shared;" :-) -
this syntax is how (threads|forks)::shared does it too. I like it, but
don't yet understand how to implement it. Have looked at "perldoc
attributes" and experimented a little. In fact I could get attributes
like "Shared" (==ucfirst("shared")) to work. "xxshared" works, but
issues a warning, but "shared" simply doesn't work (perl 5.10). Also,
I guess it isn't possible for several packages to be "listening" for
attributes at the same time, as they'd step on each other's exports of
e.g. sub MODIFY_SCALAR_ATTRIBUTES, wouldn't they?

Here is a little snippet I wrote to experiment:

me@it:~> cat attributes.pl
#!perl -w
use strict;
use attributes;
use Data::Dumper;

sub MODIFY_SCALAR_ATTRIBUTES {
my ($pkg, $ref, $attributes) = @_;
print Dumper(\@_, attributes::get($ref));
return ();
}

my $shared : shared;
my $xshared : xshared;
my $Shared : Shared;

me@it:~> perl attributes.pl
$VAR1 = [
'main',
\undef
];
$VAR1 = [
'main',
\undef,
'xshared'
];
SCALAR package attribute may clash with future reserved word: xshared
at attributes.pl line 13
$VAR1 = [
'main',
\undef,
'Shared'
];

Peter Valdemar Mørch

unread,
Jun 26, 2010, 3:16:41 AM6/26/10
to
On Jun 25, 5:33 pm, Ted Zlatanov <t...@lifelogs.com> wrote:
> I like that syntax better personally than join() and detach().

Thanks for the support! :-)

> `forks' brings in socket IPC which can be an issue.  Your approach seems
> a little cleaner IIUC.

As Ben says, it has to be done somehow. I use a pipe behind the
scenes.

Peter

Ben Morrow

unread,
Jun 26, 2010, 8:33:41 AM6/26/10
to

Quoth =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6...@sneakemail.com>:

> On Jun 26, 12:34 am, Ben Morrow <b...@morrow.me.uk> wrote:
> > Personally I find
> >
> > my %output :shared;
> >
> > for my $i (@input) {
> > async {
> > $output{$i} = do_some_hefty_calculation($i);
> > }
> > }
> >
> > somewhat clearer, but that's just a matter of taste. (With 5.10
> > presumably a 'my $_' would make $_ work too.)
>
> In fact, I think that looks better too. I do have a few concerns:
>
> * Having "my %output : shared" and just async without a
> Parallel::Loops reference parameter inevitably leads to global
> variables. I don't like them. One could have two different
> calculations in different sections of the code, that don't need the
> same variables "shared", so I'd prefer to have info about shared
> variables associated with a specific Parallel::Loops instance. What do
> you think?

They're not global. %output can be scoped as tightly as you like around
the async call: async takes a closure, so it will make available (either
shared or as copies) any lexicals in scope at the time. (This is why $_
won't work: it isn't a lexical.)

> * About the async {} instead of $pl->foreach: The implementation needs
> to wait for the last loop to finish, and only continue after the '}'
> after all the processes have finished. I don't know how to do that
> unless something like:
>
> for my $i (@input) {
> # This fires up the parallel processes
> $pl->async {
> $output{$i} = do_some_hefty_calculation($i);
> }
> }
> # This waits for them all to finish before continuing.
> $pl->joinAll();

Well, again using forks, you would write

my %output :shared;
my @thr;

for my $i (@input) {
push @thr, async {
$output{$i} = ...;
}
}
$_->join for @thr;

> This syntax could easily co-exist with the $pl->foreach and $pl->while
> syntax.

Not like that it can't, since methods don't have prototypes. If you want
a method call it would have to look like

$pl->async(sub { ... });

> I'm worried though that people will forget to call $pl-
> >joinAll()!

Stick it in DESTROY.

> I guess one could also have async return some reference to
> the actual forked process (pid comes to mind) and then $pl->join($pid)
> to wait for it to finish.
>
> Regardless, I now think $pl->share(\%output) is a better name than $pl-
> >tieOutput(\%output)
>
> The rest of this post is about "my %difficulties : with shared;" :-) -
> this syntax is how (threads|forks)::shared does it too. I like it, but
> don't yet understand how to implement it. Have looked at "perldoc
> attributes" and experimented a little. In fact I could get attributes
> like "Shared" (==ucfirst("shared")) to work. "xxshared" works, but
> issues a warning, but "shared" simply doesn't work (perl 5.10).

Yup. That's by design. Lowercase attributes are reserved to the core;
:shared, specifically, is handled internally as part of the threads
code, and is never seen by MODIFY_*_ATTRIBUTES. (Obviously it is
possible to hijack it, since forks manages to do so, but it can only be
done globally.)

> Also,
> I guess it isn't possible for several packages to be "listening" for
> attributes at the same time, as they'd step on each other's exports of
> e.g. sub MODIFY_SCALAR_ATTRIBUTES, wouldn't they?

That is certainly a possibility. IIRC Attribute::Handlers handles this
for you, since there's then only one MODIFY_*_ATTR sub to install.
Alternatively, keep a ref to the old sub (if there is one) and call it
if you don't see an attr you recognise.

Ben

Peter Valdemar Mørch

unread,
Jun 26, 2010, 3:25:32 PM6/26/10
to
Commenting on Ben's post out of order:
> > $pl->async {
> > bla_bla_bla();

> > }
> > This syntax could easily co-exist with the $pl->foreach and $pl->while
> > syntax.
>
> Not like that it can't, since methods don't have prototypes.
...

> If you want a method call it would have to look like
>
> $pl->async(sub { ... });

Yes you're right, of course.

> > I'm worried though that people will forget to call $pl->joinAll()!
>
> Stick it in DESTROY.

I don't see how that would help. I'm thinking of a user writing
something like:

$pl->share(\%results);
foreach (0..4) {
$pl->async(sub { $results{$_} = foobar($_) } );
}
$pl->joinAll();
useResults(\%results);

In this case, at the time of the call to useResults, %results will
contain the finished results from all forked processes because $pl-
>joinAll() waits for them all to finish. If $pl->joinAll() doesn't get
called, the user will most likely see an empty %results. I don't see
how DESTROY comes in to play here or could help.

> They're not global. %output can be scoped as tightly as you like around
> the async call: async takes a closure, so it will make available (either
> shared or as copies) any lexicals in scope at the time. (This is why $_
> won't work: it isn't a lexical.)

I think I haven't made my concern clear. Is it possible to do:

my %resultsForCalc1 : Shared($pl1);

and have the sharing associated with a particular Parallel::Loops
instance (so my attribute handler gets a reference to $pl1, not the
string '$pl1')?

If so, cool. Don't read any further, I'm satisified (BTW, How?). If
not, lets say one does this:

my %resultsForCalc1 : Shared;
my $pl1 = Parallel::Loops->new(4);
$pl1->foreach([0..9], sub {
$resultsForCalc11{$_} = doSomething($_);
}
useResults(\%resultsForCalc1);

# Block above duplicated, just s/1/2/g
my %resultsForCalc2 : Shared;
my $pl2 = Parallel::Loops->new(4);
$pl1->foreach([0..9], sub {
$resultsForCalc12{$_} = doSomething($_);
}
useResults(\%resultsForCalc1);

Wouldn't the list ( \%resultsForCalc1, \%resultsForCalc2 ) have to be
global? How would I/perl keep track of that the user only wants to
share %resultsForCalc1 in the first calculation and only
%resultsForCalc2 in the second?

By the way, how would one avoid that %foo gets handled as shared in
the following case, since it has gone out of scope?

{
my %foo : Shared;
}
my %resultsForCalc1 : Shared;
my $pl1 = Parallel::Loops->new(4);
$pl1->foreach([0..9], sub {
$resultsForCalc11{$_} = doSomething($_);
}
useResults(\%resultsForCalc1);

I don't (yet?) see how I can detect which of the hashes with the
"Shared" attribute that are in scope at the time of the $pl1-
>foreach() call.

But even if I could detect which of all the shared hashes that were in
scope "now", that may not be what the user wants. There could be other
reasons that the user wants %resultsForCalc1 (from way above) in an
outer scope and not have it shared in some of the calculations where
it happens to be in scope.

Perhaps we're getting a little off-topic here, but now I'm curious
about the attributes business! ;-)

Peter

Ben Morrow

unread,
Jun 26, 2010, 4:52:12 PM6/26/10
to

Quoth =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6...@sneakemail.com>:

> Commenting on Ben's post out of order:
>
> > > I'm worried though that people will forget to call $pl->joinAll()!
> >
> > Stick it in DESTROY.
>
> I don't see how that would help. I'm thinking of a user writing
> something like:
>
> $pl->share(\%results);
> foreach (0..4) {
> $pl->async(sub { $results{$_} = foobar($_) } );
> }
> $pl->joinAll();
> useResults(\%results);
>
> In this case, at the time of the call to useResults, %results will
> contain the finished results from all forked processes because $pl-
> >joinAll() waits for them all to finish. If $pl->joinAll() doesn't get
> called, the user will most likely see an empty %results. I don't see
> how DESTROY comes in to play here or could help.

Well, if the user wrote

my %results;
{
my $pl = Parallel::Loops->new;
$pl->share(\%results);


$pl->async(sub { $results{$_} = foobar($_) })

for 0..4;
}
useResults \%results;

then a call to ->joinAll in DESTROY would ensure it was called. Since
variables (particularly those containing potentially-expensive object,
like $pl) should be minimally-scoped, this would be the correct way to
write that code.

> > They're not global. %output can be scoped as tightly as you like around
> > the async call: async takes a closure, so it will make available (either
> > shared or as copies) any lexicals in scope at the time. (This is why $_
> > won't work: it isn't a lexical.)
>
> I think I haven't made my concern clear. Is it possible to do:
>
> my %resultsForCalc1 : Shared($pl1);
>
> and have the sharing associated with a particular Parallel::Loops
> instance (so my attribute handler gets a reference to $pl1, not the
> string '$pl1')?

Not easily. Apart from anything else, attribute declarations are
processed at compile-time, before your objects have been constructed.

I was still looking at the question 'why aren't you simply using
forks?'. forks handles all this for you.

> If so, cool. Don't read any further, I'm satisified (BTW, How?). If
> not, lets say one does this:
>
> my %resultsForCalc1 : Shared;
> my $pl1 = Parallel::Loops->new(4);
> $pl1->foreach([0..9], sub {
> $resultsForCalc11{$_} = doSomething($_);
> }
> useResults(\%resultsForCalc1);
>
> # Block above duplicated, just s/1/2/g
> my %resultsForCalc2 : Shared;
> my $pl2 = Parallel::Loops->new(4);
> $pl1->foreach([0..9], sub {
> $resultsForCalc12{$_} = doSomething($_);
> }
> useResults(\%resultsForCalc1);
>
> Wouldn't the list ( \%resultsForCalc1, \%resultsForCalc2 ) have to be
> global?

When you say 'global' you mean 'shared in all P::L instances', right?
Is this a problem? Since (presumably) you would be tying the variable in
the attr handler, just make sure DESTROY and UNTIE for the tied object
take it off the current list. That way, when the shared variable goes
out of scope it will no longer be considered a candidate for sharing.

(You don't even need to do that if you just weaken the refs in your
master list. Perl will replace any that go out of scope with undef.)

I don't know how P::L deals with copying the results back. Presumably
you have no idea whether a variable has been modified in the sub-process
or not? What do you do if two sub-processes change the same shared var
in different ways?

> How would I/perl keep track of that the user only wants to
> share %resultsForCalc1 in the first calculation and only
> %resultsForCalc2 in the second?
>
> By the way, how would one avoid that %foo gets handled as shared in
> the following case, since it has gone out of scope?
>
> {
> my %foo : Shared;
> }
> my %resultsForCalc1 : Shared;
> my $pl1 = Parallel::Loops->new(4);
> $pl1->foreach([0..9], sub {
> $resultsForCalc11{$_} = doSomething($_);
> }
> useResults(\%resultsForCalc1);
>
> I don't (yet?) see how I can detect which of the hashes with the
> "Shared" attribute that are in scope at the time of the $pl1-
> >foreach() call.
>
> But even if I could detect which of all the shared hashes that were in
> scope "now", that may not be what the user wants. There could be other
> reasons that the user wants %resultsForCalc1 (from way above) in an
> outer scope and not have it shared in some of the calculations where
> it happens to be in scope.
>
> Perhaps we're getting a little off-topic here, but now I'm curious
> about the attributes business! ;-)

Not OT at all.

FWIW, I would cast this API rather differently. You don't seem to be
trying to emulate the forks API of 'you can do anything you like', but
instead restricting yourself to iterating over a list. In that case, why
not have the API like

my $PL = Parallel::Loops->new(sub { dosomething($_) });
my %results = $PL->foreach(0..9);

No need for any tying, and there's no chance of forgetting the
'->joinAll' since you don't get the results until it's been done. (The
subproc that runs the closure will, of course, get a COW copy of
anything currently in scope, so there's no need to worry about sharing
'read-only' data.)

Ben

Peter Valdemar Mørch

unread,
Jun 28, 2010, 4:05:07 AM6/28/10
to
On Jun 26, 10:52 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> I was still looking at the question 'why aren't you simply using
> forks?'. forks handles all this for you.

Well, because I don't want the forks API. I want the foreach
syntax. :-) The main reason is that it is so much easier to write and
read later on.

I could've implemented it using forks, but I didn't. Forks _is_
mentioned in the "SEE ALSO" section so users have a chance to explore
alternatives.

> When you say 'global' you mean 'shared in all P::L instances', right?

Yes.

> Is this a problem?

A little bit. To me, that speaks in favor of

my %output;
$pl->share(\%output)

over

my %output : Shared;

(apart from the fact that $pl->share() seems much simpler to
understand and implement)

> (You don't even need to do that if you just weaken the refs in your
> master list. Perl will replace any that go out of scope with undef.)

Ah, good point.

> I don't know how P::L deals with copying the results back. Presumably
> you have no idea whether a variable has been modified in the sub-process
> or not? What do you do if two sub-processes change the same shared var
> in different ways?

I've mentioned in the pod that only setting of hash keys and pushing
to arrays is supported in the child. I'll append to that that setting
the same key from different iterations preserves a random one of them.

> FWIW, I would cast this API rather differently.

Yeah, I'm beginning to gather that! :-) Fine, you won't be one of
P::L's users I take it...

> You don't seem to be
> trying to emulate the forks API of 'you can do anything you like', but
> instead restricting yourself to iterating over a list.

Exactly.

> In that case, why not have the API like
>
> my $PL = Parallel::Loops->new(sub { dosomething($_) });
> my %results = $PL->foreach(0..9);

I guess if I change that to:

my $PL = Parallel::Loops->new( 4 );
my %results = $PL->foreach( [0..9], sub {
( $_ => dosomething($_) )
});

We could be in business. I'm presuming I can use wantarray() in the
foreach method to test if the caller is going to use the return value
and only transfer the return value from the child if it is going to be
used. It kind of breaks the analogy with foreach but doesn't hurt
otherwise, so why not.

> Well, if the user wrote
>
> my %results;
> {
> my $pl = Parallel::Loops->new;
> $pl->share(\%results);
> $pl->async(sub { $results{$_} = foobar($_) })
> for 0..4;
> }
> useResults \%results;
>
> then a call to ->joinAll in DESTROY would ensure it was called. Since
> variables (particularly those containing potentially-expensive object,
> like $pl) should be minimally-scoped, this would be the correct way to
> write that code.

I don't understand how that can be guaranteed. perldoc perltoot says:

> Perl's notion of the right time to call a destructor is not well-defined
> currently, which is why your destructors should not rely on when they
> are called.

Given that, how can i be sure that DESTROY has been called at the time
of the useResults call?

Peter

Ben Morrow

unread,
Jun 28, 2010, 9:29:59 AM6/28/10
to

Quoth =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6...@sneakemail.com>:

> On Jun 26, 10:52 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> > I was still looking at the question 'why aren't you simply using
> > forks?'. forks handles all this for you.
>
> Well, because I don't want the forks API. I want the foreach
> syntax. :-) The main reason is that it is so much easier to write and
> read later on.

OK.

> > You don't seem to be
> > trying to emulate the forks API of 'you can do anything you like', but
> > instead restricting yourself to iterating over a list.
>
> Exactly.
>
> > In that case, why not have the API like
> >
> > my $PL = Parallel::Loops->new(sub { dosomething($_) });
> > my %results = $PL->foreach(0..9);
>
> I guess if I change that to:
>
> my $PL = Parallel::Loops->new( 4 );
> my %results = $PL->foreach( [0..9], sub {
> ( $_ => dosomething($_) )
> });
>
> We could be in business. I'm presuming I can use wantarray() in the
> foreach method to test if the caller is going to use the return value
> and only transfer the return value from the child if it is going to be
> used. It kind of breaks the analogy with foreach but doesn't hurt
> otherwise, so why not.

It's now more analogous to map than foreach, but I don't see that as a
problem.

>
> > Well, if the user wrote
> >
> > my %results;
> > {
> > my $pl = Parallel::Loops->new;
> > $pl->share(\%results);
> > $pl->async(sub { $results{$_} = foobar($_) })
> > for 0..4;
> > }
> > useResults \%results;
> >
> > then a call to ->joinAll in DESTROY would ensure it was called. Since
> > variables (particularly those containing potentially-expensive object,
> > like $pl) should be minimally-scoped, this would be the correct way to
> > write that code.
>
> I don't understand how that can be guaranteed. perldoc perltoot says:
>
> > Perl's notion of the right time to call a destructor is not well-defined
> > currently, which is why your destructors should not rely on when they
> > are called.
>
> Given that, how can i be sure that DESTROY has been called at the time
> of the useResults call?

Hmm, I'd forgotten that was there. It's complete nonsense: in Perl 5,
destructors are always called promptly, and there are *lots* of modules
relying on that fact so it isn't going to go away. (Perl 6 is a
different matter, of course.)

Ben

Willem

unread,
Jun 28, 2010, 11:07:10 AM6/28/10
to
Peter Valdemar M?rch wrote:
)> > my %output;
)> > $pl->tieOutput( \%output );
)>

)> Why are you using tie here?
)
) Hmm... I thought the idea would be more obvious than it apparently
) is...
)
) Outside the $pl->foreach() loop, we're running in the parent process.
) Inside the $pl->foreach() loop, we're running in a child process. $pl-
)>tieOutput is actually the raison d'etre of Parallel::Loops. When the
) child process has a result, it stores it in %output (which is tied
) with Tie::Hash behind the scenes in the child process).
)
) Behind the scenes, when the child process exits, it sends the results
) (the keys written to %output) back to the parent process's version/
) copy of %output, so that the user of Parallel::Loops doesn't have to
) do any inter-process communication.

Isn't there some easier method, where you don't have to screw around with
output maps at all ?

If the following API would work, that would be the easiest, IMO:

my @result = async_map { do_something($_) } @array;

Where async_map takes care of all the details of creating the threads,
gathering all the output, et cetera. Or does that already exist ?

(The simple implementation is only a few lines of code, but it could
then be easily extended to use a limited number of threads, or keep
a thread pool handy, or something like that.)


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Peter Valdemar Mørch

unread,
Jun 29, 2010, 3:06:41 AM6/29/10
to
On Jun 28, 5:07 pm, Willem <wil...@turtle.stack.nl> wrote:
> Isn't there some easier method, where you don't have to screw around with
> output maps at all ?
>
> If the following API would work, that would be the easiest, IMO:
>
>   my @result = async_map { do_something($_) } @array;
>
> Where async_map takes care of all the details of creating the threads,
> gathering all the output, et cetera.  Or does that already exist ?

It doesn't pre-exist to my knowledge. Not with such a simple syntax.

Ben wrote (in another branch of this tread,
http://groups.google.com/group/comp.lang.perl.misc/msg/0dbec9f2d0e37750
):


> It's now more analogous to map than foreach, but I don't see that as a
> problem.

Given these two inputs (thank you!), I propose an addition to $pl-
>foreach and $pl->while:

my @result = $pl->map(sub { do_something($_) }) @array;

And that will be $pl->map(sub {}) instead of async_map {} so the
object holds the number of processes to use. Alternatively, async_map
would have to be passed the number of processes to use. Which is also
a possibility. Or the number of processes is shared among all
Parallel::Loop async_map calls (which I like less).

> (The simple implementation is only a few lines of code, but it could
> then be easily extended to use a limited number of threads, or keep
> a thread pool handy, or something like that.)

The problem with a thread pool is that then we need to keep all
variables synchronized between them. And I'm focusing on forking - not
threads - here.

But yeah, it isn't that difficult to write. Already, there is more pod
than code! :-) There have just been so many instances already where I
find myself thinking: "This loop could and should be parallelized. But
(I'm too lazy|the schedule is too tight|who cares) right now."

Peter

Ted Zlatanov

unread,
Jun 29, 2010, 10:47:42 AM6/29/10
to
On Fri, 25 Jun 2010 23:34:14 +0100 Ben Morrow <b...@morrow.me.uk> wrote:

BM> Personally I find

BM> my %output :shared;

BM> for my $i (@input) {
BM> async {
BM> $output{$i} = do_some_hefty_calculation($i);
BM> }
BM> }

BM> somewhat clearer, but that's just a matter of taste. (With 5.10
BM> presumably a 'my $_' would make $_ work too.)

I personally don't like "inline tagged" code blocks as much as passing
them off to a library subroutine. Inline tagging IMO creates spaghetti
code and is harder to refactor. But I can see the appeal :)

On Tue, 29 Jun 2010 00:06:41 -0700 (PDT) Peter Valdemar Mørch <4ux6...@sneakemail.com> wrote:

PVM> The problem with a thread pool is that then we need to keep all
PVM> variables synchronized between them. And I'm focusing on forking -
PVM> not threads - here.

Please don't try to make your module do everything for everyone. It's
OK to say "it won't support XYZ." Do a few things well rather than many
things badly.

Ted

0 new messages