Forking

Jason Price

unread,

Mar 3, 2004, 11:35:35 AM3/3/04

to begi...@perl.org

Not sure if this is the right list for this - if it's not, please direct me
to the proper list.

Anyway, I'm trying to get my hands around forking, and was hoping you all
could help me out. Basically, I'm trying to find a way to fire off a remote
script on numerous boxes in parallel, returning their results to the parent
script. Here's the basic flow I'm after:

1. User brings up web page for an on-demand report. Provides user input,
hits submit, which fires off the parent script.
2. Parent script takes user input, and fires off a remote script located on
all servers provided by user input.
3. Remote scripts return results to an array in the parent script.
4. Parent script compiles results and formats output for web display.

The process currently works, but runs each remote server in series, which
takes a considerable amount of time. I've had a hell of a time finding a
good explanation of forking, and I only seem to be able to figure out how to
fork one process at a time. I'm also unsure if the parent can utilize
variables populated in a child, or if they're completely independent after
the fork.

Anyone have any advice, input, or code snippets for me?

Thanks.

Jason

Bob Showalter

unread,

Mar 3, 2004, 2:16:46 PM3/3/04

to Price, Jason, begi...@perl.org

Price, Jason wrote:
> Not sure if this is the right list for this - if it's not, please
> direct me to the proper list.

You've come to the right place.

>
> Anyway, I'm trying to get my hands around forking, and was hoping you
> all could help me out. Basically, I'm trying to find a way to fire
> off a remote script on numerous boxes in parallel, returning their
> results to the parent script. Here's the basic flow I'm after:
>
> 1. User brings up web page for an on-demand report. Provides user
> input, hits submit, which fires off the parent script.
> 2. Parent script takes user input, and fires off a remote script
> located on all servers provided by user input.
> 3. Remote scripts return results to an array in the parent script.
> 4. Parent script compiles results and formats output for web display.
>
> The process currently works, but runs each remote server in series,
> which takes a considerable amount of time. I've had a hell of a time
> finding a good explanation of forking, and I only seem to be able to
> figure out how to fork one process at a time. I'm also unsure if the
> parent can utilize variables populated in a child, or if they're
> completely independent after the fork.

No. The parent cannot see any variables in the child. You need to use some
form of IPC to communicate between the processes. I would suggest using a
pipe. Here's an example of a parent that forks off three children, and then
reads data back from the children through a common pipe:

#!/usr/bin/perl

use strict;
$| = 1;
my $nc = 3; # number of children to create
pipe READER, WRITER; # pipe for communication

for my $c (1 .. $nc) {
# create a child process
defined(my $pid = fork) or die "Couldn't fork: $!";
next if $pid; # parent loops to create next child

# child does it's thing and writes back to parent through pipe
close READER;
select WRITER;
$| = 1;
print "Hello, I am child $c, and my PID is $$\n";
sleep rand(5) + 1;
print "Goodbye from child $c\n";
exit; # child exits (IMPORTANT!)
}

# parent reads from children
# pipe will close when last child exits
close WRITER;
while(<READER>) {
print $_;
}

1 while wait() > 0; # reap all exit statuses

Sample output:

$ perl myscript.pl
Hello, I am child 1, and my PID is 16774
Hello, I am child 2, and my PID is 16775
Hello, I am child 3, and my PID is 16776
Goodbye from child 2
Goodbye from child 1
Goodbye from child 3

If you need explanation of any of that, let me know.

Jason Price

unread,

Mar 3, 2004, 2:47:44 PM3/3/04

to Bob Showalter, Price, Jason, begi...@perl.org

Bob,

Thanks for the input - it's quite helpful. However, I don't fully
understand some of the code - maybe you could help clear it up for me. The
parts I'm unclear on are:

- the usage of "pipe READER, WRITER", and then the subsequent references to
READER and WRITER.
- the usage of $|
- "1 while wait() > 0"

Hmm...I guess that's the majority of the script. :) I can follow what it
does, but I'm not entirely sure why it does it.

Also, is there any way I can self-contain the output from each child
process?

Thanks.

Jason

Wolf Blaum

unread,

Mar 3, 2004, 4:03:16 PM3/3/04

to Price, Jason, Bob Showalter, begi...@perl.org

On Wednesday 03 March 2004 20:47, Price, Jason generously enriched virtual
reallity by making up this one:

Hi

> Thanks for the input - it's quite helpful.

and nice:-)

> However, I don't fully
> understand some of the code - maybe you could help clear it up for me. The
> parts I'm unclear on are:
>
> - the usage of "pipe READER, WRITER", and then the subsequent references to
> READER and WRITER.

pipe takes two filehandles: a readhandle and a writehandle. all children
inherit filehandles of the parent process as a copy.
Now the children close the readhandle, since they only want to repord back to
the parent and make the writehandle the default filehandle for output using
the select command.

The parent on the other hand doesnt need the writehandle and reads from the
readhandle until eof, which occours after every child has closed the
readhandle(by exiting) - the OS keeps track of the handle and closes it only
after the last process using it closed it.

> - the usage of $|

$| > 0 enables autoflush, ie turns buffering off and thus gives you a "hot
pipe" - eg., STDOUT is usally line bufferd. Default for $| is 0.

> - "1 while wait() > 0"

translates to: "Do nothing while i still have children out there." wait()
waits for the child to teminate and returns its pid once it died.

>
> Hmm...I guess that's the majority of the script. :) I can follow what it
> does, but I'm not entirely sure why it does it.
>
> Also, is there any way I can self-contain the output from each child
> process?

what about prependin a "$$ says:" to each line of child output and doing a
m // in the parent process?
(Not to smart but all I can come up with and less complicated then having
seperate handles for each child :-)
Id be interested in how you solf that.

HTH, Wolf

Bob Showalter

unread,

Mar 3, 2004, 3:57:27 PM3/3/04

to Price, Jason, begi...@perl.org

Price, Jason wrote:
> Bob,
>
> Thanks for the input - it's quite helpful. However, I don't fully

> understand some of the code - maybe you could help clear it up for
> me. The parts I'm unclear on are:
>

Wolf's aready explained most everything. I'll throw in a bit more...

> - the usage of "pipe READER, WRITER", and then the subsequent
> references to READER and WRITER.

pipe() creates a pair of handles that are connected. Anything written to
WRITER can be read from READER. After the fork, both parent and child have a
copy of each handle. So, the child can talk back to the parent by writing
data to WRITER.

Since all the children get a copy of WRITER, they are all writing to the
same pipe. That way, the parent can read the data from all the children on
READER. You might wonder if data from the multiple writers might be
intermingled. As long as any child writes less than PIPE_BUF characters (a
system-specific limit, which POSIX requires to be at least 512 bytes IIRC),
the writes will be atomic.

> - the usage of $|

It's important to flush each write to the pipe to avoid the intermingling.

> - "1 while wait() > 0"

That just reaps the exit statuses to prevent zombies; the children have
already exited (otherwise the loop wouldn't have exited.) You might want the
exit statuses or not. You can also usually use $SIG{CHLD} = 'IGNORE' prior
to the forking loop if you don't care about exit statuses. see "perldoc
perlipc"

>
> Hmm...I guess that's the majority of the script. :) I can follow
> what it does, but I'm not entirely sure why it does it.
>
> Also, is there any way I can self-contain the output from each child
> process?

You can use a separate pipe for each child, but now you have the problem of
reading from multiple handles, which requires using select() or some such. I
go with Wolf's recommendation of having the child pass back his identity
with each message. You can use the identity to split the output back into
arrays or whatever.

Wolf Blaum

unread,

Mar 3, 2004, 5:31:46 PM3/3/04

to Bob_Sh...@taylorwhite.com, begi...@perl.org

On Wednesday 03 March 2004 21:57, Bob Showalter generously enriched virtual

reallity by making up this one:

Hi,

> > - "1 while wait() > 0"
>
> That just reaps the exit statuses to prevent zombies; the children have
> already exited (otherwise the loop wouldn't have exited.) You might want
> the exit statuses or not. You can also usually use $SIG{CHLD} = 'IGNORE'
> prior to the forking loop if you don't care about exit statuses. see
> "perldoc perlipc"

Uh, I appologize - anyway, could you explain that last part to me?
Do you create zombies if you dont handle the exit status of you child by
either wait()ing or setting the signal handler? An what does that zombie do
anyway.

Thx, wolf

Bob Showalter

unread,

Mar 3, 2004, 5:38:50 PM3/3/04

to wolf blaum, begi...@perl.org

When a process ends, the OS keeps the process table entry around until the
parent reaps the exit status by calling wait(). If the parent process is
long-running and doesn't reap the children, you have these "zombie" process
table entries lying around. (The process itself is gone; just the process
table entry remains.)

If the parent exits, the children are inherited by the "init" process, which
periodically reaps any exit statuses.

So, the only problem is really if the parent is long-running.

Wiggins D'Anconia

unread,

Mar 8, 2004, 11:08:45 PM3/8/04

to begi...@perl.org

Bob Showalter wrote:

> Price, Jason wrote:
>
>
>>Hmm...I guess that's the majority of the script. :) I can follow
>>what it does, but I'm not entirely sure why it does it.
>>
>>Also, is there any way I can self-contain the output from each child
>>process?
>
>
> You can use a separate pipe for each child, but now you have the problem of
> reading from multiple handles, which requires using select() or some such. I
> go with Wolf's recommendation of having the child pass back his identity
> with each message. You can use the identity to split the output back into
> arrays or whatever.
>

This is where I do my usual little dance, though untimely as it may be
in this case, across the stage and say "POE" then exit again as if I was
still on vacation....

Though in this case I might be a tad more specific and say
POE::Wheel::Run...

http://danconia.org

Michael C. Davis

unread,

Mar 9, 2004, 10:13:23 AM3/9/04

to Wiggins d'Anconia, begi...@perl.org

At 11:08 PM 3/8/04 -0500, Wiggins d'Anconia wrote:
>This is where I do my usual little dance, though untimely as it may be
>in this case, across the stage and say "POE" then exit again as if I was
>still on vacation....

OK, I'll bite. What's so great about POE, and why, oh, why, do you love it
so?

Shiping Wang

unread,

Mar 9, 2004, 10:35:26 AM3/9/04

to begi...@perl.org, shiping Wang

Hello,

I try to slice AOA into two subsets, then put them back. The problem is
when I put them back, I couldn't do what I like, for example:
Here is Original array
a b 3 4
c d 2 5
e f 13 8
slice into two pieces:

print Dumper of array A
a b
c d
e f

print Dumper of array B

3 4
2 5
13 8
put them back once
Here we put two subsets back together
a b 3 4
c d 2 5
e f 13 8

So far it ok However, if I do it once more, it looks like this:
a b 3 4 a b 3 4
c d 2 5 c d 2 5
e f 13 8 e f 13 8

but I attempt to do is like this:
a b a b
c d c d
e f e f

How can I do it and where I am wrong?

Thanks,

Shiping
_______________________________________________________________________________
use strict;
use warnings;
use Data::Dumper;

my (@a, @b);
my @all =(['a', 'b', 3, 4], ['c', 'd', 2, 5], ['e', 'f', 13, 8]);

print "Original array\n";
foreach my $i (@all){
print join " ",@{$i},"\n";
#cut array in columns;
my @tmp1 = @{$i}[0,1];
push @a, \@tmp1;
my @tmp2 = @{$i}[2,3];
push @b, \@tmp2;
}
print "\n";

print "print Dumper of array A\n";
foreach my $k (@a){
print join " ",@{$k},"\n";
}
print "\n";
# print Dumper @a;

print "print Dumper of array B\n";
print "\n";
foreach my $k (@b){
print join " ",@{$k},"\n";
}
# print Dumper @b;

my @backcolumn1 = @a;

if (scalar @backcolumn1 != scalar @b) {
exit(0);
} else {
for my $i (0 .. $#backcolumn1) {
push @{$backcolumn1[$i]}, @{$b[$i]};
}
}
print "\n";

print "Here we put two subset arraies back together\n";
foreach my $k (@backcolumn1){
print join " ",@{$k},"\n";
}
######################################################;
my @backcolumn2 = @a;
if (scalar @backcolumn2 != scalar @a) {
exit(0);
} else {
for my $i (0 .. $#backcolumn2) {
push @{$backcolumn2[$i]}, @{$a[$i]};
}
}
print "\n";
sleep(1);
print "Here again we put another two subset arraies back together\n";
foreach my $k (@backcolumn2){
print join " ",@{$k},"\n";
}

Wiggins D Anconia

unread,

Mar 9, 2004, 10:54:26 AM3/9/04

to Michael C. Davis, begi...@perl.org

Well, that is a tough one. That is the problem with POE, steep learning
curve. The reason why it is so great is that it allows you to take some
of those things that "Perl makes possible" and turn them back into
things that "Perl makes easy".

In this case the desire to fork multiple processes and maintain
bi-directional communication with them becomes a nightmare of dealing
with forking code, pipes, and all of the other IPC nasties.
POE::Wheel::Run encapsulates all of that providing a very simple
interface for dealing with forking those processes and handling their
input/output through events. So all of the pipe/select stuff gets
hidden. Obviously there are some limitations of what you can do (aka
real-time kind of loses some of its meaning) etc. but for the most part
hiding the gory details is usually not a problem. The same can be said
for TCP servers, and other types of daemons.

I love it because it made an application that was a major pain in the
rear-end incredibly elegant and simple in a very short amount of time
(aka after I got over the learning curve).

http://danconia.org

Michael C. Davis

unread,

Mar 9, 2004, 11:01:43 AM3/9/04

to Wiggins d Anconia, begi...@perl.org

At 08:54 AM 3/9/04 -0700, Wiggins d Anconia wrote:
>In this case the desire to fork multiple processes and maintain
>bi-directional communication with them becomes a nightmare of dealing
>with forking code, pipes, and all of the other IPC nasties.
>POE::Wheel::Run encapsulates all of that providing a very simple
>interface for dealing with forking those processes and handling their
>input/output through events. So all of the pipe/select stuff gets
>hidden. Obviously there are some limitations of what you can do (aka
>real-time kind of loses some of its meaning) etc. but for the most part
>hiding the gory details is usually not a problem. The same can be said
>for TCP servers, and other types of daemons.

So, it's a tool for managing a family of forked processes? It's primary
focus is IPC-related functionality?

Wiggins D Anconia

unread,

Mar 9, 2004, 11:52:22 AM3/9/04

to Michael C. Davis, begi...@perl.org

Not exactly, that would be pigeon holing it. While it can handle those
things very well, it is certainly not limited to them. It is easiest to
think of it as a framework for handling time slicing, or event
programming tasks, or multi-tasking processes. A good way to see what
it is capable of is to check out the cookbook found here:

http://poe.perl.org/?POE_Cookbook

But realize that those are provided as documentation rather than as an
exhaustive list of what it can do. There really isn't an exhaustive list
of what it can do since it is a framework rather than any specific
functionality. I was trying to come up with a good analogy but couldn't
seem to find an appropriate one....

http://danconia.org

Michael C. Davis

unread,

Mar 9, 2004, 12:24:49 PM3/9/04

to Wiggins d Anconia, begi...@perl.org

At 09:52 AM 3/9/04 -0700, Wiggins d Anconia wrote:
> ...

>a framework for handling time slicing, or event
>programming tasks, or multi-tasking processes

If that's what it is, these terms localize it pretty well for me. "An
event-based framework for handling time-slicing in multitasking processes."

Wiggins D Anconia

unread,

Mar 9, 2004, 1:32:00 PM3/9/04

to Michael C. Davis, begi...@perl.org

That represents a good generalization I think. The introduction on
poe.perl.org is better, but longer.

Of course using "event based framework for handling time-slicing in
multitasking processes" often won't work any better than a "famboozled
watchamathingie with xanatically whooziewhatsit" on a beginners list ;-)...

http://danconia.org

Michael C. Davis

unread,

Mar 9, 2004, 1:43:56 PM3/9/04

to Wiggins d Anconia, begi...@perl.org

At 11:32 AM 3/9/04 -0700, Wiggins d Anconia wrote:
> ... "famboozled watchamathingie with xanatically whooziewhatsit" on a
beginners list ;-)...

Oh yeah, since you bring it up .... about the whooziewhatsit ...

Randal L. Schwartz

unread,

Mar 9, 2004, 5:58:32 PM3/9/04

to begi...@perl.org

>>>>> "Michael" == Michael C Davis <mcdav...@knology.net> writes:

Michael> At 09:52 AM 3/9/04 -0700, Wiggins d Anconia wrote:
>> ...
>> a framework for handling time slicing, or event
>> programming tasks, or multi-tasking processes

Michael> If that's what it is, these terms localize it pretty well for me. "An
Michael> event-based framework for handling time-slicing in multitasking processes."

"An event loop on steroids"

Most of us have written programs to do many things, but usually in an
orderly fashion. If you've ever found yourself writing a program that
was trying to do many things *at once* in an orderly fashion, you'll
want an event loop. An event loop associates reactions with each of a
set of actions of interest.

A simple version of this might be like trying to watch two growing
files at once, and interleaving the results into one larger file.
With "simple" Perl, this is pretty hard, because you don't have
asynchronous I/O. With POE, you can grab the lines from "both" files
at the "same" time in an orderly way, performing identical or distinct
actions as each line is seen.

For example, in
<http://www.stonehenge.com/merlyn/PerlJournal/col01.html>, I used POE
to write a process that tails a file, noting the time at which each
line appears, and in parallel also responding to one (or many) web
browsers as a web server, delivering a color-coded version of the
trailing lines of the file. In
<http://www.stonehenge.com/merlyn/LinuxMag/col41.html>, I used POE in
a parallel link checker, overlapping DNS lookups with page fetch
response times to fetch things three to ten times faster in a single
process than I might have done without POE. In
<http://www.stonehenge.com/merlyn/PerlJournal/col09.html>, I tail a
logfile (I do that a lot :) and deliver it to an IRC channel,
throttled appropriately so that I don't get booted off for channel
flooding.

POE is like forking, without the complete separation of data
and troublesome IPC issues.

POE is like threads, but with built-in data locking.

POE requires some discipline, but can be very powerful when used
correctly.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<mer...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Charles K. Clarkson

unread,

Mar 9, 2004, 7:29:09 PM3/9/04

to begi...@perl.org

Shiping Wang <shi...@wubios.wustl.edu> wrote:

: I try to slice AOA into two subsets, then put them back. The problem

: is when I put them back, I couldn't do what I like, for example:

:
: How can I do it and where I am wrong?

Let's take the last part first:

: ... and where I am wrong?
[snip]
: my @backcolumn2 = @a;

Insert:

print Dumper \@a;

Since @a is already back together, the following adds it to
itself.

: if (scalar @backcolumn2 != scalar @a) {

: exit(0);
: } else {
: for my $i (0 .. $#backcolumn2) {
: push @{$backcolumn2[$i]}, @{$a[$i]};
: }
: }
: print "\n";
: sleep(1);
: print "Here again we put another two subset arraies back together\n";
: foreach my $k (@backcolumn2){
: print join " ",@{$k},"\n";

: }

: How can I do it ... ?

You could use sub routines to make the program easier to read and
to limit the number of variable names you need.

First, let's write a sub for printing the AoA. This is a bit
fancier than your version, but the results are similar. The default
column width is 3.

sub dump_aoa {
my $aoa = shift;
my $column_width = shift || 3;

foreach my $array ( @$aoa ) {
printf "% ${column_width}s" x @$array, @$array;
print "\n";
}
print "\n\n";
}

Here's a routine to split the columns. It requires the column
quantity for the left array. It returns two array refs. One for the
left columns and one for the right columns. It needs better error
checking and a default $column to split on.

sub column_split_array {
my $array_ref = shift;
my $column = shift;

my( @left, @right );
foreach my $array ( @$array_ref ) {
push @left, [ @$array[ 0 .. $column - 1 ] ];
push @right, [ @$array[ $column .. $#$array ] ];
}
return( \@left, \@right );
}

Here's one for joining two AoAs. It takes references to two AoAs
and returns a reference to one AoA. It dies on arrays of unequal size,
but doesn't test the size of the inner arrays. It also needs better
error checking.

sub column_join_array {
my( $left_array, $right_array ) = @_;
die "Arrays must be same size.\n"
unless @$left_array == @$right_array;

my @return_array;
foreach my $index ( 0 .. $#$left_array ) {
push @return_array, [
@{ $left_array->[ $index ] },
@{ $right_array->[ $index ] },
];
}
return \@return_array;
}

With these defined your script could be written as:

my @array = (
[ 'a', 'b', 1, 2 ],
[ 'c', 'd', 3, 4 ],
[ 'e', 'f', 5, 6 ],
);

my( $left, $right ) = column_split_array( \@array, 2 );

dump_aoa( $left );
dump_aoa( $right );

dump_aoa( column_join_array( $left, $right ) );

dump_aoa( column_join_array( $left, $left ) );

HTH,

Charles K. Clarkson
--
Mobile Homes Specialist
254 968-8328