Strange system() slowdown when using Inline::C

rbg

unread,

Mar 5, 2009, 12:15:01 PM3/5/09

to

Before reading the code below, I want to make it clear that this code
is just a *simple reproduction* of a bug I'm seeing in some
computational biology code. So please don't point out things like,
"why are you using system() to run date?!?". :-)

The problem I'm having is that when I make the array foo[] large, the
calls to system() slow down enormously. I'm measuring this problem
using `perl -d:DProf ...`

Test case 1 below is slow, test case 2 is faster. The only difference
is LEN = (1<<26) in test case 1 and LEN = (1<<10) in test case 2. I
welcome any ideas on how to debug this bizarre problem.

Thanks!
Ross

Test case 1:
-------------------------
use Inline C;
use strict;
hello();

sub foo {
for my $i (1..1000) {
system('date > /dev/null');
}
}
foo();
__END__
__C__
#define LEN (1<<26)
static char foo[LEN];

void hello() {
int i;
printf("hello world\n");
for (i = 0; i < LEN; i++) {
foo[i] = 0;
}
}
-------------------------

Test case 2
-------------------------
use Inline C;
use strict;
hello();

sub foo {
for my $i (1..1000) {
system('date > /dev/null');
}
}
foo();
__END__
__C__
#define LEN (1<<10)
static char foo[LEN];

void hello() {
int i;
printf("hello world\n");
for (i = 0; i < LEN; i++) {
foo[i] = 0;
}
}
-------------------------

rbg

unread,

Mar 5, 2009, 12:24:49 PM3/5/09

to

I should include the DProf output for these cases.

Test 1:

Total Elapsed Time = 12.65265 Seconds
User+System Time = 3.202649 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
90.5 2.900 2.900 1 2.9000 2.9000 main::foo

Test 2:

Total Elapsed Time = 4.697635 Seconds
User+System Time = 0.337635 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
77.0 0.260 0.260 1 0.2600 0.2600 main::foo

You can see that in test 2, foo() is roughly 10x faster.

sisyphus

unread,

Mar 5, 2009, 6:18:13 PM3/5/09

to

On Mar 6, 4:24 am, rbg <ross.girsh...@gmail.com> wrote:

> You can see that in test 2, foo() is roughly 10x faster.

What happens if hello() is written as a perl sub (and Inline::C is not
used at all) - something like (untested):

use constant LEN => 1<<26;

sub hello {
my $i;
my @foo;
print "hello world\n";
for($i = 0; $i < LEN; $i++) {
$foo[$i] = chr(0);
}
}

Does the speed with which foo() executes still vary in the same way,
depending upon the value of LEN ?
Or does the variation occur *only* when hello() is written as an
Inline::C sub ?

Cheers,
Rob

rbg

unread,

Mar 5, 2009, 6:21:21 PM3/5/09

to

Continuing the discussion with myself, it seems that the problem is
really much more basic.

This script:
--------
use strict;
my $bf = 'a' x (1<<26);

sub foo {
for my $i (1..1000) {
system('date > /dev/null');
}
}
foo();

--------

is much slower than this one than this one:

--------
use strict;
my $bf = 'a' x (1<<10);

sub foo {
for my $i (1..1000) {
system('date > /dev/null');
}
}
foo();

--------.

I can reproduce this issue ubuntu with perl 5.10.0 (and debian with
perl 5.8.8), but *not* on OS X with perl 5.8.8. It seems like it has
something to do with how the perl interpreter is cloned when system()
is called (seems that it isn't actually doing copy-on-write forking).
Anyone have a deeper understanding of this issue?

rbg

unread,

Mar 5, 2009, 6:22:50 PM3/5/09

to

On Mar 5, 5:21 pm, rbg <ross.girsh...@gmail.com> wrote:
> Continuing the discussion with myself, it seems that the problem is
> really much more basic.

Oops, I wrote that before I noticed someone else had replied! I
didn't intend to be rude :-).

A. Sinan Unur

unread,

Mar 5, 2009, 6:46:06 PM3/5/09

to

rbg <ross.g...@gmail.com> wrote in news:4eeea107-69d5-4e3d-9610-
7792ca...@c11g2000yqj.googlegroups.com:

> Continuing the discussion with myself, it seems that the problem is
> really much more basic.
>
> This script:
> --------
> use strict;
> my $bf = 'a' x (1<<26);
>
> sub foo {
> for my $i (1..1000) {
> system('date > /dev/null');
> }
> }
> foo();
> --------
>
> is much slower than this one than this one:
>
> --------
> use strict;
> my $bf = 'a' x (1<<10);
>
> sub foo {
> for my $i (1..1000) {
> system('date > /dev/null');
> }
> }
> foo();
> --------.

I don't see that.

This is perl, v5.10.0 built for MSWin32-x86-multi-thread
(with 5 registered patches, see perl -V for more detail)

Binary build 1004 [287188] provided by ActiveState

C:\DOCUME~1\asu1\LOCALS~1\Temp\q> cat t.pl

use strict;
my $bf = 'a' x (1<<26);

sub foo {
for my $i (1..1000) {

system('date /t > NUL');
}
}
foo();

TimeThis : Command Line : t.pl
TimeThis : Start Time : Thu Mar 05 18:39:11 2009
TimeThis : End Time : Thu Mar 05 18:40:11 2009
TimeThis : Elapsed Time : 00:01:00.078

C:\DOCUME~1\asu1\LOCALS~1\Temp\q> cat s.pl

use strict;
my $bf = 'a' x (1<<10);

sub foo {
for my $i (1..1000) {

system('date /t > NUL');
}
}
foo();

TimeThis : Command Line : s.pl
TimeThis : Start Time : Thu Mar 05 18:41:35 2009
TimeThis : End Time : Thu Mar 05 18:42:34 2009
TimeThis : Elapsed Time : 00:00:59.640

C:\DOCUME~1\asu1\LOCALS~1\Temp\q> cat z.pl
use strict;

sub foo {
for my $i (1..1000) {

system('date /t > NUL');
}
}
foo();

TimeThis : Command Line : z.pl
TimeThis : Start Time : Thu Mar 05 18:43:11 2009
TimeThis : End Time : Thu Mar 05 18:44:10 2009
TimeThis : Elapsed Time : 00:00:59.015

C:\DOCUME~1\asu1\LOCALS~1\Temp\q> cat u.pl

use strict;
my $bf = 'a' x (1<<26);

TimeThis : Command Line : u.pl
TimeThis : Start Time : Thu Mar 05 18:45:05 2009
TimeThis : End Time : Thu Mar 05 18:45:06 2009
TimeThis : Elapsed Time : 00:00:00.390

--
A. Sinan Unur <1u...@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Big and Blue

unread,

Mar 5, 2009, 6:56:45 PM3/5/09

to

rbg wrote:
>
> I can reproduce this issue ubuntu with perl 5.10.0 (and debian with
> perl 5.8.8), but *not* on OS X with perl 5.8.8. It seems like it has
> something to do with how the perl interpreter is cloned when system()
> is called (seems that it isn't actually doing copy-on-write forking).
> Anyone have a deeper understanding of this issue?

The size of the process is *much* greater in the 1<<26 case, so there will be a much greater process
space to create (it may not copy the pages, but it has to create page table entries for it?).

Looking at top when these run:

1<<10 VIRT=20848, RES=2060
1<<26 VIRT=148m. RES=130m

If you undef $bf after setting it to the long string you'll find that, for the 1<<26 case you end up
with something in between, both in timing and size.

VIRT=86388, RES=66m

--
Just because I've written it doesn't mean that
either you or I have to believe it.

Ben Morrow

unread,

Mar 5, 2009, 7:12:47 PM3/5/09

to

Quoth "A. Sinan Unur" <1u...@llenroc.ude.invalid>:

>
> I don't see that.
>
> This is perl, v5.10.0 built for MSWin32-x86-multi-thread
> (with 5 registered patches, see perl -V for more detail)

Well, no. Since this is probably a problem with fork(2) of a large
address space, and Win32 doesn't use fork(2), you wouldn't.

Ben

A. Sinan Unur

unread,

Mar 5, 2009, 8:03:40 PM3/5/09

to

Ben Morrow <b...@morrow.me.uk> wrote in news:vc6686-it1.ln1
@osiris.mauzo.dyndns.org:

OK then ... I tried it on my Eee PC 701 with ArchLinux using the stock
perl v5.10.0 while downloading system updates.

time shows:

real 0m37.892s
user 0m17.959s
sys 0m17.882s

For control, omitting the large assignment to $bf yields

real 0m24.965s
user 0m17.586s
sys 0m6.416s

On the other hand, using

my $bf = 'a' x (1 << 10);

yields

real 0m24.748s
user 0m17.322s
sys 0m6.526s

Clearly, there is a slowdown with the larger $bf, but there is not a 10x
difference.

PS: Apologies for not copying and pasting.

Sinan

rbg

unread,

Mar 5, 2009, 9:19:00 PM3/5/09

to

Thanks for your responses.

I guess my real question is, can this performance penalty be
eliminated? My understanding is that linux uses copy-on-write for the
fork() system call, so forking a process is very cheap, and
independent of memory usage (modulo page table copying). My
understanding at this point is that, somehow, perl has found a way to
defeat the copy-on-write gains. This seems rather unfortunate. Is
there some way to tell perl, "when I call system() don't copy the
state of the interpreter"?

thanks
-ross

tus...@gmail.com

unread,

Mar 6, 2009, 10:43:23 AM3/6/09

to

On Mar 6, 1:12 am, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth "A. Sinan Unur" <1...@llenroc.ude.invalid>:

I have looked up perldoc -f system where it says:
"[...] Does exactly the same thing as exec LIST, except that a fork is
done first, and the parent process waits for the child process to
complete [...]"

I see that system() involves a fork, but a naïve question here: Why
does system() need to fork, can it not just simply run whatever needs
running (a shell program presumably) without forking and return to the
perl program to continue the program ? -- this would avoid the
unneccessary large address space.

Big and Blue

unread,

Mar 6, 2009, 7:14:23 PM3/6/09

to

tus...@gmail.com wrote:
>
> I see that system() involves a fork, but a naïve question here: Why
> does system() need to fork,

Because it needs another process to run the command in system (it actually does fork+exec). The
current process still needs to be there to pick up the story afterwards.

Ben Morrow

unread,

Mar 6, 2009, 7:05:20 PM3/6/09

to

Quoth tus...@gmail.com:

>
> I see that system() involves a fork, but a naïve question here: Why
> does system() need to fork, can it not just simply run whatever needs
> running (a shell program presumably) without forking and return to the
> perl program to continue the program ? -- this would avoid the
> unneccessary large address space.

Under traditional Unix systems there is no way to create a new process
without using fork(2).

Ben

Ben Morrow

unread,

Mar 6, 2009, 8:00:09 PM3/6/09

to

Quoth rbg <ross.g...@gmail.com>:

>
> I can reproduce this issue ubuntu with perl 5.10.0 (and debian with
> perl 5.8.8), but *not* on OS X with perl 5.8.8. It seems like it has
> something to do with how the perl interpreter is cloned when system()
> is called (seems that it isn't actually doing copy-on-write forking).
> Anyone have a deeper understanding of this issue?

Run

perl -V:d_vfork

on each of these systems. I strongly suspect Mac OS X is using vfork,
which is more efficient for a fork almost immediately followed by an
exec, whereas Linux is not, but it would be interesting to confirm this.

On FreeBSD, with d_vfork='define', I get no noticable increase in
execution time with a large string, but if I replace the system with

fork or exec "true";
waitpid -1, 0;

I do.

Ben

rbg

unread,

Mar 6, 2009, 9:06:38 PM3/6/09

to

Ben -- right, you are. On the OS X system I get d_vfork='define' and
on the linux systems I get d_vfork='undef'. Is this a compile time
configuration option or is there a way to change it at runtime?

Thanks,
ross

Ben Morrow

unread,

Mar 7, 2009, 8:12:52 AM3/7/09

to

Quoth rbg <ross.g...@gmail.com>:

> Ben -- right, you are. On the OS X system I get d_vfork='define' and
> on the linux systems I get d_vfork='undef'. Is this a compile time
> configuration option or is there a way to change it at runtime?

It's set at perl Configure time. You could try building perl with
-Dusevfork on the Configure line, but there may be a good reason for it
being disabled by default. Make sure you run all the tests.

Ben

rbg

unread,

Mar 7, 2009, 8:44:53 AM3/7/09

to

On Mar 7, 7:12 am, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth rbg <ross.girsh...@gmail.com>:

>
> It's set at perl Configure time. You could try building perl with
> -Dusevfork on the Configure line, but there may be a good reason for it
> being disabled by default. Make sure you run all the tests.
>
> Ben

The linux man pages caution strongly against using vfork() -- saying
that its deprecated and should not be used because fork() is
implemented efficiently using COW. This brings me back to the same
question: if fork() is behaving efficiently, what else is perl doing
on linux that is causing the execution time of system() to grow
(significantly) with process size? Maybe I need to dig into the perl
source to answer this question...

-ross

Ben Morrow

unread,

Mar 7, 2009, 10:19:46 AM3/7/09

to

Quoth rbg <ross.g...@gmail.com>:

> On Mar 7, 7:12 am, Ben Morrow <b...@morrow.me.uk> wrote:
> > Quoth rbg <ross.girsh...@gmail.com>:
> >
> > It's set at perl Configure time. You could try building perl with
> > -Dusevfork on the Configure line, but there may be a good reason for it
> > being disabled by default. Make sure you run all the tests.
>

> The linux man pages caution strongly against using vfork() -- saying
> that its deprecated and should not be used because fork() is
> implemented efficiently using COW.

Even with COW, fork(2) still has to copy all the page tables, which
makes it linear in the size of the process address space, albeit with a
much smaller constant than without COW. vfork(2) should be more-or-less
constant time (it will be linear in the number for open file
descriptors, since those are still cloned).

> This brings me back to the same question: if fork() is behaving
> efficiently, what else is perl doing on linux that is causing the
> execution time of system() to grow (significantly) with process size?
> Maybe I need to dig into the perl source to answer this question...

OK, this is odd. Running perl -e'system "true"' under ktrace gives me

<snip>
4640 perl CALL fork
4640 perl RET fork 4641/0x1221
4640 perl CALL close(0x4)
4640 perl RET close 0
4640 perl CALL sigaction(SIGINT,0xbfbfe338,0xbfbfe3d8)
4640 perl RET sigaction 0
4640 perl CALL sigaction(SIGQUIT,0xbfbfe338,0xbfbfe3c0)
4640 perl RET sigaction 0
4640 perl CALL wait4(0x1221,0xbfbfe3fc,<invalid>0,0)
4640 perl RET wait4 4641/0x1221
4640 perl CALL sigaction(SIGINT,0xbfbfe3d8,0)
4640 perl RET sigaction 0
4640 perl CALL sigaction(SIGQUIT,0xbfbfe3c0,0)
4640 perl RET sigaction 0
<snip>

so even though I have d_vfork='define' perl isn't actually using it for
system. It seems this changed with perl 5.8. Now I have no idea what's
going on... :).

Ben

rbg

unread,

Mar 7, 2009, 10:57:03 AM3/7/09

to

And on my linux system it's implemented with clone, not fork

strace perl -e'system "true"'
<snip>
brk(0x8b9c000) = 0x8b9c000
close(3) = 0
pipe([3, 4]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
SIGCHLD, child_tidptr=0xb7e9a908) = 16784
--- SIGCHLD (Child exited) @ 0 (0) ---
close(4) = 0
rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0
waitpid(16784, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 16784
rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL}, NULL, 8) = 0
</snip>

Odd.

Peter J. Holzer

unread,

Mar 7, 2009, 3:51:39 PM3/7/09

to

On 2009-03-07 15:57, rbg <ross.g...@gmail.com> wrote:
> And on my linux system it's implemented with clone, not fork
>
> strace perl -e'system "true"'

> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
> SIGCHLD, child_tidptr=0xb7e9a908) = 16784
>

> Odd.

fork() and vfork() are just wrapper functions for clone() on Linux (the
"real" fork and vfork syscalls still exist for binary compatibility but
have been deprecated for a very long time and aren't used by any current
libc (AFAIK)).

hp

Peter J. Holzer

unread,

Mar 7, 2009, 3:46:24 PM3/7/09

to

On 2009-03-07 15:19, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth rbg <ross.g...@gmail.com>:
>> On Mar 7, 7:12 am, Ben Morrow <b...@morrow.me.uk> wrote:
>> > It's set at perl Configure time. You could try building perl with
>> > -Dusevfork on the Configure line, but there may be a good reason for it
>> > being disabled by default. Make sure you run all the tests.
>>
>> The linux man pages caution strongly against using vfork() -- saying
>> that its deprecated and should not be used because fork() is
>> implemented efficiently using COW.
>
> Even with COW, fork(2) still has to copy all the page tables, which
> makes it linear in the size of the process address space, albeit with a
> much smaller constant than without COW. vfork(2) should be more-or-less
> constant time (it will be linear in the number for open file
> descriptors, since those are still cloned).

This is correct. A simple C program which just calls (v)fork/waitpid in a
tight loop (the child process exits immediately) shows that the
execution time increases with the process size for fork, but is roughly
constant for vfork.

See http://www.hjp.at/os/fork-vs-vfork/ for a graph.

hp

Ilya Zakharevich

unread,

Mar 7, 2009, 10:12:08 PM3/7/09

to

On 2009-03-07, rbg <ross.g...@gmail.com> wrote:
> The linux man pages caution strongly against using vfork() -- saying
> that its deprecated and should not be used because fork() is
> implemented efficiently using COW. This brings me back to the same
> question: if fork() is behaving efficiently, what else is perl doing

As you know now, using fork() for startup of a child process was one
of the most brain-damaged decisions of designers of a certain one-user
non-multiprocessing OS for micro-computers.

When people say that "fork() is efficient" they mean just that it is
"more effecient than it was when one had no COW optimization".

Hope this helps,
Ilya

P.S. I wonder why people do not enable Perl to use the POSIX variant
of spawn(). All the necessary code is already there in DOSISH
branches of the codebase...