Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Perl crash on regex

2 views
Skip to first unread message

Jamie McCarthy

unread,
Sep 14, 1999, 3:00:00 AM9/14/99
to
I've been chasing this bug for months, it's highly nonrepeatable,
and I'm putting out too many fires to produce a test case at the
moment. (The real-world code it's running in is many thousands
of lines long.) So I'll just ask if anyone else has seen
something like this, if it's a known bug.

It manifested when sorting a large (4000+) list of URLs. The perl
process just crashed: nothing output to STDERR, no core file
left over, it was just gone. Often it would work for days at a
time; then there would be long periods when it would not work at
all. I have also seen crashes _near_ this segment of code: again,
the perl process just exits silently. They have not seemed
related to the sort but it's hard to tell.

The sort uses a custom routine to sort the URLs into an order
that's a little more natural than alphanumeric, which I call
"URLDomainOrder". For debugging, I added a filehandle to the
comparison function; it looks like this now:

[...]
my @ers_keys = keys %$cur_ers_ref;
open(UDOFH, ">/tmp/udofh") or die "can't write udofh, $!";
select(UDOFH); $|=1; select(STDOUT);
my @sorted_ers_keys =
sort { &URLDomainOrder($a, $b, \*UDOFH) }
@ers_keys;
close UDOFH;
[...]

The comparison function starts off like this:

sub URLDomainOrder {

my($a, $b, $fh) = @_;
$fh = '' if !$fh;

if ($fh) {
print $fh "UDO1 a='$a' b='$b'\n";
}

my($a_www, $b_www,
$a_host, $b_host,
$a_port_given, $b_port_given,
$a_host_num_str, $b_host_num_str,
$a_domain, $b_domain,
$a_path, $b_path) = ( ) x 12;
my($a_host_num_val, $b_host_num_val) = (0, 0);
my($a_port, $b_port) = (80, 80);

if ($fh) {
print $fh "UDO a='$a' b='$b'\n";
}

if ($a and $b) {

if ($fh) {
print $fh "UDO a='$a' b='$b'\n";
}
($a_www, $a_host, $a_port, $a_path) = $a =~ m!^(?:http://)?(www\.)?([^/]+?)(\:\d+)?(/.*)!;
if ($fh) {
print $fh "UDO a_www='$a_www' a_host='$a_host' a_port='$a_port' a_path='$a_path'\n";
}
($b_www, $b_host, $b_port, $b_path) = $b =~ m!^(?:http://)?(www\.)?([^/]+?)(\:\d+)?(/.*)!;
if ($fh) {
print $fh "UDO b_www='$b_www' b_host='$b_host' b_port='$b_port' b_path='$b_path'\n";
}

[...snip...]

The last lines left in my debugging log file before the crash are
these (and this is at least somewhat repeatable, it's crashed
exactly like this twice in a row):

UDO1 a='http://www.projo.com/report/pjb/stories/02573501.htm' b='http://www.projo.com/report/pjb/stories/02568772.htm'
UDO a='http://www.projo.com/report/pjb/stories/02573501.htm' b='http://www.projo.com/report/pjb/stories/02568772.htm'
UDO a='http://www.projo.com/report/pjb/stories/02573501.htm' b='http://www.projo.com/report/pjb/stories/02568772.htm'
UDO a_www='www.' a_host='projo.com' a_port='' a_path='/report/pjb/stories/02573501.htm'

It thus appears to have crashed on the last regex above ("$b_www").
Of course this regex works fine on the data normally, and in fact
the debugging log file is full of a few hundred other references to
this same URL in which the regex was successfully executed.

This behavior is with perl5.005_61, and though I haven't run the
debugging log with 5.005_03, essentially identical behavior was
occurring with 5.005_03 as well.

My guess is that perl has a bug causing a stray pointer, but I
don't know enough to chase this down.

I can do workarounds, but I don't like this. Any advice will be
appreciated.

--
Jamie McCarthy
ja...@mccarthy.org

Ilya Zakharevich

unread,
Sep 16, 1999, 3:00:00 AM9/16/99
to
[A complimentary Cc of this posting was sent to Jamie McCarthy
<ja...@mccarthy.org>],
who wrote in article <37DEBA1C...@mccarthy.org>:

> It manifested when sorting a large (4000+) list of URLs. The perl
> process just crashed: nothing output to STDERR, no core file
> left over, it was just gone.

Are you sure you look for core in a correct directory? What is your
core-size limit? What is the exit code the parent process gets?

Ilya

Tom Phoenix

unread,
Sep 16, 1999, 3:00:00 AM9/16/99
to
On Tue, 14 Sep 1999, Jamie McCarthy wrote:

> It manifested when sorting a large (4000+) list of URLs. The perl
> process just crashed: nothing output to STDERR, no core file
> left over, it was just gone.

Could core files be disabled, or could there be too little (non-reserved)
space remaining for the large core? Other than that, I can't see why there
would be no core.

Your sorting problem might benefit from the Schwartzian Transform, so that
you won't have to process each URL multiple times.

> My guess is that perl has a bug causing a stray pointer, but I
> don't know enough to chase this down.

I'd suspect a memory leak. But I'd expect a core, too. It sounds as if
your perl (or your libraries) has a bug of misconfiguration.

If you're using perl's malloc, try your system's, and vice versa.

Good luck!

--
Tom Phoenix Perl Training and Hacking Esperanto
Randal Schwartz Case: http://www.rahul.net/jeffrey/ovs/

Jamie McCarthy

unread,
Sep 16, 1999, 3:00:00 AM9/16/99
to
Ilya Zakharevich wrote:

> Are you sure you look for core in a correct directory? What is your
> core-size limit? What is the exit code the parent process gets?

There are no core files on any filesystem on this machine.
My core size limit is 1000000 blocks, which it seems is 512 MB
(this perl process takes up 5-10 MB, 20 at most). I don't know
the exit code because its parent process has long since exited.

Tom Phoenix wrote:

> Could core files be disabled, or could there be too little (non-reserved)
> space remaining for the large core? Other than that, I can't see why there
> would be no core.

There's plenty of space free on every filesystem (gigabytes).

> Your sorting problem might benefit from the Schwartzian Transform, so that
> you won't have to process each URL multiple times.

That's true. I've been meaning to get around to that :-)

> > My guess is that perl has a bug causing a stray pointer, but I
> > don't know enough to chase this down.
>
> I'd suspect a memory leak. But I'd expect a core, too. It sounds as if
> your perl (or your libraries) has a bug of misconfiguration.

I confirmed that it does happen both on the 5.00503 shipped with
Red Hat 6.0, and on 5.00561 as installed with "Configure -des".

> If you're using perl's malloc, try your system's, and vice versa.

That's an excellent idea. I'll try perl's malloc.

(Un)Fortunately, everything has worked fine for the last 48 hours
with no changes in the code; a few dozen of these large sorts
have been done in that time. So I won't know whether any changes
have made it better or worse. Phase of the moon, for all I know...

--
Jamie McCarthy

Mark Doyle

unread,
Sep 21, 1999, 3:00:00 AM9/21/99
to
[posted and e-mailed]

In <37E121D1...@mccarthy.org> Jamie McCarthy wrote:

> (Un)Fortunately, everything has worked fine for the last 48 hours
> with no changes in the code; a few dozen of these large sorts
> have been done in that time. So I won't know whether any changes
> have made it better or worse. Phase of the moon, for all I know...

Another guess: There are known problems with perl signal handling in that
you can crash a non-threaded perl by sending it a signal at an inopportune
time. Perhaps your process is getting a signal of some sort (no pun
intended) and this is most likely to occur during the time the program
spends in the large sort which causes the process to go belly up silently?
From what I understand, it is a hit-or-miss kind of thing - sometimes the
signal is handled gracefully, sometimes (usually very rarely) it isn't. But
with large sorts, perhaps you would be more likely to run into the problem.

Cheers,
Mark

Ilya Zakharevich

unread,
Sep 22, 1999, 3:00:00 AM9/22/99
to
[A complimentary Cc of this posting was sent to Mark Doyle
<do...@aps.org>],
who wrote in article <7s84eq$7av$1...@constellation.acp.org>:

> Another guess: There are known problems with perl signal handling in that
> you can crash a non-threaded perl by sending it a signal at an inopportune
> time. Perhaps your process is getting a signal of some sort (no pun
> intended) and this is most likely to occur during the time the program
> spends in the large sort which causes the process to go belly up silently?
> From what I understand, it is a hit-or-miss kind of thing - sometimes the
> signal is handled gracefully, sometimes (usually very rarely) it isn't.

I think without my voodoo patch your "very rarely" is 1/30. With the
voodoo patch it was down to less than 1/100000. (I had seen a failure
only once on many *very* long tests - with a signal each 30ms tick. I
could not get a better granularity on OS/2.)

Ilya

P.S. I do not remember what the voodoo patch was doing. It *should
not have* made any difference... See archives for details.

0 new messages