Needed: a pointer for a perl compare script

2 views
Skip to first unread message

Dick O'Connor

unread,
Nov 30, 1990, 10:47:36 AM11/30/90
to
I've been following this group for awhile, cutting and saving sample scripts
like my Mom clips recipes, but I still haven't picked up enough pointers to
migrate one of my old Fortran "compare" programs to perl. If I could do
this, the users could make their own runs and I'd be free to contemplate a
higher order of existence (programming! :)

My program reads two files of differing format which are sorted by a unique
5-character label. When two labels match, a new record is written, with
info from file A (moved around a bit) written to the "left" and info from
file B (again, reformatted a little) written to the "right". Where a
given record from file A or B has no counterpart, the same new record is
written, with blanks on the "side" without counterpart information.

I know this is simple; it's a short program now. But there's something
I'm just not seeing that blocks my conversion to perl. A pointer to a
suggested construct would be wonderful...I'm happy to work out the details.

BTW I've been away for two weeks; did I miss The Announcement about The Book?

"Moby" Dick O'Connor djo...@u.washington.edu
Washington Department of Fisheries *I brake for salmonids*

Randal Schwartz

unread,
Nov 30, 1990, 2:48:13 PM11/30/90
to
In article <12...@milton.u.washington.edu>, djo7613@hardy (Dick O'Connor) writes:
| My program reads two files of differing format which are sorted by a unique
| 5-character label. When two labels match, a new record is written, with
| info from file A (moved around a bit) written to the "left" and info from
| file B (again, reformatted a little) written to the "right". Where a
| given record from file A or B has no counterpart, the same new record is
| written, with blanks on the "side" without counterpart information.
|
| I know this is simple; it's a short program now. But there's something
| I'm just not seeing that blocks my conversion to perl. A pointer to a
| suggested construct would be wonderful...I'm happy to work out the details.


Hmm... (warning... untested code follows)...

If I could fit all of file A and B into memory (my preferred tactic), I'd
do something like this:

open(A,"Afile");
while(<A>) {
chop;
($label,$rest) = unpack("a5a*",$_);
$a{$label} = $rest;
$both{$label}++;
}
close(A);
open(B,"Bfile");
while(<B>) {
chop;
($label,$rest) = unpack("a5a*",$_);
$b{$label} = $rest;
$both{$label}++;
}
close(B);
for (sort keys both) {
$left = defined($a{$_}) ? $a{$_} : "left default";
$right = defined($b{$_}) ? $b{$_} : "right default";
print "$_ $left $right\n";
}

This'd probably take some massaging, but I hope you get the general
idea. If they don't both fit into memory, you will have to do some
juggling to read from a or b depending on which of the current labels
are lower. This solution here is much simpler (and elegant :-) by
comparison.

| BTW I've been away for two weeks; did I miss The Announcement about The Book?

The only announcement you may have missed is that Larry and I are
working intensely to incorporate the review comments and finish up the
final draft so that we can still make the Usenix deadline.

print "Just another Perl [book] hacker,"
--
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III |
| mer...@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/

Leslie Mikesell

unread,
Dec 4, 1990, 6:04:36 PM12/4/90
to
In article <12...@milton.u.washington.edu> djo...@hardy.u.washington.edu.acs.washington.edu (Dick O'Connor) writes:
>I've been following this group for awhile, cutting and saving sample scripts
>like my Mom clips recipes, but I still haven't picked up enough pointers to
>migrate one of my old Fortran "compare" programs to perl. If I could do
>this, the users could make their own runs and I'd be free to contemplate a
>higher order of existence (programming! :)

>My program reads two files of differing format which are sorted by a unique
>5-character label. When two labels match, a new record is written, with
>info from file A (moved around a bit) written to the "left" and info from
>file B (again, reformatted a little) written to the "right". Where a
>given record from file A or B has no counterpart, the same new record is
>written, with blanks on the "side" without counterpart information.

Perl is the language of choice for this kind of thing but it may still
turn out to be non-trivial. It is also fairly hard to describe
so examples are generally needed. The merging subroutine is > half the
file so I'll just include the whole thing. The concept here is to store
old and new items into different associative arrays, sort the keys,
then make the comparison from the top of each list.

Here is a sample that takes a stream that looks like this from a
legislative database:
NOXIOUS WEEDS, PEST ERADICATION - 1.2.5

VT H 2 AUTHOR: ...
TOPIC: ...
SUBTOPIC: ...

SUMMARY:
........
........

STATUS:
.......
.......

VT H 5 AUTHOR:
etc...

and files items under /dir/state/number, where state is taken from the
first 2 characters of the bill id, and number is last portion of the
topic line. Within the file, items are sorted by their bill id with
an additional header added to note the date and whether the bill has
been signed.
A subsequent entry (possibly an update) with the same bill id will be
merged by extracting the SUMMARY: portion of the old entry and stuffing
it into the new data which will contain the current status.
The merging portion is done in the writestate subroutine.

Les Mikesell
l...@chinet.chi.il.us
#----------------
# merge.pl
# put legislative info into files:
# 1 directory per state, 1 file per topic
# merge w/current - if new includes summary, use it,
# else snarf summary from old
# collect one state from current input - then read current info & merge
#
# top dir of tree:
$dir = './test';
open (ERR,">>errlog") ;
#
%nitems=();
$haveitem = 0;
$havetopic = "";
$havestate = "";
$havesum = 0;
$instatus = 0;
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)=localtime(time);
$mon += 1;
$signed = 'n';
while (<>) { # read input file(s)
# strip page breaks and some other junk
if (/ Page/) { next;}
if ( / /) { next;} # these are always blank lines
if ( /^ *\f/) { next;} # form feeds - ignore
if ( /^ *$/) {
$bl=1;
next; # blank lines
}
unless (/\n$/) { # possible junk at EOF
if ($instatus) { # past a complete item
do saveitem();
do writestate();
} else { # incomplete item, junk it
$haveitem = 0;
$havestate = "";
$instatus = 0;
$havesum = 0;
}
next;
}
# find: AGRICULTURAL LOANS - 1.4.1
if ( /^[A-Z].* \- [0-9][0-9]*\.[0-9][0-9]*/) { # new topic
$topic=$_;
# print "$topic \n";
$topic =~ s/.* \- // ;
# print "$topic \n";
$topic =~ s/[^0-9]*$// ;
# print "$topic \n";
if ( $haveitem != 0 ) {
do saveitem(); # save current entry
do writestate(); # write current state
}
$havetopic = $topic;
print "Topic = $topic\n";
next;
}
# if start of new item, save last in %nitems
# if start of new state, merge last state
if (/^[A-Z][A-Z].*AUTHOR:/) { #new item
if ($haveitem) {do saveitem();}
/^([A-Z][A-Z]) *([0-9A-Za-z][0-9A-Za-z]*) *([0-9].*)AUTHOR:/;
if (length($1) != 2) {
print "**** BILL ID ERROR in $_ **** \n";
print ERR "**** BILL ID ERROR in $_ **** \n";
next;
}
$sortid=sprintf("%2.2s %s %s",$1,$2,$3);
$sortid =~ s/ *$// ;
$state=substr($_,0,2);
$state =~ y/A-Z/a-z/;
if ($havestate ne "" && $havestate ne $state) {
# $save = $_ ;
do writestate();
# print "Input line: was $save\n now $_\n";
}
$id = substr($_,0,14);
$id =~ s/ *$//; #trim trailing space
$havestate=$state;
$_ =~ s/.*AUTHOR:/AUTHOR:/ ;
$_ =~ s/ *$//;
$item = $_ ;
#print "New: Itemid = $id $sortid\n";
$haveitem = 1;
next;
}

if (/SUMMARY:/) { $havesum = 1;}
if ($haveitem == 0 ) { next;}
if (/STATUS:/) { $instatus = 1;}
if ($instatus ) {
if ( /RATIFIED/) { $signed = 's';} #NC variation
if ( /[Ss]igned/) { $signed = 's';}
}
if (/END OF REPORT/) {
do saveitem();
do writestate();
$havetopic="";
next;
}
$_ =~ s/ // ; # strip right indent
$item .= $_ ; # append current line to item
}

do saveitem();
do writestate();
exit;

# save item to %nitems array w/sortid as key
# add header line of:
# >smmddyy
# where s = s or n (signed or not)
# m = month
# d = day date item is written to database (now)
# y = year (2 digits)
# if duplicate keep one with summary field

sub saveitem {
if ($haveitem = 0 ) {return; }
if ($havetopic eq "" ) {return; }
$haveitem = 0;
$instatus = 0;
if ($havesum == 0) { # no summary, check for alternate
if ($nitems{$sortid} && $nitems{$sortid} =~ /SUMMARY:/) {
return;
}
}
$nitems{$sortid} = sprintf (">%s%02d%02d%02d\n%s%s\n%s",$signed,$mon,$mday,$year,"BILL ID: ",$id,$item) ;
$signed = 'n';
$havesum = 0;
return;
}

# load any current items in for merging
sub writestate {
local ($_) ; # important to not alter upper $_
local ($*) = 1 ; # multi-line match needed
%oitems=();
@olist=();
# sanity check - may not have any new input
if ($havetopic eq "" ) {return; }
if ($havestate eq "" ) {return; }
$nname = sprintf ("%s/%s/%s",$dir,$havestate,$havetopic);
print "Loading $nname \n" ;
if (open (IN,"<$nname")) {
# read in old items keeping old date
$wsortid="";
$item="";
while (<IN>) {
if (/^>/) { # added header line
if ($wsortid ne "") {
$oitems{$wsortid} = $item; #store previous item
$wsortid = "";
$item=""; # start new one
}
}
$item .= $_ ; # collect lines of item
# normalize key to match original input
if (/^BILL ID: (..) *([0-9A-Za-z][0-9A-Za-z]*) *([0-9].*)/) {
$wsortid=sprintf("%2.2s %s %s",$1,$2,$3);
#print "Old id: $wsortid\n";
}
}
if ($wsortid ne "") {
$oitems{$wsortid} = $item; # save the last one
}
close(IN);
@olist = sort (keys(%oitems)); # sort the old keys
$howmany = $#olist +1;
print "$howmany old bills\n" ;
}
@nlist = sort(keys(%nitems)); # sort the new keys
$howmany = $#nlist +1;
print "$howmany updates\n" ;
#now merge the lists and write out
print "Writing $nname \n" ;
if ($nname ne $lname ) {
close OUT;
unless (open (OUT,">$nname")) {
$dirname = sprintf ("%s/%s",$dir,$havestate);
printf "Creating $dirname\n";
mkdir ($dirname,0777);
open (OUT,">$nname") || die "Can't open $nname";
$lname = $nname ;
}
}
#print "@olist\n";
#print "@nlist\n";
$oldid=shift(@olist); # start with top two keys
$newid=shift(@nlist);
$current = "" ; # sanity check
while ( $oldid && $newid ) { # compare and merge
#print " oldid = $oldid newid = $newid\n";
if ($current ge $oldid || $current ge $newid) {
print "***** MERGE ERROR at $current *****\n";
print ERR "***** MERGE ERROR at $current *****\n";
}
if ($oldid eq $newid ) { # merge summary w/new
# if anything beyond date is changed use new
# this keeps the old date on duplicates
if ( ( ($t1) = $oitems{$oldid} =~ /BILL ID:([^\0]*)/ ) &&
( ($t2) = $nitems{$newid} =~ /BILL ID:([^\0]*)/ ) &&
($t1 eq $t2)) {
print "Match: unchanged using OLD $oldid\n";
print OUT $oitems{$oldid} ;
$current = $oldid ;
} else {
print "Match: using NEW $newid \n";
if ($nitems{$newid} =~ /SUMMARY:/) { #new has summary, toss old
print OUT $nitems{$newid} ;
$current = $newid ;
print "NEW has summary\n";
} else {
# snarf summary from old - note multi-line wierdness
if (($status) = $oitems{$oldid} =~ /(^SUMMARY:\n[^\0]*)^STATUS:/) {
# and insert into new - that was easy...
substr($nitems{$newid},index($nitems{$newid},"STATUS:\n"),0) = $status ;
}
printf OUT $nitems{$newid} ;
$current = $newid ;
print "OLD has summary\n";
}
}
# this was a match, shift both lists to next item
$oldid=shift(@olist);
$newid=shift(@nlist);
next;
}
# not a match, use alphabetically first item
if ($oldid lt $newid ) {
print "using OLD $oldid \n";
print OUT $oitems{$oldid} ;
$current = $oldid ;
$oldid = shift(@olist);
next;
}
# newid must be > oldid
print OUT $nitems{$newid} ;
$current = $newid ;
print "using NEW $newid \n";
$newid = shift(@nlist);
next;
}
# one of the arrays is empty - write remaining part of other array
if ($oldid) {
print OUT $oitems{$oldid} ;
print "using OLD $oldid \n";
foreach $oldid (@olist) {
print OUT $oitems{$oldid} ;
print "using OLD $oldid \n";
}
}
if ($newid) {
print OUT $nitems{$newid} ;
print "using NEW $newid \n";
foreach $newid (@nlist) {
print OUT $nitems{$newid} ;
print "using NEW $newid \n";
}
}
undef %nitems; #left over from trying to pin down a memory leak
undef %oitems; # in an old version of perl
%nitems=();
%oitems=();
$havestate="";
$haveitem=0;
}

Richard L. Goerwitz

unread,
Dec 5, 1990, 1:03:00 AM12/5/90
to
In article <1990Dec04....@chinet.chi.il.us>

l...@chinet.chi.il.us (Leslie Mikesell) writes:
>
>>My program reads two files of differing format which are sorted by a unique
>>5-character label. When two labels match, a new record is written, with
>>info from file A (moved around a bit) written to the "left" and info from
>>file B (again, reformatted a little) written to the "right". Where a
>>given record from file A or B has no counterpart, the same new record is
>>written, with blanks on the "side" without counterpart information.
>
>Perl is the language of choice for this kind of thing but it may still
>turn out to be non-trivial....

I like reading this newsgroup, but this sort of statement comes up all too
often. Perl is not the only language around that is optimized for file,
string, and symbol processing, which has associative arrays, and handles
sorting and printing elegantly. If you can't think of any examples off-
hand then mail me, and I'll be glad to provide you with a few. This is not
to say that we should not use perl. It is to say simply that it's a bit
outlandish to call it "*the* language of choice" for tasks like the one
described above.

-Richard (go...@sophist.uchicago.edu)

Chip Salzenberg

unread,
Dec 6, 1990, 12:09:27 PM12/6/90
to
According to go...@quads.uchicago.edu (Richard L. Goerwitz):

>Perl is not the only language around that is optimized for file,
>string, and symbol processing, which has associative arrays, and handles
>sorting and printing elegantly. If you can't think of any examples off-
>hand then mail me, and I'll be glad to provide you with a few.

Come now, Richard. If you criticize in public, you must put up your
facts in public. Name these other languages. Oh yes, and please
include availability and cost information.
--
Chip Salzenberg at Teltronics/TCT <ch...@tct.uucp>, <uunet!pdn!tct!chip>
"I'm really sorry I feel this need to insult some people..."
-- John F. Haugh II (He thinks HE'S sorry?)

Stephane Payrard

unread,
Dec 7, 1990, 4:04:10 PM12/7/90
to

I agree very much with Chip; if you know a better tool than perl, you
should not let us in the dark.

I am curious to know which tools are better to do the dirty tasks
which involve string pattern-matching, some non trivial processing and
some system calls with as main input a big file (say .5 to .1 MB) in
a reasonable amount of time (say less than 1 minute) Surely not nawk.

I am sure that nawk, sed ,ex (or any tool (orcombination of) which come
with a "standard Unix distribution) would never allow to write
the kind of programs I have written with perl:
-it has not the functionalities offered by perl
-it has not the performance of perl
-it offers no direct access to the OS (system-calls)
I don't pretend that perl is an answer to every problem, but it is
certainly the best I know of for the class of program I defined in the
first paragraph of this mail.


The idea of combining basic tools using pipes, backquotes or whatever,
is a UNIX myth propagated by most of the UNIX books. Each time I have
tried to do a non trivial task this way, it happened to almost be
impossible for a simple minded guy for me ;-). Each command/shell has
a a different set of metacharacters; this makes the combination of
this atomic tools very tricky ("How many backote should I put before
this character?"). Moreover, none of these tools come with a
debugger. Anyway, if they did, it would not be very useful if your
script is a complicated combination of those tools.

I am sure when the perl book will come-up, it will be more easy to
learn perl that to acquire the UNIX expertise necessary to use and
combines the UNIX atomic tools or shells (grep, sed, wc, awk. sh,
csh, expr...).

I am confident that, someday, someone will come with a program which
will allow to use perl as an extensible interactive shell; this will
relegate sh and csh in the rank of historically interesting tools.

In the mean time you need the UNIX expertise because the perl
documentation constantly refers to the UNIX one.

An extreme example of what can be done with perl:

I have written a 600 line program in perl which deals with a
PostScript file generated by FrameMaker; it allows to preview and
interactively browse the corresponding document (using NeWS/TNT); it
extracts information to build menus; one of the menus allows me to go
directly to any chapter of the documentation, keyboard accelerator
allow to go from the current page to the next/previous. This program
is able to browse a .5MB file, to generate a data file (used for
subsequent runs) and pop-up the browser window in about 30 seconds
using a (Sun 4/110) and assuming the TNT toolkit already loaded.
Subsequents run pop-up the window in 5-6 seconds.

Dont ask for this program: it used a not yet
released version of TNT and make many assumptions about the browsed
file.

I am quite sure Larry has never intended perl to be used to write
simple windowd tools, but with perl/TNT, it fit the bill.

It is quite exciting to use NeWS with perl because NeWS is an
interpretor as well. So perl can generate "in the fly" the NeWS code
which deals with the windowed part of the tool. I prefer not to
imagine a program such as the one I described written with whatever X
toolkit and Display PostScript. fooey.


In fact, perl is so powerful that I am very much tempted to write
stuff I should write in C. And I will write more in perl, if Larry
come up some day with an equivalent of the C structs, because pack()
and unpack() is an horrible kludge . I don't know very much how Larry
could fit syntactically and semantically such an extension to the
language.

stef

--
Stephane Payrard -- st...@eng.sun.com -- (415) 336 3726
SMI 2550 Garcia Avenue M/S 10-09 Mountain View CA 94043



Richard L. Goerwitz

unread,
Dec 7, 1990, 9:07:06 PM12/7/90
to
In article <STEF.90D...@zweig.exodus> st...@eng.sun.com writes:
>>
>> According to go...@quads.uchicago.edu (Richard L. Goerwitz):
>> >Perl is not the only language around that is optimized for file,
>> >string, and symbol processing, which has associative arrays, and handles
>> >sorting and printing elegantly. If you can't think of any examples off-
>> >hand then mail me, and I'll be glad to provide you with a few.
>>
>> Come now, Richard. If you criticize in public, you must put up your
>> facts in public. Name these other languages. Oh yes, and please
>> include availability and cost information.
>
>I agree very much with Chip; if you know a better tool than perl, you
>should not let us in the dark.

I think everyone is getting the wrong impression. When I posted, I had
just read a description of a very specific problem. I then read a res-
ponse in which someone declared perl uniquely able to handle it. While
in some cases this is true, it was not true in the case I had just read
about. The point was not that there were other tools out there which could
replace perl, but rather that certain features found in perl (e.g. good
string handling facilities, associative arrays, and what not) were by no
means unique, and that for problems which required such facilities, perl
was by no means a unique tool.

I fully expect that once perl stabilizes, and the documentation begins
to become readily accessible, it will become widely installed, and will
become the tool of choice for most tasks now whipped together using a
bunch of heterogenous tools, and glued in place with /bin/sh. Perl is
filling a very important niche.

Please continue perling!

-Richard

Richard L. Goerwitz

unread,
Dec 7, 1990, 3:34:12 AM12/7/90
to
In article <275E7B...@tct.uucp> ch...@tct.uucp (Chip Salzenberg) writes:
>According to go...@quads.uchicago.edu (Richard L. Goerwitz):
>>Perl is not the only language around that is optimized for file,
>>string, and symbol processing, which has associative arrays, and handles
>>sorting and printing elegantly. If you can't think of any examples off-
>>hand then mail me, and I'll be glad to provide you with a few.
>
>Come now, Richard. If you criticize in public, you must put up your
>facts in public. Name these other languages. Oh yes, and please
>include availability and cost information.

Please, not one of those "come now" responses :-). I felt it completely
inappropriate to go into language comparisons here. I assumed that most
readers would know of other alternatives, and that my posting would serve
merely as a reminder not to get too outlandish in our claims about perl.
If you are going to press me, I'll gladly offer you a brief response re-
garding alternatives I had in mind:

In the case mentioned, I didn't see how perl offered distinct advantages
over nawk. Nawk has most of the traits mentioned above, and is much more
widely available. If you would like a good example of a language that has
all of the characteristics noted above, then I'd suggest you look at Icon.
Icon is a general purpose programming language with Snobolish string-
handling capabilities, automatic type conversions, associative arrays,
and so on. Icon would have been just as easily applied to the particular
problem at hand as perl.

As for cost and availability, Icon is supported by government grant, and
has traditionally been PD. Despite its PD status, Icon is available
through a very fine distribution system, and the software itself is much
less buggy and much more stable than perl's. Icon is fully documented
in _The Icon Programming Language_ by Griswold & Griswold (2nd ed.; Prent-
ice Hall). It is available, not only for Unix, MS-DOS, and the Mac (as
in perl's case), but is also available for VM/CMS, Ultrix, MVS/XA, VMS,
Mach, AEGIS, OS/2, Amiga DOS, and probably others I haven't thought of.
You can ftp it from many sites, probably the most accessible being cs.
arizona.edu.

This posting is not intended as an argument against using perl, by the
way.

-Richard (go...@sophist.uchicago.edu)

Rich Kaul

unread,
Dec 7, 1990, 12:06:53 PM12/7/90
to
In article <1990Dec7.0...@midway.uchicago.edu> go...@quads.uchicago.edu (Richard L. Goerwitz) writes:
In the case mentioned, I didn't see how perl offered distinct advantages
over nawk. Nawk has most of the traits mentioned above, and is much more
widely available.

I would argue that nawk is not nearly as available as perl. There are
quite a few installed machines in which the old awk is all that is
available. Even today nawk is not nearly as common as most awk users
would like, since few manufacturers ship it -- if you depend on nawk,
it's best to carry a copy of gawk with you. If you have a carry a
copy of the sources to your tools with you, I'd take perl over awk
most any time. Perl has all the options you can ever use and then
some ;-).

-rich
--
Rich Kaul | It wouldn't be research if we
ka...@icarus.eng.ohio-state.edu | knew what we were doing.

Leslie Mikesell

unread,
Dec 9, 1990, 12:23:53 AM12/9/90
to
In article <1990Dec8.0...@midway.uchicago.edu> go...@ellis.uchicago.edu (Richard L. Goerwitz) writes:
>The point was not that there were other tools out there which could
>replace perl, but rather that certain features found in perl (e.g. good
>string handling facilities, associative arrays, and what not) were by no
>means unique, and that for problems which required such facilities, perl
>was by no means a unique tool.

Ok, sticking to the text handling features relating to the original question,
there may be other languages that would easily sort text by keys. But there
was also a mention of needing to manipulate it when a match occured.
Does anything else let you do those wonderful combination test, assign
and regexp extract like perl's:

if (($got1,$got2,$got3) =($var =~ /(pattern1) (pattern2) (pattern3))) {
... do whatever you want with $got1 etc.
}

Or handle multi-line regexps like this piece from the example I posted
where it takes everything between a SUMMARY: line and STATUS: line
in one item and inserts it before the STATUS: in an update which lacks
the SUMMARY information?

local ($*) = 1 ; # multi-line match needed

[...]


# snarf summary from old - note multi-line

if (($status) = $oitems{$oldid} =~ /(^SUMMARY:\n[^\0]*)^STATUS:/) {
# and insert into new

substr($nitems{$newid},index($nitems{$newid},"STATUS:\n"),0) = $status ;
}

Yes, you could loop over the lines (or characters) explicitly, but why?

Les Mikesell
l...@chinet.chi.il.us

Dan Bernstein

unread,
Dec 9, 1990, 3:40:51 PM12/9/90
to
Here are the three biggest things I can't really do in Perl but can do
with (some) other UNIX tools:

1. Compile some large subset of the language to portable C code.

2. Pass descriptors back and forth between programs. This is hellishly
useful for combining programs in different languages, for passing
messages securely, and for minimizing the overhead of a modular resource
controller. Practically every system in existence has some mechanism for
descriptor passing, but Perl doesn't standardize it.

3. Use signal-schedule (aka non-preemptive) threads. In various
languages I can schedule threads to execute when the program receives a
``signal''---including signals such as ``descriptor 2 is writable,''
``we have just taken control of resource x,'' etc. This makes coroutines
and multithreaded programs a joy rather than a pain to write. Different
kinds of signals are available under different UNIX variants, but Perl
could certainly standardize the basic mechanism.

If Perl had these features, my objections about portability, efficiency,
and interoperability would almost disappear.

---Dan

Richard L. Goerwitz

unread,
Dec 9, 1990, 2:32:05 PM12/9/90
to
In article <1990Dec09.0...@chinet.chi.il.us>

l...@chinet.chi.il.us (Leslie Mikesell) writes:
>
>Ok, sticking to the text handling features relating to the original question,
>there may be other languages that would easily sort text by keys. But there
>was also a mention of needing to manipulate it when a match occured.
>Does anything else let you do those wonderful combination test, assign
>and regexp extract like perl's:
>
>if (($got1,$got2,$got3) =($var =~ /(pattern1) (pattern2) (pattern3))) {
> ... do whatever you want with $got1 etc.
>}
>
>Or handle multi-line regexps like this piece from the example I posted
>where it takes everything between a SUMMARY: line and STATUS: line
>in one item and inserts it before the STATUS: in an update which lacks
>the SUMMARY information?
>
>local ($*) = 1 ; # multi-line match needed
>[...]
># snarf summary from old - note multi-line
>if (($status) = $oitems{$oldid} =~ /(^SUMMARY:\n[^\0]*)^STATUS:/) {
># and insert into new
>substr($nitems{$newid},index($nitems{$newid},"STATUS:\n"),0) = $status ;
>}

Again, it's not the string processing tools that make perl unique.
It's the combination of tools and their particularly facile integration
with the operating system that make perl unique.

The regexp stuff you mention above is peanuts in languages like Snobol
and Icon. In fact, regular expressions are felt, by Snobol and Icon
programmers, to be insufficiently powerful for the sorts of things they
do. Multi-line matches, non-regular languages, and other bits of
trickery are the bread and butter of languages like Snobol and Icon.
Note, though, that to do the things you mention above takes more space
in at least Icon than perl - that is, if you restrict yourself to pat-
terns that can be recognized using a deterministic finite state auto-
maton. And for this restricted pattern-type, perl will probably run
faster than Icon and Snobol (but what about Spitbol?). There are ups
and downs to everything.

I guess what I'm saying is that statements like the one I'm responding
to above indicate that people really don't know about the grand old
tradition of nonnumeric processing we see in systems like COMIT (ee
gads), SNOBOL4, Spitbol, Icon, and offshoots like awk, nawk, and now
languages which incorporate elements from these, like perl. I really
never wanted to get into any argument here. I've never taken a
course from a computer science departement in my life (I'm currently
finishing up a PhD in Near Eastern Languages), and I feel out of my
element. When people started taking me to task for saying that perl
wasn't uniquely suited to sorting, hashing, and matching tasks, I guess
I felt I had to say something.

As I've said before, perl is neat tool, and if it had no usefulness,
I would not be here.

Keep on perling!

-Richard (go...@sophist.uchicago.edu)

Tom Christiansen

unread,
Dec 10, 1990, 9:43:54 PM12/10/90
to
In article <9592:Dec920:40:51...@kramden.acf.nyu.edu> brn...@kramden.acf.nyu.edu (Dan Bernstein) writes:
>2. Pass descriptors back and forth between programs. This is hellishly
>useful for combining programs in different languages, for passing
>messages securely, and for minimizing the overhead of a modular resource
>controller. Practically every system in existence has some mechanism for
>descriptor passing, but Perl doesn't standardize it.

I'm not sure what you want here. It's pretty easy in perl to
connect processes through a file descriptor:

if (open(HANDLE, "|-")) {
# parent code writes to HANDLE
} else {
# child code just reads from STDIN per usual
}

or else:

if (open(HANDLE, "-|")) {
# parent code reads from HANDLE
} else {
# child code just writes to STDOUT per usual
}

(I know -- I didn't check that open returned undefined.)

You can also play more elaborate games using explicit pipe() calls.
For unrelated processes, you're going to have to use named pipes
or sockets. How does C offer a more standard mechanism for
passing descriptors which Perl can't use?

--tom
--
Tom Christiansen tch...@convex.com convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
to look at yourself in the mirror the next morning." -me

Tom Christiansen

unread,
Dec 11, 1990, 1:51:33 AM12/11/90
to
>1. Compile some large subset of the language to portable C code.

We usually say "well, but not evals of course." I've a suspicion
that this rules out a lot of code. For example, a user guy mailed
me recently with a problem that had a quick eval answer, and I'm
thinking that saying "no evals in compiled code" really limits
a large subset of the language. Here's the problem:

I'm still working with the problem I was attempting to describe to you
last night. It involves a simple search and replace, but the
delimiting of the search string will vary, and I would like the new
string to maintain the same variable delimiters. The strings can be
internally delimited by a combination of underscores, spaces and
newlines, and externally by newlines and commas. I want to replace
with the same delimeters. For example:

search string: my_search_string new string: this_is_it

may be matched by: replace should be:
- ------------------ ------------------
my search string this is it

my_search_string this_is_it

my_search this_is
string it

my this
search string is it

and so on.

I know there must be some straightforward way to do it, but so far
I have not figured it out. I've got the general one word case, and
fixed number of words, but not a variable number solution.


The code he was trying to use was this:

###########################################################################
#!/usr/bin/perl
#
# gl - global replace for variable format strings
$#ARGV == 3 || die "Invalid no. of arguments";
($infile, $outfile, $oldexp, $newexp) = @ARGV;
@old = split(/[ _]/,$oldexp);
@new = split(/[ _]/,$newexp);
open(in,"$infile") || die "Can't open $infile: $!";
open(out,">$outfile") || die "Can't open $outfile: $!";
$foo = <in>;
while ( <in> ) {
$foo .= $_;
}
#First pass, single line to single line
# new expression may contain underscores
if (!$#old) {
# The following searches for the label making sure it begins
# and ends with a space, comma or newline and replaces the
# label and whatever separators it found around it.

$foo =~ s/(\d\n|,\n|,)([ ]*)$oldexp([ ]*)(,|\n,|\n\d)/\1\2$newexp\3\4/g;
print "Finished, output in $outfile.\n";
}
# Multi-line to multi line, equal size
# Need to parameterize for any size
if ($#old) {
$test = $foo;
$foo =~ s/(\d\n|,\n|,)([ ]*)$old[0]([ _\n])$old[1]([ _\n])$old[2]([ ]*)(,|\n,|\n\d)/\1\2$ new[0]\3$new[1]\4$new[2]\5\6/g;
print "Finished 2, output in $outfile.\n";
}
print out $foo;
###########################################################################


Which I found to be pretty convoluted. My solution was this:

#!/usr/local/bin/perl
# sanity checks first
die "usage: $0 string1 string2 [files ...]" if @ARGV < 2;
die "unbalanced underbars"
unless ($count = $ARGV[0] =~ tr/_/_/) == ($ARGV[1] =~ tr/_/_/);
die "too many underbars" unless $count < 10;

($find = shift) =~ s/[\s_]/([\\s_]+)/g;
($repl = shift) =~ s/[\s_]/'$'.++$i/eg;
print STDERR "replacing all ``$find'' with ``$repl''\n";
undef $/;
$_ = <>;
eval "s/$find/$repl/g";
print;


Notice that I've used not one but two evals in this little program.
Of course, this is too short to bother wanting to compile (unless
someone has other motivations than speed for compilations), but I
think it illustrates the problem: evals are just too darn convenient.
I don't really want to think about how I might do that if I couldn't
have an eval, but I don't know how to compile it with one either.

Harald Fuchs

unread,
Dec 11, 1990, 11:37:05 PM12/11/90
to
tch...@convex.COM (Tom Christiansen) writes:
>For unrelated processes, you're going to have to use named pipes
>or sockets.
Named pipes? Hmm... how about adding the mknod system call to perl?
--

Harald Fuchs <fu...@it.uka.de> <fuchs%it.u...@relay.cs.net> ...
<fu...@telematik.informatik.uni-karlsruhe.dbp.de> *gulp*

Richard L. Goerwitz

unread,
Dec 12, 1990, 1:40:15 AM12/12/90
to
In article <1990Dec12.0...@NCoast.ORG>
all...@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:

>On the other hand, I want to learn Icon. Show me something along the lines of
>the Perl manpage and I'll see what I can accomplish.

Brief overviews of Icon can be ftp'd from a number of sites, the best being
cs.arizona.edu. Cd to icon/ and grab "technical" report 90-1, 90-2, 90-6.
Don't expect Icon to fill perl's shoes. It's not a good system administra-
tion language. It occupies a different niche.

-Richard

Peter Phillips

unread,
Dec 12, 1990, 1:45:30 AM12/12/90
to
In article <110...@convex.convex.com> tch...@convex.COM (Tom Christiansen) writes:
>In article <9592:Dec920:40:51...@kramden.acf.nyu.edu> brn...@kramden.acf.nyu.edu (Dan Bernstein) writes:
>>1. Compile some large subset of the language to portable C code.
>
>We usually say "well, but not evals of course." I've a suspicion
>that this rules out a lot of code. For example, a user guy mailed
>me recently with a problem that had a quick eval answer, and I'm
>thinking that saying "no evals in compiled code" really limits
>a large subset of the language. Here's the problem:

[ string replacing problem omitted ]

>Notice that I've used not one but two evals in this little program.
>Of course, this is too short to bother wanting to compile (unless
>someone has other motivations than speed for compilations), but I
>think it illustrates the problem: evals are just too darn convenient.
>I don't really want to think about how I might do that if I couldn't
>have an eval, but I don't know how to compile it with one either.

For some perl scripts, eval is indispensible. The debugger wouldn't
work without it. For other scripts, eval can be replaced by less
powerful operations. Eval is often used to get at the regular
expression compiler built into perl. If perl had a regular expression
variable and a regular expression compile function, code fragments
like:

eval "s/$find/$repl/g";

Could be replaced with the translatable-to-C code version:

$pat1 = &compile_pattern($find);
$pat2 = &compile_pattern($repl);
s/$pat1/$repl/g;

Something like this could be added to perl, I think.

There are other common uses for eval, like simulating references.
I think with the right modifications, most uses of eval could be
eliminated. Perhaps the greatest and wisest perl hackers should
get together, examine their scripts which use eval, and decide
what reasonable extensions to perl would eliminate 90% of the
use for eval.

--
Peter Phillips, pphi...@cs.ubc.ca | "It's worse than that ... He has
{alberta,uunet}!ubc-cs!pphillip | no brain." -- McCoy, "Spock's Brain"

Brandon S. Allbery KB8JRR

unread,
Dec 11, 1990, 8:02:03 PM12/11/90
to
As quoted from <110...@convex.convex.com> by tch...@convex.COM (Tom Christiansen):
+---------------

| In article <9592:Dec920:40:51...@kramden.acf.nyu.edu> brn...@kramden.acf.nyu.edu (Dan Bernstein) writes:
| >2. Pass descriptors back and forth between programs. This is hellishly
|
| I'm not sure what you want here. It's pretty easy in perl to
| connect processes through a file descriptor:
+---------------

I think he means ioctl(streamfd, I_SENDFD, fd) or the socket equivalent.

Problem is, I use plenty of machines that *don't* support it. This is about
as portable as that alarm() replacement that uses setitimer... less so, in
fact, as SVR3 with Streams support has I_SENDFD.

++Brandon
--
Me: Brandon S. Allbery VHF/UHF: KB8JRR on 220, 2m, 440
Internet: all...@NCoast.ORG Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery Delphi: ALLBERY

Brandon S. Allbery KB8JRR

unread,
Dec 11, 1990, 7:56:36 PM12/11/90
to
As quoted from <1990Dec8.0...@midway.uchicago.edu> by go...@ellis.uchicago.edu (Richard L. Goerwitz):
+---------------

| I fully expect that once perl stabilizes, and the documentation begins
| to become readily accessible, it will become widely installed, and will
+---------------

I daresay Perl is more widely installed than Icon. And more widely installed
than nawk.

As far as documentation goes --- the Perl manpage was enough to get me going
in Perl. I have yet to find Grswold&Griswold locally, and I'm not in a
position to order it from Prentice-Hall; the Icon interpreter sits, compiled
but unused, on my machine waiting for me to learn enough Icon to try to use
it. Additionally, the number of Perl examples in the distribution is more
than enough to get one started even without the manual. (I say "started", not
"fully knowedgeable"... but I can't even get started from the Icon examples.)

On the other hand, I want to learn Icon. Show me something along the lines of
the Perl manpage and I'll see what I can accomplish.

++Brandon

Tom Christiansen

unread,
Dec 12, 1990, 9:02:01 AM12/12/90
to
In article <1990Dec12.0...@cs.ubc.ca> pphi...@cs.ubc.ca (Peter Phillips) writes:

:There are other common uses for eval, like simulating references.

:I think with the right modifications, most uses of eval could be
:eliminated. Perhaps the greatest and wisest perl hackers should
:get together, examine their scripts which use eval, and decide
:what reasonable extensions to perl would eliminate 90% of the
:use for eval.

Yes, although I think in many cases you can use the *foo notation,
and it will be faster, too. I hope that wouldn't be barred as
well, as it's far too useful.

Two other reasons for using eval are for dynamic formats and
for the creatures that hp2h creates, although as I show in h2pl,
these can often be reduced.

Plus don't forget that s///e counts as an eval also.

Let's keep a list here. I also suspect that there'll be a fair number
of perl hackers at USENIX next month. More at the end than at the
beginning if I have anything to do with it. :-)

Tom Christiansen

unread,
Dec 12, 1990, 2:12:16 PM12/12/90
to
In article <fuchs.660976625@t500m0> fu...@it.uka.de (Harald Fuchs)
quotes me:
:>For unrelated processes, you're going to have to use named pipes

:>or sockets.
:Named pipes? Hmm... how about adding the mknod system call to perl?

Is there some reason why you can't use system("/etc/mknod ...")?
I recall that Larry doesn't like to add things that will only
be called once; it doesn't buy much time, and it really isn't
a bad thing to use other UNIX tools from within Perl.

"Ah," you say, "but I want to rewrite MAKEDEV in perl." Ok, then use
syscall and SYS_mknod instead then. It doesn't seem too hard to me.

$SYS_mknod = 14; # should really have gotten from the "right" place
$S_FIFO = 010000; # ditto

syscall($SYS_mknod, "rendezvous", $S_FIFO|0666, 0)

Paul O'Neill

unread,
Dec 12, 1990, 10:43:36 PM12/12/90
to
In article <110...@convex.convex.com> tch...@convex.COM (Tom Christiansen) writes:
> ..............
> eval "s/$find/$repl/g";
> ................
>

Gee, I've always glossed over this eval stuff. Now that I'm paying attention
I'm befuddled. Why is the eval needed, Tom?

Why does the substitution 1/2 work w/o the eval? The $find is parsed and
found but the $repl gets shoved in literally. I just hate it when I don't
have a model that will predict code's behavior and have to "just try it" to
see what it does.

>Notice that I've used not one but two evals in this little program.

Boy, I am dense. Where's the other one?

Thanks.


Paul O'Neill p...@oce.orst.edu DoD 000006
Coastal Imaging Lab
OSU--Oceanography
Corvallis, OR 97331 503-737-3251

Tom Neff

unread,
Dec 13, 1990, 3:08:43 AM12/13/90
to
I won't be at USENIX but here are my thoughts on compiled Perl:

1. Even with limited functionality it would be a godsend.

2. For many of us, it would be enough to be able to make fast-loadable
"Perl object files," i.e., write all data structures to disk after
compilation & before execution. The resulting "compiled scripts"
would run faster because the parsing pass would be eliminated.
Especially wonderful with large scripts!

3. A lot of the really troublesome 'eval' examples are hacks for the
purpose of coaxing a little faster performance out of the interpreter.
Presumably in exchange for the inherent speed of a compiled script
you could give some of that up.

4. If the Perl 'eval' compiler were put into a shared library, compiled
scripts could run and have access to a single, reentrant copy of the
evaluator if they need it. Scripts themselves could stay small.

--
Anthrax Rampant in Kirghizia: Oo*oO Tom Neff
Izvestia Comment -- TASS * *O* * tn...@bfmny0.BFM.COM

Dale Worley

unread,
Dec 12, 1990, 4:46:23 PM12/12/90
to

X-Name: Brandon S. Allbery KB8JRR

I have yet to find Grswold&Griswold locally, and I'm not in a
position to order it from Prentice-Hall;

Most bookstores will special order books.

Dale

Dale Worley Compass, Inc. wor...@compass.com
--
The workers ceased to be afraid of the bosses. It's as if they suddenly
threw off their chains. -- a Soviet journalist, about the Donruss coal strike

Len Weisberg

unread,
Dec 12, 1990, 3:43:58 PM12/12/90
to
Peter Phillips writes:
> For some perl scripts, eval is indispensible. The debugger wouldn't
> work without it. For other scripts, eval can be replaced by less
> powerful operations. Eval is often used to get at the regular
> expression compiler built into perl.
> ... <some supporting details omitted> ...

> There are other common uses for eval, like simulating references.
> I think with the right modifications, most uses of eval could be
> eliminated. Perhaps the greatest and wisest perl hackers should
> get together, examine their scripts which use eval, and decide
> what reasonable extensions to perl would eliminate 90% of the
> use for eval.

Hear, hear!! My opinion exactly!!
Sorry for taking up bandwidth with this, but Peter has said it so well,
I just wanted to underline it.
I think the development outlined here would be a tremendous boost to
the usability and the use of perl.

- Len Weisberg - HP Corp Computing & Services - weis...@corp.HP.COM

Tom Neff

unread,
Dec 14, 1990, 3:59:01 AM12/14/90
to
In article <15591:Dec1323:30:24...@kramden.acf.nyu.edu> brn...@kramden.acf.nyu.edu (Dan Bernstein) writes:

>In article <9372...@bfmny0.BFM.COM> tn...@bfmny0.BFM.COM (Tom Neff) writes:
>> 2. For many of us, it would be enough to be able to make fast-loadable
>> "Perl object files," i.e., write all data structures to disk after
>> compilation & before execution.
>
>Supposedly perl -u does that, but it doesn't work on many systems.

Perl -u is supposed to undump your core image to create a SELF CONTAINED,
executable program. Where this does work, the result is HUGE, bigger
than Perl itself (by definition). What I want is to store JUST the
compiled script data, suitable for immediate interpretation by the
regular Perl program. The results should be quite small, and you save
the parsing pass later on.

I think 'checkpointing' would be a good way to go if the results stored
compactly... haven't seen Dan's invention yet, maybe that qualifies.

--
"We plan absentee ownership. I'll stick to `o' Tom Neff
building ships." -- George Steinbrenner, 1973 o"o tn...@bfmny0.BFM.COM

Dan Bernstein

unread,
Dec 13, 1990, 5:50:09 PM12/13/90
to
In article <110...@convex.convex.com> tch...@convex.COM (Tom Christiansen) writes:
> In article <9592:Dec920:40:51...@kramden.acf.nyu.edu> brn...@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >1. Compile some large subset of the language to portable C code.
> We usually say "well, but not evals of course."

Even without evals this would make Perl a lot more useful. Of course,
half the advantage disappears if the Perl-in-C library isn't freely
redistributable---but at least that, unlike the entire language, can be
rewritten in pieces. The other half of the advantage stays in any case:
no parsing time, single executable, easy hand optimization, easy use of
fast calculation.

And there's no reason an eval can't be compiled. ``It's too much work to
stick the compiler into the library!'' you say. Well, most evals in
practice are just fixed operations applied to variable string arguments.
There's no reason your example couldn't be compiled into fixed code---
the only parsing left after compilation would be the regexp parsing.

---Dan

Dan Bernstein

unread,
Dec 13, 1990, 5:59:40 PM12/13/90
to
In article <1990Dec12.0...@NCoast.ORG> all...@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
> As quoted from <110...@convex.convex.com> by tch...@convex.COM (Tom Christiansen):
> | In article <9592:Dec920:40:51...@kramden.acf.nyu.edu> brn...@kramden.acf.nyu.edu (Dan Bernstein) writes:
> | >2. Pass descriptors back and forth between programs. This is hellishly
> | I'm not sure what you want here. It's pretty easy in perl to
> | connect processes through a file descriptor:
> I think he means ioctl(streamfd, I_SENDFD, fd) or the socket equivalent.

Yes. I seem to have this huge pile of secure resource managers, all of
which create a descriptor pointing to a secure resource, then fork off a
child process with access to that descriptor. In the latest program I
tried an option for passing the descriptor up to another process, which
would take control. You can't imagine how much better life would be if
there were a standard protocol and library routine for this job.

Add non-preemptive threads to this message-passing language, and it
would finally be conceivable that UNIX system resources be implemented
in---and used by---Perl.

> Problem is, I use plenty of machines that *don't* support it. This is about
> as portable as that alarm() replacement that uses setitimer... less so, in
> fact, as SVR3 with Streams support has I_SENDFD.

Uh, other way around? SVR3 with Streams does indeed have I_SENDFD, which
is why descriptor passing *is* so portable.

---Dan

Dan Bernstein

unread,
Dec 13, 1990, 6:01:34 PM12/13/90
to
In article <1990Dec12.0...@cs.ubc.ca> pphi...@cs.ubc.ca (Peter Phillips) writes:
> For some perl scripts, eval is indispensible. The debugger wouldn't
> work without it.

I imagine that the debugger would remain one of the advantages of the
interpreted language.

> Perhaps the greatest and wisest perl hackers should
> get together, examine their scripts which use eval, and decide
> what reasonable extensions to perl would eliminate 90% of the
> use for eval.

This is a good idea for any language.

---Dan

Dan Bernstein

unread,
Dec 13, 1990, 6:30:24 PM12/13/90
to
In article <9372...@bfmny0.BFM.COM> tn...@bfmny0.BFM.COM (Tom Neff) writes:
> 2. For many of us, it would be enough to be able to make fast-loadable
> "Perl object files," i.e., write all data structures to disk after
> compilation & before execution.

Supposedly perl -u does that, but it doesn't work on many systems. As an
alternative I might suggest that you try to work my pmckpt checkpointer
into Perl. pmckpt 0.95 (which I just made available for anonymous ftp
from stealth.acf.nyu.edu) has been reported to work on (gasp) System V
machines, as well as my native environment. Both Larry and Tom seemed
slightly interested in the code a few weeks ago, but appear to have
abandoned it (sigh).

The reason pmckpt is so portable, btw, is that it doesn't use setjmp()
or longjmp(). Guess what it uses instead...

---Dan

Tom Christiansen

unread,
Dec 13, 1990, 3:59:28 PM12/13/90
to
From the keyboard of p...@sapphire.OCE.ORST.EDU (Paul O'Neill):
:In article <110...@convex.convex.com> tch...@convex.COM (Tom Christiansen) writes:
:> eval "s/$find/$repl/g";
:Gee, I've always glossed over this eval stuff. Now that I'm paying attention

:I'm befuddled. Why is the eval needed, Tom?
:Why does the substitution 1/2 work w/o the eval? The $find is parsed and
:found but the $repl gets shoved in literally. I just hate it when I don't
:have a model that will predict code's behavior and have to "just try it" to
:see what it does.

Because perl only does one level of evaluation. If you want
more, have you to ask for it. There are $1 and $2 references
inside of $repl.

:>Notice that I've used not one but two evals in this little program.


:Boy, I am dense. Where's the other one?

It's hidden in the substitute that creates repl:

($repl = shift) =~ s/[\s_]/'$'.++$i/eg;

--tom

Thomas Gee

unread,
Dec 14, 1990, 4:19:26 PM12/14/90
to
In article <1990Dec12.0...@cs.ubc.ca> pphi...@cs.ubc.ca (Peter Phillips) writes:
>In article <110...@convex.convex.com> tch...@convex.COM (Tom Christiansen) writes:
>>In article <9592:Dec920:40:51...@kramden.acf.nyu.edu> brn...@kramden.acf.nyu.edu (Dan Bernstein) writes:
>>>1. Compile some large subset of the language to portable C code.
>
>For some perl scripts, eval is indispensible.

A related point on perl compilation. If I am correct, perl "compiles" the
input code to another internal representation, and interprets the result. This
results in a significant pause at invocation before the program (ie perl script)
begins executing.

Would it be possible to save the internal representation to which the script
is translated and feed that directly into the interpretor? I have at least
one system that uses a "vast" number of perl scripts which execute in
sequence, and the overhead for the initial translation is noticeable and
non-trivial.

I believe this suggestion did come up in the last "where's my perl compiler"
flood, but was never addressed.

Thanks,
Tom.


-------------------------------------------------------------------------------
Thomas Gee |
Aerospace Group | a man in search of a quote
DCIEM, DND |
Canada | g...@dretor.dciem.dnd.ca
-------------------------------------------------------------------------------

Brandon S. Allbery KB8JRR

unread,
Dec 13, 1990, 5:52:07 PM12/13/90
to
As quoted from <110...@convex.convex.com> by tch...@convex.COM (Tom Christiansen):
+---------------

| In article <fuchs.660976625@t500m0> fu...@it.uka.de (Harald Fuchs)
| quotes me:
| :>For unrelated processes, you're going to have to use named pipes
| :>or sockets.
| :Named pipes? Hmm... how about adding the mknod system call to perl?
|
| Is there some reason why you can't use system("/etc/mknod ...")?
+---------------

You've done it again, Tom.

System V Release 2 and earlier had a stupid /etc/mknod that assumed that only
root was permitted to run it, even though non-root is allowed to make FIFOs.
(SVR2 and earlier: A/UX, AIX, 3B1 UNIX, etc.) It did *not* let mknod() fail;
it did *not* have root-only modes; it complained if geteuid() was not 0. Sigh.

+---------------


| "Ah," you say, "but I want to rewrite MAKEDEV in perl." Ok, then use
| syscall and SYS_mknod instead then. It doesn't seem too hard to me.

+---------------

Again --- show me syscall() for System V.

The original message was concerned with portability. BSD-specific responses
aren't portable....

Brandon S. Allbery KB8JRR

unread,
Dec 15, 1990, 11:10:34 AM12/15/90
to
As quoted from <15024:Dec1322:59:40...@kramden.acf.nyu.edu> by brn...@kramden.acf.nyu.edu (Dan Bernstein):
+---------------
+---------------

Oops. Mental typo. Yeah, but the machine I use most often doesn't have
Streams (we have the add-on package but have yet to install it because the
network board we want to use it with is so unreliable...).

Non-preemptive multithreading: yesterday at work, I laid out a nonpreemptive
thread system of sorts. It's not particularly easy to rewrite something big
like Perl to use the implementation I came up with, but it's there. (I have
some fairly bizarre convolutions between a 4GL and a Prolog interpreter at
work to get a job done --- bizarre it may be, but it runs 20x faster than the
4GL-only version. The threading is for the interface to the Prolog, so if
necessary I can have more than one running.)

Brandon S. Allbery KB8JRR

unread,
Dec 15, 1990, 11:19:11 AM12/15/90
to
As quoted from <9372...@bfmny0.BFM.COM> by tn...@bfmny0.BFM.COM (Tom Neff):
+---------------

| 2. For many of us, it would be enough to be able to make fast-loadable
| "Perl object files," i.e., write all data structures to disk after
| compilation & before execution. The resulting "compiled scripts"
| would run faster because the parsing pass would be eliminated.
| Especially wonderful with large scripts!
+---------------

I mentioned this to Larry once; he pointed out that Perl's internal structures
aren't particularly easy to save/restore in a portable way. Of course, it
might be possible to write(savefd, etext, sbrk(0) - etext), but this is also
nonportable.

+---------------


| 4. If the Perl 'eval' compiler were put into a shared library, compiled
| scripts could run and have access to a single, reentrant copy of the
| evaluator if they need it. Scripts themselves could stay small.

+---------------

...and shared libraries are another nonportable feature. Not to mention that
I have yet to make any sense out of the SVR3 version. (Of course, that may
simply be *my* problem, not a problem with the shared library implementation.)

I may look into compiling a *subset* of Perl. It wouldn't accept everything,
and it might not treat everything the same as the interpreter does (i.e. "do"
would be reated as an include request... although most uses ofthis are now
subsumed by "require"), but the speed increase would probably be worth the
loss in functionality, as you say. Of course, I need to find time to do this
(grrr!).

Dan Bernstein

unread,
Dec 17, 1990, 4:14:54 AM12/17/90
to
In article <1990Dec15.1...@NCoast.ORG> all...@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
> I mentioned this to Larry once; he pointed out that Perl's internal structures
> aren't particularly easy to save/restore in a portable way. Of course, it
> might be possible to write(savefd, etext, sbrk(0) - etext), but this is also
> nonportable.

I wrote pmckpt exactly to prove that a checkpointer *can* be portable.
pmckpt assumes all the basic UNIX process structure. It doesn't make any
allowances for systems that don't conform (except that it automatically
figures out which way your stack grows). Yet people have reported pmckpt
working on several System V variants, as well as BSD. How much more
portable can you get?

---Dan

Leslie Mikesell

unread,
Dec 17, 1990, 2:23:46 PM12/17/90
to
In article <1990Dec15.1...@NCoast.ORG> all...@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
>As quoted from <9372...@bfmny0.BFM.COM> by tn...@bfmny0.BFM.COM (Tom Neff):
>+---------------
>| 2. For many of us, it would be enough to be able to make fast-loadable
>| "Perl object files," i.e., write all data structures to disk after
>| compilation & before execution. The resulting "compiled scripts"
>| would run faster because the parsing pass would be eliminated.
>| Especially wonderful with large scripts!
>+---------------

>I mentioned this to Larry once; he pointed out that Perl's internal structures
>aren't particularly easy to save/restore in a portable way. Of course, it
>might be possible to write(savefd, etext, sbrk(0) - etext), but this is also
>nonportable.

A reasonable solution is to not require the saved copy to be portable or
even explicitly saved. Instead, add a statement and/or command line
option to specify a directory to cache the parsed output allowing
the usual expansions of ~/, $HOME, etc. to give a choice between saving
in a public-writable directory or making a private copy for each user.
Then, if the directory exists and some quick checks establish that
the cached copy was written later than the script on a machine with
the same variable types, the parsing pass could be skipped. Otherwise
a parsed copy would be saved in that directory (if permissions allow)
for the new run to use. I think this would be a big help on machines
with slow disks and demand paged executables since it would likely
avoid the need to page in a lot of the perl program that would otherwise
be needed for the compile pass. It might chew up some disk space,
but probably nowhere near to the extent that perl -u does, and this way
you still get the advantage of shared text when multiple copies of perl
are running.

Les Mikesell
l...@chinet.chi.il.us

Tom Neff

unread,
Dec 17, 1990, 7:51:14 PM12/17/90
to
In article <1990Dec15.1...@NCoast.ORG> all...@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
>As quoted from <9372...@bfmny0.BFM.COM> by tn...@bfmny0.BFM.COM (Tom Neff):
>| 2. For many of us, it would be enough to be able to make fast-loadable
>| "Perl object files," i.e., write all data structures to disk after
>| compilation & before execution. The resulting "compiled scripts"
>| would run faster because the parsing pass would be eliminated.
>| Especially wonderful with large scripts!
>
>I mentioned this to Larry once; he pointed out that Perl's internal structures
>aren't particularly easy to save/restore in a portable way. Of course, it
^^^^^^^^

>might be possible to write(savefd, etext, sbrk(0) - etext), but this is also
>nonportable.

Is portability the issue here? This would be a proposed speed optimization
for individual sites. Precompiled scripts would not be inherently
portable across disparate OS's or machine architectures; but neither are
today's UNDUMP executables! Also, precompiled scripts might not be
portable across major Perl versions even on the same platform; but it
would be fairly straightforward to record the version number at the
beginning of the precompiled script file, so that Perl could check for
incompatibilities before beginning execution.


--
"DO NOT, repeat, DO NOT blow the hatch!" /)\ Tom Neff
"Roger....hatch blown!" \(/ tn...@bfmny0.BFM.COM

Brandon S. Allbery KB8JRR

unread,
Dec 19, 1990, 1:33:17 PM12/19/90
to
As quoted from <1243...@bfmny0.BFM.COM> by tn...@bfmny0.BFM.COM (Tom Neff):
+---------------

| In article <1990Dec15.1...@NCoast.ORG> all...@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
| >aren't particularly easy to save/restore in a portable way. Of course, it
| ^^^^^^^^
+---------------

"Portable" may not be the word. I have used systems where this will fail
because a different execution of a program has a few things at different
addresses, so just restoring the data and bss from a file leaves pointers
dangling. (Consider that stdio is already initialized by the time the data
and bss are loaded.)

Felix Lee

unread,
Dec 21, 1990, 4:43:38 AM12/21/90
to
Everyone seems to be giving up too easily. I'm nearly convinced that
Perl can be effectively compiled. I've decided to attempt a Perl to
Scheme compiler in my copious spare time (tm). Don't hold your
breath.
--
Felix Lee fl...@cs.psu.edu
Reply all
Reply to author
Forward
0 new messages