Second try: Builtins

Aaron Sherman

unread,

Sep 6, 2002, 1:34:56 AM9/6/02

to Perl6 Language List

This is still a monolith, but it's getting better. It's now stored in
P6C/Builtins/CORE.p6m in my tree. More functions are coded, and I now
differentiate between the functions that need external support (e.g.
POSIX/libc functions) and those that just need to be written (e.g.
sort).

I think I've covered all of the comments (other than breaking up the
file and making it part of the compilation process, which I'll work on
this weekend, and then submit this as a patch to p6i).

Anyone who wants to take a crack at answering any of the questions that
I've marked with "XXX" will be much appreciated. I'm out of town for the
weekend, but will be back and catching up on mail Sunday night.

CORE.p6m

Nicholas Clark

unread,

Sep 6, 2002, 9:29:28 AM9/6/02

to Aaron Sherman, Perl6 Language List

On Fri, Sep 06, 2002 at 01:34:56AM -0400, Aaron Sherman wrote:

> # INTERNAL q, qq, qw
> # XXX - how do I do quote-like operators? I know I saw someone say...
> # Need to do: qr (NEVER("qr")) and qx

presumably the way the perl5 tokeniser does them - by parsing the string
into a series of concatenated constants and variables, with some optionally
fed through uc/ucfirst/lc/lcfirst/quotemeta
(And scalar and list interpolators breaking back out to the real parser)

> sub chomp($string is rw){
> my $irs = ${"/"}; # XXX What is $/ now?

per file handle. So does that mean each string needs a property to hold what
the record separator for the file handle it was read from at the time of
reading?
(Well, as record separators could be regexps^Wpatterns actually I think a
an offset to the start of the record separator will do)

> sub index($string, $substr, int $pos //= 0) {
> # XXX - slow dumb way... need to break out Knuth
> my $sl = $substr.length;
> for(my $i = $pos; $i+$sl <= $string.length; $i++) {
> return $i if substr($string,$i,$sl) eq $substr;
> }
> return -1;
> }
> sub rindex($string, $substr, $pos //= 0) {
> # XXX - slow dumb way
> my $sl = $substr.length;
> for(my $i = $string.length-$sl; $i >= $pos; $i--) {
> return $i if substr($string,$i,$sl) eq $substr;
> }
> return -1;
> }

I think that string in string searches are common functionality that ought
to be implemented in the parrot core. Rather than every language and
extension that needs them having to re-implement the wheel.
(Also, I confess that I'm not up-to-date on following the parrot source, so
I don't know if the PMC v-tables for strings contain entries for these.
That would allow index where both strings had the same (or compatible)
encodings to run more quickly)

> ############# IO
> # IO::... stuff to be moved out into IO classes:

> # sub socketpair($socket1,$socket2,$domain,$type,$protocol) { NEVER("socketpair") }

Why is socketpair never? It's a real Unix system call that provides
something that is impossible to completely fake from userspace
[connected Unix domain sockets with neither end bounce to an address]

Nicholas Clark

Aaron Sherman

unread,

Sep 6, 2002, 10:02:03 AM9/6/02

to Nicholas Clark, Perl6 Language List

On Fri, 2002-09-06 at 09:29, Nicholas Clark wrote:
> On Fri, Sep 06, 2002 at 01:34:56AM -0400, Aaron Sherman wrote:
>
> > # INTERNAL q, qq, qw
> > # XXX - how do I do quote-like operators? I know I saw someone say...
> > # Need to do: qr (NEVER("qr")) and qx
>
> presumably the way the perl5 tokeniser does them - by parsing the string
> into a series of concatenated constants and variables, with some optionally
> fed through uc/ucfirst/lc/lcfirst/quotemeta
> (And scalar and list interpolators breaking back out to the real parser)

Ok, so I guess that all has to go in the parser, not the builtins. Perl
5 already provides builtin functions that are the back-ends for things
like C<< <> >>, C<qx>, etc so those will be in the builtins, and the
parser can just call them. It may already do so, I've not had time to
look.

> > sub chomp($string is rw){
> > my $irs = ${"/"}; # XXX What is $/ now?
>
> per file handle. So does that mean each string needs a property to hold what
> the record separator for the file handle it was read from at the time of
> reading?

That's not terribly useful, since filehandles will auto-chomp in Perl 6
anyway. I propose the alternate:

sub chomp($string is rw, $sep //= rx/\n/) { ... }
sub chomp(@strings is rw, $sep //= rx/\n/) { ... }

This will mean that you can:

chomp(@lines=<>)

but not:

chomp($line1, $line2, $line3);

Is that going to be a problem?

> (Well, as record separators could be regexps^Wpatterns actually I think a
> an offset to the start of the record separator will do)

I forgot they could be patterns. Need to go fix that!

> > sub index($string, $substr, int $pos //= 0) {
> > # XXX - slow dumb way... need to break out Knuth

[...]

> I think that string in string searches are common functionality that ought
> to be implemented in the parrot core. Rather than every language and
> extension that needs them having to re-implement the wheel.

Good point. I will look at what parrot does now, and consider moving
index over to the Internal section.

> Why is socketpair never? It's a real Unix system call that provides

Because I was tired :)

I'm going to cut that section down to just a list of the functions that
need to be moved over to the IO modules as one long comment. It's just
there for a reminder right now.

Thanks for the comments and answers to some of my questions. The
feedback as been quite helpful!

Chuck Kulchar

unread,

Sep 7, 2002, 2:01:24 AM9/7/02

to ni...@ccl4.org, a...@ajs.com, perl6-l...@perl.org

>> # INTERNAL q, qq, qw
>> # XXX - how do I do quote-like operators? I know I saw someone say...
>> # Need to do: qr (NEVER("qr")) and qx

>presumably the way the perl5 tokeniser does them - by parsing the string
>into a series of concatenated constants and variables, with some optionally
>fed through uc/ucfirst/lc/lcfirst/quotemeta
>(And scalar and list interpolators breaking back out to the real parser)

Actually, the way P6C does it is very different than the way perl5 does it; P6C matches straight through rather than by finding the end and then reparsing the middle. This is because perl6 needs to handle stuff like "%var{"var"}", which wouldn't work at all in perl5. Check out P6C/Parser.pm and P6C/Tree/String.pm for more details.

Also, how do these perl6 builtins in perl6 work with the current P6C/Builtins.pm? (also, why are some that are already defined in pure pasm/part of the parrot core redefined as perl6 code?)

Joseph F. Ryan
ryan...@osu.edu

Sean O'Rourke

unread,

Sep 7, 2002, 10:53:30 AM9/7/02

to Chuck Kulchar, ni...@ccl4.org, a...@ajs.com, perl6-l...@perl.org

On Sat, 7 Sep 2002, Chuck Kulchar wrote:
> Also, how do these perl6 builtins in perl6 work with the current
> P6C/Builtins.pm? (also, why are some that are already defined in pure
> pasm/part of the parrot core redefined as perl6 code?)

For the moment, "they don't". Eventually, I expect there will be some
sort of a "header file" with the builtin declarations (P6C parses and
interprets function declarations for this very reason), and a .pbc file
containing their code. As for why they're written in perl 6 code, I
expect it's easier to define their semantics in Perl than in assembly.

/s

Smylers

unread,

Sep 7, 2002, 2:22:23 PM9/7/02

to perl6-l...@perl.org

Aaron Sherman wrote:

> sub chomp($string is rw){
> my $irs = ${"/"}; # XXX What is $/ now?

> if defined $irs {
> if $irs.isa(Object) {
> return undef;
> } elsif $irs.length == 0 {
> $string =~ s/ \n+ $ //;

Should that C<+> be there? I would expect chomp only to remove a single
line-break.

> sub reverse(@list) {
> my @r;
> my $last = @list.length - 1;
> for(my $i=$last;$i >= 0;$i++) {
> @r[$last-$i] = @list[$i];
> }
> return *@r;
> }

In a scalar context does C<reverse> still a string with characters
reversed?

Smylers

Aaron Sherman

unread,

Sep 9, 2002, 5:09:38 PM9/9/02

to Smylers, Perl6 Language List

On Sat, 2002-09-07 at 14:22, Smylers wrote:
> Aaron Sherman wrote:

> > sub chomp($string is rw){
[...]

> > } elsif $irs.length == 0 {
> > $string =~ s/ \n+ $ //;
>
> Should that C<+> be there? I would expect chomp only to remove a single
> line-break.

Note that this is in paragraph (e.g. C<$/=''>) mode....

> > sub reverse(@list) {
> > my @r;
> > my $last = @list.length - 1;
> > for(my $i=$last;$i >= 0;$i++) {
> > @r[$last-$i] = @list[$i];
> > }
> > return *@r;
> > }
>
> In a scalar context does C<reverse> still a string with characters
> reversed?

Yes, but that would be:

sub reverse($string) {
return join '', reverse([split //, $string]);
}

Though this example is too inefficient, it does demonstrate the point.

--
Aaron Sherman <a...@ajs.com>
http://www.ajs.com/~ajs

Aaron Sherman

unread,

Sep 9, 2002, 5:36:42 PM9/9/02

to Sean O'Rourke, Perl6 Language List

Correct in as far as it goes. The more general answer is that one of the
goals of this re-write (as I was lead to believe) was that the Perl
internals would be maintainable. If we write the well over 150 Perl 5
builtins in Parrot assembly, I think we can kiss that wish goodbye.

Some of this will have to be done in assembly, but hopefully a very
small and modular core (e.g. my proposal earlier on how to handle pack,
sprintf and chr). The rest will be the subject of increasingly powerful
optimizations that the compiler will have to perform for user code
anyway. Ultimately I would hope that the only builtins that will be
represented 100% in assembly will be those that have a 1-to-1 mapping in
the parrot instruction set (e.g. scalar).

BTW: Current status is that I'm preparing to make some changes to the
compiler tonight. After that, I'll be ready to issue a patch against the
current tree. Over the weekend I focused on getting all of the builtins
to compile cleanly and I implemented a few other small pieces. We now
have a sprintf that can handle C<'%d'> and C<'%s'> along with some
simple modifiers, so C<printf("%02d%% of % 6s\n")> should work.

I'm making heavy use of C<given>, in the assumption that it will make
the code easy to optimize.

Nicholas Clark

unread,

Sep 9, 2002, 5:52:10 PM9/9/02

to Aaron Sherman, Sean O'Rourke, Perl6 Language List

On Mon, Sep 09, 2002 at 05:36:42PM -0400, Aaron Sherman wrote:
> Correct in as far as it goes. The more general answer is that one of the
> goals of this re-write (as I was lead to believe) was that the Perl
> internals would be maintainable. If we write the well over 150 Perl 5
> builtins in Parrot assembly, I think we can kiss that wish goodbye.

This may sound a bit arm wavy, but when I'm off messing up the core of
perl5, I don't find the perl5 ops are the maintenance problem.
Most of the op functions are quite small (partly due to the use of macros)
but they are all nicely self contained. (And all in 6 (4 before 5.8) source
files, out of a total of 36 source files)

The writhing mass of yuck comes from the interaction of the bits in the
various utility functions that they call in the other 26 or so files.
Plus the 2 files of the regexp engine, and the 2 files of the parser which
I attempt to avoid lest I go insane.

Hence I don't think that writing the perl builtins in parrot assembly
(or at least the majority that really need to go really fast) would be
a maintenance nightmare. Although being able to write them in perl and
having an inliner and optimiser that is good enough to produce results
better than calling out to general purpose parrot assembler would be nice.

Although my biased opinion is that probably best to write the perl builtins
as tidy C code rather than parrot assembler. But I know C better.

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

Aaron Sherman

unread,

Sep 9, 2002, 10:10:12 PM9/9/02

to Nicholas Clark, Perl6 Language List

On Mon, 2002-09-09 at 17:52, Nicholas Clark wrote:
> On Mon, Sep 09, 2002 at 05:36:42PM -0400, Aaron Sherman wrote:
> > Correct in as far as it goes. The more general answer is that one of the
> > goals of this re-write (as I was lead to believe) was that the Perl
> > internals would be maintainable. If we write the well over 150 Perl 5
> > builtins in Parrot assembly, I think we can kiss that wish goodbye.
>
> This may sound a bit arm wavy, but when I'm off messing up the core of
> perl5, I don't find the perl5 ops are the maintenance problem.
> Most of the op functions are quite small (partly due to the use of macros)
> but they are all nicely self contained. (And all in 6 (4 before 5.8) source
> files, out of a total of 36 source files)

Keep in mind that the majority of Perl 5 builtins are of the form:

....munge parameters...
....call libc function of same name...
....munge return values....

In Perl 6 those will mostly be the same. Many of them will be moved out
to modules (e.g. the filehandle functions) but many others will remain
in the core (e.g. chdir, getppid, etc) and simply be wrappers around the
C functions. When the general-purpose interface for C is defined, these
functions will be implemented in a fairly short period of time.

Those that are left are internal Perl utilities that I break down into
several categories: string, math, list, internal and misc.

Of these, about 30-50% will probably be pure Perl. Another small
percentage will be assembly wrappers that call a one-for-one parrot
function (e.g. exit). The rest will be a complex mix of Perl and
assembly (e.g. sprintf which is mostly Perl, but needs assembly for
low-level type conversion).

> Although my biased opinion is that probably best to write the perl builtins
> as tidy C code rather than parrot assembler. But I know C better.

Yeah, that would be ideal for speed. I am willing to concede that that's
the way we'll have to go for some things, eventually. However, until we
have a pure Perl library (or as much so as we can), I don't think we'll
know where we need the speed boost most. What's more, this will force
the compiler to optimize as strongly as possible, which can only benefit
users.

Leopold Toetsch

unread,

Sep 10, 2002, 12:54:07 AM9/10/02

to Aaron Sherman, Nicholas Clark, Perl6 Language List

Aaron Sherman wrote:

> Of these, about 30-50% will probably be pure Perl. Another small
> percentage will be assembly wrappers that call a one-for-one parrot
> function (e.g. exit). The rest will be a complex mix of Perl and
> assembly (e.g. sprintf which is mostly Perl, but needs assembly for
> low-level type conversion).

I'm just providing the necessary infrastructure inside imcc. The format
of current Builtin's will probably slightly change. I need the global
label (i.e. entry point) of the function for bsr fixup.

Sean did propose:

.extern sub_name _label

Plus something like (mine):

.sub sub_name {
saveall
.param PerlUndef Arg1
...
}

for PIR subs.

(Current imcc parses »ret« as end of sub, which we might change)

There is no real need to use PASM at all in the function, but imcc
(0.0.9) parses PASM instructions too.

BTW: are there any thoughts about "PackFile_FixupTable"?

leo

Smylers

unread,

Sep 10, 2002, 4:59:04 PM9/10/02

to perl6-l...@perl.org

Aaron Sherman wrote:

> On Sat, 2002-09-07 at 14:22, Smylers wrote:
>
> > Should that C<+> be there? I would expect chomp only to remove a
> > single line-break.
>
> Note that this is in paragraph (e.g. C<$/=''>) mode....

Ah, yes. I quoted the wrong case above. The final branch deals with
the case when C<$/> (or equivalent) is set:

} else {
$string =~ s/<{"<[$irs]>"}>+$//;
return $0;
}

If C<$irs = "\n"> then I'd only expect a single trailing newline to be
removed but that substitution still looks as though it'll get rid of as
many as are there.

> > In a scalar context does C<reverse> still a string with characters
> > reversed?
>
> Yes, but that would be:
>
> sub reverse($string) {
> return join '', reverse([split //, $string]);
> }

Perl 5's C<reverse> is sensitive to the context in which it is called
rather than the number of arguments. This is an 'element' reversal with
only one element:

$ perl -wle 'print reverse qw<abc>'

This is a 'character' reversal even though several strings have been
passed:

$ perl -wle 'print scalar reverse qw<abc def>'

So a C<reverse> with a single array parameter could be either type.

Smylers