Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The split opcode

7 views
Skip to first unread message

James deBoer

unread,
Dec 10, 2004, 1:34:03 PM12/10/04
to perl6-i...@perl.org
Currently, the split opcode is declared as 'split(out PMC, in STR, in
STR)' where $2 is a regex.

PGE, however, currently supports three types of regular expressions, and
more are likely going to be added. So, which type of regular expression
should split use?

The Perl6's split function will likely use slightly different regular
expressions than TCL's split function or Python's. Picking any one
regular expression (e.g Perl6's) will force the other languages to
reimplement split's functionality.

A solution:

Declare split as 'split(out PMC, in PMC, in STR)' where $2 would be a
compiled PGE::Match object. This lets you pick what kind of regular
expression you want to use.

An example using Perl6's regular expressions:
.local string pattern
.local string input
.local pmc rulesub
.local pmc array
pattern = "[\n ]" # pattern to compile
rulesub = p6rule_compile(pattern) # compile it to rulesub
input = "I held out my arm\nbut she laughed"
split array, rulesub, input # array will be a
list of words in input


Comments? Is there a simpler solution? Am I making a problem out of nothing?

- James


Patrick R. Michaud

unread,
Dec 10, 2004, 2:25:46 PM12/10/04
to James deBoer, perl6-i...@perl.org
On Fri, Dec 10, 2004 at 01:34:03PM -0500, James deBoer wrote:
> Currently, the split opcode is declared as 'split(out PMC, in STR, in
> STR)' where $2 is a regex.
>
> PGE, however, currently supports three types of regular expressions, and
> more are likely going to be added. So, which type of regular expression
> should split use?
> [...]

> A solution:
>
> Declare split as 'split(out PMC, in PMC, in STR)' where $2 would be a
> compiled PGE::Match object. This lets you pick what kind of regular
> expression you want to use.

Slight correction: Thus far a "PGE::Match object" is the result of
performing a match between a rule and target string, not the compiled
form of the rule. At present a rule is just a subroutine that returns
PGE::Match objects. Eventually we may have a PGE::Rule class for
representing compiled rule objects, but we're not there yet. So, $2
would need to be a rule subroutine.

Going beyond that, we might want to just have a "split" method for
PGE::Rule objects, and leave the split opcode to do fast separation
of strings based on constant strings. But I'm not entirely familiar
with Parrot's opcode/MMD semantics so I'll follow others' leads on this
one...

Pm

James deBoer

unread,
Dec 10, 2004, 4:38:01 PM12/10/04
to Patrick R. Michaud, perl6-i...@perl.org

Patrick R. Michaud wrote:

I would even go further than that and say that if we went with
PGE::Rule's "split", the split opcode should be obsoleted. I can't think
of a place where splitting on constant strings is not a special case of
splitting on a regular expression. Evaluating a very simple regular
expression (i.e. a constant string) should be fast enough that it is not
worth the effort to determine if a pattern can be sent through the split
opcode instead of PGE::Rule."split"().

However, using a split opcode that accepts a match subroutine has the
advantage that the PGE is not strictly required. It would be possible to
write your own subroutines if speed or code size were issues or if you
had some other crazy requirements.

This raises the question: How far do we want to let the PGE into our
everyday lives?

- James

Leopold Toetsch

unread,
Dec 10, 2004, 10:37:44 PM12/10/04
to James deBoer, perl6-i...@perl.org
James deBoer <ja...@huronbox.com> wrote:

> I would even go further than that and say that if we went with
> PGE::Rule's "split", the split opcode should be obsoleted.

All these function/method like opcodes will be refactured somewhen.

WRT split (you write):

PGE::Rule."split"()

in general

$P0."split"(...)

where $P0 is a namespace or object that "can split". For some bits of
more performance a user could do:

cl = getclass "String"
cl."split"(...)

assuming that the current split on strings moves to the String class.

> This raises the question: How far do we want to let the PGE into our
> everyday lives?

PGE is a Parrot core feature available and usable for all languages
using the Parrot engine.

> - James

leo

James deBoer

unread,
Dec 11, 2004, 9:36:32 AM12/11/04
to l...@toetsch.at, perl6-i...@perl.org
Leopold Toetsch wrote:

>James deBoer <ja...@huronbox.com> wrote:
>
>
>
>>I would even go further than that and say that if we went with
>>PGE::Rule's "split", the split opcode should be obsoleted.
>>
>>
>
>All these function/method like opcodes will be refactured somewhen.
>
>WRT split (you write):
>
> PGE::Rule."split"()
>
>in general
>
> $P0."split"(...)
>
>where $P0 is a namespace or object that "can split". For some bits of
>more performance a user could do:
>
> cl = getclass "String"
> cl."split"(...)
>
>assuming that the current split on strings moves to the String class.
>
>

Ok. If we are moving things like split into objects at some point in the
future, should the split opcode be removed now?

(I'm guessing the answer is yes, since split is one of the opcodes
listed in your 'Too many opcodes' post of a few weeks back)

At this point the split opcode doesn't really do anything useful, and
any fixes/improvements to it would be lost when the logic is moved to
String/PerlString/PythonString/... objects.

- James

Leopold Toetsch

unread,
Dec 11, 2004, 10:29:54 AM12/11/04
to James deBoer, perl6-i...@perl.org
James deBoer <ja...@huronbox.com> wrote:
> Ok. If we are moving things like split into objects at some point in the
> future, should the split opcode be removed now?

This again goes into: what's an opcode. There are two views:
- surface: i.e. what the assembler understands
- in core: what the runcore executes, that is what the assembler did
do with the opcode

> (I'm guessing the answer is yes, since split is one of the opcodes
> listed in your 'Too many opcodes' post of a few weeks back)

So while the split opcode syntax in PASM/PIR can remain I'm pretty sure
that the runcore will get a method call, which also means that a HLL
compiler can emit that method call directly because they are the same.

> At this point the split opcode doesn't really do anything useful, and
> any fixes/improvements to it would be lost when the logic is moved to
> String/PerlString/PythonString/... objects.

But, as the current syntax doesn't provide the needed functionality we could
change it now into:

Px."__split"(...) # a method call

or we change the opcode to be:

=item B<split>(out PMC, in PMC, in STR)

where $2 is the class that "can split", e.g. String or PGE and do a
method call in the split opcode.

> - James

leo

0 new messages