PGE, however, currently supports three types of regular expressions, and
more are likely going to be added. So, which type of regular expression
should split use?
The Perl6's split function will likely use slightly different regular
expressions than TCL's split function or Python's. Picking any one
regular expression (e.g Perl6's) will force the other languages to
reimplement split's functionality.
A solution:
Declare split as 'split(out PMC, in PMC, in STR)' where $2 would be a
compiled PGE::Match object. This lets you pick what kind of regular
expression you want to use.
An example using Perl6's regular expressions:
.local string pattern
.local string input
.local pmc rulesub
.local pmc array
pattern = "[\n ]" # pattern to compile
rulesub = p6rule_compile(pattern) # compile it to rulesub
input = "I held out my arm\nbut she laughed"
split array, rulesub, input # array will be a
list of words in input
Comments? Is there a simpler solution? Am I making a problem out of nothing?
- James
Slight correction: Thus far a "PGE::Match object" is the result of
performing a match between a rule and target string, not the compiled
form of the rule. At present a rule is just a subroutine that returns
PGE::Match objects. Eventually we may have a PGE::Rule class for
representing compiled rule objects, but we're not there yet. So, $2
would need to be a rule subroutine.
Going beyond that, we might want to just have a "split" method for
PGE::Rule objects, and leave the split opcode to do fast separation
of strings based on constant strings. But I'm not entirely familiar
with Parrot's opcode/MMD semantics so I'll follow others' leads on this
one...
Pm
Patrick R. Michaud wrote:
I would even go further than that and say that if we went with
PGE::Rule's "split", the split opcode should be obsoleted. I can't think
of a place where splitting on constant strings is not a special case of
splitting on a regular expression. Evaluating a very simple regular
expression (i.e. a constant string) should be fast enough that it is not
worth the effort to determine if a pattern can be sent through the split
opcode instead of PGE::Rule."split"().
However, using a split opcode that accepts a match subroutine has the
advantage that the PGE is not strictly required. It would be possible to
write your own subroutines if speed or code size were issues or if you
had some other crazy requirements.
This raises the question: How far do we want to let the PGE into our
everyday lives?
- James
> I would even go further than that and say that if we went with
> PGE::Rule's "split", the split opcode should be obsoleted.
All these function/method like opcodes will be refactured somewhen.
WRT split (you write):
PGE::Rule."split"()
in general
$P0."split"(...)
where $P0 is a namespace or object that "can split". For some bits of
more performance a user could do:
cl = getclass "String"
cl."split"(...)
assuming that the current split on strings moves to the String class.
> This raises the question: How far do we want to let the PGE into our
> everyday lives?
PGE is a Parrot core feature available and usable for all languages
using the Parrot engine.
> - James
leo
>James deBoer <ja...@huronbox.com> wrote:
>
>
>
>>I would even go further than that and say that if we went with
>>PGE::Rule's "split", the split opcode should be obsoleted.
>>
>>
>
>All these function/method like opcodes will be refactured somewhen.
>
>WRT split (you write):
>
> PGE::Rule."split"()
>
>in general
>
> $P0."split"(...)
>
>where $P0 is a namespace or object that "can split". For some bits of
>more performance a user could do:
>
> cl = getclass "String"
> cl."split"(...)
>
>assuming that the current split on strings moves to the String class.
>
>
Ok. If we are moving things like split into objects at some point in the
future, should the split opcode be removed now?
(I'm guessing the answer is yes, since split is one of the opcodes
listed in your 'Too many opcodes' post of a few weeks back)
At this point the split opcode doesn't really do anything useful, and
any fixes/improvements to it would be lost when the logic is moved to
String/PerlString/PythonString/... objects.
- James
This again goes into: what's an opcode. There are two views:
- surface: i.e. what the assembler understands
- in core: what the runcore executes, that is what the assembler did
do with the opcode
> (I'm guessing the answer is yes, since split is one of the opcodes
> listed in your 'Too many opcodes' post of a few weeks back)
So while the split opcode syntax in PASM/PIR can remain I'm pretty sure
that the runcore will get a method call, which also means that a HLL
compiler can emit that method call directly because they are the same.
> At this point the split opcode doesn't really do anything useful, and
> any fixes/improvements to it would be lost when the logic is moved to
> String/PerlString/PythonString/... objects.
But, as the current syntax doesn't provide the needed functionality we could
change it now into:
Px."__split"(...) # a method call
or we change the opcode to be:
=item B<split>(out PMC, in PMC, in STR)
where $2 is the class that "can split", e.g. String or PGE and do a
method call in the split opcode.
> - James
leo