This makes sense when I think about what split is doing, but it is surprising at first glance. Perhaps this should be included as an example in the docs?
On Thu, 2005-05-12 at 12:22, David Storrs wrote: > On May 12, 2005, at 11:59 AM, Autrijus Tang wrote: > > On Thu, May 12, 2005 at 04:53:06PM +0200, "TSa (Thomas Sandla )" > > wrote: > >> Autrijus Tang wrote:
> >>> pugs> split /(..)*/, 1234567890 > >>> ('', '12', '34', '56', '78', '90') > >> Why the empty string match at the start? > > I don't know, I didn't invent that! :-) > > $ perl -le 'print join ",", split /(..)/, 123' > > ,12,3
> This makes sense when I think about what split is doing, but it is > surprising at first glance. Perhaps this should be included as an > example in the docs?
perldoc -f split says:
"Splits a string into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted [...] If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). [...] Empty leading (or trailing) fields are produced when there are positive width matches at the beginning (or end) of the string [...] As a special case, specifying a PATTERN of space (' ') will split on white space just as "split" with no arguments does. Thus, "split(' ')" can be used to emulate awk's default behavior, whereas "split(/ /)" will give you as many null initial fields as there are leading spaces [...]"
And there you have it.
-- Aaron Sherman <a...@ajs.com> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback
JSD> which currently generates a list of ('','12','34','56','78','90') JSD> In perl5 it would generate a list of ('','90') because only the last JSD> pair of characters matched is kept (such is the nature of quantifiers JSD> applied to capturing parens). But in perl6 quantified captures put all JSD> of the matches into an array such that "abcdef" ~~ /(..)*/ will make JSD> $0 = ['ab','cd','ef'].
JSD> I think that the above split should generate a list like this:
JSD> ('', [ '12','34','56','78','90'])
i disagree. if you want complex tree results, use a rule. split is for creating a single list of elements from a string. it is better keep split simple for it is commonly used in this domain. tree results are more for real parsing (which split is not intended to do) so use a parsing rule for that.
also note the coding style rule (i think randal created it) which is to use split when you want to throw things away (the delimiters) and m// when you want to keep thinks.
uri
-- Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
> For longer strings it makes every other match an empt string.
Not quite. The matching part are the strings "11", "22", "33", etc. And since what matches is what we're splitting on, we get the empty string between pairs of characters (including the leading empty string). The only reason you're getting the string that was matched in the output is because that's what you've asked split to do by placing parens around the pattern. (Type "perldoc -f split" at your command prompt and read all about it)
To bring this back to perl6, autrijus' original query was regarding
which currently generates a list of ('','12','34','56','78','90') In perl5 it would generate a list of ('','90') because only the last pair of characters matched is kept (such is the nature of quantifiers applied to capturing parens). But in perl6 quantified captures put all of the matches into an array such that "abcdef" ~~ /(..)*/ will make $0 = ['ab','cd','ef'].
I think that the above split should generate a list like this:
> There are two matches each at 0, 2, 4, 6, 8 and 10. > The empty match at the end seams to be skipped because > position 12 is after the string?
No, the empty match at the end is skipped because that's the default behaviour of split. Preserve leading empty fields and discard empty trailing ones.
> And for odd numbers of > chars the before last position doesn't produce an empty > match: > perl -le 'print join ",", split /(..)/, 11223' > ,11,,22,3
There's an empty field between the beginning of the string and "11", there's an empty field between the "11" and the "22", and finally there's a field at the end containing only "3"
Maybe, but it's because you're misunderstanding what split does (i can heartily recommend TFM in this case).
Let's start with a simpler case (inside debugger for help):
x split /../, 112233445566, -1 [ -1 to preserve all found fields ]
0 '' 1 '' 2 '' 3 '' 4 '' 5 '' 6 ''
Split uses the regular expression to find "seperators" in the text, and then return the contents of the fields between them. The above case looks like this:
sep sep sep sep sep sep | | | | | | 11 22 33 44 55 66 | | | | | | field field field field field field
Ok, let's try that with your second example:
x split /../, 11223, -1
0 '' 1 '' 2 3
sep sep | | 11 22 3 | | | field field field
Now, if the regular expression contains parentheses, additional list elements are created from each matching substring (quoted almost verbatim from TFM). So:
And of course, if we remove the LIMIT from the equation, then any trailing fields will be removed. Ergo the results quoted at the top of this email. Hope this helps you (and anyone else who might have been confused) understand what is going on.
> JSD> which currently generates a list of ('','12','34','56','78','90') > JSD> In perl5 it would generate a list of ('','90') because only the last > JSD> pair of characters matched is kept (such is the nature of quantifiers > JSD> applied to capturing parens). But in perl6 quantified captures put all > JSD> of the matches into an array such that "abcdef" ~~ /(..)*/ will make > JSD> $0 = ['ab','cd','ef'].
> JSD> I think that the above split should generate a list like this:
> JSD> ('', [ '12','34','56','78','90'])
> i disagree. if you want complex tree results, use a rule.
Well ... we *are* using a rule; it just doesn't have a name.
So, would you advocate too that
my @a = "foofoofoobarbarbar" ~~ /(foo)+ (bar)+/;
should flatten? thus @a = ('foo','foo','foo','bar','bar','bar') rather than (['foo','foo','foo'],['bar','bar','bar]) ?
This may have even been discussed before but we should probably make the determination as to whether or not we keep the delimiters be something other than the presence or absense of parentheses in the pattern. Perhaps the flattening/non-flattening behavior could be modulated the same way. Probably as a modifier to split
> split is for creating a single list of elements from a string. it is > better keep split simple for it is commonly used in this domain.
I'll wager that splits with non-capturing patterns are far and away the most common case. :-)
On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote: > On Thu, May 12, 2005 at 12:03:55PM -0500, Jonathan Scott Duff wrote: > : I think that the above split should generate a list like this: > : > : ('', [ '12','34','56','78','90'])
> Yes, though I would think of it more generally as
> ('', $0, '', $0, '', $0, ...)
> where in this case it just happens to be
> ('', $0)
> and $0 expands to ['12','34','56','78','90'] if you treat it as an array.
Exactly so. Principle of least surprise wins again! ;)
On Thu, May 12, 2005 at 02:56:37PM -0500, Jonathan Scott Duff wrote: > On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote: > > Yes, though I would think of it more generally as
> > ('', $0, '', $0, '', $0, ...)
> > where in this case it just happens to be
> > ('', $0)
> > and $0 expands to ['12','34','56','78','90'] if you treat it as an array.
> Exactly so. Principle of least surprise wins again! ;)
On Fri, May 13, 2005 at 04:05:23AM +0800, Autrijus Tang wrote: > > On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote: > > > Yes, though I would think of it more generally as
> > > ('', $0, '', $0, '', $0, ...)
> > > where in this case it just happens to be
> > > ('', $0)
> > > and $0 expands to ['12','34','56','78','90'] if you treat it as an array.
Sorry if I'm getting ahead of the implementation but if it is returning $0 then shouldn't ref($0) return ::Rule::Result or somesuch? It would just look like an ::Array::Const if you treat it as such.
On Thu, May 12, 2005 at 08:33:40PM -0400, Rick Delaney wrote: > Sorry if I'm getting ahead of the implementation but if it is returning > $0 then shouldn't ref($0) return ::Rule::Result or somesuch? It would > just look like an ::Array::Const if you treat it as such.
...also note that the $0 here is $/[0], also known as Perl 5's $1...
Indeed, the entire match result, that is $/, will always be a single ::Match object if a match succeeds.
On Thu, May 12, 2005 at 08:33:40PM -0400, Rick Delaney wrote: > On Fri, May 13, 2005 at 04:05:23AM +0800, Autrijus Tang wrote: > > pugs> map { ref $_ } split /(..)*/, 1234567890 > > (::Str, ::Array::Const)
> Sorry if I'm getting ahead of the implementation but if it is returning > $0 then shouldn't ref($0) return ::Rule::Result or somesuch? It would > just look like an ::Array::Const if you treat it as such.
Er, where does this ::Rule::Result thing come from?
I was basing my implementation on Damian's:
Quantifiers (except C<?> and C<??>) cause a matched subrule or subpattern to return an array of C<Match> objects, instead of just a single object.
As well as the PGE's implementation of treating the quantified capture as a simple PerlArray PMC.
Rick Delaney wrote: > On Fri, May 13, 2005 at 04:05:23AM +0800, Autrijus Tang wrote:
>>>On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote:
>>>>Yes, though I would think of it more generally as
>>>> ('', $0, '', $0, '', $0, ...)
>>>>where in this case it just happens to be
>>>> ('', $0)
>>>>and $0 expands to ['12','34','56','78','90'] if you treat it as an array.
I don't understand this comment. The $0 here is an array of match-objects and when treated as array it returns an array of match-objects, not an array of strings. (see below)
> Sorry if I'm getting ahead of the implementation but if it is returning > $0 then shouldn't ref($0) return ::Rule::Result or somesuch? It would > just look like an ::Array::Const if you treat it as such.
With pugs (r2917) this doesn't return an Array of Strings but an Array of Match-objects:
No, it's not inconsistant. Think about the simpler case split /a/,'aaaaa' which return a list of empty strings. Now ask to keep the separators split /(a), 'aaaaa' which will return ('', 'a', '', 'a', '', 'a', '', 'a, '', 'a'). Now look at split /(a)/, 'aaab' which returns ('', 'a', '', 'a', '', 'a', 'b'). not no empty string ebfore the 'b'.
In the case of split /(..)/, "12345678" all those pairs of digits are all spearators so again you get empty strings aternating with digit pairs. If the number of digits is odd the lat on isn't a separator so it takes the place of the final empty string and there won;t be a empty string in the list before it, I.e, split /(..)/, 12345 returns (''. '12', '', '34', '5');
This is another of those cases where the computer did exactly what you ask it to.
-- Mark Biggar m...@biggar.org mark.a.big...@comcast.net mbig...@paypal.com
> For longer strings it makes every other match an empt string. > With the "Positions between chars" interpretation the above > string is with '.' indication position:
> There are two matches each at 0, 2, 4, 6, 8 and 10. > The empty match at the end seams to be skipped because > position 12 is after the string? And for odd numbers of > chars the before last position doesn't produce an empty > match: > perl -le 'print join ",", split /(..)/, 11223' > ,11,,22,3
> Am I the only one who finds that inconsistent? > -- > TSa (Thomas Sandlaß)