Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
split /(..)*/, 1234567890
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  17 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Autrijus Tang  
View profile  
 More options May 10 2005, 10:53 am
Newsgroups: perl.perl6.language
From: autri...@autrijus.org (Autrijus Tang)
Date: Tue, 10 May 2005 22:53:35 +0800
Local: Tues, May 10 2005 10:53 am
Subject: split /(..)*/, 1234567890

In Pugs, the current logic for array submatches in split() is
to stringify each element, and return them separately in the
resulting list.  To wit:

    pugs> split /(..)*/, 1234567890
    ('', '12', '34', '56', '78', '90')

Is this sane?

Thanks,
/Autrijus/

  application_pgp-signature_part
< 1K Download

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autrijus Tang  
View profile  
 More options May 12 2005, 11:59 am
Newsgroups: perl.perl6.language
From: autri...@autrijus.org (Autrijus Tang)
Date: Thu, 12 May 2005 23:59:26 +0800
Local: Thurs, May 12 2005 11:59 am
Subject: Re: split /(..)*/, 1234567890

On Thu, May 12, 2005 at 04:53:06PM +0200, "TSa (Thomas Sandla )" wrote:
> Autrijus Tang wrote:
> >    pugs> split /(..)*/, 1234567890
> >    ('', '12', '34', '56', '78', '90')

> >Is this sane?

> Why the empty string match at the start?

I don't know, I didn't invent that! :-)

    $ perl -le 'print join ",", split /(..)/, 123'
    ,12,3

Thanks,
/Autrijus/

  application_pgp-signature_part
< 1K Download

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Storrs  
View profile  
 More options May 12 2005, 12:22 pm
Newsgroups: perl.perl6.language
From: dsto...@dstorrs.com (David Storrs)
Date: Thu, 12 May 2005 12:22:46 -0400
Local: Thurs, May 12 2005 12:22 pm
Subject: Re: split /(..)*/, 1234567890
On May 12, 2005, at 11:59 AM, Autrijus Tang wrote:

> On Thu, May 12, 2005 at 04:53:06PM +0200, "TSa (Thomas Sandla )"  
> wrote:
>> Autrijus Tang wrote:

>>>    pugs> split /(..)*/, 1234567890
>>>    ('', '12', '34', '56', '78', '90')

>>> Is this sane?

>> Why the empty string match at the start?

> I don't know, I didn't invent that! :-)

>     $ perl -le 'print join ",", split /(..)/, 123'
>     ,12,3

This makes sense when I think about what split is doing, but it is  
surprising at first glance.  Perhaps this should be included as an  
example in the docs?

--Dks


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aaron Sherman  
View profile  
 More options May 12 2005, 12:59 pm
Newsgroups: perl.perl6.language
From: a...@ajs.com (Aaron Sherman)
Date: Thu, 12 May 2005 12:59:20 -0400
Local: Thurs, May 12 2005 12:59 pm
Subject: Re: split /(..)*/, 1234567890

perldoc -f split says:

        "Splits a string into a list of strings and returns that list.
        By default, empty leading fields are preserved, and empty
        trailing ones are deleted [...] If PATTERN is also omitted,
        splits on whitespace (after skipping any leading whitespace).
        [...] Empty leading (or trailing) fields are produced when there
        are positive width matches at the beginning (or end) of the
        string [...] As a special case, specifying a PATTERN of space ('
        ') will split on white space just as "split" with no arguments
        does. Thus, "split(' ')" can be used to emulate awk's default
        behavior, whereas "split(/ /)" will give you as many null
        initial fields as there are leading spaces [...]"

And there you have it.

--
Aaron Sherman <a...@ajs.com>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Uri Guttman  
View profile  
 More options May 12 2005, 1:12 pm
Newsgroups: perl.perl6.language
From: u...@stemsystems.com (Uri Guttman)
Date: Thu, 12 May 2005 13:12:26 -0400
Local: Thurs, May 12 2005 1:12 pm
Subject: Re: split /(..)*/, 1234567890

>>>>> "JSD" == Jonathan Scott Duff <d...@pobox.com> writes:

  JSD> To bring this back to perl6, autrijus' original query was regarding

  JSD>       $ pugs -e 'say join ",", split /(..)*/, 1234567890'

  JSD> which currently generates a list of ('','12','34','56','78','90')
  JSD> In perl5 it would generate a list of ('','90') because only the last
  JSD> pair of characters matched is kept (such is the nature of quantifiers
  JSD> applied to capturing parens). But in perl6 quantified captures put all
  JSD> of the matches into an array such that "abcdef" ~~ /(..)*/ will make
  JSD> $0 = ['ab','cd','ef'].

  JSD> I think that the above split should generate a list like this:

  JSD>       ('', [ '12','34','56','78','90'])

i disagree. if you want complex tree results, use a rule. split is for
creating a single list of elements from a string. it is better keep
split simple for it is commonly used in this domain. tree results are
more for real parsing (which split is not intended to do) so use a
parsing rule for that.

also note the coding style rule (i think randal created it) which is to
use split when you want to throw things away (the delimiters) and m//
when you want to keep thinks.

uri

--
Uri Guttman  ------  u...@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jonathan Scott Duff  
View profile  
 More options May 12 2005, 1:03 pm
Newsgroups: perl.perl6.language
From: d...@pobox.com (Jonathan Scott Duff)
Date: Thu, 12 May 2005 12:03:55 -0500
Local: Thurs, May 12 2005 1:03 pm
Subject: Re: split /(..)*/, 1234567890

On Thu, May 12, 2005 at 06:29:49PM +0200, "TSa (Thomas Sandlaß)" wrote:
> Autrijus Tang wrote:
> >I don't know, I didn't invent that! :-)

> >    $ perl -le 'print join ",", split /(..)/, 123'
> >    ,12,3

> Hmm,

> perl -le 'print join ",", split /(..)/, 112233445566'
> ,11,,22,,33,,44,,55,,66

> For longer strings it makes every other match an empt string.

Not quite. The matching part are the strings "11", "22", "33", etc.
And since what matches is what we're splitting on, we get the empty
string between pairs of characters (including the leading empty
string).    The only reason you're getting the string that was matched
in the output is because that's what you've asked split to do by
placing parens around the pattern.  (Type "perldoc -f split" at your
command prompt and read all about it)

To bring this back to perl6, autrijus' original query was regarding

        $ pugs -e 'say join ",", split /(..)*/, 1234567890'

which currently generates a list of ('','12','34','56','78','90')
In perl5 it would generate a list of ('','90') because only the last
pair of characters matched is kept (such is the nature of quantifiers
applied to capturing parens). But in perl6 quantified captures put all
of the matches into an array such that "abcdef" ~~ /(..)*/ will make
$0 = ['ab','cd','ef'].

I think that the above split should generate a list like this:

        ('', [ '12','34','56','78','90'])

Or, another example:

        $ pugs -e 'say join ",", split /(<[abc]>)*/, "xabxbxbcx"'
        # ('x', ['a','b'], 'x', ['b'], 'x', ['b','c'], 'x')

But that's just MHO.

> With the "Positions between chars" interpretation the above
> string is with '.' indication position:

> .1.1.2.2.3.3.4.4.5.5.6.6.
> 0 1 2 3 4 5 6 7 8 9 1 1 1
>                     0 1 2

> There are two matches each at 0, 2, 4, 6, 8 and 10.
> The empty match at the end seams to be skipped because
> position 12 is after the string?

No, the empty match at the end is skipped because that's the default
behaviour of split.  Preserve leading empty fields and discard empty
trailing ones.

> And for odd numbers of
> chars the before last position doesn't produce an empty
> match:
> perl -le 'print join ",", split /(..)/, 11223'
> ,11,,22,3

There's an empty field between the beginning of the string and "11",
there's an empty field between the "11" and the "22", and finally
there's a field at the end containing only "3"

> Am I the only one who finds that inconsistent?

Probably.  :-)

-Scott
--
Jonathan Scott Duff
d...@pobox.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jody Belka  
View profile  
 More options May 12 2005, 1:13 pm
Newsgroups: perl.perl6.language
From: lists...@pimb.org (Jody Belka)
Date: Thu, 12 May 2005 19:13:22 +0200
Local: Thurs, May 12 2005 1:13 pm
Subject: Re: split /(..)*/, 1234567890

On Thu, May 12, 2005 at 06:29:49PM +0200, "TSa (Thomas Sandla?)" wrote:
> perl -le 'print join ",", split /(..)/, 112233445566'
> ,11,,22,,33,,44,,55,,66
[snipped]
> perl -le 'print join ",", split /(..)/, 11223'
> ,11,,22,3

> Am I the only one who finds that inconsistent?

Maybe, but it's because you're misunderstanding what split does (i can
heartily recommend TFM in this case).

Let's start with a simpler case (inside debugger for help):

x split /../, 112233445566, -1           [ -1 to preserve all found fields ]

0  ''
1  ''
2  ''
3  ''
4  ''
5  ''
6  ''

Split uses the regular expression to find "seperators" in the text, and
then return the contents of the fields between them. The above case looks
like this:

     sep    sep    sep    sep    sep    sep
     |      |      |      |      |      |
     11     22     33     44     55     66
  |      |      |      |      |      |
field  field  field  field  field  field

Ok, let's try that with your second example:

x split /../, 11223, -1

0 ''
1 ''
2 3

     sep    sep
     |      |
     11     22  3
  |      |      |
field  field  field

Now, if the regular expression contains parentheses, additional list
elements are created from each matching substring (quoted almost verbatim
from TFM). So:

x split /(..)/, 112233445566, -1

0  ''
1  11
2  ''
3  22
4  ''
5  33
6  ''
7  44
8  ''
9  55
10  ''
11  66
12  ''

x split /(..)/, 11223, -1

0  ''
1  11
2  ''
3  22
4  3

And of course, if we remove the LIMIT from the equation, then any trailing
fields will be removed. Ergo the results quoted at the top of this email.
Hope this helps you (and anyone else who might have been confused) understand
what is going on.

J

--
Jody Belka
knew (at) pimb (dot) org


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jonathan Scott Duff  
View profile  
 More options May 12 2005, 1:50 pm
Newsgroups: perl.perl6.language
From: d...@pobox.com (Jonathan Scott Duff)
Date: Thu, 12 May 2005 12:50:08 -0500
Local: Thurs, May 12 2005 1:50 pm
Subject: Re: split /(..)*/, 1234567890

Well ... we *are* using a rule; it just doesn't have a name.

So, would you advocate too that

        my @a = "foofoofoobarbarbar" ~~ /(foo)+ (bar)+/;

should flatten? thus @a = ('foo','foo','foo','bar','bar','bar')
rather than (['foo','foo','foo'],['bar','bar','bar]) ?

This may have even been discussed before but we should probably make
the determination as to whether or not we keep the delimiters be
something other than the presence or absense of parentheses in the
pattern.  Perhaps the flattening/non-flattening behavior could be
modulated the same way.  Probably as a modifier to split

> split is for creating a single list of elements from a string. it is
> better keep split simple for it is commonly used in this domain.

I'll wager that splits with non-capturing patterns are far and away the
most common case. :-)

-Scott
--
Jonathan Scott Duff
d...@pobox.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Larry Wall  
View profile  
 More options May 12 2005, 3:01 pm
Newsgroups: perl.perl6.language
From: la...@wall.org (Larry Wall)
Date: Thu, 12 May 2005 12:01:59 -0700
Local: Thurs, May 12 2005 3:01 pm
Subject: Re: split /(..)*/, 1234567890
On Thu, May 12, 2005 at 12:03:55PM -0500, Jonathan Scott Duff wrote:

: I think that the above split should generate a list like this:
:
:       ('', [ '12','34','56','78','90'])

Yes, though I would think of it more generally as

    ('', $0, '', $0, '', $0, ...)

where in this case it just happens to be

    ('', $0)

and $0 expands to ['12','34','56','78','90'] if you treat it as an array.

Larry


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jonathan Scott Duff  
View profile  
 More options May 12 2005, 3:56 pm
Newsgroups: perl.perl6.language
From: d...@pobox.com (Jonathan Scott Duff)
Date: Thu, 12 May 2005 14:56:37 -0500
Local: Thurs, May 12 2005 3:56 pm
Subject: Re: split /(..)*/, 1234567890

On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote:
> On Thu, May 12, 2005 at 12:03:55PM -0500, Jonathan Scott Duff wrote:
> : I think that the above split should generate a list like this:
> :
> :  ('', [ '12','34','56','78','90'])

> Yes, though I would think of it more generally as

>     ('', $0, '', $0, '', $0, ...)

> where in this case it just happens to be

>     ('', $0)

> and $0 expands to ['12','34','56','78','90'] if you treat it as an array.

Exactly so.  Principle of least surprise wins again! ;)

-Scott
--
Jonathan Scott Duff
d...@pobox.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autrijus Tang  
View profile  
 More options May 12 2005, 4:05 pm
Newsgroups: perl.perl6.language
From: autri...@autrijus.org (Autrijus Tang)
Date: Fri, 13 May 2005 04:05:23 +0800
Local: Thurs, May 12 2005 4:05 pm
Subject: Re: split /(..)*/, 1234567890

On Thu, May 12, 2005 at 02:56:37PM -0500, Jonathan Scott Duff wrote:
> On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote:
> > Yes, though I would think of it more generally as

> >     ('', $0, '', $0, '', $0, ...)

> > where in this case it just happens to be

> >     ('', $0)

> > and $0 expands to ['12','34','56','78','90'] if you treat it as an array.

> Exactly so.  Principle of least surprise wins again! ;)

Thanks, implemented as such.

    pugs> map { ref $_ } split /(..)*/, 1234567890
    (::Str, ::Array::Const)

Thanks,
/Autrijus/

  application_pgp-signature_part
< 1K Download

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rick Delaney  
View profile  
 More options May 12 2005, 8:33 pm
Newsgroups: perl.perl6.language
From: r...@bort.ca (Rick Delaney)
Date: Thu, 12 May 2005 20:33:40 -0400
Local: Thurs, May 12 2005 8:33 pm
Subject: Re: split /(..)*/, 1234567890

Sorry if I'm getting ahead of the implementation but if it is returning
$0 then shouldn't ref($0) return ::Rule::Result or somesuch?  It would
just look like an ::Array::Const if you treat it as such.

--
Rick Delaney
r...@bort.ca


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autrijus Tang  
View profile  
 More options May 12 2005, 9:19 pm
Newsgroups: perl.perl6.language
From: autri...@autrijus.org (Autrijus Tang)
Date: Fri, 13 May 2005 09:19:51 +0800
Local: Thurs, May 12 2005 9:19 pm
Subject: Re: split /(..)*/, 1234567890

On Thu, May 12, 2005 at 08:33:40PM -0400, Rick Delaney wrote:
> Sorry if I'm getting ahead of the implementation but if it is returning
> $0 then shouldn't ref($0) return ::Rule::Result or somesuch?  It would
> just look like an ::Array::Const if you treat it as such.

...also note that the $0 here is $/[0], also known as Perl 5's $1...

Indeed, the entire match result, that is $/, will always be a
single ::Match object if a match succeeds.

Thanks,
/Autrijus/

  application_pgp-signature_part
< 1K Download

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autrijus Tang  
View profile  
 More options May 12 2005, 9:17 pm
Newsgroups: perl.perl6.language
From: autri...@autrijus.org (Autrijus Tang)
Date: Fri, 13 May 2005 09:17:27 +0800
Local: Thurs, May 12 2005 9:17 pm
Subject: Re: split /(..)*/, 1234567890

On Thu, May 12, 2005 at 08:33:40PM -0400, Rick Delaney wrote:
> On Fri, May 13, 2005 at 04:05:23AM +0800, Autrijus Tang wrote:
> >     pugs> map { ref $_ } split /(..)*/, 1234567890
> >     (::Str, ::Array::Const)

> Sorry if I'm getting ahead of the implementation but if it is returning
> $0 then shouldn't ref($0) return ::Rule::Result or somesuch?  It would
> just look like an ::Array::Const if you treat it as such.

Er, where does this ::Rule::Result thing come from?

I was basing my implementation on Damian's:

    Quantifiers (except C<?> and C<??>) cause a matched subrule or
    subpattern to return an array of C<Match> objects, instead of just a
    single object.

As well as the PGE's implementation of treating the quantified capture as a
simple PerlArray PMC.

Thanks,
/Autrijus/

  application_pgp-signature_part
< 1K Download

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Markus Laire  
View profile  
 More options May 13 2005, 4:21 am
Newsgroups: perl.perl6.language
From: mar...@laire.info (Markus Laire)
Date: Fri, 13 May 2005 11:21:51 +0300
Local: Fri, May 13 2005 4:21 am
Subject: Re: split /(..)*/, 1234567890

Rick Delaney wrote:
> On Fri, May 13, 2005 at 04:05:23AM +0800, Autrijus Tang wrote:

>>>On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote:

>>>>Yes, though I would think of it more generally as

>>>>    ('', $0, '', $0, '', $0, ...)

>>>>where in this case it just happens to be

>>>>    ('', $0)

>>>>and $0 expands to ['12','34','56','78','90'] if you treat it as an array.

I don't understand this comment. The $0 here is an array of
match-objects and when treated as array it returns an array of
match-objects, not an array of strings. (see below)

>>Thanks, implemented as such.

>>    pugs> map { ref $_ } split /(..)*/, 1234567890
>>    (::Str, ::Array::Const)

> Sorry if I'm getting ahead of the implementation but if it is returning
> $0 then shouldn't ref($0) return ::Rule::Result or somesuch?  It would
> just look like an ::Array::Const if you treat it as such.

With pugs (r2917) this doesn't return an Array of Strings but an Array
of Match-objects:

     pugs> map { ref $_ } split /(..)*/, 1234567890
     (::Str, ::Array::Const)

     pugs> map { ref $_ } [split /(..)*/, 1234567890][1]
     (::Match, ::Match, ::Match, ::Match, ::Match)
     pugs> map { ~$_ } [split /(..)*/, 1234567890][1]
     ('12', '34', '56', '78', '90')
     pugs> map { $_.from } [split /(..)*/, 1234567890][1]
     (0, 2, 4, 6, 8)

--
Markus Laire
<Jam. 1:5-6>


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jody Belka  
View profile  
 More options May 12 2005, 1:23 pm
Newsgroups: perl.perl6.language
From: lists...@pimb.org (Jody Belka)
Date: Thu, 12 May 2005 19:23:59 +0200
Local: Thurs, May 12 2005 1:23 pm
Subject: Re: split /(..)*/, 1234567890

On Thu, May 12, 2005 at 07:13:22PM +0200, Jody Belka wrote:
>      sep    sep    sep    sep    sep    sep
>      |      |      |      |      |      |
>      11     22     33     44     55     66
>   |      |      |      |      |      |
> field  field  field  field  field  field

whoops. add an extra field component in at the end of that of course.

J

--
Jody Belka
knew (at) pimb (dot) org


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark A Biggar  
View profile  
 More options May 12 2005, 2:21 pm
Newsgroups: perl.perl6.language
From: mark.a.big...@comcast.net (Mark A Biggar)
Date: Thu, 12 May 2005 18:21:33 +0000
Local: Thurs, May 12 2005 2:21 pm
Subject: Re: split /(..)*/, 1234567890
No, it's not inconsistant.  Think about the simpler case split /a/,'aaaaa' which return a list of empty strings.  Now ask to keep the separators
split /(a), 'aaaaa' which will return ('', 'a', '', 'a', '', 'a', '', 'a, '', 'a').  Now look at
split /(a)/, 'aaab' which returns ('', 'a', '', 'a', '', 'a', 'b'). not no empty string ebfore the 'b'.

In the case of split /(..)/, "12345678" all those pairs of digits are all spearators so again you get  empty strings aternating with digit pairs.  If the number of digits is odd the lat on isn't  a separator so it takes the place of the final empty string and there won;t be a empty string in the list before it, I.e,
split /(..)/, 12345 returns (''. '12', '', '34', '5');

This is another of those cases where the computer did exactly what you ask it to.

--
Mark Biggar
m...@biggar.org
mark.a.big...@comcast.net
mbig...@paypal.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google