Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[perl #116086] split "\x20" doesn't work as documented

46 views
Skip to first unread message

James E Keenan via RT

unread,
Dec 13, 2012, 9:37:19 PM12/13/12
to perl5-...@perl.org
On Thu Dec 13 09:32:33 2012, estrai wrote:
> This is a bug report for perl from est...@estrai.com,
> generated with the help of perlbug 1.39 running under perl 5.17.7.
>
>
> -----------------------------------------------------------------
>
> Hi,
> split() has a special case for " " and "\x20" so they work like \s+
> (see perldoc -f split for details)
> In blead (also observed on 5.17.5) "\x20" doesn't seem to work as
> documented.
>
> * " " works as expected:
>
> % perl -MData::Dumper -E '$_=" a b c "; print Dumper [ split " "
> ]'
> $VAR1 = [
> 'a',
> 'b',
> 'c'
> ];
>
> * "\x20" doesn't:
>
> % perl -MData::Dumper -E '$_=" a b c "; print Dumper [ split
> "\x20" ]'
> $VAR1 = [
> '',
> '',
> 'a',
> '',
> '',
> 'b',
> 'c'
> ];
>
> estrai.
>

On 5.16.0 (at least), I get different results from yours:

#####
$ perl -MData::Dumper -E '$_ = "\x20a\x20b\x20c\x20";@r=split /\x20/;say
Dumper \@r;'
$VAR1 = [
'',
'a',
'b',
'c'
];

$ perl -MData::Dumper -E '$_ = " a b c ";@r=split /\x20/;say Dumper \@r;'
$VAR1 = [
'',
'a',
'b',
'c'
];
#####

Nonetheless, these results appear to fail to match the documentation in
a *different* way from yours, viz., they fail the specification that
"any leading whitespace in EXPR is removed before splitting occurs."

Thank you very much.
Jim Keenan

---
via perlbug: queue: perl5 status: new
https://rt.perl.org:443/rt3/Ticket/Display.html?id=116086

Eric Brine

unread,
Dec 13, 2012, 10:31:18 PM12/13/12
to perlbug-...@perl.org, perl5-...@perl.org
On Thu, Dec 13, 2012 at 9:37 PM, James E Keenan via RT <perlbug-...@perl.org> wrote:
Nonetheless, these results appear to fail to match the documentation in
a *different* way from yours, viz., they fail the specification that
"any leading whitespace in EXPR is removed before splitting occurs."

That only applies when you pass a *string* consisting of a space. You are passing a regex. The document is quite clear about this.

Dave Mitchell

unread,
Dec 14, 2012, 8:06:35 AM12/14/12
to Eric Brine, perlbug-...@perl.org, perl5-...@perl.org
I can confirm that it broke between 5.17.4 and 5.17.5:

$ perl5174 -e 'print "[$_]\n" for split "\x20", " a b c " '
[a]
[b]
[c]
$ perl5175 -e 'print "[$_]\n" for split "\x20", " a b c " '
[]
[a]
[b]
[]
[c]
$

and that the old behaviour worked back until at least 5_004_05.


--
The Enterprise's efficient long-range scanners detect a temporal vortex
distortion in good time, allowing it to be safely avoided via a minor
course correction.
-- Things That Never Happen in "Star Trek" #21

Father Chrysostomos via RT

unread,
Dec 14, 2012, 9:19:01 AM12/14/12
to perl5-...@perl.org
On Fri Dec 14 05:07:24 2012, davem wrote:
> On Thu, Dec 13, 2012 at 10:31:18PM -0500, Eric Brine wrote:
> > On Thu, Dec 13, 2012 at 9:37 PM, James E Keenan via RT <
> > perlbug-...@perl.org> wrote:
> >
> > > Nonetheless, these results appear to fail to match the
documentation in
> > > a *different* way from yours, viz., they fail the specification that
> > > "any leading whitespace in EXPR is removed before splitting occurs."
> > >
> >
> > That only applies when you pass a *string* consisting of a space.
You are
> > passing a regex. The document is quite clear about this.
>
> I can confirm that it broke between 5.17.4 and 5.17.5:
>
> $ perl5174 -e 'print "[$_]\n" for split "\x20", " a b c " '
> [a]
> [b]
> [c]
> $ perl5175 -e 'print "[$_]\n" for split "\x20", " a b c " '
> []
> [a]
> [b]
> []
> [c]
> $
>
> and that the old behaviour worked back until at least 5_004_05.

I probably broke that (unintentionally).

--

Father Chrysostomos


---
via perlbug: queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=116086

Nicholas Clark

unread,
Dec 14, 2012, 9:25:29 AM12/14/12
to Father Chrysostomos via RT, perl5-...@perl.org
On Fri, Dec 14, 2012 at 06:19:01AM -0800, Father Chrysostomos via RT wrote:
> On Fri Dec 14 05:07:24 2012, davem wrote:

> > I can confirm that it broke between 5.17.4 and 5.17.5:
> >
> > $ perl5174 -e 'print "[$_]\n" for split "\x20", " a b c " '
> > [a]
> > [b]
> > [c]
> > $ perl5175 -e 'print "[$_]\n" for split "\x20", " a b c " '
> > []
> > [a]
> > [b]
> > []
> > [c]
> > $
> >
> > and that the old behaviour worked back until at least 5_004_05.
>
> I probably broke that (unintentionally).

.../Porting/bisect.pl --target miniperl -e '@a = split "\x20", " a b c "; die if @a > 3'

thinks that you're right:

commit 5255171e6cd0accee6f76ea2980e32b3b5b8e171
Author: Father Chrysostomos <spr...@cpan.org>
Date: Sat Sep 22 17:54:12 2012 -0700

[perl #94490] const fold should not trigger special split " "

The easiest way to fix this was to move the special handling out of
the regexp engine. Now a flag is set on the split op itself for
this case. A real regexp is still created, as that is the most
convenient way to propagate locale settings, and it prevents the
need to rework pp_split to handle a null regexp.

This also means that custom regexp plugins no longer need to handle
split specially (which they all do currently).


I'm surprised that there wasn't a regression test that caught this.

Nicholas Clark

Father Chrysostomos via RT

unread,
Dec 15, 2012, 8:29:48 AM12/15/12
to perl5-...@perl.org
On Fri Dec 14 19:03:21 2012, jkeenan wrote:
> Here is a patch for the regression test part of the solution. Please
> review.

Thank you. That looks good to me.

--

Father Chrysostomos

James E Keenan via RT

unread,
Dec 15, 2012, 2:40:33 PM12/15/12
to perl5-...@perl.org
On Sat Dec 15 05:29:48 2012, sprout wrote:
> On Fri Dec 14 19:03:21 2012, jkeenan wrote:
> > Here is a patch for the regression test part of the solution. Please
> > review.
>
> Thank you. That looks good to me.
>

Patch applied in cd346b2859236d69de687d1baa46c23e19af2202.

We still need the fix for the problem.

Note that some of the tests added were TODO-ed. This may need
mentioning in pod/perldelta.pod if the underlying problem is not fixed
before the upcoming release.

Thank you very much.
Jim Keenan




Daniel Lukasiak

unread,
Dec 13, 2012, 12:32:33 PM12/13/12
to bugs-bi...@rt.perl.org
# New Ticket Created by Daniel Lukasiak
# Please include the string: [perl #116086]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=116086 >


This is a bug report for perl from est...@estrai.com,
generated with the help of perlbug 1.39 running under perl 5.17.7.


-----------------------------------------------------------------

Hi,
split() has a special case for " " and "\x20" so they work like \s+
(see perldoc -f split for details)
In blead (also observed on 5.17.5) "\x20" doesn't seem to work as documented.

* " " works as expected:

% perl -MData::Dumper -E '$_=" a b c "; print Dumper [ split " " ]'
$VAR1 = [
'a',
'b',
'c'
];

* "\x20" doesn't:

% perl -MData::Dumper -E '$_=" a b c "; print Dumper [ split "\x20" ]'
$VAR1 = [
'',
'',
'a',
'',
'',
'b',
'c'
];

estrai.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=low
---
Site configuration information for perl 5.17.7:

Configured by estrai at Thu Dec 13 15:39:20 GMT 2012.

Summary of my perl5 (revision 5 version 17 subversion 7) configuration:
Snapshot of: 71446f2da4f4887cb6b60b4ddee4754faee70d3d
Platform:
osname=linux, osvers=2.6.32-5-amd64, archname=x86_64-linux
uname='linux roar 2.6.32-5-amd64 #1 smp sun sep 23 10:07:46 utc
2012 x86_64 gnulinux '
config_args='-de
-Dprefix=/home/estrai/perl5/perlbrew/perls/perl-blead -Dusedevel
-Aeval:scriptdir=/home/estrai/perl5/perlbrew/perls/perl-blead/bin'
hint=recommended, useposix=true, d_sigaction=define
useithreads=undef, usemultiplicity=undef
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2',
cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.4.5', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/../lib /usr/lib/../lib /lib /usr/lib
/usr/lib/x86_64-linux-gnu /lib64 /usr/lib64
libs=-lnsl -ldb -ldl -lm -lcrypt -lutil -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.11.3.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.11.3'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'

Locally applied patches:


---
@INC for perl 5.17.7:
/home/estrai/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.17.7/x86_64-linux
/home/estrai/perl5/perlbrew/perls/perl-blead/lib/site_perl/5.17.7
/home/estrai/perl5/perlbrew/perls/perl-blead/lib/5.17.7/x86_64-linux
/home/estrai/perl5/perlbrew/perls/perl-blead/lib/5.17.7
.

---
Environment for perl 5.17.7:
HOME=/home/estrai
LANG=en_GB.utf8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/estrai/perl5/perlbrew/bin:/home/estrai/perl5/perlbrew/perls/perl-blead/bin:/home/estrai/.rbenv/shims:/home/estrai/bin:/Applications/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11/bin:/home/estrai/.rbenv/shims:/home/estrai/bin:/Applications/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11/bin:/home/estrai/bin:/Application/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/estrai/.rbenv/bin:/home/estrai/.rbenv/bin
PERLBREW_BASHRC_VERSION=0.53
PERLBREW_HOME=/home/estrai/.perlbrew
PERLBREW_MANPATH=/home/estrai/perl5/perlbrew/perls/perl-blead/man
PERLBREW_PATH=/home/estrai/perl5/perlbrew/bin:/home/estrai/perl5/perlbrew/perls/perl-blead/bin
PERLBREW_PERL=perl-blead
PERLBREW_ROOT=/home/estrai/perl5/perlbrew
PERLBREW_VERSION=0.53
PERL_BADLANG (unset)
SHELL=/usr/bin/zsh

James E Keenan via RT

unread,
Feb 16, 2013, 11:00:05 AM2/16/13
to perl5-...@perl.org
On Sat Dec 15 11:40:33 2012, jkeenan wrote:
> On Sat Dec 15 05:29:48 2012, sprout wrote:
> > On Fri Dec 14 19:03:21 2012, jkeenan wrote:
> > > Here is a patch for the regression test part of the solution. Please
> > > review.
> >
> > Thank you. That looks good to me.
> >
>
> Patch applied in cd346b2859236d69de687d1baa46c23e19af2202.
>
> We still need the fix for the problem.
>
> Note that some of the tests added were TODO-ed. This may need
> mentioning in pod/perldelta.pod if the underlying problem is not fixed
> before the upcoming release.
>
> Thank you very much.
> Jim Keenan
>
>


The source code in the files that Father C touched in his patch has
shifted around quite a bit. Anyone who wanted to take up this RT might
start by looking at these locations:

op.c 9758
pp.c 5316 5349
regcomp.c 6326
regen/regcomp.pl 266
regexp.h 426

demerphq

unread,
Feb 16, 2013, 8:15:54 PM2/16/13
to perl5-...@perl.org, bugs-bi...@rt.perl.org
On 13 December 2012 18:32, Daniel Lukasiak <perlbug-...@perl.org> wrote:
> # New Ticket Created by Daniel Lukasiak
> # Please include the string: [perl #116086]
> # in the subject line of all future correspondence about this issue.
> # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=116086 >
>
>
> This is a bug report for perl from est...@estrai.com,
> generated with the help of perlbug 1.39 running under perl 5.17.7.
>
>
> -----------------------------------------------------------------
>
> Hi,
> split() has a special case for " " and "\x20" so they work like \s+

Umm. I wasn't aware that we document that "\x20" works the same as " ".

It used to, as an implementation accident, but I don't believe that we
document that it should.

The docs look like this:

As a special case, specifying a PATTERN of space (' ')
will split on white space just as "split" with no arguments does.
Thus,
"split(' ')" can be used to emulate awk's default
behavior, whereas "split(/ /)" will give you as many initial null
fields (empty
string) as there are leading spaces. A "split" on
"/\s+/" is like a "split(' ')" except that any leading whitespace
produces a
null first field. A "split" with no arguments really
does a "split(' ', $_)" internally.

That doesn't say "\x20" works the same.

We changed which level of the perl parser handles escapes intended for
the regex engine.

Previous to this the \x20 would be resolved to a space, and as far as
the regex engine was concerned the pattern would be " ".

After this change the \x20 would be delivered to the regex engine
verbatim and the \x20 form would not be recognized by the heuristic
that handles the " " case.

This change was very desirable for many reasons, and as it doesnt
actually contradict the docs, unless Ricardo says otherwise I consider
this Not A Bug.

Cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Tom Christiansen

unread,
Feb 16, 2013, 8:56:57 PM2/16/13
to demerphq, perl5-...@perl.org, bugs-bi...@rt.perl.org
I've seen people try to do this:

my $delim = $if_something ? " " : qr/\s+/;
@fields = split $delim, $string;

And they are *very* surprised that it doesn't work the same as

@fields = split $if_something ? " " : /\s+/, $string;

That is really hard to explain, you know? Not easy to justify either.

Either it should clearly state that this space trick is a magic literal
literal and that cannot be in variable, or else it should be fixed so
anything that shows up as a U+0020 counts as that.

--tom

demerphq

unread,
Feb 16, 2013, 11:34:54 PM2/16/13
to Tom Christiansen, perl5-...@perl.org, bugs-bi...@rt.perl.org
Well it looks like this isnt going to be easy at all. And frankly I
don't understand why some of the changes were done. It looks like FC
in an attempt to fix a "bug" (of dubious provenance) ended up making
this kind of thing a lot harder, and in fact you could argue that the
change that FC made in 5255171e6cd0accee6f76ea2980e32b3b5b8e171
completely makes this impossible.

I personally think that commit was a mistake, and that #94490 should
have been marked as "not a bug". I regret I didnt spot it at the time.

Had 5255171e6 not been committed fixing this would have been possible.

/grrrrr

Yves

git log -SSKIPWHITE

commit cccd1425414e6518c1fc8b7bcaccfb119320c513
Author: Father Chrysostomos <spr...@cpan.org>
Date: Thu Oct 11 09:27:18 2012 -0700

Define RXf_SPLIT and RXf_SKIPWHITE as 0

They are on longer used in core, and we need room for more flags.

The only CPAN modules that use them check whether RXf_SPLIT is set
(which no longer happens) before setting RXf_SKIPWHITE (which is
ignored).

commit 5255171e6cd0accee6f76ea2980e32b3b5b8e171
Author: Father Chrysostomos <spr...@cpan.org>
Date: Sat Sep 22 17:54:12 2012 -0700

[perl #94490] const fold should not trigger special split " "

The easiest way to fix this was to move the special handling out of
the regexp engine. Now a flag is set on the split op itself for
this case. A real regexp is still created, as that is the most
convenient way to propagate locale settings, and it prevents the
need to rework pp_split to handle a null regexp.

This also means that custom regexp plugins no longer need to handle
split specially (which they all do currently).

commit 7bd1e61447493a93405e0d15fe2f8a0b6bf71de1
Author: Yves Orton <deme...@gmail.com>
Date: Thu Jun 28 22:14:14 2007 +0000

Replace pattern parsing logic with optree "parsing" logic.

p4raw-id: //depot/perl@31496

commit 0ac6acaed7c2092a5668c6b70ddeaf3003e989d8
Author: Ævar Arnfjörð Bjarmason <av...@cpan.org>
Date: Thu Jun 28 20:06:50 2007 +0000

Move the RXf_WHITE logic for split " " into the regex engine
From: "Ævar Arnfjörð Bjarmason"
<ava...@gmail.com>
Message-ID: <51dd1af80706281306i4db...@mail.gmail.com>

(with tweaks)

p4raw-id: //depot/perl@31495

Dr.Ruud

unread,
Feb 17, 2013, 7:14:00 AM2/17/13
to perl5-...@perl.org
On 2013-02-17 02:15, demerphq wrote:

> We changed which level of the perl parser handles escapes intended for
> the regex engine.
>
> Previous to this the \x20 would be resolved to a space, and as far as
> the regex engine was concerned the pattern would be " ".
>
> After this change the \x20 would be delivered to the regex engine
> verbatim and the \x20 form would not be recognized by the heuristic
> that handles the " " case.
>
> This change was very desirable for many reasons, and as it doesnt
> actually contradict the docs, unless Ricardo says otherwise I consider
> this Not A Bug.

See some split() cases below. So, #4 should behave as #7..10.

So, the PATTERN "\x20" should be compiled as /\x20/, not as " ".

--
Ruud


$ perl5.14.2 -MData::Dumper -wle '
use constant SPC => "\x20";
sub SPC0 () { "\x20" }
sub SPC1 { "\x20" }
my $SPC = "\x20";
sub p { print shift, ":\t[", join( "][", @_ ), "]" }

$_ = "\x20\x20\x20a b c\x20\x20";

p 1, split undef;
p 2, split;
p 3, split " ";
p 4, split "\x20";
p 5, split SPC;
p 6, split SPC0;
p 7, split SPC1;
p 8, split $SPC;
p 9, split / /;
p 10, split /\x20/;
p 11, split "a";
p 12, split /a/;
'
Use of uninitialized value in regexp compilation at -e line 9.
1: [ ][ ][ ][a][ ][b][ ][c][ ][ ]
2: [a][b][c]
3: [a][b][c]
4: [a][b][c]
5: [a][b][c]
6: [a][b][c]
7: [][][][a][b][c]
8: [][][][a][b][c]
9: [][][][a][b][c]
10: [][][][a][b][c]
11: [ ][ b c ]
12: [ ][ b c ]

demerphq

unread,
Feb 17, 2013, 7:43:16 AM2/17/13
to Dr.Ruud, perl5-...@perl.org
On 17 February 2013 13:14, Dr.Ruud <rvtol+...@isolution.nl> wrote:
> On 2013-02-17 02:15, demerphq wrote:
>
>> We changed which level of the perl parser handles escapes intended for
>> the regex engine.
>>
>> Previous to this the \x20 would be resolved to a space, and as far as
>> the regex engine was concerned the pattern would be " ".
>>
>> After this change the \x20 would be delivered to the regex engine
>> verbatim and the \x20 form would not be recognized by the heuristic
>> that handles the " " case.
>>
>> This change was very desirable for many reasons, and as it doesnt
>> actually contradict the docs, unless Ricardo says otherwise I consider
>> this Not A Bug.
>
>
> See some split() cases below. So, #4 should behave as #7..10.
>
> So, the PATTERN "\x20" should be compiled as /\x20/, not as " ".

So you dont agree that the original ticket is a bug?

Dr.Ruud

unread,
Feb 17, 2013, 9:30:30 AM2/17/13
to perl5-...@perl.org, demerphq, perl5-...@perl.org
Heheh, "the ticket is a bug".
I am really not sure whether the ticket is wrong or not.
So I will just live with any outcome.

- - - - - - -

If the compile-time "" operation (AKA qq: explicit string extrapolation)
is always done first, like at 'pre-processor' level, then that is
easiest to explain and document, I think.
" " always becomes ' ', and "$x\n" always becomes ($x."\n"), etc.

But then we should compile the split-PATTERN "a*" as /a[*]/.
Also because the split-PATTERN " $" currently leads to:
'Final $ should be \$ or $name at -e line 5, within string'.

So the split-PATTERN ' $' currently behaves different from " $".
Should split( q{\x20} ) then not also differ from split( qq{\x20} )?

The other way is to completely defer extrapolation in split-PATTERN
context, and add some flag to the PATTERN if "" or qq was (not) around.

So I see issues with both ways. IMO the clearest is to not allow any
literal string but a single white space, which then rejects " $" as a
bad split-PATTERN, instead of as a bad string. Sure, that will break
some code.

--
Ruud

Daniel Łukasiak

unread,
Feb 19, 2013, 6:19:37 AM2/19/13
to perlbug-...@perl.org
Hi,
it looks like split's documentation has been reworded around 5.16 and it
is now explicitly mentioning "\x20", vide:

perldoc -f split

"As another special case, "split" emulates the default behavior of the
command line tool awk when the PATTERN is either omitted or a literal
string composed of a single space character (such as ' ' or "\x20", but
not e.g. "/ /"). In this case, any leading whitespace in EXPR is
removed before splitting occurs, and the PATTERN is instead treated as
if it were "/\s+/"; in particular, this means that any contiguous
whitespace(not just a single space character) is used as a separator.
However, this special treatment can be avoided by specifying the pattern
"/ /" instead of the string " ", thereby allowing only a single space
character to be a separator."

--
Daniel Łukasiak

demerphq

unread,
Feb 19, 2013, 6:44:01 AM2/19/13
to Daniel Łukasiak, perlbug-...@perl.org
Hrm. I have a weird feeling I was involved in that change, so I guess
i have to eat my words.

It also seems to support the contention that FC's patch needs to be reverted.

cheers,

demerphq

unread,
Feb 24, 2013, 10:16:22 AM2/24/13
to perlbug-...@perl.org, perl5-...@perl.org
On 15 December 2012 20:40, James E Keenan via RT
<perlbug-...@perl.org> wrote:
> On Sat Dec 15 05:29:48 2012, sprout wrote:
>> On Fri Dec 14 19:03:21 2012, jkeenan wrote:
>> > Here is a patch for the regression test part of the solution. Please
>> > review.
>>
>> Thank you. That looks good to me.
>>
>
> Patch applied in cd346b2859236d69de687d1baa46c23e19af2202.
>
> We still need the fix for the problem.
>
> Note that some of the tests added were TODO-ed. This may need
> mentioning in pod/perldelta.pod if the underlying problem is not fixed
> before the upcoming release.

I just pushed a patch sequence to get this fixed. My working copy is
at yves/revert_skipwhite, and I am smoking it at
smoke-me/yves-revert_skipwhite.

It is mostly a revert of 5255171e6cd0accee6f76ea2980e32b3b5b8e171.

I think FC's patch had some merit BTW, just didnt work out in the game
of Jenga that is perl.

We maybe will have to reopen RT# 94490 if it is merged. It depends how
you look at things.

The root of this patch sequence is the fact that:

split 0 || " ", $thing; #1

and

my $x=0; split $x || " ", $thing; #2

behave differently. The former behaves like:

split " ", $thing; #3

and the latter behaves like

split / /, $thing; #4

The patch that caused all these problems made #1 behave the same as #2
and #4, instead of its current behavior like #3.

Now, besides from the breakage in this ticket I personally think that
#2 should behave like #3, which means that the behavior of #1 is just
fine.

Anyway, its an interesting question, and I could not find an easy way
to make split $x || " " behave like split " " when $x is false. The
"special" behavior of #3 (called RXf_SKIPWHITE) is triggered only when
the pattern can be compile time determined. So far I don't understand
the optree logic involved to figure out how to make it work like I
expect.

An unrelated follow up for this patch sequence might be to look into
RXf_SPLIT and RXf_WHITE and RXf_SKIPWHITE. We use more bits
than we need. RXf_SKIPWHITE is only relevant when RXf_SPLIT is set,
and both RXf_WHITE and RXf_SKIPWHITE are used only in pp_split().
RXf_SPLIT is set only when dealing with a split "..", $thing and not
when dealing with split /../, $thing. (IOW it is badly named, it
should be called RXf_SKIPWHITE_ALLOWED or something like that.)
Simplifying this mess, and putting some or all of these flags
somewhere else, might free up some RXf_ bits (which are scarce).

demerphq

unread,
Feb 24, 2013, 10:58:15 AM2/24/13
to Tom Christiansen, perl5-...@perl.org, bugs-bi...@rt.perl.org
On 17 February 2013 02:56, Tom Christiansen <tch...@perl.com> wrote:
> I've seen people try to do this:
>
> my $delim = $if_something ? " " : qr/\s+/;
> @fields = split $delim, $string;
>
> And they are *very* surprised that it doesn't work the same as
>
> @fields = split $if_something ? " " : /\s+/, $string;

Well I think you got the example wrong. Those two actually are the
same, but if you replaced $if_something with a constant, then it would
be different, and agreed surprising.

The rule is that if the argument to split can be resolved at compile
time to a string containing a single space then the special behavior
takes place.

so:

use constant IF_SOMETHING => 1;
@fields = split IF_SOMETHING ? " " : /\s+/, $string;

behaves the same as:

@fields = split " ", $string;

not like:

@fields = split $if_something ? " " : /\s+/, $string;

or

my $delim = $if_something ? " " : qr/\s+/;
@fields = split $delim, $string;

Which I consider to be a bug. The special behavior should cut in if
the argument to split is a string and not a regexp. But I havent
figured out how to make this work properly.

> That is really hard to explain, you know? Not easy to justify either.

Yeah, I agree. It shouldn't work like this.

> Either it should clearly state that this space trick is a magic literal
> literal and that cannot be in variable, or else it should be fixed so
> anything that shows up as a U+0020 counts as that.

We could document how it does work better for now. Id like to get it
fixed however so that the rule was that if the split argument was not
a qr// and not a bare m/.../ that it would get the magic behavior. I
havent figured out how to make that work yet tho.

IOW i believe that anything that resolves to a string eq to " " should
get the special behavior. Anything that is qr// or // quoted should
not. Regardless as to whether it is compile time.

cheers,

Karl Williamson

unread,
Feb 24, 2013, 11:32:25 AM2/24/13
to demerphq, Tom Christiansen, perl5-...@perl.org, bugs-bi...@rt.perl.org
On 02/24/2013 08:58 AM, demerphq wrote:
> IOW i believe that anything that resolves to a string eq to " " should
> get the special behavior. Anything that is qr// or // quoted should
> not. Regardless as to whether it is compile time.

That sounds reasonable to me

Tom Christiansen

unread,
Feb 24, 2013, 3:39:30 PM2/24/13
to Karl Williamson, demerphq, perl5-...@perl.org, bugs-bi...@rt.perl.org
Karl Williamson <pub...@khwilliamson.com> wrote
on Sun, 24 Feb 2013 09:32:25 MST:
I confess that that is how I'd always thought that it *did* work -- until
I started testing it.

--tom

demerphq

unread,
Feb 24, 2013, 4:37:53 PM2/24/13
to Tom Christiansen, Karl Williamson, perl5-...@perl.org, bugs-bi...@rt.perl.org
Well then you might be pleased to learn that I have finally managed to
make it work as we both seem to expect it to work.

Running a final make test now.

I swear I am going to have nightmares about the optree tonight.

yves orton via RT

unread,
Feb 24, 2013, 5:03:43 PM2/24/13
to perl5-...@perl.org
On Sun Feb 24 13:38:33 2013, demerphq wrote:
> On 24 February 2013 21:39, Tom Christiansen <tch...@perl.com> wrote:
> > Karl Williamson <pub...@khwilliamson.com> wrote
> > on Sun, 24 Feb 2013 09:32:25 MST:
> >
> >>On 02/24/2013 08:58 AM, demerphq wrote:
> >>> IOW i believe that anything that resolves to a string eq to " "
should
> >>> get the special behavior. Anything that is qr// or // quoted
should
> >>> not. Regardless as to whether it is compile time.
> >
> >> That sounds reasonable to me
> >
> > I confess that that is how I'd always thought that it *did* work --
until
> > I started testing it.
>
> Well then you might be pleased to learn that I have finally managed to
> make it work as we both seem to expect it to work.
>
> Running a final make test now.

See RT #116911 for further details on this.

demerphq

unread,
Feb 24, 2013, 6:24:49 PM2/24/13
to Ricardo Signes, Tom Christiansen, perl5-...@perl.org, bugs-bi...@rt.perl.org
On 25 February 2013 00:22, Ricardo Signes <perl...@rjbs.manxome.org> wrote:
> * demerphq <deme...@gmail.com> [2013-02-24T10:58:15]
>> IOW i believe that anything that resolves to a string eq to " " should
>> get the special behavior. Anything that is qr// or // quoted should
>> not. Regardless as to whether it is compile time.
>
> That sounds good.
>
> Thanks very much for working on this. I look forward to looking at it!

git checkout yves/revert_skipwhite

:-)

Ricardo SIGNES via RT

unread,
Feb 26, 2013, 10:00:44 PM2/26/13
to perl5-...@perl.org
On Sun Feb 24 15:25:25 2013, demerphq wrote:
> git checkout yves/revert_skipwhite

Looks good!

--
rjbs

demerphq

unread,
Feb 27, 2013, 2:21:59 AM2/27/13
to perlbug-...@perl.org, perl5-...@perl.org
On 27 February 2013 04:00, Ricardo SIGNES via RT
<perlbug-...@perl.org> wrote:
> On Sun Feb 24 15:25:25 2013, demerphq wrote:
>> git checkout yves/revert_skipwhite
>
> Looks good!

Thanks, but alas, Tony reported it hung his smokers. I have yet to get
time to investigate further.

Ricardo SIGNES via RT

unread,
Apr 3, 2013, 9:37:31 PM4/3/13
to perl5-...@perl.org
This has been fixed!

--
rjbs
0 new messages