the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash
Records to Text Files" the following code to read a hash from a file:
$/ = ""; # paragraph read mode
while (<>) {
my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field
push(@Array_of_Records, { map /(.*)/, @fields });
}
or, a bit simpler for testing:
$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
print "field is $key\n";
}
This code works as intended, but I don't understand it. Why is
$fields[0] empty after the split? I would have expected it to contain
the "a" but that is found in $fields[1].
It is because this code is using split in an inverted fashion.
Normally, split is looking for substrings separated by delimiters,
returning the substrings and discarding the delimiters. Here,
parentheses are used to capture a portion of the delimiters, and split
is returning the captured portion intermixed with the substrings.
Therefore, the first field ('a' in your example) is actually part of a
delimiter, not part of a substring, and it is the portion of the string
that precedes the first delimiter that ends up in $field[0]. Since
there is nothing before the 'a', there is nothing in $field[0].
--
Jim Gibson
> $text="a:b\nc:d\ne:f";
> my @fields = split /^([^:]+):\s*/m,$text;
> foreach $key (@fields)
> {
> print "field is $key\n";
> }
>
> This code works as intended, but I don't understand it. Why is
> $fields[0] empty after the split? I would have expected it to contain
> the "a" but that is found in $fields[1].
because there is nothing matching before the delimiter.
A simple split delivers items seperated by the delimiter without the
delimiter parts.
If you want parts from the delimiter also, you need to capture them.
The delimiter is ^([^:]+):\s* which matches a:. Everything
before the : is captured, so you get
undef, a, b, c, d, e, f
My understanding is that, since the separator is not the single blank " ",
and the string starts with a delimiter, there's always a null field "before"
the first delimiter, and split by default does not remove leading empty
fields (this is similar to what happens with awk when the separator is not
the default blank).
If you split on /:/, you should get no empty leading fields (though there
may be other reasons to prefer the original method).
--
D.
> The delimiter is ^([^:]+):\s* which matches a:. Everything
> before the : is captured, so you get
> undef, a, b, c, d, e, f
I think the first field is the empty string, rather than undef (but I may be
wrong).
--
D.
No, you are correct.
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
> my @fields = split /^([^:]+):\s*/m;
> shift @fields; # for leading null field
Others have answered your real question, but I'll point out
that those 2 lines can be replaced with:
my(undef, @fields) = split /^([^:]+):\s*/m; # ignore leading null field
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
>> my @fields = split /^([^:]+):\s*/m;
>> shift @fields; # for leading null field
>
> Others have answered your real question, but I'll point out
> that those 2 lines can be replaced with:
>
> my(undef, @fields) = split /^([^:]+):\s*/m; # ignore leading null
> field
The "[^:]+" can match a newline, so a "^([^:]+):" in a multiline context
is probably better written as "^(.+?):".
$ echo -ne "A: aaa\nB: bbb\n bbb\nC: ccc\n" |
perl -le'
undef $/;
$_ = <>;
print;
print "<$_>" for split /^([^:]+):\s*/m
'
A: aaa
B: bbb
bbb
C: ccc
<>
<A>
<aaa
>
<B>
<bbb
>
< bbb
C>
<ccc
>
$ echo -ne "A: aaa\nB: bbb\n bbb\nC: ccc\n" |
perl -le'
undef $/;
$_ = <>;
print;
print "<$_>" for split /^(.+?):\s*/m
'
A: aaa
B: bbb
bbb
C: ccc
<>
<A>
<aaa
>
<B>
<bbb
bbb
>
<C>
<ccc
>
A further variant, that captures a header field value that is white
space only:
$ echo -ne "A: aaa\nB: bbb\n bbb\nC:\n \nD: ddd\n" |
perl -le'
undef $/;
$_ = <>;
print;
print "<$_>" for split /^(.+?):[^\n\S]*/m
'
A: aaa
B: bbb
bbb
C:
D: ddd
<>
<A>
<aaa
>
<B>
<bbb
bbb
>
<C>
<
>
<D>
<ddd
>
--
Affijn, Ruud
"Gewoon is een tijger."