Why split() drops trailing null fields?

15 views
Skip to first unread message

E. Tye McQueen

unread,
Jul 2, 1992, 1:18:28 PM7/2/92
to
I was just debugging a small problem with some perl scripts when I ran
across this in the man page under split():

If LIMIT is unspecified, trailing null fields are stripped
(which potential users of pop() would do well to remember).

I was curious what the reason for this was? It never even occurred to
me that split() might do this or what use it might serve (though I'm
sure I'd read that sentence at least once a long time ago). My work-
around is to specify a limit of 9999. Thinking about it now, I guess
split() might run faster because it doesn't have to assign to the array
unless a non-null value is found (causing the middle null fields to get
put in when the next non-null field is found). But that guess is mostly
wild. Maybe it is for emulating the behavior of awk or something?

Still, I hate putting silly constants like 9999 in. I wouldn't be
suprised if split() pre-extends the array to the specified size. Would
it make more sense to allow a limit of 0 to disable this stripping of
trailing null fields? Perhaps a limit of -N could pre-extend the array
to N entries but allow it to grow past that in chunks of N or larger?
This would also make this feature more heavilly documented.

Any enlightenment would be greatly appreciated.

t...@spillman.com Tye McQueen, E.
-------------------------------------------------------------
What do you mean that table has more than 9999 fields in it??
-------------------------------------------------------------
--
t...@spillman.com Tye McQueen, E.
----------------------------------------------------------
Nothing is obvious unless you are overlooking something.
----------------------------------------------------------

Larry Wall

unread,
Jul 3, 1992, 3:18:25 PM7/3/92
to
In article <1992Jul02.1...@spillman.uucp> t...@spillman.uucp (E. Tye McQueen) writes:
: I was just debugging a small problem with some perl scripts when I ran

: across this in the man page under split():
:
: If LIMIT is unspecified, trailing null fields are stripped
: (which potential users of pop() would do well to remember).
:
: I was curious what the reason for this was? It never even occurred to
: me that split() might do this or what use it might serve (though I'm
: sure I'd read that sentence at least once a long time ago). My work-
: around is to specify a limit of 9999. Thinking about it now, I guess
: split() might run faster because it doesn't have to assign to the array
: unless a non-null value is found (causing the middle null fields to get
: put in when the next non-null field is found). But that guess is mostly
: wild. Maybe it is for emulating the behavior of awk or something?

No, it's mainly because people would be surprised when they say

$_ = <STDIN>;
@foo = split;

and discover there's an extra null field in @foo from after the final newline.
This is exacerbated when you split on a single space, and you're processing
records you dd'ed over from some dinosaur that contain trailing spaces before
the newline. (In actual fact, it's not more efficient. It has to do the whole
split and then back off on the null fields. After all, it's only trailing
null fields that can be discarded...)

Likewise, people would be surprised to find a null line at the end if they say

foreach $line (split(/\n/, `ps`)) { ...

It's not just trailing newline, but any delimiter that's used as a
terminator rather than a separator. You also get the situation where
the syntax of something or other requires N fields, but only the first
few of them are used. 80-column cards are just a variant of this.

: Still, I hate putting silly constants like 9999 in. I wouldn't be


: suprised if split() pre-extends the array to the specified size. Would
: it make more sense to allow a limit of 0 to disable this stripping of
: trailing null fields? Perhaps a limit of -N could pre-extend the array
: to N entries but allow it to grow past that in chunks of N or larger?
: This would also make this feature more heavilly documented.

It doesn't have anything to do with pre-extending either. The limit is
just that--a maximum, not a minimum. It would be silly to pre-extend
to the specified size, for the simple reason that people sometimes DO
put 9999999 in. And I don't like doing silly things (except on purpose).

: Any enlightenment would be greatly appreciated.

In actual fact, it doesn't normally matter whether split strips the
trailing null fields or not, unless you're interested in exactly how
many values were returned. If you say

($a,$b,$c) = split(/\s/, "a b \n");

then $c is going to evaluate to the null string either way. Note that
no place information of real data is ever lost--if you say

split(/,/, "a,b,c,,,,,,,,,,,d,,,,")

you still know how far over the d was. In general, when you DO care how
many fields there were, you don't want the trailing null fields anyway.
If it happens that you DO want to know how many trailing null fields there
were, then and only then would you actually need to specify the limit as
some form of infinity.

But the situation doesn't arise often in practice.

And I don't think that 9999 is so silly. Most programmers (well, except
for maybe Cobol programmers :-) recognize /99+/ as a form of infinity.

(Admittedly, it'd be nice to have a symbolic representation for infinity.
It's a bit hard to get at DBL_MAX on some C implementations though...)

Larry

Erik E. Rantapaa

unread,
Jul 3, 1992, 4:41:38 PM7/3/92
to
t...@spillman.uucp (E. Tye McQueen) writes:

>I was just debugging a small problem with some perl scripts when I ran
>across this in the man page under split():

> If LIMIT is unspecified, trailing null fields are stripped
> (which potential users of pop() would do well to remember).

>... My work-around is to specify a limit of 9999. ...

>Still, I hate putting silly constants like 9999 in.

No need to use silly constants. Just say

@result = split(/,/, $string.",dummyfield");
pop(@result);

Note that you have to do something different if you are splitting
on, say \s+:

$addnull = (substr($string,-1) =~ /\s/); # better than =~ /\s$/
@result = split(/\s+/, $string." dummyfield");
pop(@result);
push(@result,'') if $addnull;

Now wouldn't it be nice to say

pop(@result = split(/re/, $string));

or

@result = pop(split(/re/, $string));

Reply all
Reply to author
Forward
0 new messages