Well, best I can tell that handling of null strings that match the
regexp is the only difference between the 3rd arg for patsplit() and the
3rd arg for split() other than the cases where split() is using either
of the special-case FSs of "" or " ".
So the key is that split() takes an FS for the 3rd arg while patsplit()
takes a regexp and while a FS is regexp-like, it has 3 special cases
that make it different from regexps:
1) FS = "" -> undefined by POSIX, some awks split into chars.
2) FS = " " -> leading/trailing spaces ignored, split on contiguous spaces.
3) FS = a regexp that can match a null string -> treat it like a regexp
that cannot match a null string (e.g. `,*` gets treated like `,+`).
While that 3rd point makes sense I couldn't actually find anything
documenting the fact that a field separator isn't allowed to match a
null string (except in the case of FS="" in some awks). POSIX says:
---------
The following describes FS behavior:
If FS is a null string, the behavior is unspecified.
If FS is a single character:
If FS is <space>, skip leading and trailing <blank> and
<newline> characters; fields shall be delimited by sets of one or more
<blank> or <newline> characters.
Otherwise, if FS is any other character c, fields shall be
delimited by each single occurrence of c.
Otherwise, the string value of FS shall be considered to be an
extended regular expression. Each occurrence of a sequence matching the
extended regular expression shall delimit fields.
---------
so in the case of `-F'[^,]*', for example, that falls into the final
case above. It should really say "...a sequence _of 1 or more
characters_ matching..." I suppose.
That difference makes it non-trivial to implement patsplit() using
existing functionality (i.e. split() with the args swapped). Thanks to
all who replied.
Ed.