The results of running the following script gawk 'BEGIN { n = split("(a,b,c,d)", a, /[(,)]/); printf("n=%d\n", n); for(i=1; i<=n; i++) printf(" -%s-\n",a[i]);
I see But if FS is space, awk will skip empty fields. Why does awk work in different ways for cases where FS is space and where FS is regular expression?
> I see > But if FS is space, awk will skip empty fields. Why does awk work in > different ways for cases where FS is space and where FS is regular > expression?
Because space is explicitly defined to be a special case.
Compare:
$ echo ' a b ' | awk '{print NF}' 2 $ echo ',,a,,b,,' | awk -F, '{print NF}' 7
From the standard:
The following describes FS behavior:
1. If FS is a null string, the behavior is unspecified.
2. If FS is a single character:
1. If FS is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or more <blank>s.
2. Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c.
3. Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields.
On 8 Mai, 13:13, PRC <panruoc...@gmail.com> wrote:
[Please don't top-post.]
> I see > But if FS is space, awk will skip empty fields. Why does awk work in > different ways for cases where FS is space and where FS is regular > expression?
In your example you didn't use FS. Generally there are some special cases implemented with the semantics of FS/RS and spaces or null strings. I suppose to get the best benefits from a concise awk interface and powerful features. Besides that your program behaves exactly the same way if you specify a space as regexp...
n = split(" a b c d ", a, / /);
In your application, since you know the data and delimiters, just change your loop
>>I see >>But if FS is space, awk will skip empty fields. Why does awk work in >>different ways for cases where FS is space and where FS is regular >>expression?
> Because space is explicitly defined to be a special case.
> 1. If FS is a null string, the behavior is unspecified.
> 2. If FS is a single character:
> 1. If FS is <space>, skip leading and trailing <blank>s; fields > shall be delimited by sets of one or more <blank>s.
> 2. Otherwise, if FS is any other character c, fields shall be > delimited by each single occurrence of c.
> 3. Otherwise, the string value of FS shall be considered to be an > extended regular expression. Each occurrence of a sequence matching > the extended regular expression shall delimit fields.
> And splitting in split() works the same way.
and if you want to literally use a single blank character as the field separator, specify it as '[ ]':
$ echo ' a b ' | awk '{print NF}' 2 $ echo ' a b ' | awk -F'[ ]' '{print NF} 7
and if you want to use repetitions of a given character (or RE), specify it as '<pattern>+':