Google Groups Home
Help | Sign in
How awk `split' works?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
PRC  
View profile
 More options May 8, 6:49 am
Newsgroups: comp.unix.shell
From: PRC <panruoc...@gmail.com>
Date: Thu, 8 May 2008 03:49:28 -0700 (PDT)
Local: Thurs, May 8 2008 6:49 am
Subject: How awk `split' works?
The results of running the following script
gawk 'BEGIN {
    n = split("(a,b,c,d)", a, /[(,)]/);
    printf("n=%d\n", n);
    for(i=1; i<=n; i++)
        printf(" -%s-\n",a[i]);
}'

is
n=6
 --
 -a-
 -b-
 -c-
 -d-
 --

instead of
n=4
 -a-
 -b-
 -c-
 -d-
which is expected.

I have no ideas how these results come out.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave B  
View profile
 More options May 8, 6:53 am
Newsgroups: comp.unix.shell
From: Dave B <da...@addr.invalid>
Date: Thu, 08 May 2008 12:53:58 +0200
Local: Thurs, May 8 2008 6:53 am
Subject: Re: How awk `split' works?
On Thursday 8 May 2008 12:49, PRC wrote:

You are telling awk to use either '(', ',' or ')' as field separator for
splitting.

Given your string '(a,b,c,d)' awk sees six fields:

- one empty field before the '('
- a
- b
- c
- d
- one empty field after the ')'

Try this:

$ echo '(a,b,c,d)' | awk -F '[(,)]' '{print NF}'
6

--
D.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
PRC  
View profile
 More options May 8, 7:13 am
Newsgroups: comp.unix.shell
From: PRC <panruoc...@gmail.com>
Date: Thu, 8 May 2008 04:13:38 -0700 (PDT)
Local: Thurs, May 8 2008 7:13 am
Subject: Re: How awk `split' works?
I see
But if FS is space, awk will skip empty fields. Why does awk work in
different ways for cases where FS is space and where FS is regular
expression?


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave B  
View profile
 More options May 8, 7:21 am
Newsgroups: comp.unix.shell
From: Dave B <da...@addr.invalid>
Date: Thu, 08 May 2008 13:21:17 +0200
Local: Thurs, May 8 2008 7:21 am
Subject: Re: How awk `split' works?
On Thursday 8 May 2008 13:13, PRC wrote:

> I see
> But if FS is space, awk will skip empty fields. Why does awk work in
> different ways for cases where FS is space and where FS is regular
> expression?

Because space is explicitly defined to be a special case.

Compare:

$ echo '  a  b  ' | awk '{print NF}'
2
$ echo ',,a,,b,,' | awk -F, '{print NF}'
7

From the standard:

The following describes FS behavior:

   1. If FS is a null string, the behavior is unspecified.

   2. If FS is a single character:

         1. If FS is <space>, skip leading and trailing <blank>s; fields
            shall be delimited by sets of one or more <blank>s.

         2. Otherwise, if FS is any other character c, fields shall be
            delimited by each single occurrence of c.

   3. Otherwise, the string value of FS shall be considered to be an
      extended regular expression. Each occurrence of a sequence matching
      the extended regular expression shall delimit fields.

And splitting in split() works the same way.

--
D.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Janis  
View profile
 More options May 8, 7:55 am
Newsgroups: comp.unix.shell
From: Janis <janis_papanag...@hotmail.com>
Date: Thu, 8 May 2008 04:55:03 -0700 (PDT)
Local: Thurs, May 8 2008 7:55 am
Subject: Re: How awk `split' works?
On 8 Mai, 13:13, PRC <panruoc...@gmail.com> wrote:

[Please don't top-post.]

> I see
> But if FS is space, awk will skip empty fields. Why does awk work in
> different ways for cases where FS is space and where FS is regular
> expression?

In your example you didn't use FS. Generally there are some
special cases implemented with the semantics of FS/RS and
spaces or null strings. I suppose to get the best benefits
from a concise awk interface and powerful features. Besides
that your program behaves exactly the same way if you specify
a space as regexp...

    n = split(" a b c d ", a, / /);

In your application, since you know the data and delimiters,
just change your loop

    for(i=2; i<n; i++)

Janis


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed Morton  
View profile
 More options May 8, 7:56 am
Newsgroups: comp.unix.shell
From: Ed Morton <mor...@lsupcaemnt.com>
Date: Thu, 08 May 2008 06:56:43 -0500
Local: Thurs, May 8 2008 7:56 am
Subject: Re: How awk `split' works?
On 5/8/2008 6:21 AM, Dave B wrote:

and if you want to literally use a single blank character as the field
separator, specify it as '[ ]':

$ echo '  a  b  ' | awk '{print NF}'
2
$ echo '  a  b  ' | awk -F'[ ]' '{print NF}
7

and if you want to use repetitions of a given character (or RE), specify it as
'<pattern>+':

$ echo ',,a,,b,,' | awk -F, '{print NF}'
7
$ echo ',,a,,b,,' | awk -F',+' '{print NF}'
4

and if you want it treated the same as the default FS, you need to strip away
any leading and trailing occurences of the FS:

$ echo ',,a,,b,,' | awk -F',+' '{gsub("^"FS"|"FS"$","");print NF}'
2

So, the default FS behavior is a shorthand that lets us write:

$ echo '  a  b  ' | awk '{print NF}'
2

instead of:

$ echo '  a  b  ' | awk -F'[[:blank:]]+' '{gsub("^"FS"|"FS"$","");print NF}'
2

        Ed.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google