Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Field separator characters ?

13 views
Skip to first unread message

John Fitzsimons

unread,
Nov 11, 2009, 12:29:26 AM11/11/09
to

Windows gawk newbie.

Can anyone here answer any/all of the following questions please ?

(1) Is there any way to find out what field separators are "okay" and
which aren't ? A list, or web page, perhaps ?

(2) Can one have multiple field separators for the one input file ?
For example, could one have a single space AND eg. ! So that eg.
Fieldone Fieldtwo!Fieldthree would be read as three fields ?

(3) If the answer for (2) is "yes" then what would the syntax be for
setting the FS be please ?


Regards, John.

Ed Morton

unread,
Nov 11, 2009, 12:37:44 AM11/11/09
to
John Fitzsimons wrote:
> Windows gawk newbie.
>
> Can anyone here answer any/all of the following questions please ?
>
> (1) Is there any way to find out what field separators are "okay" and
> which aren't ? A list, or web page, perhaps ?

Any Extended Regular Expression (ERE) is OK.

> (2) Can one have multiple field separators for the one input file ?
> For example, could one have a single space AND eg. ! So that eg.
> Fieldone Fieldtwo!Fieldthree would be read as three fields ?

Yes.


>
> (3) If the answer for (2) is "yes" then what would the syntax be for
> setting the FS be please ?

FS="[ !]" or FS="( |!)" would work.

Ed.

Loki Harfagr

unread,
Nov 11, 2009, 5:00:39 AM11/11/09
to
Tue, 10 Nov 2009 23:37:44 -0600, Ed Morton did cat :

And, just in case, the OP after saying "this works":
$ echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]'

he'd add the questiob "but why doesn't this?":
$ echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]'

then an answer could be ;-):
$ echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]*'

--
just my 2 billion car industry bonds.

w_a_x_man

unread,
Nov 11, 2009, 10:18:30 AM11/11/09
to
On Nov 10, 11:29 pm, John Fitzsimons <DELETEucwubq...@sneakemail.com>
wrote:

FS = "[! ]"
or
FS = " |!"

And try this one:
FS = "[^a-zA-Z]"

Janis Papanagnou

unread,
Nov 11, 2009, 8:18:54 PM11/11/09
to

Why not this...?

echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]+'

I am slightly surprised that * gives the same result (with my gawk) as +.

Janis

Loki Harfagr

unread,
Nov 12, 2009, 2:10:47 AM11/12/09
to
Thu, 12 Nov 2009 02:18:54 +0100, Janis Papanagnou did cat :

Good question :D)
maybe because the 'real' regexp meaning of * would be absurd in our case?
"a field separator of none to whatever char in the previous set" sounds
weird to my eyes (sic) ;-)

> I am slightly surprised that * gives the same result (with my gawk) as
> +.

Maybe ask the author, I'd say he decided to make * equivalent to the +
just to ease up the syntax? "Just supposing"...

pk

unread,
Nov 12, 2009, 4:34:59 AM11/12/09
to
Janis Papanagnou wrote:

> Why not this...?
>
> echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]+'
>
> I am slightly surprised that * gives the same result (with my gawk) as +.

it's because FS is not allowed to match the empty string. This is sparsely
documented in the GAWK source, and virtually not documented anywhere else.

Grant

unread,
Nov 12, 2009, 4:43:12 AM11/12/09
to

Doesn't FS='' separate each character (convert file to a byte stream)?

Grant.
--
http://bugsplatter.id.au

pk

unread,
Nov 12, 2009, 5:01:42 AM11/12/09
to
Grant wrote:

Strictly speaking, per POSIX using FS='' would be undefined behavior. GNU
awk and some other awks allow it as a special case meaning "each character
is a field".

Janis' case was slightly different, in that FS was not an explicit empty RE
but rather a RE that would allow matching an empty string, eg "a*".

(I think this was also discussed in the past here)

Here are some examples:

$ awk 'BEGIN{s="foobar";gsub(/a*/,"X",s);print s}'
XfXoXoXbXrX

but:

$ echo 'foobar' | awk -F 'a*' '{print NF; print "--"$1"--"$2"--"}'
2
--foob--r--

0 new messages