Can anyone here answer any/all of the following questions please ?
(1) Is there any way to find out what field separators are "okay" and
which aren't ? A list, or web page, perhaps ?
(2) Can one have multiple field separators for the one input file ?
For example, could one have a single space AND eg. ! So that eg.
Fieldone Fieldtwo!Fieldthree would be read as three fields ?
(3) If the answer for (2) is "yes" then what would the syntax be for
setting the FS be please ?
Regards, John.
Any Extended Regular Expression (ERE) is OK.
> (2) Can one have multiple field separators for the one input file ?
> For example, could one have a single space AND eg. ! So that eg.
> Fieldone Fieldtwo!Fieldthree would be read as three fields ?
Yes.
>
> (3) If the answer for (2) is "yes" then what would the syntax be for
> setting the FS be please ?
FS="[ !]" or FS="( |!)" would work.
Ed.
And, just in case, the OP after saying "this works":
$ echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]'
he'd add the questiob "but why doesn't this?":
$ echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]'
then an answer could be ;-):
$ echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]*'
--
just my 2 billion car industry bonds.
FS = "[! ]"
or
FS = " |!"
And try this one:
FS = "[^a-zA-Z]"
Why not this...?
echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]+'
I am slightly surprised that * gives the same result (with my gawk) as +.
Janis
Good question :D)
maybe because the 'real' regexp meaning of * would be absurd in our case?
"a field separator of none to whatever char in the previous set" sounds
weird to my eyes (sic) ;-)
> I am slightly surprised that * gives the same result (with my gawk) as
> +.
Maybe ask the author, I'd say he decided to make * equivalent to the +
just to ease up the syntax? "Just supposing"...
> Why not this...?
>
> echo 'Fieldone Fieldtwo!Fieldthree' | awk '{print NF,$2}' FS='[ !]+'
>
> I am slightly surprised that * gives the same result (with my gawk) as +.
it's because FS is not allowed to match the empty string. This is sparsely
documented in the GAWK source, and virtually not documented anywhere else.
Doesn't FS='' separate each character (convert file to a byte stream)?
Grant.
--
http://bugsplatter.id.au
Strictly speaking, per POSIX using FS='' would be undefined behavior. GNU
awk and some other awks allow it as a special case meaning "each character
is a field".
Janis' case was slightly different, in that FS was not an explicit empty RE
but rather a RE that would allow matching an empty string, eg "a*".
(I think this was also discussed in the past here)
Here are some examples:
$ awk 'BEGIN{s="foobar";gsub(/a*/,"X",s);print s}'
XfXoXoXbXrX
but:
$ echo 'foobar' | awk -F 'a*' '{print NF; print "--"$1"--"$2"--"}'
2
--foob--r--