One problem I had encountered on several occasions, is where I have to
print the last field, or certain field but counted from the right,
as opposed to from the left. So f.ex. if I have this file:
this_line_consist_of_many_words
short_line2
this_line_has_even_more_words_separated_by_underscore
...
using awk and underscore as delimiter I can easily print out any words
using $1, $2, $3, etc.
$ awk 'BEGIN{OFS=FS="_"}{print $1" "$2" "$3}' temp.txt
But how can i print the last word (field), or 2nd from the right,
if number of fields varies for each line?
I'm on sun solaris 7. I heard about ARGV and ARGC, but they don't return
anything (unless I'm not using them right).
Hint:
awk '{print $NF, $NF-2}'
--
William Park, Open Geometry Consulting, <openge...@yahoo.ca>
Linux solution for data management and processing.
Actually:
awk '{ print $NF, $(NF-2) }'
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 530 688 5518
Nof Ayalon Cell Phone: +972 51 297-545
D.N. Shimshon 99785 ISRAEL
Btw.. any "elegant" (lean) solution to print all fields
expcept first 2 or such?
Besides taking $NF and for-looping from '2' to '$NF'?
Just curious ;)
ciao
--
"Unix was the first OS where you could carry the media and system
documentation around in a briefcase. This was fixed in BSD4.2."
Not without patching the source... :-)
> * Aharon (arn...@skeeve.com) wrote:
>
>> awk '{ print $NF, $(NF-2) }'
>
>
> Btw.. any "elegant" (lean) solution to print all fields
> expcept first 2 or such?
> Besides taking $NF and for-looping from '2' to '$NF'?
awk '{$1="";$2=""; print $0}'
works for me ;)
--
Walter
That's ok for some purposes, but not necessarily all. After assigning
to one of the fields, $0 is re-constructed, which may not exactly
preserve whitespace. On my system, gawk 3.1.1 and mawk 1.3.3 both
print leading space with the above. In addition, on a tab separated
file, the remaining tabs are converted to space characters. Whether
or not the above behavior is acceptable obviously depends on what the
application is.
Mike
--
Michael Zawrotny
Institute of Molecular Biophysics
Florida State University | email: zawr...@sb.fsu.edu
Tallahassee, FL 32306-4380 | phone: (850) 644-0069
this has sideeffects in printing and sometimes reassignment..
How about:
#!gawk
{
for (i=0; i<2; i++) # Adjust to set the number of fields to delete
sub(/^[ \t]*[^ \t]*[ \t]*/,"")
print
}
There's probably also a way to do it with tricky application of the match
function, using features found in advanced versions of AWK (e.g., GAWK
& TAWK), where you would end up with an integer which points to the 3rd
field in $0, without modifying $0.
And, FWIW, you can probably do it in standard AWK with multiple invocations
of the match() function, but the advanced versions would allow you to do it
in 1.
It depends on what you mean by elegant. You can create a pattern
which matches the first 2 (or n) fields plus the field separators,
use match() to get the offset, then use substr to print the rest.
For the default field separator
match(/^[[:space:]]*[^[:space:]]+[[:space:]]+[^[:space:]]+[[:space:]]+/, $0)
print substr($0, RSTART+RLENGTH)
you can construct the RE with a loop once you have a pattern for the field
and a pattern for the field separator, but you don't seem to like loops...
The advantage of this is that it preserves the field separators, which
can't be done in general using looping solutions.
--
Patrick TJ McPhee
East York Canada
pt...@interlog.com
^^^^^^^^^^^^
> #!gawk
ugh :)
> for (i=0; i<2; i++) # Adjust to set the number of fields to delete
well, see above. Obviously there's no "shortcut" w/o constraints.
Case closed on my side :)
oh.. that's clever
> The advantage of this is that it preserves the field separators, which
> can't be done in general using looping solutions.
yeah, point being here. Further processing could rely on FS. hmm :)
Thanks
Lisa
> As I'm reading a config file and after I get the first 4 fields, the 5th
> field is a comment (which contains spaces) so I want $5..$(NF) to return
> to the calling script.
> So far I see $1="";$2=""...etc... any other solution?
No other solution. But you can write $1=$2="".
AWK (like C) is an ALGOLoid language.
Note that the $1=$2=...="" approach is equivalent to the for (i=5; ...)
approach - both involve looping at one end or the other.
The best way is to somehow find out the char position where you want to
start (i.e., where the chosen field is located) and then do substr(str,pos)
The details of how you get that starting position are, of course,
problem-specific.
print substr($0, match( $0, $5 ));
if the field is empty I'm hosed - actuall if any field is empty in my
config file, i'm hosed! Anyway, this seems to work for now.
Caio-caio,
Lisa
If you use gawk this untested snippet should work (run gawk with
--re-interval flag):
s = $0
Field5on = ""
if (sub(/([^[:space:]]+[[:space:]]+){4}/, "", s) Field5on = s;
You could do this without re-interval, by rewriting the regular
expression. Setting OFS and FS effects this.
Anything you do with NF causes $0 to be recomputed, so you lose spacing
information. But, I guess you could do something like this, if that did
not matter (untested):
$4 = $4 # Force recomputation of $0
s = $0
NF = 4 # Truncate $0
Field5on = substr(s, length($0)+1)
--
Huey Long was once asked if he thought America would ever
become fascist. He responded, "Of course it will, but
we'll call it anti-fascism."
That's close but it'd require the record to start with non-space
characters. I'd use this with a POSIX awk:
awk 'sub(/^[[:space:]]*([^[:space:]]*[[:space:]]*){4}/,"")'
With gawk:
gawk --re-interval '...'
or:
gawk --posix '...'
Note that "gensub()" is not available with "--posix" but it is available
with "--re-interval" so if you need to use an interval expression (e.g.
{1,} or {8} or {2,4}) with gensub() then you must use --re-interval
rather than --posix so --re-interval is generally the preferred method.
Regards,
Ed.
% Think I got it... I was reading a config file in which the 5th field is
% the last field and can contain spaces (a comment field) - so after
% reading everywhere, this above substr gave me the answer:
%
% print substr($0, match( $0, $5 ));
The problem with this is that it fails if $5 contains characters which
are special in regular expressions, or if the contents of $5 appear
earlier in the line.
What will work is something like
BEGIN {
# the default FS ignores leading whitespace
firstfour = "^[ \t]*"
# now add in a RE that matches non-whitespace followed by whitespace,
# once for each field you want to skip
for (i = 1; i <= 4; i++)
firstfour = firstfour "[^ \t]+[ \t]+"
}
# the fifth field starts just after the string that matches firstfour
{
match($0, firstfour)
print substr($0, RSTART+RLENGTH)
}
--
Patrick TJ McPhee
North York Canada
pt...@interlog.com
If your fifth field began with a pound sign (#), you could probably
do something simple like comment=$0; sub (/.*#/, "", comment).
That would probably require reformatting your config files, but that
should be a one-time operation. The benefit is that if you needed
to add a fifth field that is not part of the comment, just put it in
front of the pound sign. Also, to keep the non-comment fields, use
noncomment=$0; sub (/#.*/, "", noncomment);
Bill Seivert
ITYM:
sub (/[^#]*#/, "", comment)
to handle "#"s within comments, e.g. the incredibly useful:
i = 1 # set i to #1
> That would probably require reformatting your config files, but that
> should be a one-time operation. The benefit is that if you needed
> to add a fifth field that is not part of the comment, just put it in
> front of the pound sign. Also, to keep the non-comment fields, use
> noncomment=$0; sub (/#.*/, "", noncomment);
or something like:
sub(comment"$","",noncomment)
just in case you want to change the way you identify comments in future...
Ed.
Ed Morton wrote:
> Bill Seivert wrote:
>
snip
>>> wow, last post here and I'm looking for a way to grab "the rest of
>>> the fields" without for-looping (if its possible)
>>> As I'm reading a config file and after I get the first 4 fields, the
>>> 5th field is a comment (which contains spaces) so I want $5..$(NF) to
>>> return to the calling script.
>>> So far I see $1="";$2=""...etc... any other solution?
>>> Nothing in Sed&Awk book or the awk FAQ
>>>
>>> Thanks
>>> Lisa
>>
>>
>>
>> If your fifth field began with a pound sign (#), you could probably
>> do something simple like comment=$0; sub (/.*#/, "", comment).
>
>
> ITYM:
>
> sub (/[^#]*#/, "", comment)
ITYM:
sub (/^[^#]*#/, "", comment)
Thanks, Ed, sometimes I forget to anchor my REs.
Bill Seivert
It'll match the same strings either way, i.e. any sequence of zero or
more non-# characters followed by a #.
> Thanks, Ed, sometimes I forget to anchor my REs.
It doesn't hurt, but you don't actually need to anchor it to the start
of the string in this case.
Ed.
But, Ed, in your previous post you mentioned
"to handle "#"s within comments, e.g. the incredibly useful:
i = 1 # set i to #1
"
which your unanchored RE would remove " set i to #", leaving comment as
" i = 1 #1", I think.
My anchored version should leave comment as " set i to #1".
Though I haven't tried either.
Bill Seivert
's/\(\..\).\(.\)$/\1e\2/'
They both would since the string is analysed left to right so both REs
match from the first non-# character (i.e. the first character at the
start of the string) to the first "#". Look:
$ echo "i = 1 # set i to #1" | awk 'sub (/[^#]*#/,"")'
set i to #1
$ echo "i = 1 # set i to #1" | awk 'sub (/^[^#]*#/,"")'
set i to #1
Maybe the confusion's because within the square brackets the "^" is the
negation symbol whereas outside it's the "start of string" symbol so
maybe you thought I was anchoring the # to something instead of negating it?
Regards,
Ed.