Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Extracting part of a record ?

1 view
Skip to first unread message

John Fitzsimons

unread,
Nov 7, 2009, 10:08:36 PM11/7/09
to

Windows Gawk newbie here.

Suppose I start with the following on one line.....

Path:
news.sunsite.dk!dotsrc.org!news.net.uni-c.dk!aotearoa.belnet.be!news.belnet.be!newsfeed.kpn.net!pfeed09.wxs.nl!feeder.eternal-september.org!eternal-september.org!not-for-mail

The data content AND width change AND there isn't the normal space
delimiter one often has with a line of data.

(1) How would I extract the second bit of data ? In this case
"dotsrc.org"

(2) How would I extract the last bit of data ? In this case
eternal-september.org OR maybe eternal-september.org!not-for-mail

I think "not-for-mail" is at the end of all news header "paths" but I
am not sure of this. In any case I want the "eternal-september.org"
part whether the "not-for-mail" follows it, or not.

Can anyone here help with either/both queries please ?


Regards, John.

Bob Harris

unread,
Nov 7, 2009, 10:37:57 PM11/7/09
to
In article <qkccf5pqt3ai6slb3...@4ax.com>,
John Fitzsimons <DELETEu...@sneakemail.com> wrote:

awk -f'!' { print $2 }

Grant

unread,
Nov 7, 2009, 11:06:58 PM11/7/09
to

awk -f'!' { print $2; print $NF == "not-for-mail" ? $(NF-1) : $NF }

untested

Grant.
--
http://bugsplatter.id.au

pk

unread,
Nov 8, 2009, 6:55:43 AM11/8/09
to
Grant wrote:

awk -F'!' '{print $2; print $NF == "not-for-mail" ? $(NF-1) : $NF }'

Ed Morton

unread,
Nov 8, 2009, 7:28:24 AM11/8/09
to

awk -F'!' '{print $2; print ($NF == "not-for-mail" ? $(NF-1) : "") $NF}'

Ed.

John Fitzsimons

unread,
Nov 8, 2009, 10:20:53 PM11/8/09
to
On Sat, 07 Nov 2009 22:37:57 -0500, Bob Harris
<nospam....@remove.Smith-Harris.us> wrote:

>In article <qkccf5pqt3ai6slb3...@4ax.com>,
> John Fitzsimons <DELETEu...@sneakemail.com> wrote:

>> Windows Gawk newbie here.

< snip >

>> Path:
>> news.sunsite.dk!dotsrc.org!news.net.uni-c.dk!aotearoa.belnet.be!news.belnet.be
>> !newsfeed.kpn.net!pfeed09.wxs.nl!feeder.eternal-september.org!eternal-septembe
>> r.org!not-for-mail

< snip >

>> (1) How would I extract the second bit of data ? In this case
>> "dotsrc.org"

>> (2) How would I extract the last bit of data ? In this case
>> eternal-september.org OR maybe eternal-september.org!not-for-mail

< snip >

>awk -f'!' { print $2 }

Thanks, but I cannot get any of the solutions to work. I am obviously
doing something wrong and/or it isn't a simple "cut and paste"
solution. I get...


gawk.exe -f mon.awk path.txt >suntest.txt

gawk: mon.awk:1: awk -f'!' { print $2 }
gawk: mon.awk:1: ^ invalid char ''' in expression


Regards, John.

Kenny McCormack

unread,
Nov 9, 2009, 6:59:13 AM11/9/09
to
In article <go1ff5l1dku3sgi56...@4ax.com>,
John Fitzsimons <DELETEu...@sneakemail.com> wrote:
...

>Thanks, but I cannot get any of the solutions to work. I am obviously
>doing something wrong and/or it isn't a simple "cut and paste"
>solution. I get...
>
>
>gawk.exe -f mon.awk path.txt >suntest.txt
>
>gawk: mon.awk:1: awk -f'!' { print $2 }
>gawk: mon.awk:1: ^ invalid char ''' in expression

First of all, I don't think anybody really got what you were really
trying to do - so the responses have been of the "I don't know what you
are really trying to do, but here try this idea that just popped into my
head - it may get you started [towards wherever it is you think you are
trying to get to]" variety. (*)

Second, the example above assumes use of a Unix shell (where the 's work
correctly). You seem to be using Windows. The best advice for Windows AWK
users (except when using TAWK where, naturally enough, it all works
correctly) is to forget about doing stuff on the command line and just
put the script in a file.

Third, in the above command line, it needs to be -F (capital F), since
-f means something entirely different.

---------------------------------------------------------------------------
(*) Note: FWIW, IIRC, I *think* what you are looking for is something
like: for (i=1; i<=NF; i++) { split($i,T,"!");print T[1] }
But that's also just a guess.

John Fitzsimons

unread,
Nov 10, 2009, 3:17:54 AM11/10/09
to
On Sat, 07 Nov 2009 22:37:57 -0500, Bob Harris
<nospam....@remove.Smith-Harris.us> wrote:

>In article <qkccf5pqt3ai6slb3...@4ax.com>,
> John Fitzsimons <DELETEu...@sneakemail.com> wrote:

Hi Bob,

>> Windows Gawk newbie here.

>> Suppose I start with the following on one line.....

>> Path:
>> news.sunsite.dk!dotsrc.org!news.net.uni-c.dk!aotearoa.belnet.be!news.belnet.be
>> !newsfeed.kpn.net!pfeed09.wxs.nl!feeder.eternal-september.org!eternal-septembe
>> r.org!not-for-mail

>> The data content AND width change AND there isn't the normal space
>> delimiter one often has with a line of data.

>> (1) How would I extract the second bit of data ? In this case
>> "dotsrc.org"

>> (2) How would I extract the last bit of data ? In this case
>> eternal-september.org OR maybe eternal-september.org!not-for-mail

>> I think "not-for-mail" is at the end of all news header "paths" but I
>> am not sure of this. In any case I want the "eternal-september.org"
>> part whether the "not-for-mail" follows it, or not.

>> Can anyone here help with either/both queries please ?

>awk -f'!' { print $2 }

Okay, as suggested I updated my gawk.

gawk\bin>gawk.exe --version
GNU Awk 3.1.6
Copyright (C) 1989, 1991-2007 Free Software Foundation.

The error I get is..

C:\gawk\bin>gawk.exe -f bob.awk path.txt >suntest.txt
gawk: bob.awk:1: awk -f'!' { print $2 }
gawk: bob.awk:1: ^ invalid char ''' in expression


Does anyone have any suggestions as to how to get gawk 3.1.6 to work
please ?

Regards, John.

John Fitzsimons

unread,
Nov 10, 2009, 3:17:54 AM11/10/09
to
On Mon, 9 Nov 2009 11:59:13 +0000 (UTC), gaz...@shell.xmission.com
(Kenny McCormack) wrote:

>In article <go1ff5l1dku3sgi56...@4ax.com>,
>John Fitzsimons <DELETEu...@sneakemail.com> wrote:

< snip >

>First of all, I don't think anybody really got what you were really
>trying to do

Well, nobody said that they couldn't understand my query. I gave an
example of the source data..

Path: news.sunsite.dk!dotsrc.org!news.net.uni-c. etc. etc.

and said...

How would I extract the second bit of data ? In this case
"dotsrc.org"

I doubt that I could make that any clearer.

> - so the responses have been of the "I don't know what you
>are really trying to do, but here try this idea that just popped into my
>head - it may get you started [towards wherever it is you think you are
>trying to get to]" variety. (*)

>Second, the example above assumes use of a Unix shell (where the 's work
>correctly). You seem to be using Windows.

Seem to be ? ! The first line of my post said..

Windows Gawk newbie here.

>The best advice for Windows AWK
>users (except when using TAWK where, naturally enough, it all works
>correctly) is to forget about doing stuff on the command line and just
>put the script in a file.

Have never heard of TAWK. I did put the script in a file.

>Third, in the above command line, it needs to be -F (capital F), since
>-f means something entirely different.

>---------------------------------------------------------------------------
>(*) Note: FWIW, IIRC, I *think* what you are looking for is something
>like: for (i=1; i<=NF; i++) { split($i,T,"!");print T[1] }
>But that's also just a guess.

Okay, here is the result of putting Ed's version in a script file..


\gawk\bin>gawk.exe -F ed.awk path.txt >suntest.txt
gawk: path.txt
gawk: ^ syntax error


The suntest file had the following entry.

errcount: 1

Thank you for your feedback.


Regards, John.

Pierre Gaston

unread,
Nov 10, 2009, 4:18:22 AM11/10/09
to
On 2009-11-10, John Fitzsimons <DELETEu...@sneakemail.com> wrote:
>
> \gawk\bin>gawk.exe -F ed.awk path.txt >suntest.txt
> gawk: path.txt
> gawk: ^ syntax error

You are confused. you need to distingish the invocation of awk
with the awk code:

gawk.exe -F'!' '{print $2}' path.txt

This line invokes awk and passed several arguemnt to it, the main script
is '{print $2}'. I think this might fail for you because it is using
unix style quotes. So it might be useful to put the script
in a file foo.awk:

BEGIN{
FS="!" # equivalent to the -F'!'
}
{
print $2
}

Notice that gawk and -F are not included in the foo.awk file,
it contains only awk code. The command line shortcut has also
been transformed into awk code.

now you can run this script like this:

gawk.exe -f foo.awk path.txt

--
pgas @ SDF Public Access UNIX System - http://sdf.lonestar.org

Kenny McCormack

unread,
Nov 10, 2009, 7:43:44 AM11/10/09
to
In article <jq2if5te9adqiq7rl...@4ax.com>,
John Fitzsimons <DELETEu...@sneakemail.com> wrote:
...

>Okay, as suggested I updated my gawk.
>
>gawk\bin>gawk.exe --version
>GNU Awk 3.1.6
>Copyright (C) 1989, 1991-2007 Free Software Foundation.
>
>The error I get is..
>
>C:\gawk\bin>gawk.exe -f bob.awk path.txt >suntest.txt
>gawk: bob.awk:1: awk -f'!' { print $2 }
>gawk: bob.awk:1: ^ invalid char ''' in expression
>
>
>Does anyone have any suggestions as to how to get gawk 3.1.6 to work
>please ?

Oh, maybe used it correctly. These computers are so stuffy, ya know.
Gotta do things always *their* way. Talk about "my way or the highway!"...

Maybe read a manual. I know. Such a pain...

Why should *you* have to do all the work?
Can't it at least meet you half way?

Ed Morton

unread,
Nov 10, 2009, 8:17:11 AM11/10/09
to
John Fitzsimons wrote:
<snip>

> Okay, here is the result of putting Ed's version in a script file..
>
>
> \gawk\bin>gawk.exe -F ed.awk path.txt >suntest.txt
> gawk: path.txt
> gawk: ^ syntax error

That should be -f, not -F. Try this:

gawk.exe -f ed.awk path.txt >suntest.txt

Ed.

Janis Papanagnou

unread,
Nov 10, 2009, 1:31:30 PM11/10/09
to

You have an awk call in an awk program? That doesn't work.

I think this has already been mentioned, but probably buried
somewhere in the thread bandworm;

awk -f ed.awk path.txt >suntest.txt

where ed.awk contains *only* the _awk code_, which was just

BEGIN {FS="!"}


{print $2; print ($NF == "not-for-mail" ? $(NF-1) : "") $NF}

(You don't need the BEGIN clause if you put add option -F "!"
on command line. But better try it, as depiced, using BEGIN.)

Janis

Anton Treuenfels

unread,
Nov 10, 2009, 8:29:11 PM11/10/09
to

"John Fitzsimons" <DELETEu...@sneakemail.com> wrote in message
news:jq2if5te9adqiq7rl...@4ax.com...

> Does anyone have any suggestions as to how to get gawk 3.1.6 to work
> please ?
>

I haven't used it but I suspect it works just fine. What the suggestions so
far have in common is changing the field separator from the default
'run-of-whitespace' to '!' (since that is what separates the fields of
interest to you). No doubt how to do this is adequately explained in the
documentation (typically involving the special variable 'FS'). Then to get
the second field you just use the normal '$2'.

That won't directly solve your second problem, since sometimes you seem to
want the last field and sometimes the last two. But at least you'll be able
to find them easily enough while you decide.

If you want to leave the default field separator alone and do it the hard
way instead, take a look at the index(), rindex() and substr() functions.

- Anton Treuenfels

John Fitzsimons

unread,
Nov 11, 2009, 12:29:25 AM11/11/09
to
On Tue, 10 Nov 2009 09:18:22 +0000 (UTC), Pierre Gaston
<pg...@vinland.freeshell.org> wrote:

>On 2009-11-10, John Fitzsimons <DELETEu...@sneakemail.com> wrote:

>> \gawk\bin>gawk.exe -F ed.awk path.txt >suntest.txt
>> gawk: path.txt
>> gawk: ^ syntax error

>You are confused.

Absolutely. :-)

>you need to distingish the invocation of awk
>with the awk code:

>gawk.exe -F'!' '{print $2}' path.txt

>This line invokes awk and passed several arguemnt to it, the main script
>is '{print $2}'. I think this might fail for you because it is using
>unix style quotes.

Okay.

>So it might be useful to put the script
>in a file foo.awk:

Done.

>BEGIN{
> FS="!" # equivalent to the -F'!'
>}
>{
> print $2
>}

>Notice that gawk and -F are not included in the foo.awk file,
>it contains only awk code. The command line shortcut has also
>been transformed into awk code.

>now you can run this script like this:

>gawk.exe -f foo.awk path.txt

Thanks Pierre. That is a big step forward. I now no longer get
compile errors ! :-)

I also now get the output of my first query exactly as
wanted/expected. Thank you. Very much appreciated. :-)


Regards, John.

John Fitzsimons

unread,
Nov 11, 2009, 12:29:25 AM11/11/09
to
On Tue, 10 Nov 2009 12:43:44 +0000 (UTC), gaz...@shell.xmission.com
(Kenny McCormack) wrote:

>In article <jq2if5te9adqiq7rl...@4ax.com>,
>John Fitzsimons <DELETEu...@sneakemail.com> wrote:

< snip >

>Maybe read a manual. I know. Such a pain...

< snip >

Actually, as someone who has done computer application tutoring I see
a few problems with that sort of comment. Last time I checked the .awk
manual was over 300 html pages. So..

(1) Newbies might not have the time to fully read 300+ pages.

(2) Even if they did it wouldn't mean that they understood what they
read.

(3) If they instead did a "search" then that would only work if they
were sure of the correct terms to be looking for.

(4) Someone used to coding can sometimes see an error in a couple of
seconds that a newbie might take days to find/work out. Assuming
he/she even got to that stage.

(5) Sometimes one only needs to understand a very small percentage
of a program to produce output that is worthwhile/significant to the
user.

HTH. :-)

Regards, John.

John Fitzsimons

unread,
Nov 11, 2009, 12:29:25 AM11/11/09
to
On Tue, 10 Nov 2009 07:17:11 -0600, Ed Morton <morto...@gmail.com>
wrote:

>John Fitzsimons wrote:
><snip>

Now that I have made the changes suggested by Janis, and others,
it seems to work fine. Thank you. :-)

Regards, John.

John Fitzsimons

unread,
Nov 11, 2009, 12:29:25 AM11/11/09
to
On Tue, 10 Nov 2009 19:31:30 +0100, Janis Papanagnou
<janis_pa...@hotmail.com> wrote:

>John Fitzsimons wrote:

< snip >

Hi Janis,

>> The error I get is..

>> C:\gawk\bin>gawk.exe -f bob.awk path.txt >suntest.txt
>> gawk: bob.awk:1: awk -f'!' { print $2 }
>> gawk: bob.awk:1: ^ invalid char ''' in expression

>You have an awk call in an awk program? That doesn't work.

>I think this has already been mentioned, but probably buried
>somewhere in the thread bandworm;

> awk -f ed.awk path.txt >suntest.txt

>where ed.awk contains *only* the _awk code_, which was just

> BEGIN {FS="!"}
> {print $2; print ($NF == "not-for-mail" ? $(NF-1) : "") $NF}

>(You don't need the BEGIN clause if you put add option -F "!"
>on command line. But better try it, as depiced, using BEGIN.)

Thank you for not only finding the errors, but also for explaining why
certain things didn't work correctly. Very much appreciated. :-)


Regards, John.

w_a_x_man

unread,
Nov 11, 2009, 10:32:46 AM11/11/09
to
On Nov 10, 11:29 pm, John Fitzsimons <DELETEucwubq...@sneakemail.com>
wrote:
> On Tue, 10 Nov 2009 12:43:44 +0000 (UTC), gaze...@shell.xmission.com
>
> (Kenny McCormack) wrote:
> >In article <jq2if5te9adqiq7rlb4h01quiv4ejrv...@4ax.com>,

> >John Fitzsimons  <DELETEucwubq...@sneakemail.com> wrote:
>
> < snip >
>
> >Maybe read a manual.  I know.  Such a pain...
>
> < snip >
>
> Actually, as someone who has done computer application tutoring I see
> a few problems with that sort of comment. Last time I checked the .awk
> manual was over 300 html pages. So..
>
> (1) Newbies might not have the time to fully read 300+ pages.

You know nothing about using manuals. One doesn't have to read
the entire manual; he simply skims the first part in order
to find out how to run an awk program.

>
> (2) Even if they did it wouldn't mean that they understood what they
> read.
>
> (3) If they instead did a "search" then that would only work if they
> were sure of the correct terms to be looking for.
>
> (4) Someone used to coding can sometimes see an error in a couple of
> seconds that a newbie might take days to find/work out. Assuming
> he/she even got to that stage.

What's a "he/she"? I never encountered that term in the
works of the great authors.

>
> (5) Sometimes one only needs to understand a very small percentage
> of a program to produce output that is worthwhile/significant to the
> user.
>
> HTH.  :-)

Are you blissfully unaware that you are the one desperately
in need of help?

>
> Regards, John.

--
A woman's place is in the home. --- Wise old saying.

0 new messages