Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion multi-line + sub-regex

Received: by 10.224.193.72 with SMTP id dt8mr17460634qab.7.1351613276992;
        Tue, 30 Oct 2012 09:07:56 -0700 (PDT)
Received: by 10.52.65.33 with SMTP id u1mr6184779vds.18.1351613276966; Tue, 30
 Oct 2012 09:07:56 -0700 (PDT)
Path: gf5ni6293802qab.0!nntp.google.com!e17no14138291qar.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Tue, 30 Oct 2012 09:07:56 -0700 (PDT)
In-Reply-To: <k6oofi$7va$1@news.xmission.com>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2001:660:5001:142:ea39:35ff:fe46:2882;
 posting-account=5syELgoAAABMLWsjbxhk8Wo7CLxGgTPG
NNTP-Posting-Host: 2001:660:5001:142:ea39:35ff:fe46:2882
References: <59217ed3-0e6e-4920-b071-5972567b3104@googlegroups.com> <k6oofi$7va$1@news.xmission.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <76402c5e-3b4c-4cbf-9af1-1e3f06608c27@googlegroups.com>
Subject: Re: multi-line + sub-regex
From: mathieu.malate...@gmail.com
Injection-Date: Tue, 30 Oct 2012 16:07:56 +0000
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, October 30, 2012 3:33:22 PM UTC+1, Kenny McCormack wrote:
> In article <59217ed3-0e6e-4920-b071-5972567b3104@googlegroups.com>,
> 
>  <mathieu.malaterre> wrote:
> 
> >Hi there,
> 
> >
> 
> >  I am trying to use awk to parse a multiline expression. A single one
> 
> >of them looks like this:
> 
> >
> 
> >_begin bla
> 
> >_attrib0 123
> 
> >_attrib1 456
> 
> >_attrib1 789
> 
> >_attrib2 foo
> 
> >_end
> 
> >...
> 
> >
> 
> >I need to extract the value associated to _begin and _attrib1. So in the
> 
> >example, the awk script should return (one per line):
> 
> >
> 
> >bla 456 789
> 
> >
> 
> >Thanks for comments !
> 
> 
> 
> Here's a way to do it that involves manipulating the "internal variables" -
> 
> a technique that I usually argue against (i.e., think should be used with
> 
> caution) - but I think you will find it interesting.  Note to other group
> 
> geeks: This is not necessarily the easiest (or least byte count) way to do
> 
> this task, but, as I said, I find this approach interesting.
> 
> 
> 
> # Yes, the output will have a trailing blank...
> 
> BEGIN {ORS=" ";RS="_end\n";FS="\n| "}
> 
> {for (i=1; i<=NF; i+=2)
> 
>     if ($i == "_begin" || $i == "_attrib1") print $(i+1)
> 
> printf "\n"}

This seems to work somewhat when input contains spaces, but fails when input contains tag characters. My input file is something like

_begin	hello world !
_attrib0	123
_attrib1	super duper
_attrib1	yet another value
_attrib2	foo
_end

So space is not a separator in my case, only tab.

Thanks