Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

multi-line + sub-regex

65 views
Skip to first unread message

mathieu....@gmail.com

unread,
Oct 30, 2012, 10:02:14 AM10/30/12
to
Hi there,

I am trying to use awk to parse a multiline expression. A single one of them looks like this:

_begin bla
_attrib0 123
_attrib1 456
_attrib1 789
_attrib2 foo
_end
...

I need to extract the value associated to _begin and _attrib1. So in the example, the awk script should return (one per line):

bla 456 789

Thanks for comments !

pop

unread,
Oct 30, 2012, 10:25:18 AM10/30/12
to
mathieu....@gmail.com said the following on 10/30/2012 9:02 AM:
simple way:

/_begin/{ lin=$2 }
/_attrib1/{ lin=lin" "$2 }
/_end/{ print lin; lin="" }

pop is Mark

Kenny McCormack

unread,
Oct 30, 2012, 10:33:22 AM10/30/12
to
In article <59217ed3-0e6e-4920...@googlegroups.com>,
Here's a way to do it that involves manipulating the "internal variables" -
a technique that I usually argue against (i.e., think should be used with
caution) - but I think you will find it interesting. Note to other group
geeks: This is not necessarily the easiest (or least byte count) way to do
this task, but, as I said, I find this approach interesting.

# Yes, the output will have a trailing blank...
BEGIN {ORS=" ";RS="_end\n";FS="\n| "}
{for (i=1; i<=NF; i+=2)
if ($i == "_begin" || $i == "_attrib1") print $(i+1)
printf "\n"}

--
But the Bush apologists hope that you won't remember all that. And they
also have a theory, which I've been hearing more and more - namely,
that President Obama, though not yet in office or even elected, caused the
2008 slump. You see, people were worried in advance about his future
policies, and that's what caused the economy to tank. Seriously.

(Paul Krugman - Addicted to Bush)

mathieu....@gmail.com

unread,
Oct 30, 2012, 12:07:56 PM10/30/12
to
On Tuesday, October 30, 2012 3:33:22 PM UTC+1, Kenny McCormack wrote:
> In article <59217ed3-0e6e-4920...@googlegroups.com>,
This seems to work somewhat when input contains spaces, but fails when input contains tag characters. My input file is something like

_begin hello world !
_attrib0 123
_attrib1 super duper
_attrib1 yet another value
_attrib2 foo
_end

So space is not a separator in my case, only tab.

Thanks

Kenny McCormack

unread,
Oct 30, 2012, 1:01:23 PM10/30/12
to
In article <76402c5e-3b4c-4cbf...@googlegroups.com>,
You're a smart boy. You'll figure it out.

--
The motto of the GOP "base": You can't be a billionaire, but at least you
can vote like one.
0 new messages