Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

awk here documents

942 views
Skip to first unread message

Ed Morton

unread,
Jan 11, 2012, 8:34:42 AM1/11/12
to
Someone at work asked me how to use a shell here-document inside an awk script
called from shell to produce some pre-formatted blocks of code (there's multiple
segments that do that inside his code-generator script) and I gave him this:

$ cat file
abc def ghi

$ cat tstAwkHere.sh
shellVar="foo"

awk '{
awkVar="bar"

print "----------------"

printf "%s", "'"$( awk '{printf "%s\\n",$0}' <<!

shell variable: <$shellVar>
shell positional parameter: <$2>

awk variable: <"awkVar">
awk field: <"\$2">

!

)"'"

print "----------------"

}' file

$ tstAwkHere.sh lmn opq rst
----------------

shell variable: <foo>
shell positional parameter: <opq>

awk variable: <bar>
awk field: <def>

----------------

Any _better_ suggestions? By better I mean clearer and simpler with the
formatted blocks of code still in-line where he needs them in each of his
functions just like you would do if this was a shell script.

Any potential disasters lurking?

Ed.

Janis Papanagnou

unread,
Jan 11, 2012, 9:56:11 AM1/11/12
to
First I'd start to implement a cleaner parameter interface

awk -v shellVar="${shellVar}" -v posPar="$2" '...'

then I'd try to avoid that quoting desaster, where the script switches
between shell and awk. (Sorry, no ad-hoc solution; have to think a bit
about a "better" solution... - my preferred way would be just to print
sequences of strings. If you want fewer syntactical overhead something
like

function _ (a) { print a }

{
_("whatever here")
_("and there")
_("with posPar "posPar" or awkField "$2)
_("or awkVar "awkVar" or shellVar "shellVar)
_(etc...)
}

which I consider considerably clearer than any solution that mixes awk
commands with shell commands (printf), or awk quotes with shell quotes
("), thus contributing to obscurity.)

Janis

Ed Morton

unread,
Jan 11, 2012, 10:53:25 AM1/11/12
to
I agree that's the way you'd really want to get at the shell variables and
parameters, I was just showing that, for better or worse, the shell variables
and positional parameters ARE accessible in the here document.

> then I'd try to avoid that quoting desaster, where the script switches
> between shell and awk.

I tried, I tried....

(Sorry, no ad-hoc solution; have to think a bit
> about a "better" solution... - my preferred way would be just to print
> sequences of strings. If you want fewer syntactical overhead something
> like
>
> function _ (a) { print a }
>
> {
> _("whatever here")
> _("and there")
> _("with posPar "posPar" or awkField "$2)
> _("or awkVar "awkVar" or shellVar "shellVar)
> _(etc...)
> }
>
> which I consider considerably clearer than any solution that mixes awk
> commands with shell commands (printf), or awk quotes with shell quotes
> ("), thus contributing to obscurity.)

The couple of people in my area who are separately looking for a solution to
this both find that approach unacceptable, they specifically want something that
looks like a here document so they can just copy/paste chunks of code from
existing programs into their scripts and tweak a couple of lines to add the
variable parts. They'd rather forego the awk benefits to the rest of their tool
and just write the whole thing as a shell script than do the above function()
approach or have explicit prints for every line in awk.

Thanks,

Ed.

Janis Papanagnou

unread,
Jan 12, 2012, 9:29:08 AM1/12/12
to
Hmm.. - I've got just a vague impression of those "people in my area".

Other possibilities could be using tags or markers and pre-processing
that enhanced awk script

awk '
...awk script code goes here...
HEREDOC+
...heredoc template
inside tags...
HEREDOC-
...awk script code goes here...
'


awk '
...awk script code goes here...
#> ...heredoc template
#> inside tags...
...awk script code goes here...
'

But it's hard to know what your folks are willing to accept and what
they refuse. With a strong emphasis on plain heredoc shell syntax I'd
probably stick to shell anyway. Your people do need awk functionality
as well? I'm not that familiar with perl, but perl supports both, awk
functionality and shell heredoc functionality; ist that an option?

In case that the requirements are:
+ it must be based on awk and shell
+ it may not be shell alone
+ it may not be perl
+ it may not have interpreted pre-processing steps
+ any awk part of the solution must conform to requirements: [unknown]
+ it must be a clear programming pattern

then I'll bite; I think there's no solution. If any of the requirements
can be dropped, though...

BTW, I seem to recall that there was already a thread about heredocs in
awk (in c.l.a or in c.u.s) recently (last year, or so).

Janis

Ed Morton

unread,
Jan 12, 2012, 10:05:59 AM1/12/12
to
They're just people who need to write a bunch of repetitive C code for every new
feature and would like to generate it instead of write it. They're fairly
familiar with shell and awk but not at all familiar with perl.

> Other possibilities could be using tags or markers and pre-processing
> that enhanced awk script
>
> awk '
> ...awk script code goes here...
> HEREDOC+
> ...heredoc template
> inside tags...
> HEREDOC-
> ...awk script code goes here...
> '
>
>
> awk '
> ...awk script code goes here...
> #> ...heredoc template
> #> inside tags...
> ...awk script code goes here...
> '

Hmm, interesting possibility. I'll give that a try and see how it looks/works.

> But it's hard to know what your folks are willing to accept and what
> they refuse. With a strong emphasis on plain heredoc shell syntax I'd
> probably stick to shell anyway. Your people do need awk functionality
> as well?

I don't think awk has anything anyone _needs_ over shell, it just has some
useful features that make it a bit easier to use for text processing so it'd
save them some time/effort if they could write an awk script containing a here
document rather than a full shell script.

I'm not that familiar with perl, but perl supports both, awk
> functionality and shell heredoc functionality; ist that an option?

No, the people I work with don't know perl at all, it's not available on all of
the machines we use and we don't have permission to install it so perl is
definitely out. Plus, most importantly, perl syntax is icky :-).

> In case that the requirements are:
> + it must be based on awk and shell

Right.

> + it may not be shell alone

It could be shell alone. If it's just shell you lose some awk benefits in the
rest of the script, and if it's just awk you lose the here doc benefit of shell.

> + it may not be perl

Right.

> + it may not have interpreted pre-processing steps

That may be possible, depending how it works.

> + any awk part of the solution must conform to requirements: [unknown]

The only requirement is that the here doc part looks like a here doc in as much
as you can copy/paste a chunk of text from some other file into the script
without adding control constructs around every line (start/end quotes, trailing
backslashes, separate function calls, etc.).

> + it must be a clear programming pattern

Right.

> then I'll bite; I think there's no solution. If any of the requirements
> can be dropped, though...
>
> BTW, I seem to recall that there was already a thread about heredocs in
> awk (in c.l.a or in c.u.s) recently (last year, or so).

Yeah, that was probably me too. I had this issue back then as well and was never
really happy with the end result so I thought I'd give it one more try since the
subject just came up again at work.

Thanks for the suggestions, especially that pre-processing one. I'll post what I
come up with for more feedback.

Ed.

Ed Morton

unread,
Jan 12, 2012, 12:56:26 PM1/12/12
to
Here's my attempt at the pre-processing solution:

$ cat file
abc def ghi

$ cat bin/tstAwkHere.sh
awk -v awkVar="bar" \
"$( awk '
/^[[:space:]]*_HERE{/ { pfx="print \""; sfx="\""; next }
/^[[:space:]]*_HERE}/ { pfx=""; ; sfx=""; next }
{ print pfx $0 sfx }
' <<"!"
{ # actual script starts here
awkVar="bar"

print "----------------"

_HERE{

awk variable: <"awkVar">
awk field: <"$2">

_HERE}

print "----------------"

} # actual script ends here
!
)" file

$ tstAwkHere.sh lmn opq rst
----------------

awk variable: <bar>
awk field: <def>

----------------

You can no longer access shell variables or positional parameters from within
the here document, and the awk field "$2" no longer needs to be escaped (\$2).

I worry that the main script is now delimited by double quotes instead of single
quotes, i.e.

awk -v awkVar="bar" "<generated script>"
vs
awk -v awkVar="bar" '<generated script>'

I never use double quotes around my awk scripts so I'm not sure what exactly the
differences will be (and I'd rather not find out!).

Any way to get single quotes instead (issue is getting '$(cmd)' to execute cmd)?

Obviously I'm trying to avoid using a temp file for the generated script.

Ed.

Janis Papanagnou

unread,
Jan 12, 2012, 1:10:53 PM1/12/12
to
On 12.01.2012 18:56, Ed Morton wrote:
>
> I never use double quotes around my awk scripts so I'm not sure what exactly
> the differences will be (and I'd rather not find out!).

The shell matches the quotes around the awk script. A consequence is
that you have to escape any of awk's double quotes (which are used
as string delimiters), and another consequence is that $-expressions
are evaluated by the shell, so field access (for example) will have
to be escaped as well, otherwise the shell will evaluated those.

Janis

Ed Morton

unread,
Jan 12, 2012, 1:15:30 PM1/12/12
to
That's what I thought but that doesn't actually happen and I don't know why so
now I'm not sure what actually WILL happen and when. Look:

$ awk 'BEGIN{ print "Hello World!" }'
Hello World!

$ awk "BEGIN{ print "Hello World!" }"
bash: !": event not found

$ cat foo.sh
awk \
"$( awk '
/^[[:space:]]*_HERE{/ { pfx="print \""; sfx="\""; next }
/^[[:space:]]*_HERE}/ { pfx=""; ; sfx=""; next }
{ print pfx $0 sfx }
' <<"!"
BEGIN{ print "Hello World!" }
!
)"
$ ./foo.sh
Hello World!

Regards,

Ed.

Janis Papanagnou

unread,
Jan 12, 2012, 4:20:12 PM1/12/12
to
On 12.01.2012 19:15, Ed Morton wrote:
> On 1/12/2012 12:10 PM, Janis Papanagnou wrote:
>> On 12.01.2012 18:56, Ed Morton wrote:
>>>
>>> I never use double quotes around my awk scripts so I'm not sure what exactly
>>> the differences will be (and I'd rather not find out!).
>>
>> The shell matches the quotes around the awk script. A consequence is
>> that you have to escape any of awk's double quotes (which are used
>> as string delimiters), and another consequence is that $-expressions
>> are evaluated by the shell, so field access (for example) will have
>> to be escaped as well, otherwise the shell will evaluated those.
>>
>> Janis
>
> That's what I thought but that doesn't actually happen and I don't know why so
> now I'm not sure what actually WILL happen and when. Look:

You're right, some shells evaluate more characters even in double quotes.
You can see that bash-(csh-inherited)-problem already in a simpler context
like

$ printf "Hello World!"
bash: !": event not found

Don't use csh, tcsh, or bash, or any other shell that evaluates the /!/
and breaks standard programs. But if you really want to use one of those,
always escape it the /!/ - though funnily, then you'll get Hello World\!
as response; so switch to a better shell or work around those bash effects.

>
> $ awk 'BEGIN{ print "Hello World!" }'
> Hello World!
>
> $ awk "BEGIN{ print "Hello World!" }"
> bash: !": event not found

What you otherwise get (e.g. with ksh or zsh)

awk: cmd. line:1: BEGIN{ print Hello
awk: cmd. line:1: ^ unexpected newline or end of string

You notice in the message that the quotes have been removed from the shell.

What you get if, as suggested, you escape the quotes

$ awk "BEGIN{ print \"Hello World!\" }"
Hello World!


Janis

Ben Bacarisse

unread,
Jan 12, 2012, 4:43:24 PM1/12/12
to
Janis Papanagnou <janis_pa...@hotmail.com> writes:
<snip>
> Don't use csh, tcsh, or bash, or any other shell that evaluates the /!/
> and breaks standard programs. But if you really want to use one of those,
> always escape it the /!/ - though funnily, then you'll get Hello World\!
> as response; so switch to a better shell or work around those bash
> effects.

History expansion in bash is not on by default when running scripts, and
can be turned off in interactive shells with "set +H".

<snip>
--
Ben.

Janis Papanagnou

unread,
Jan 12, 2012, 4:53:58 PM1/12/12
to
I suspected that you can turn it off, but didn't explored the manual; it's
not my favourite shell, y'know. Thanks for the information, though. :-)
The OP has it used in interactive mode, though, so it seems on by default
in this context.

Janis

>
> <snip>

Kaz Kylheku

unread,
Jan 12, 2012, 5:32:25 PM1/12/12
to
On 2012-01-12, Ed Morton <morto...@gmail.com> wrote:
>> Hmm.. - I've got just a vague impression of those "people in my area".
>
> They're just people who need to write a bunch of repetitive C code for every new
> feature and would like to generate it instead of write it. They're fairly
> familiar with shell and awk but not at all familiar with perl.

I wonder, what are the odds this code generation could be avoided with some
C++?

Do you have an example?

> No, the people I work with don't know perl at all, it's not available on all of
> the machines we use and we don't have permission to install it so perl is
> definitely out. Plus, most importantly, perl syntax is icky :-).

You don't have permission to install development tools that are used
to generate C code on machines that are used for development and builds?

This is the kind of thing that happens when people come to work to exercise
their egos rather than do what is best for the organization and the customers.

What I would do is look into having the developers re-generate the code and
check it into the repository, so the build machines (where the interpreter to
run the generation scripts must not be installed) do not actually have to run
the generation steps.

I can understand about not being able to install cruft on customer boxes, of
course. Does the C generation have to happen on customer boxes? If you can
compile on customer boxes, you could probably stick perl (or <interpreter of
your choice>) into the build, and have your build build it for itself just
within its little environment, not requiring it to go into /usr/bin.

Ed Morton

unread,
Jan 12, 2012, 6:47:05 PM1/12/12
to
The main issue is really that the developers don't know and don't want to learn
perl as they can do whatever they want with just the generally available shell
tools. If people wanted perl I expect we could work around all the technical
issues. They're also not looking to rewrite all the legacy code (which actually
is C++ by the way) to make enhancements easier since doing that would introduce
risk of breakage which would mean a big re-testing effort. All they want is a
cheap and cheerful way to use tools they're already familiar with to generate
some of the more monotonous code they currently need to write by hand.

Ed.

Robert Bonomi

unread,
Jan 13, 2012, 3:58:15 PM1/13/12
to
In article <jek35j$4go$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
>
>Any _better_ suggestions? By better I mean clearer and simpler with the
>formatted blocks of code still in-line where he needs them in each of his
>functions just like you would do if this was a shell script.

Would a script, cat(1)-ing a series of heredocs into a tempfile, e.g.:
TMPFILE=/tp/uniquename.$$

cat <<-EOF > $TMPFILE
awkVar="bar"

print "----------------"

printf "%s", "'"$( awk '{printf "%s\\n",$0}'
EOF
cat r<-EOF >> $TMPFILE

shell variable: <$shellVar>
shell positional parameter: <$2>

awk variable: <"awkVar">
awk field: <"\$2">
EOF
cat r<-EOF >> $TMPFILE
)"'"
print "----------------"
EOF

And then following up with:
awk -f $TMPFILE file
rm {$TMPFILE


satisfy your users ?
0 new messages