On 2016-01-24, Ed Morton <
morto...@gmail.com> wrote:
> On 1/23/2016 11:59 PM, Kaz Kylheku wrote:
>> On 2016-01-24, Hongyi Zhao <
hongy...@gmail.com> wrote:
>>> If I want to using shell variable after the getline's redirect symbol:
>>>
>>> i.e., change the following:
>>>
>>> getline < "./ovpn_file_list.d/all-ovpn-file.list"
>>>
>>> into something like this:
>>>
>>> getline < "var-pointed-to-all-ovpn-file.list"
>>>
>>> How can I achieve this goal?
>>
>> I'm surprised that after all this time you don't know how to pass
>> a piece of data from the shell into Awk. This topic has been rehashed
>> numerous times.
>>
>> The most transparent way is to pass the variable via the environment:
>>
>> $ MY_VARIABLE=whatever awk 'BEGIN { print ENVIRON["MY_VARIABLE"] }'
>> whatever
>>
>> Other than that, there are hacky approaches where you have to be clever,
>> depending on the content of the variable. There is the "intermediately
>> hacky" method using Awk's -v option:
>>
>> $ awk -v foo=whatever 'BEGIN { print foo }'
>> whatever
>>
>> Here, "whatever" is considered Awk syntax; it cannot be arbitrary
>> characters.
>
> That statement is incorrect, "whatever" is just a numeric-string stored in a
> variable and can be arbitrary characters.
Contradicted immediately below:
> literally, though - a backslash followed by a character is treated as the
> expansion of that escape sequence, so "\t" becomes a literal tab char and if you
> want the literal string "\t" you need to escape the backslash "\\t".
\ and t are two arbitrary characters. Their conversion to TAB is due to
Awk syntax. It's not the whole syntax, of course, just some of it,
namely the syntax of a STRING token. You can't put a "pattern action"
phrase into a -v, obviously.
This syntactic processing doesn't happen under environment passing:
$ foo='\t' awk 'BEGIN { print ENVIRON["foo"] }'
\t
$ awk -vfoo='\t' 'BEGIN { print foo }'
POSIX and implementations seem to be at odds with each other here in
some ways. Both mawk and gawk (and gawk --posix) reproduce the quotes
in this example:
$ awk -v foo='"bar"' 'BEGIN { print foo }' # gawk or mawk
"bar"
But POSIX says, very clearly:
"The characters following the equal sign will be interpreted as if
they appeared in the awk program preceded and followed by a
double-quote (") character, as a STRING token (see Grammar ), except
that if the last character is an unescaped backslash, it will be
interpreted as a literal backslash rather than as the first character
of the sequence \". The variable will be assigned the value of that
STRING token."
According to this, we should get an empty string, because "bar" is
put between quotes, resulting in ""bar"", and when a STRING token
is extracted from that, we get en empty one, with the bar"" remaining
untokenized. Or else we should get a syntax error, because the input
didn't correspond to a STRING token in its entirety.
About STRING, the specification says:
A string constant will be terminated by the first unescaped occurrence
of the character after the one that begins the string constant.
Maybe this is undefined behavior? In C, an undefined behavior situation
exists when the token pasting ## operator produces something that isn't
a lexically a single, valid preprocessor token. This is vaguely similar:
the material from -v is pasted up into a string literal, but the result
isn't a single string literal token.
By the way it is also written that "[A] newline character will not occur
within a string constant." That's a very significant way of not
supporting "arbitrary characters". Gawk (with or without --posix) and
Mawk again *do* support a newline rather than diagnosing a bad string
literal syntax, which is better behavior of course, but out of spec:
$ awk -v foo='hey
there' 'BEGIN { print foo }'
hey
there