Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

GAWK Issue/question/curiosity #2: (can a function return an array?)

79 views
Skip to first unread message

Kenny McCormack

unread,
Mar 26, 2015, 10:06:49 AM3/26/15
to
Can I have a user-defined function that returns an array?
Like: function foo() { A[1];return A }

As far as I can tell, the answer is "no"; whenever I try I get:

gawk4: cmd. line:1: fatal: attempt to use array `A' in a scalar context

I'm aware that the usual workaround is to pass an array in as a parameter
and have the function modify the array. But that's not really as
convenient as if the function could just return the array.

Incidentally, this works in TAWK:

C:>awkw 'function foo() { local A;A[1];return A } BEGIN {A[2];A=foo();for (i in A) print i}'
1
C:>

Note that if you leave out the "local A", then the final print will print
both 1 and 2.
--
"Remember when teachers, public employees, Planned Parenthood, NPR and PBS
crashed the stock market, wiped out half of our 401Ks, took trillions in
TARP money, spilled oil in the Gulf of Mexico, gave themselves billions in
bonuses, and paid no taxes? Yeah, me neither."

Kenny McCormack

unread,
Apr 7, 2015, 7:36:21 AM4/7/15
to
In article <mf13po$4se$3...@news.xmission.com>,
Kenny McCormack <gaz...@shell.xmission.com> wrote:
>Can I have a user-defined function that returns an array?
>Like: function foo() { A[1];return A }
>
>As far as I can tell, the answer is "no"; whenever I try I get:
>
> gawk4: cmd. line:1: fatal: attempt to use array `A' in a scalar context
>
>I'm aware that the usual workaround is to pass an array in as a parameter
>and have the function modify the array. But that's not really as
>convenient as if the function could just return the array.
>
>Incidentally, this works in TAWK:
>
>C:>awkw 'function foo() { local A;A[1];return A } BEGIN {A[2];A=foo();for (i in
>A) print i}'
>1
>C:>
>
>Note that if you leave out the "local A", then the final print will print
>both 1 and 2.

Bump!!!

--
Those on the right constantly remind us that America is not a
democracy; now they claim that Obama is a threat to democracy.

Andrew Schorr

unread,
Apr 7, 2015, 8:45:47 PM4/7/15
to
On Tuesday, April 7, 2015 at 7:36:21 AM UTC-4, Kenny McCormack wrote:
> Kenny McCormack wrote:
> >Can I have a user-defined function that returns an array?
> >Like: function foo() { A[1];return A }
> >
> >As far as I can tell, the answer is "no"; whenever I try I get:
> >
> > gawk4: cmd. line:1: fatal: attempt to use array `A' in a scalar context
> >
> Bump!!!

I think you answered your own question. What more is there to say?

Regards,
Andy

Kenny McCormack

unread,
Apr 7, 2015, 9:17:42 PM4/7/15
to
In article <4ebb8860-5c88-46de...@googlegroups.com>,
OK, thanks. The point of many of these posts of mine is to get
confirmation from the experts. That's you.

So, you can't return an array from a function in GAWK, right?

Andrew Schorr

unread,
Apr 8, 2015, 7:31:15 PM4/8/15
to
On Tuesday, April 7, 2015 at 9:17:42 PM UTC-4, Kenny McCormack wrote:
> OK, thanks. The point of many of these posts of mine is to get
> confirmation from the experts. That's you.
>
> So, you can't return an array from a function in GAWK, right?

Correct. That is not currently possible.

Using the latest gawk master branch:

bash-4.2$ cat -n /tmp/test.awk
1 function foo() {
2 a[1] = 2
3 a[2] = 3
4 return a
5 }
6
7 BEGIN {
8 x = foo()
9 for (i in x)
10 print i, x[i]
11 }
bash-4.2$ ./gawk -f /tmp/test.awk
gawk: /tmp/test.awk:4: fatal: attempt to use array `a' in a scalar context

This works:
bash-4.2$ cat -n /tmp/test.awk
1 function foo(a) {
2 a[1] = 2
3 a[2] = 3
4 }
5
6 BEGIN {
7 foo(x)
8 for (i in x)
9 print i, x[i]
10 }
bash-4.2$ ./gawk -f /tmp/test.awk
1 2
2 3

Regards,
Andy


Kenny McCormack

unread,
Apr 8, 2015, 9:16:36 PM4/8/15
to
In article <90639ac6-f8ad-4b25...@googlegroups.com>,
Yup. Thanks again.

--
(This discussion group is about C, ...)

Wrong. It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch [sic] revelations of the childhood
traumas of the participants...

Joep van Delft

unread,
Apr 9, 2015, 4:43:07 AM4/9/15
to
Thanks for these examples. This has tripped me up as well. But
contrast the second example with this:

% gawk '
function foo(a) {
a = "aaa"
}
BEGIN {
foo(x)
print x
}'

It prints nothing (also not with "return a" added to the function).

There is an asymmetry here, which I did not find explicitly
documented in the gawk manual. One can construct hints in this
direction from sections "9.2.3.2 Controlling Variable Scope" and
"9.2.3.3 Passing Function Arguments By Value Or By Reference" and
"9.2.4 The return Statement".

What I take from these paragraphs (without being familiar with the
gawk source), is that:

* Function output: "return" returns an exit _value_; and cannot be
misappropriated to return another type (array or function)
* Function input: Arrays are passed by reference, as opposed to
scalars. Therefore, the scalar scoping rules do not apply. This
_can_ be misappropriated to `return' arrays.

It would be good to have this documented somehow. I remember
searching for the string "return array" multiple times in the
documentation.

Kind regards,

Joep




Janis Papanagnou

unread,
Apr 9, 2015, 5:43:09 AM4/9/15
to
On 09.04.2015 10:43, Joep van Delft wrote:
>>[...]
>
> Thanks for these examples. This has tripped me up as well. But
> contrast the second example with this:
>
> % gawk '
> function foo(a) {
> a = "aaa"
> }
> BEGIN {
> foo(x)
> print x
> }'
>
> It prints nothing (also not with "return a" added to the function).

A 'return a' in above example makes no sense if you not access the
return value where the function is called as 'y=foo(x)' or even as
'x=foo(x)'. This is a normal function call.

>
> There is an asymmetry here, which I did not find explicitly
> documented in the gawk manual. One can construct hints in this
> direction from sections "9.2.3.2 Controlling Variable Scope" and
> "9.2.3.3 Passing Function Arguments By Value Or By Reference" and
> "9.2.4 The return Statement".

Chapter 9.2.3.3 is very clear in the first two sentences already.

>
> What I take from these paragraphs (without being familiar with the
> gawk source), is that:
>
> * Function output: "return" returns an exit _value_; and cannot be
> misappropriated to return another type (array or function)
> * Function input: Arrays are passed by reference, as opposed to
> scalars. Therefore, the scalar scoping rules do not apply. This
> _can_ be misappropriated to `return' arrays.

It's (IMO) quite obvious; you have a pass "by value" for simple types
and pass "by reference" for arrays. (This is, BTW, not untypical how
parameter passing works in other programming languages as well.)

>
> It would be good to have this documented somehow. I remember
> searching for the string "return array" multiple times in the
> documentation.

Personally I think it's pretty clear already.

Janis

>
> Kind regards,
>
> Joep
>
>
>
>

Andrew Schorr

unread,
Apr 9, 2015, 10:02:38 AM4/9/15
to
On Thursday, April 9, 2015 at 4:43:07 AM UTC-4, Joep van Delft wrote:
> There is an asymmetry here, which I did not find explicitly
> documented in the gawk manual. One can construct hints in this
> direction from sections "9.2.3.2 Controlling Variable Scope" and
> "9.2.3.3 Passing Function Arguments By Value Or By Reference" and
> "9.2.4 The return Statement".

I think it is explicitly documented. In 9.2.3.3, it says:

In `awk', when you declare a function, there is no way to declare
explicitly whether the arguments are passed "by value" or "by
reference".

Instead, the passing convention is determined at runtime when the
function is called, according to the following rule: if the argument is
an array variable, then it is passed by reference. Otherwise, the
argument is passed by value.

That seems pretty clear to me. It is also in the gawk man page:

Functions are executed when they are called from within expressions in
either patterns or actions. Actual parameters supplied in the function
call are used to instantiate the formal parameters declared in the
function. Arrays are passed by reference, other variables are passed
by value.

It's also stated in the POSIX awk specification here:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

Function parameters, if present, can be either scalars or arrays; the behavior is undefined if an array name is passed as a parameter that the function uses as a scalar, or if a scalar expression is passed as a parameter that the function uses as an array. Function parameters shall be passed by value if scalar and by reference if array name.

Regards,
Andy

Joep van Delft

unread,
Apr 9, 2015, 7:29:36 PM4/9/15
to
I did not formulate adequately what I have been trying to say, as
both Janis' and Andrews replies indicate.

The question is: How to make a function "productive" on an array
(possibly complicated by the repeated advice not to abuse global
variables). I do not believe that this question is explicitly
answered in the documentation. This is one of the few places where my
lack of formal CS education has hindered me locating the conclusive
hint from the documentation.

Rereading, the most conrete clue is:

[W]hen arrays are the parameters to functions, they are not
copied. Instead, the array itself is made available for
direct manipulation by the function. This is usually termed
call by reference. Changes made to an array parameter inside
the body of a function are visible outside that function.

NOTE: Changing an array parameter inside a function can be
very dangerous if you do not watch what you are doing. (...)

What I take from this: *DANGER ZONE*. It would have saved me a trip
to $SEARCH_ENGINE and some experimentation if the official,
recommended way to `return an array' would be explicit (-ly
referenced) here.


Kind regards,


Joep

Janis Papanagnou

unread,
Apr 10, 2015, 5:25:24 AM4/10/15
to
On 10.04.2015 01:29, Joep van Delft wrote:
> [...]
>
> The question is: How to make a function "productive" on an array
> (possibly complicated by the repeated advice not to abuse global
> variables). I do not believe that this question is explicitly
> answered in the documentation. This is one of the few places where my
> lack of formal CS education has hindered me locating the conclusive
> hint from the documentation.
>
> Rereading, the most conrete clue is:
>
> [W]hen arrays are the parameters to functions, they are not
> copied. Instead, the array itself is made available for
> direct manipulation by the function. This is usually termed
> call by reference. Changes made to an array parameter inside
> the body of a function are visible outside that function.
>
> NOTE: Changing an array parameter inside a function can be
> very dangerous if you do not watch what you are doing. (...)
>
> What I take from this: *DANGER ZONE*.

There's a "concept" in CS and programming that's called "side effects";
and the general wisdom is to avoid "side effects" in programming.

The point is, though, that awk is full of side effects built into the
language _by design_ (global access to $0, $1, etc., global variables
like FS, RSTART, etc.)

The warning that you found in the documentation (as I suspect) is to
make folks explicitly aware that there's a cal-by-reference semantics
for arrays, and changing an array will affect the respective array in
the calling context. No more and no less; really nothing to get excited
about. :-)

A "safe" way in that respect is using any of the Functional Languages.
But prevalent imperative languages often allow such Side Effects. You
should certainly know what you're doing.

In awk, since it's a (UI wise) terse language, it may not be obvious
that simple types and arrays are passed with different concepts to a
function; there's no specific syntactic construct (like C++'s '&', or
C's '*', or Pascal's 'var'). Therefore an explicit hint, or warning,
is not bad to be present in the documentation.

> It would have saved me a trip
> to $SEARCH_ENGINE and some experimentation if the official,
> recommended way to `return an array' would be explicit (-ly
> referenced) here.

(I don't understand what you are saying in that sentence.)

Janis

>
>
> Kind regards,
>
>
> Joep
>

Janis Papanagnou

unread,
Apr 10, 2015, 5:48:11 AM4/10/15
to
On 10.04.2015 11:25, Janis Papanagnou wrote:
> On 10.04.2015 01:29, Joep van Delft wrote:
> [...]
>> It would have saved me a trip
>> to $SEARCH_ENGINE and some experimentation if the official,
>> recommended way to `return an array' would be explicit (-ly
>> referenced) here.
>
> (I don't understand what you are saying in that sentence.)

Ah, got it! "would be explicit" / "would be explicitly referenced".
(And I thought, what the heck is "-ly referenced".)

You want a paragraph that says something like:
* If you want a function to operate on an existing array just
pass that array to the function as parameter, and the function
will modify that array.
* If you want the function to create a new array, filled with
data, provide an array parameter to the function, call that
function with an actual (array-)parameter that is unused, or
use the delete operation in the function before filling in the
new data.
* If you want to let the function operate on a copy of an array
in the calling environment, do an explicit copy (using a 'for'
loop, either counted or associative) of the data into a new
array and pass that new (cloned) array to the function.
!! Be aware that any data in the respective arrays in the calling
environment will change accordingly!

Well, I suppose that's indeed all basic CS/programming knowledge,
if you know about the pass-by-value vs. pass-by-reference concepts.

I admit that such description (or examples) may help anyway.

Janis

Kenny McCormack

unread,
Apr 10, 2015, 5:54:36 AM4/10/15
to
In article <mg84u3$c6r$1...@news.m-online.net>,
Janis Papanagnou <janis_pa...@hotmail.com> wrote:
...
>> It would have saved me a trip
>> to $SEARCH_ENGINE and some experimentation if the official,
>> recommended way to `return an array' would be explicit (-ly
>> referenced) here.
>
>(I don't understand what you are saying in that sentence.)

Note: The comments below represent what I *think* the previous poster was
thinking and trying to convey in his post. They represent also what I
think about things, so it is possible that I am being presumptuous in
assigning these thoughts to him. So, take it in that light.

My comments are:
1) I agree that the way the GAWK docu is worded is a little obscure and
strained. I think it could have been worded better - but it's not
worth going into it here. It's in the all-too-common position of
being clear enough for us CS types, but unnecessarily complicated
for non-CS types. (As I said in another post, the point of
scripting languages like GAWK [and others] is to allow
non-programmers to program...)
2) I think we can all agree that the TAWK way is better. Being able to
return an array directly is certainly nice.

--
"I heard somebody say, 'Where's Nelson Mandela?' Well,
Mandela's dead. Because Saddam killed all the Mandelas."

George W. Bush, on the former South African president who
is still very much alive, Sept. 20, 2007

0 new messages