Incorrect / Inconsistent behavior with nameref assignments in functions

20 views
Skip to first unread message

Binarus

unread,
Aug 28, 2020, 9:03:44 AM8/28/20
to bug-...@gnu.org, ba...@packages.debian.org
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-pc-linux-gnu'
-DCONF_VENDOR='pc' -DLOCALEDIR$
uname output: Linux cerberus 4.9.0-13-amd64 #1 SMP Debian 4.9.228-1
(2020-07-05) x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 4.4
Patch Level: 12
Release Status: release


Description:
------------

Under certain circumstances, assignments of namerefs to local variables
in functions behaves in a way which makes namerefs completely useless.
Furthermore, the behavior is not consistent.

The two scripts given below really should produce the same output.
Instead, the output is different. To reproduce, run the two scripts and
observe the difference in the output.

In summary, the -a qualifier alters the assignment in an undocumented
and surprising way. Both scripts should work identically as expected.

Important additional information: I have tried the same with exactly the
same results in bash 5.0.3 and bash 5.0.11.


Repeat-By:
----------

Consider the following two scripts:

SCRIPT 1:

#!/bin/bash

function Dummy() {

local -n namerefArray="$1"
local -a -i myArray=("${namerefArray[@]}")

local -p
}

declare -a -i myArray=('1' '2' '3')

Dummy 'myArray'


SCRIPT 2:

#!/bin/bash

function Dummy() {

local -n namerefArray="$1"
local myArray=("${namerefArray[@]}")

local -p
}

declare -a -i myArray=('1' '2' '3')

Dummy 'myArray'


OUTPUT OF SCRIPT 1:

myArray=()
namerefArray=myArray


OUTPUT OF SCRIPT 2:

myArray=([0]="1" [1]="2" [2]="3")
namerefArray=myArray


That is, in SCRIPT 1, the assignment to myArray in the function destroys
the contents of the variable referenced by namerefArray. I think that
this a bug, but I am not completely sure about it.

However, what I am quite sure about is that the behavior, whether or not
it is correct, should not differ between the two variants. The
attributes -a and -i in this case should not make any difference, but
actually make a fundamental difference.


Thank you very much, and best regards,

Binarus


Greg Wooledge

unread,
Aug 28, 2020, 11:28:52 AM8/28/20
to bug-...@gnu.org
On Fri, Aug 28, 2020 at 10:56:34AM +0200, Binarus wrote:
> #!/bin/bash
>
> function Dummy() {
>
> local -n namerefArray="$1"
> local -a -i myArray=("${namerefArray[@]}")
>
> local -p
> }
>
> declare -a -i myArray=('1' '2' '3')

You've got a local variable with the same name as the global variable
that you're attempting to pass by reference. This will not work.

Namerefs (declare -n) in bash are *not* like uplevel commands in Tcl.
They cause the referenced variable name to be evaluated just like any
other variable would be, starting at the current function scope, then
going up to the caller, and so on.

If you want to use namerefs in a function in bash, you MUST go out of
your way to minimize the chances of a collision between the caller's
variable refererance and ANY local variable of the function. Not just
the nameref itself, but any other incidental variables used in the
function. (As you aptly demonstrated here.)

So, you can't write functions like this:

func1() {
declare -n ref="$1"
local i
...
}

Instead, you need crazy things like this:

func1() {
declare -n _func1_ref="$1"
local _func1_i
...
}

And then you just have to pray that the caller respects you enough not
to use variables named with _func1_ prefixes.

There is no 100% bulletproof solution to this issue.

See also <https://mywiki.wooledge.org/BashProgramming#Functions>.

Oğuz

unread,
Aug 28, 2020, 11:37:08 AM8/28/20
to bug-...@gnu.org
28 Ağustos 2020 Cuma tarihinde Greg Wooledge <woo...@eeg.ccf.org> yazdı:

> On Fri, Aug 28, 2020 at 10:56:34AM +0200, Binarus wrote:
> > #!/bin/bash
> >
> > function Dummy() {
> >
> > local -n namerefArray="$1"
> > local -a -i myArray=("${namerefArray[@]}")
> >
> > local -p
> > }
> >
> > declare -a -i myArray=('1' '2' '3')
>
> You've got a local variable with the same name as the global variable
> that you're attempting to pass by reference. This will not work.
>
>
These scripts yield identical output on bash-5.1 though.


> Namerefs (declare -n) in bash are *not* like uplevel commands in Tcl.
> They cause the referenced variable name to be evaluated just like any
> other variable would be, starting at the current function scope, then
> going up to the caller, and so on.
>
> If you want to use namerefs in a function in bash, you MUST go out of
> your way to minimize the chances of a collision between the caller's
> variable refererance and ANY local variable of the function. Not just
> the nameref itself, but any other incidental variables used in the
> function. (As you aptly demonstrated here.)
>
> So, you can't write functions like this:
>
> func1() {
> declare -n ref="$1"
> local i
> ...
> }
>
> Instead, you need crazy things like this:
>
> func1() {
> declare -n _func1_ref="$1"
> local _func1_i
> ...
> }
>
>
This doesn't make the slightest sense. What is the point of having local
variables then?


> And then you just have to pray that the caller respects you enough not
> to use variables named with _func1_ prefixes.
>
> There is no 100% bulletproof solution to this issue.
>
> See also <https://mywiki.wooledge.org/BashProgramming#Functions>.
>
>

--
Oğuz

Greg Wooledge

unread,
Aug 28, 2020, 12:14:48 PM8/28/20
to bug-...@gnu.org
On Fri, Aug 28, 2020 at 06:37:00PM +0300, Oğuz wrote:
> 28 Ağustos 2020 Cuma tarihinde Greg Wooledge <woo...@eeg.ccf.org> yazdı:
> > func1() {
> > declare -n _func1_ref="$1"
> > local _func1_i
> > ...
> > }
> >
> This doesn't make the slightest sense. What is the point of having local
> variables then?

You need to distinguish between two types of functions: ones which
use namerefs, and ones which do not.

If a function doesn't use name references, then you may safely use
local variables with any names you like. Nothing has to change.

If a function is going to use a name reference, then you need to
bullet-proof it from head to toe. All local variables have to be
mangled to minimize the chance of collisions. It's an entirely
different problem.

Binarus

unread,
Aug 28, 2020, 12:20:13 PM8/28/20
to bug-...@gnu.org


On 28.08.2020 17:28, Greg Wooledge wrote:
> On Fri, Aug 28, 2020 at 10:56:34AM +0200, Binarus wrote:
>> #!/bin/bash
>>
>> function Dummy() {
>>
>> local -n namerefArray="$1"
>> local -a -i myArray=("${namerefArray[@]}")
>>
>> local -p
>> }
>>
>> declare -a -i myArray=('1' '2' '3')
>
> You've got a local variable with the same name as the global variable
> that you're attempting to pass by reference. This will not work.
>
> Namerefs (declare -n) in bash are *not* like uplevel commands in Tcl.
> They cause the referenced variable name to be evaluated just like any
> other variable would be, starting at the current function scope, then
> going up to the caller, and so on.
>
> If you want to use namerefs in a function in bash, you MUST go out of
> your way to minimize the chances of a collision between the caller's
> variable refererance and ANY local variable of the function. Not just
> the nameref itself, but any other incidental variables used in the
> function. (As you aptly demonstrated here.)
>
> So, you can't write functions like this:
>
> func1() {
> declare -n ref="$1"
> local i
> ...
> }
>
> Instead, you need crazy things like this:
>
> func1() {
> declare -n _func1_ref="$1"
> local _func1_i
> ...
> }

I had worked that out myself so far and have setup a naming convention
in my script similar to your proposition before posting the bug. I
needed a while to recognize the problem since I wouldn't have expected
it because it renders namerefs nearly useless.

However, the main question is why leaving away the -a and -i in the
second script makes things work as expected. Can we rely on that
behavior and circumvent the problem that way? Both scripts should give
the same output, shouldn't they? So it's a bug they don't, and we can't
rely on it?

For reference, I also have posted a question about it on SO yesterday
(nicer code formatting):
https://stackoverflow.com/questions/63629172/weird-behavior-when-assigning-local-namerefs-to-local-variables-in-functions?noredirect=1#comment112528357_63629172

> And then you just have to pray that the caller respects you enough not
> to use variables named with _func1_ prefixes.
>
> There is no 100% bulletproof solution to this issue.
>
> See also <https://mywiki.wooledge.org/BashProgramming#Functions>.

Since I am the caller :-), this won't impose a problem. I have
excessively commented that script.

However, this whole thing does not make much sense. I have put too much
work into this script; otherwise, I'd dump it and restart with Perl or
Python.

Binarus

unread,
Aug 28, 2020, 12:25:31 PM8/28/20
to bug-...@gnu.org


On 28.08.2020 17:37, Oğuz wrote:
> 28 Ağustos 2020 Cuma tarihinde Greg Wooledge <woo...@eeg.ccf.org> yazdı:
>
>> On Fri, Aug 28, 2020 at 10:56:34AM +0200, Binarus wrote:
>>> #!/bin/bash
>>>
>>> function Dummy() {
>>>
>>> local -n namerefArray="$1"
>>> local -a -i myArray=("${namerefArray[@]}")
>>>
>>> local -p
>>> }
>>>
>>> declare -a -i myArray=('1' '2' '3')
>>
>> You've got a local variable with the same name as the global variable
>> that you're attempting to pass by reference. This will not work.
>>
>>
> These scripts yield identical output on bash-5.1 though.

Thank you very much for testing! This is interesting. I couldn't get my
hands on a system with 5.1 yet.

With 5.1, do both scripts behave like SCRIPT 1 with the older versions
or like SCRIPT 2 with the older versions?
>> Namerefs (declare -n) in bash are *not* like uplevel commands in Tcl.
>> They cause the referenced variable name to be evaluated just like any
>> other variable would be, starting at the current function scope, then
>> going up to the caller, and so on.
>>
>> If you want to use namerefs in a function in bash, you MUST go out of
>> your way to minimize the chances of a collision between the caller's
>> variable refererance and ANY local variable of the function. Not just
>> the nameref itself, but any other incidental variables used in the
>> function. (As you aptly demonstrated here.)
>>
>> So, you can't write functions like this:
>>
>> func1() {
>> declare -n ref="$1"
>> local i
>> ...
>> }
>>
>> Instead, you need crazy things like this:
>>
>> func1() {
>> declare -n _func1_ref="$1"
>> local _func1_i
>> ...
>> }
>>
>>
> This doesn't make the slightest sense. What is the point of having local
> variables then?

Or namerefs ... I totally agree. Either locals or namerefs are just
unusable given that situation; you have to choose between them.

Greg Wooledge

unread,
Aug 28, 2020, 1:17:34 PM8/28/20
to bug-...@gnu.org
On Fri, Aug 28, 2020 at 06:20:04PM +0200, Binarus wrote:
> However, the main question is why leaving away the -a and -i in the
> second script makes things work as expected.

I'll leave aside my usual rant about -i for now. Here's your original
code:

#!/bin/bash

function Dummy() {

local -n namerefArray="$1"
local -a -i myArray=("${namerefArray[@]}")

local -p
}

declare -a -i myArray=('1' '2' '3')

Dummy 'myArray'


Here's a variant of it, to try to figure out what's going on.

unicorn:~$ cat f1
#!/bin/bash

f() {
local -n ref="$1"
local -a -i myArray=("${ref[@]}" 420 69)
local -p
}

declare -a -i myArray=('1' '2' '3')
f 'myArray'
declare -p myArray
unicorn:~$ ./f1
myArray=([0]="420" [1]="69")
ref=myArray
declare -ai myArray=([0]="1" [1]="2" [2]="3")


Here we can see that a local array variable is created, and populated with
our two static elements. After returning from the function, the caller's
array is still untouched, so we did not trample the global variable.
It was definitely local.

So the "error" here appears to be that the expansion of "${ref[@]}"
results in zero words.

If we take out the -i then we get the same result.

If we put the -i back in and take out the -a, we also get the same result.

If we take out BOTH the -a and the -i, then we get:

unicorn:~$ ./f1
myArray=([0]="1" [1]="2" [2]="3" [3]="420" [4]="69")
ref=myArray
declare -ai myArray=([0]="1" [1]="2" [2]="3")

Your guess is as good as mine what's happening here. I'll just continue
to stick with my defensive programming.

Robert Elz

unread,
Aug 28, 2020, 2:28:44 PM8/28/20
to Binarus, bug-...@gnu.org
Date: Fri, 28 Aug 2020 18:25:23 +0200
From: Binarus <li...@binarus.de>
Message-ID: <8313a366-6ecd-5e87...@binarus.de>


| > This doesn't make the slightest sense. What is the point of having local
| > variables then?
|
| Or namerefs ... I totally agree. Either locals or namerefs are just
| unusable given that situation; you have to choose between them.

Not really, you just have to stop thinking like a C/python/... programmer
and think like a shell programmer instead.

namerefs (as I understand them, I don't write bash scripts) provide
a way to make use of a variable name that can vary, without needing
to resort to "eval $var='whatever'" type tricks. But aside from being
easier to use, that's essentially what they do, the nameref variable
is simply replaced by its value, and used as if that was typed.

locals (in general, bash's model is slightly different in some details)
are not new variables really (they're not like a local variable in
C or whatever) - instead the best way to think of a local variable is
that the (executable) "local" command copies the old value (+ attribuutes)
to somewhere unknown to the script (and inaccessible), and then when
the function containing "local" returns, those values and attributes
are copied back again (by default bash also unsets local vars initially,
other shells don't, they simply retain whatever value they had previously).
That is, there really is just one variable of each name, and it is global
(other functions that you call access your local variable when they refer
to it by name, not a global they would get if the func with the local
statement was not active ...).

So when you combine those things everything actually makes perfect
sense - as long as you're expecting it to work like this, and not
hoping that sh programming is just C with a different syntax.

kre

ps: I agree that the options given to local should make no difference,
but it sounds as if whatever issue that was has already been fixed.




Oğuz

unread,
Aug 28, 2020, 4:08:38 PM8/28/20
to Binarus, bug-...@gnu.org
28 Ağustos 2020 Cuma tarihinde Binarus <li...@binarus.de> yazdı:

>
>
> On 28.08.2020 17:37, Oğuz wrote:
> > 28 Ağustos 2020 Cuma tarihinde Greg Wooledge <woo...@eeg.ccf.org>
> yazdı:
> >
> >> On Fri, Aug 28, 2020 at 10:56:34AM +0200, Binarus wrote:
> >>> #!/bin/bash
> >>>
> >>> function Dummy() {
> >>>
> >>> local -n namerefArray="$1"
> >>> local -a -i myArray=("${namerefArray[@]}")
> >>>
> >>> local -p
> >>> }
> >>>
> >>> declare -a -i myArray=('1' '2' '3')
> >>
> >> You've got a local variable with the same name as the global variable
> >> that you're attempting to pass by reference. This will not work.
> >>
> >>
> > These scripts yield identical output on bash-5.1 though.
>
> Thank you very much for testing! This is interesting. I couldn't get my
> hands on a system with 5.1 yet.
>
>
5.1 is still in beta phase.


> With 5.1, do both scripts behave like SCRIPT 1 with the older versions
> or like SCRIPT 2 with the older versions?


They behave like SCRIPT 2 as expected, local `myArray' is populated with
the contents of global `myArray'. _identical_ was the wrong word though,
the second script doesn't copy the integer attribute.


> >> Namerefs (declare -n) in bash are *not* like uplevel commands in Tcl.
> >> They cause the referenced variable name to be evaluated just like any
> >> other variable would be, starting at the current function scope, then
> >> going up to the caller, and so on.
> >>
> >> If you want to use namerefs in a function in bash, you MUST go out of
> >> your way to minimize the chances of a collision between the caller's
> >> variable refererance and ANY local variable of the function. Not just
> >> the nameref itself, but any other incidental variables used in the
> >> function. (As you aptly demonstrated here.)
> >>
> >> So, you can't write functions like this:
> >>
> >> func1() {
> >> declare -n ref="$1"
> >> local i
> >> ...
> >> }
> >>
> >> Instead, you need crazy things like this:
> >>
> >> func1() {
> >> declare -n _func1_ref="$1"
> >> local _func1_i
> >> ...
> >> }
> >>
> >>
> > This doesn't make the slightest sense. What is the point of having local
> > variables then?
>
> Or namerefs ... I totally agree. Either locals or namerefs are just
> unusable given that situation; you have to choose between them.
>
> Thank you very much, and best regards,
>
> Binarus
>
>

--
Oğuz

Koichi Murase

unread,
Aug 28, 2020, 7:46:46 PM8/28/20
to Binarus, bug-...@gnu.org, ba...@packages.debian.org
2020-08-28 22:04 Binarus <li...@binarus.de>:
> Description:
> ------------
>
> Under certain circumstances, assignments of namerefs to local variables
> in functions behaves in a way which makes namerefs completely useless.
> Furthermore, the behavior is not consistent.

This is actually not related to namerefs and has already been fixed in
Bash 5.1 and the devel branch. Think about the following codes:

a=1; f1() { local a=$a; local; }; f1
a=2; f2() { local -a a=("$a"); local; }; f2

The results for `f1' are the same for all the Bash versions
2.0..devel, but the results for `f2' varies in versions. Here is the
summary of the results from the different versions of Bash:

- 2.0..3.0: f1: a=1, f2: a=([0]="1")
- 3.1: f2: a=1, f2: a=([0]="")
- 3.2..4.2: f1: a=1, f2: a=([0]="1")
- 4.3..5.0: f1: a=1, f2: a=([0]="")
- 5.1..dev: f1: a=1, f2: a=([0]="1")

I checked the detailed changes. The behavior of `f2' in 3.1 was
reported as a bug in the following thread.

https://lists.gnu.org/archive/html/bug-bash/2006-05/msg00025.html

It was fixed in 8b35878f (commit bash-20060504 snapshot). However,
the bug seems to be introduced again in 36eb585c (commit bash-20121221
snapshot). This regression has been reported at

https://savannah.gnu.org/support/index.php?109669

Finally, it was again fixed in c6c7ae81 (commit bash-20200427
snapshot).

--
Koichi

Binarus

unread,
Aug 29, 2020, 12:58:12 AM8/29/20
to bug-...@gnu.org

On 28.08.2020 22:08, Oğuz wrote:
>
> 5.1 is still in beta phase.

I see. I couldn't use it anyway because the script in question (where I
make heavy use of namerefs and nested function calls) is for production,
and I won't use software which I need to compile myself on those servers
for security reasons (I would need to have to check for security updates
and probably compile and install each new version myself).

Switching those servers to a rolling release distribution like Arch is
not an option either.

>
> With 5.1, do both scripts behave like SCRIPT 1 with the older versions
> or like SCRIPT 2 with the older versions?
>
>
> They behave like SCRIPT 2 as expected, local `myArray' is populated with
> the contents of global `myArray'. _identical_ was the wrong word though,
> the second script doesn't copy the integer attribute. 

This is good news. At least, we can expect that problem to be solved in
production releases in the near future. If the debian folks follow the
notion that the problem is a bug, they will eventually backport the
respective changes. If not, I'll probably have to wait at least 5 years
until bash 5.1 arrives in debian (the current testing and even the
current unstable obviously will include 5.0).

The second script not copying the integer attribute is no problem. We
can just use the first script :-)

Once again, thank you very much for all your effort and testing!

Best regards,

Binarus

Binarus

unread,
Aug 29, 2020, 1:20:22 AM8/29/20
to bug-...@gnu.org
At first, thank you very much for your effort and the understandable
explanation.

On 28.08.2020 20:28, Robert Elz wrote:
> Date: Fri, 28 Aug 2020 18:25:23 +0200
> From: Binarus <li...@binarus.de>
> Message-ID: <8313a366-6ecd-5e87...@binarus.de>
>
>
> | > This doesn't make the slightest sense. What is the point of having local
> | > variables then?
> |
> | Or namerefs ... I totally agree. Either locals or namerefs are just
> | unusable given that situation; you have to choose between them.

> namerefs (as I understand them, I don't write bash scripts) provide
> a way to make use of a variable name that can vary, without needing
> to resort to "eval $var='whatever'" type tricks. But aside from being
> easier to use, that's essentially what they do, the nameref variable
> is simply replaced by its value, and used as if that was typed.

One of my problems was that my two scripts behave differently in current
release version of bash. It may be questionable which behavior would be
the correct one, but at least the two scripts should behave the same.
Otherwise, it's a bug.

> locals (in general, bash's model is slightly different in some details)
> are not new variables really (they're not like a local variable in
> C or whatever) - instead the best way to think of a local variable is
> that the (executable) "local" command copies the old value (+ attribuutes)
> to somewhere unknown to the script (and inaccessible), and then when
> the function containing "local" returns, those values and attributes
> are copied back again (by default bash also unsets local vars initially,
> other shells don't, they simply retain whatever value they had previously).
> That is, there really is just one variable of each name, and it is global
> (other functions that you call access your local variable when they refer
> to it by name, not a global they would get if the func with the local
> statement was not active ...).

This explanation is very good and understandable - thank you very much!
But unfortunately, it makes clear even more that namerefs are nearly
useless in current release versions of bash.

I am very grateful that the bash folks obviously have recognized and
solved that problem in 5.1, and obviously have partially dumped the
policy you have described above. I really hope that this has not
happened accidentally, but by intention.

It will be very interesting whether the man page for 5.1 differs from
that for the previous versions in this respect. I'll eventually try to
get my hands on the manual for 5.1 and find out about it.

Thanks again, and best regards,

Binarus

Binarus

unread,
Aug 29, 2020, 1:45:53 AM8/29/20
to bug-...@gnu.org, ba...@packages.debian.org
Thank you very much for your effort, testing and support!

On 29.08.2020 01:46, Koichi Murase wrote:
> 2020-08-28 22:04 Binarus <li...@binarus.de>:
>> Description:
>> ------------
>>
>> Under certain circumstances, assignments of namerefs to local variables
>> in functions behaves in a way which makes namerefs completely useless.
>> Furthermore, the behavior is not consistent.
>
> This is actually not related to namerefs and has already been fixed in
> Bash 5.1 and the devel branch. Think about the following codes:
>
> a=1; f1() { local a=$a; local; }; f1
> a=2; f2() { local -a a=("$a"); local; }; f2
>
> The results for `f1' are the same for all the Bash versions
> 2.0..devel, but the results for `f2' varies in versions. Here is the
> summary of the results from the different versions of Bash:
>
> - 2.0..3.0: f1: a=1, f2: a=([0]="1")
> - 3.1: f2: a=1, f2: a=([0]="")
> - 3.2..4.2: f1: a=1, f2: a=([0]="1")
> - 4.3..5.0: f1: a=1, f2: a=([0]="")
> - 5.1..dev: f1: a=1, f2: a=([0]="1")

This is very interesting. I never have written code like that and
therefore had my problem not before making heavy use of namerefs. So I
have tested my code in three different bash versions which all are buggy ...

I am surprised that a bug of such severity could survive several years.
I don't know when 4.3 came out, but my version of 4.4 is from 2016, and
5.1 is not out yet, so the bug survived at least 4 years (not taking
into account devel or beta versions, which are not an option for most
people).

> I checked the detailed changes. The behavior of `f2' in 3.1 was
> reported as a bug in the following thread.
>
> https://lists.gnu.org/archive/html/bug-bash/2006-05/msg00025.html
>
> It was fixed in 8b35878f (commit bash-20060504 snapshot). However,
> the bug seems to be introduced again in 36eb585c (commit bash-20121221
> snapshot). This regression has been reported at
>
> https://savannah.gnu.org/support/index.php?109669
>
> Finally, it was again fixed in c6c7ae81 (commit bash-20200427
> snapshot).

Thank you very much again for that invaluable information! I am
wondering when debian will include bash 5.1. It looks like debian
testing and debian unstable are on bash 5.0, so it will probably take
several years.

Best regards,

Binarus


Koichi Murase

unread,
Aug 29, 2020, 8:59:40 PM8/29/20
to Binarus, bug-...@gnu.org
2020-08-29 14:46 Binarus <li...@binarus.de>:
> I am wondering when debian will include bash 5.1. It looks like
> debian testing and debian unstable are on bash 5.0, so it will
> probably take several years.

Actually the problem of the function `Dummy' will not be solved even
in bash 5.1. There is another but similar problem with your function.
A user might specify `namerefArray' as the name of an outer array,
which results in a circular-reference error.

$ cat testR2c.sh
function Dummy {
local -n namerefArray="$1"
local -a -i myArray=("${namerefArray[@]}")
local -p
}
declare -a -i namerefArray=('1' '2' '3')
Dummy namerefArray

$ bash-5.1-alpha testR2c.sh
testR2c.sh: line 4: local: warning: namerefArray: circular name reference
testR2c.sh: line 4: warning: namerefArray: circular name reference
testR2c.sh: line 5: warning: namerefArray: circular name reference
testR2c.sh: line 5: warning: namerefArray: circular name reference
declare -a myArray=([0]="1" [1]="2" [2]="3")
declare -n namerefArray="namerefArray"

If you want to work around the problem, there are several ways.

* One of the simplest ways is to use different variable names as
already suggested by other people. However, when the variable name
is not under control for some reason (that, e.g., the functon is
provided to users who may use it in an arbitrary way, or it imports
different shell-script frameworks), the probability of the name
collision is not 0%.

* Another way is to copy to the local array only when the name is
different from `myArray':

function Dummy {
[[ $1 == myArray ]] ||
eval "local -a myArray=(\"\${$1[@]}\")"
declare -p myArray
}

When you want to add `-i' attribute to the array or to modify the
array without affecting the original outer array, you may first save
the value to another local array and next copy the array to the
array that you want to edit.

function Dummy {
[[ $1 == inputArray ]] ||
eval "local -a inputArray=(\"\${$1[@]}\")"
local -ia myArray=("${inputArray[@]}")
declare -p myArray
}

* If you want to use namerefs to eliminate the use of `eval', maybe
you could do like the following but I think it is more natural and
readable to use eval:

function Dummy {
[[ $1 == refArray ]] || local -n refArray=$1
[[ $1 == inputArray ]] || local -i inputArray=("${refArray[@]}")
local -ia myArray=("${inputArray[@]}")
declare -p myArray
}

Best regards,

Koichi

Binarus

unread,
Aug 30, 2020, 6:24:12 AM8/30/20
to bug-...@gnu.org

On 30.08.2020 02:59, Koichi Murase wrote:
> 2020-08-29 14:46 Binarus <li...@binarus.de>:
>> I am wondering when debian will include bash 5.1. It looks like
>> debian testing and debian unstable are on bash 5.0, so it will
>> probably take several years.
>
> Actually the problem of the function `Dummy' will not be solved even
> in bash 5.1. There is another but similar problem with your function.
> A user might specify `namerefArray' as the name of an outer array,
> which results in a circular-reference error.

Actually, this is what first happened to me and what led me to the
problem described in my original post.

I was trying to solve the circular reference problem by establishing a
mixture of a simple variable naming convention and copying method. The
idea was that all nameref variable names would start with the string
"nameref" (designating the nameref "type"), that all other variable
names would start with other strings, and that I would never pass a
variable which is already a nameref to other functions as a parameter.
Then, as the first action in each function which deals with namerefs, I
would copy the contents of each nameref (whose name starts with
"nameref") to a local variable (whose name doesn't start with
"nameref"), and (if needed) would write back the contents from the local
variable to the nameref variable at the end of the function.

This would make sure that circular references couldn't occur. My example
script 1 is the implementation of this idea, which worked in that it
prevented the circular references ...

In my case, this idea is not as dumb as it first sounds due to the
seemingly unnecessary copying, because most of my functions have to
operate on copies of the data they get passed (mostly arrays) anyway.

Later, while the problem of the circular reference error had indeed been
solved, I noticed that the script silently produced wrong results.

This is the key point and the difference between the two sorts of error:

The bug I have reported (I still believe that "bug" is the correct term)
makes scripts silently produce wrong results. This is an absolute no-go.
If I hadn't tested it thoroughly with edge cases, I eventually wouldn't
have noticed it, which could have led to serious damage (the scripts in
question will be part of a backup system).

In contrast, a circular reference makes bash throw a visible,
appropriate message. I am fine with bash throwing errors or warnings and
possibly aborting script execution. When I see this, I can fix the
problem. Therefore, I don't have any problem with bash 5.1 still not
allowing that sort of circular reference (as long as it keeps throwing
messages when it encounters that situation).

But again, I consider it a massive problem when data just silently is
not copied as expected and even the contents of the variable which is
referenced by the nameref get destroyed (at least as long as we are in
the respective function itself) by using it as the RHS in an assignment.

> $ cat testR2c.sh
> function Dummy {
> local -n namerefArray="$1"
> local -a -i myArray=("${namerefArray[@]}")
> local -p
> }
> declare -a -i namerefArray=('1' '2' '3')
> Dummy namerefArray
>
> $ bash-5.1-alpha testR2c.sh
> testR2c.sh: line 4: local: warning: namerefArray: circular name reference
> testR2c.sh: line 4: warning: namerefArray: circular name reference
> testR2c.sh: line 5: warning: namerefArray: circular name reference
> testR2c.sh: line 5: warning: namerefArray: circular name reference
> declare -a myArray=([0]="1" [1]="2" [2]="3")
> declare -n namerefArray="namerefArray"
>
> If you want to work around the problem, there are several ways.
>
> * One of the simplest ways is to use different variable names as
> already suggested by other people. However, when the variable name
> is not under control for some reason (that, e.g., the functon is
> provided to users who may use it in an arbitrary way, or it imports
> different shell-script frameworks), the probability of the name
> collision is not 0%.

This solution would be good enough for me, because bash warns me about
the problem if it occurs, and I then can change the script accordingly;
I (currently) don't need to provide the functions to other users.

> * Another way is to copy to the local array only when the name is
> different from `myArray':
>
> function Dummy {
> [[ $1 == myArray ]] ||
> eval "local -a myArray=(\"\${$1[@]}\")"
> declare -p myArray
> }

Thank you very much for that idea!

However, eval is evil. If I ever had to provide that function to other
users (which currently is not the case), then I would have a problem if
another user would call it like that:

declare -a -i myArray1=('1' '2' '3')
Dummy 'myArray1[@]}"); echo Gotcha!; #'

Output:

root@cerberus:~/scripts# ./test6
Gotcha!
declare -a myArray=([0]="1" [1]="2" [2]="3")

I guess it would be very complicated, if possible at all, to protect the
code inside eval against every sort of such attacks.

> When you want to add `-i' attribute to the array or to modify the
> array without affecting the original outer array, you may first save
> the value to another local array and next copy the array to the
> array that you want to edit.
>
> function Dummy {
> [[ $1 == inputArray ]] ||
> eval "local -a inputArray=(\"\${$1[@]}\")"
> local -ia myArray=("${inputArray[@]}")
> declare -p myArray
> }

For the reason detailed above, I'd like to avoid eval by all means
(although I currently do not need to handle attacks from evil users).

> * If you want to use namerefs to eliminate the use of `eval', maybe
> you could do like the following but I think it is more natural and
> readable to use eval:
>
> function Dummy {
> [[ $1 == refArray ]] || local -n refArray=$1
> [[ $1 == inputArray ]] || local -i inputArray=("${refArray[@]}")
> local -ia myArray=("${inputArray[@]}")
> declare -p myArray
> }

Thank you very much! This is a good idea which I hadn't thought about
yet. It provides a clean solution to the circular reference problem.
However (hoping that I don't get flamed for a dumb question), I don't
understand why we need inputArray at all in that code. Wouldn't the
following function be sufficient?

function Dummy {
[[ $1 == refArray ]] || local -n refArray=$1
local -ia myArray=("${refArray[@]}")
declare -p myArray
}

Unfortunately, these solutions (while solving the circular reference
problem) don't solve my original problem. I can't use bash 5.1 in the
production environment where the scripts will run, so I have to work
around the bug I have described in the first message of this thread.

I think I'll stick with my current (extremely ugly, but reliable)
solution: Assign a number to each function, and decorate each local
variable in each function with that number, i.e. prepend or append that
number to each local variable name.

Thank you very much again for your valuable time and knowledge!

Best regards,

Binarus

Greg Wooledge

unread,
Aug 30, 2020, 10:51:16 AM8/30/20
to bug-...@gnu.org
On Sun, Aug 30, 2020 at 12:24:03PM +0200, Binarus wrote:
> On 30.08.2020 02:59, Koichi Murase wrote:
> > * Another way is to copy to the local array only when the name is
> > different from `myArray':
> >
> > function Dummy {
> > [[ $1 == myArray ]] ||
> > eval "local -a myArray=(\"\${$1[@]}\")"
> > declare -p myArray
> > }
>
> Thank you very much for that idea!
>
> However, eval is evil. If I ever had to provide that function to other
> users (which currently is not the case), then I would have a problem if
> another user would call it like that:
>
> declare -a -i myArray1=('1' '2' '3')
> Dummy 'myArray1[@]}"); echo Gotcha!; #'
>
> Output:
>
> root@cerberus:~/scripts# ./test6
> Gotcha!
> declare -a myArray=([0]="1" [1]="2" [2]="3")

The evil thing here is code injection. Obviously eval is one way to
perform code injection, but it's not the *only* way. Eval itself isn't
evil; if anything, it's all of the other forms of code injection,
which people don't suspect, that are truly insidious.

https://mywiki.wooledge.org/CodeInjection
https://mywiki.wooledge.org/BashWeaknesses

You're trying to do something that you feel should be possible -- passing
an array to a function by reference. Every other language can do this,
right? So bash should be able to do this... right? Nope.

Passing variables by reference (especially arrays) is one of the
major missing features of bash. Everyone wants it. Many, many people
have attempted it. The sheer insanity of some of the attempts is
astounding.

https://fvue.nl/wiki/Bash:_Passing_variables_by_reference

That's a slightly older page, but he found an exploit in "unset" which
does bizarre things when called at different function scope levels, and
managed to use it to manipulate the existence of variables at various
function scopes.

If you absolutely *need* to pass a variable by reference, don't use bash.
That's the best advice I can give you.

Koichi Murase

unread,
Aug 30, 2020, 2:02:14 PM8/30/20
to Binarus, bug-...@gnu.org
2020-08-30 19:24 Binarus <li...@binarus.de>:
> Actually, this is what first happened to me and what led me to the
> problem described in my original post.
>
> [...]

Thank you for your explanation! Now I see your situation.

>> * Another way is to copy to the local array only when the name is
>> different from `myArray':
>>
>> [...]
>
> However, eval is evil. If I ever had to provide that function to
> other users (which currently is not the case), then I would have a
> problem if another user would call it like that:

Yes, I recognize the problem when the function isn't properly used.
But, the use of eval itself is not fatal. When another user can call
the function as in your example,

Dummy 'myArray1[@]}"); echo Gotcha!; #'

that means that the user can already run arbitrary commands. The user
could have directly written

echo 'Gotcha!'

The real problems occur when the user write like

Dummy "$input_to_program"

with `input_to_program' provided by the third user who should not be
able to run arbitrary commands, and the input is not checked nor
sanitized. In this case, the problem should be evaded by checking or
sanitizing the input. This check can be made inside the function
Dummy, but it is also possible to make it at the time when the shell
receives the input.

> declare -a -i myArray1=('1' '2' '3')
> Dummy 'myArray1[@]}"); echo Gotcha!; #'
>
> Output:
>
> root@cerberus:~/scripts# ./test6
> Gotcha!
> declare -a myArray=([0]="1" [1]="2" [2]="3")

Unfortunately, your original function `Dummy' also has the same
vulnerability. As Greg has written, there are many other places that
cause the command execution in the shell because that is the purpose
of the shell. With your original method,

$ cat testR2d.sh
function Dummy {
local -n namerefArray="$1"
local -a -i myArray=("${namerefArray[@]}")
}
myArray1=("$@")
Dummy 'myArray1'
$ bash testR2d.sh 'a[$(echo Gotcha1 >/dev/tty)]'
Gotcha1

This is caused by the arithmetic evaluation caused by `-i' flag. In
arithmetic evaluations, the array subscripts are subject to the extra
expansions so that the string $(echo Gotcha1 >/dev/tty) is expanded as
a command substitution.

Actually, the nameref also has the same behavior, so the use of
`nameref' is not much safer than the use of `eval'.

$ cat testR2e.sh
function Dummy2 {
local -n namerefScalar=$1
local var=$namerefScalar
}
Dummy2 "$1"
$ bash testR2e.sh 'a[$(echo Gotcha2 >/dev/tty)]'
Gotcha2

Yes, I guess it would be a valid strategy to disallow any use of
`eval' because humans will make mistakes no matter how careful we are.
But, there are still different traps, so anyway we need to carefully
check or sanitize inputs even when we don't use `eval'.

> I guess it would be very complicated, if possible at all, to protect
> the code inside eval against every sort of such attacks.

I think the standard way is to check the input before passing it to
`eval' and is not complicated. You can just check if the array name
has an expected form:

function is-valid-array-name {
local reg='^[_[:alpha:]][_[:alnum:]]*$'
[[ $1 =~ $reg ]]
}

# Check it inside Dummy
function Dummy {
is-valid-array-name "$1" || return 1
[[ $1 == myArray ]] || eval "local -a myArray=(\"\${$1[@]}\")"
declare -p myArray
}

# Or check it when it receives the array name (I prefer this)
is-valid-array-name "$1" || exit 1
input_data=$1
Dummy "$input_data"

>> * If you want to use namerefs to eliminate the use of `eval', maybe
>> you could do like the following [...]
>
> However (hoping that I don't get flamed for a dumb question), I
> don't understand why we need inputArray at all in that code.

Sorry, I should have explained it in detail. The step of `inputArray'
is only needed when you want to modify `myArray' locally in the
function `Dummy' keeping the original array unmodified. Without
`inputArray',

$ cat testR2f.sh
function Dummy {
[[ $1 == refArray ]] || local -n refArray=$1
[[ $1 == myArray ]] || local -ia myArray=("${refArray[@]}")
myArray[0]=${myArray[0]%/}
}
myArray=(my/dir/)
declare -p Dummy
Dummy myArray
declare -p Dummy
$ bash testR2f.sh
declare -a myArray=([0]="my/dir/")
declare -a myArray=([0]="my/dir")

This is because, when the outer array has the same name `myArray', the
function Dummy sees the outer array directly instead of the local
copy.

> Wouldn't the following function be sufficient?
>
> function Dummy {
> [[ $1 == refArray ]] || local -n refArray=$1
> local -ia myArray=("${refArray[@]}")
> declare -p myArray
> }

No, that function solves the problem of the collision with `refArray'
(the circular reference) but not the problem of the collision with
`myArray' (the problem in your original post).

> Unfortunately, these solutions (while solving the circular reference
> problem) don't solve my original problem.

Have you tried? In the examples in my previous reply, I intended to
solve both problems with older versions of Bash.

> I think I'll stick with my current (extremely ugly, but reliable)
> solution

Yes, I think that is the simplest solution in your case.


Best regards,

Koichi

Binarus

unread,
Aug 31, 2020, 2:34:22 AM8/31/20
to bug-...@gnu.org

On 30.08.2020 16:50, Greg Wooledge wrote:

> The evil thing here is code injection. Obviously eval is one way to
> perform code injection, but it's not the *only* way. Eval itself isn't
> evil; if anything, it's all of the other forms of code injection,
> which people don't suspect, that are truly insidious.
>
> https://mywiki.wooledge.org/CodeInjection
> https://mywiki.wooledge.org/BashWeaknesses
>
> You're trying to do something that you feel should be possible -- passing
> an array to a function by reference. Every other language can do this,
> right? So bash should be able to do this... right? Nope.
>
> Passing variables by reference (especially arrays) is one of the
> major missing features of bash. Everyone wants it. Many, many people
> have attempted it. The sheer insanity of some of the attempts is
> astounding.
>
> https://fvue.nl/wiki/Bash:_Passing_variables_by_reference
>
> That's a slightly older page, but he found an exploit in "unset" which
> does bizarre things when called at different function scope levels, and
> managed to use it to manipulate the existence of variables at various
> function scopes.
>
> If you absolutely *need* to pass a variable by reference, don't use bash.
> That's the best advice I can give you.

You are absolutely right, and I have understood this in the meantime.
Unfortunately, there is a substantial amount of work (and thus, money)
in these scripts, and there is a time line, so the moment where I could
dump bash for Perl or Python has passed some time ago.

Hence, I really have to finish these bash scripts, but I have learned my
lesson and in the future won't use bash for anything that is more
complex than a one-liner. Even though bash 5.1 seems to solve my current
problem, I suspect that there are more surprises like this which I just
haven't come across yet.

Binarus

unread,
Aug 31, 2020, 3:57:23 AM8/31/20
to bug-...@gnu.org

On 30.08.2020 20:01, Koichi Murase wrote:
> 2020-08-30 19:24 Binarus <li...@binarus.de>:

> Yes, I recognize the problem when the function isn't properly used.
> But, the use of eval itself is not fatal. When another user can call
> the function as in your example,
>
> Dummy 'myArray1[@]}"); echo Gotcha!; #'
>
> that means that the user can already run arbitrary commands. The user
> could have directly written
>
> echo 'Gotcha!'
>
> The real problems occur when the user write like
>
> Dummy "$input_to_program"
>
> with `input_to_program' provided by the third user who should not be
> able to run arbitrary commands, and the input is not checked nor
> sanitized. In this case, the problem should be evaded by checking or
> sanitizing the input. This check can be made inside the function
> Dummy, but it is also possible to make it at the time when the shell
> receives the input.

Thank you very much for the clarification. Of course, you are right. If
I would provide the functions in a file which can be sourced by users,
security isn't a problem at all (except in cases where users
accidentally make mistakes which lead to the HDD being purged ...). But
if I wrap them in a script and provide the interface via command line
parameters, and that script runs SUID, this imposes a new class of problems.
Thank you very much for these insights! Actually, I didn't think
systematically about command line interfaces and safety yet, because
these scripts won't run SUID and don't provide command line interfaces.
However, the eval code injection stared me in the face, so I thought I'd
comment about it. Probably I shouldn't have done so, because I actually
at least don't have that problem at the moment, and because I don't
intend to to harden theses scripts anyway; as it currently stands, only
root will be able to execute them.

>> I guess it would be very complicated, if possible at all, to protect
>> the code inside eval against every sort of such attacks.
>
> I think the standard way is to check the input before passing it to
> `eval' and is not complicated. You can just check if the array name
> has an expected form:
>
> function is-valid-array-name {
> local reg='^[_[:alpha:]][_[:alnum:]]*$'
> [[ $1 =~ $reg ]]
> }
>
> # Check it inside Dummy
> function Dummy {
> is-valid-array-name "$1" || return 1
> [[ $1 == myArray ]] || eval "local -a myArray=(\"\${$1[@]}\")"
> declare -p myArray
> }
>
> # Or check it when it receives the array name (I prefer this)
> is-valid-array-name "$1" || exit 1
> input_data=$1
> Dummy "$input_data"

Thank you very much. These snippets are nice and clean. I'll file all
your messages and code patterns and use them as reference if I ever
write a bash script again.

>>> * If you want to use namerefs to eliminate the use of `eval', maybe
>>> you could do like the following [...]
>>
>> However (hoping that I don't get flamed for a dumb question), I
>> don't understand why we need inputArray at all in that code.
>
> Sorry, I should have explained it in detail. The step of `inputArray'
> is only needed when you want to modify `myArray' locally in the
> function `Dummy' keeping the original array unmodified. Without
> `inputArray',
>
> $ cat testR2f.sh
> function Dummy {
> [[ $1 == refArray ]] || local -n refArray=$1
> [[ $1 == myArray ]] || local -ia myArray=("${refArray[@]}")
> myArray[0]=${myArray[0]%/}
> }
> myArray=(my/dir/)
> declare -p Dummy
> Dummy myArray
> declare -p Dummy
> $ bash testR2f.sh
> declare -a myArray=([0]="my/dir/")
> declare -a myArray=([0]="my/dir")
>
> This is because, when the outer array has the same name `myArray', the
> function Dummy sees the outer array directly instead of the local
> copy.

I wrongly assumed that this code would be for bash 5.1 where we didn't
need to work around my original problem, but only around the circular
name reference problem (in bash 5.1, if I got it right, we could declare
myArray without any precondition, so it would be local in any case). I
apologize for this misunderstanding.

>> Wouldn't the following function be sufficient?
>>
>> function Dummy {
>> [[ $1 == refArray ]] || local -n refArray=$1
>> local -ia myArray=("${refArray[@]}")
>> declare -p myArray
>> }
>
> No, that function solves the problem of the collision with `refArray'
> (the circular reference) but not the problem of the collision with
> `myArray' (the problem in your original post).

See previous comment - I assumed that we were talking about bash 5.1.
Sorry for not having clarified this ...

>> Unfortunately, these solutions (while solving the circular reference
>> problem) don't solve my original problem.
>
> Have you tried? In the examples in my previous reply, I intended to
> solve both problems with older versions of Bash.

You are absolutely right. I have been too fast, and the day was too long:

I immediately concentrated on your third example, because I generally
try to avoid eval (although it is not per se evil, as I have learned
now). Then I looked into bash 5.1 and came to the conclusion that my
Dummy from above (three lines in the body without inputArray) is
equivalent to your third example in bash 5.1.

Some hours later, I saw that my Dummy did not solve my original problem
in previous bash versions, and had forgotten that your example had one
line more :-) Silly me!

In summary, thanks to your help, we now have a clean and understandable
solution for my original problem as well as for the circular reference
problem.

>> I think I'll stick with my current (extremely ugly, but reliable)
>> solution
>
> Yes, I think that is the simplest solution in your case.

This is the only thing where I dare to not agree :-) Your solution is so
far superior to this ugly hack that I will begin to change my scripts
immediately after having finished this message. Decorating variables
with numbers is so silly and hard to maintain that it can only be a last
resort; furthermore, it breaks my existing naming conventions.

Thank you very much for sharing your invaluable knowledge and for
putting so much time and effort into helping us!

Best regards,

Binarus

Chet Ramey

unread,
Aug 31, 2020, 11:21:37 AM8/31/20
to Binarus, bug-...@gnu.org, ba...@packages.debian.org, chet....@case.edu
On 8/28/20 4:56 AM, Binarus wrote:

> Bash Version: 4.4
> Patch Level: 12
> Release Status: release
>
>
> Description:
> ------------
>
> Under certain circumstances, assignments of namerefs to local variables
> in functions behaves in a way which makes namerefs completely useless.
> Furthermore, the behavior is not consistent.

There is an order of evaluation problem as explained later in the thread,
not specific to namerefs. It's fixed in bash-5.1.

The underlying issue is making `declare [options] foo=bar' expand the
argument like an assignment statement as POSIX specifies. This makes
`declare' more like a hybrid reserved word instead of a builtin. In some
cases, the options matter and affect how the argument gets expanded. You
end up having to do something like `declare [options] foo; foo=bar' to get
the expansion right, but that introduces several corner cases.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ch...@case.edu http://tiswww.cwru.edu/~chet/

Chet Ramey

unread,
Aug 31, 2020, 11:24:06 AM8/31/20
to bug-...@gnu.org, chet....@case.edu
On 8/28/20 11:28 AM, Greg Wooledge wrote:

>
> You've got a local variable with the same name as the global variable
> that you're attempting to pass by reference. This will not work.
>
> Namerefs (declare -n) in bash are *not* like uplevel commands in Tcl.
> They cause the referenced variable name to be evaluated just like any
> other variable would be, starting at the current function scope, then
> going up to the caller, and so on.

Namerefs don't change bash's dynamic variable scoping. The two conflict
and make namerefs less useful than they might be.
Reply all
Reply to author
Forward
0 new messages