array problem: space between words

Luis P. Mendes

unread,

Feb 24, 2009, 1:56:06 PM2/24/09

to

Hi,

I'm trying to build an array whose elements belong to a text string
separated by the semicolon.

Like this:
texto_campos="VALOR EM VENDAS;valor"

oifs=$IFS
IFS=';'
array2=( $texto_campos ) IFS=$oifs

when I print the elements of the array, like:
for v in ${array2[@]}
do
echo "%% $v"
done

%% VALOR
%% EM
%% VENDAS
%% valor

I'd like it to print:
%% VALOR EM VENDAS
%% valor

The array is built according to ';' but also to space between words.

I tried do put some double quotes around the two strings, but have no
sucess in achieving what I want.

What am I missing?

Luis

Stephane CHAZELAS

unread,

Feb 24, 2009, 2:10:02 PM2/24/09

to

2009-02-24, 18:56(+00), Luis P. Mendes:

> Hi,
>
> I'm trying to build an array whose elements belong to a text string
> separated by the semicolon.
>
> Like this:
> texto_campos="VALOR EM VENDAS;valor"
>
> oifs=$IFS
> IFS=';'

you also need "set -f"

> array2=( $texto_campos ) IFS=$oifs
>
> when I print the elements of the array, like:
> for v in ${array2[@]}

for f in "${array2[@]}"

or

IFS=';'
set -f
for f in $texto_campos

> do
> echo "%% $v"
> done
>
> %% VALOR
> %% EM
> %% VENDAS
> %% valor
>
> I'd like it to print:
> %% VALOR EM VENDAS
> %% valor
>
> The array is built according to ';' but also to space between words.
>
> I tried do put some double quotes around the two strings, but have no
> sucess in achieving what I want.
>
> What am I missing?

The quotes.

Note that arrays are not a standard Unix shell feature.

--
Stéphane

Ed Morton

unread,

Feb 24, 2009, 2:12:28 PM2/24/09

to

On Feb 24, 12:56 pm, "Luis P. Mendes" <luislupe...@gmailXXX.com>
wrote:

> Hi,
>
> I'm trying to build an array whose elements belong to a text string
> separated by the semicolon.
>
> Like this:
> texto_campos="VALOR EM VENDAS;valor"
>
> oifs=$IFS
> IFS=';'
> array2=( $texto_campos ) IFS=$oifs
>
> when I print the elements of the array, like:
> for v in ${array2[@]}
> do
> echo "%% $v"
> done
>
> %% VALOR
> %% EM
> %% VENDAS
> %% valor
>
> I'd like it to print:
> %% VALOR EM VENDAS
> %% valor

$ for ((v=0;v<${#array2[@]};v++))
do
echo "% % ${array2[v]}"
done
% % VALOR EM VENDAS
% % valor

Stephane CHAZELAS

unread,

Feb 24, 2009, 2:21:26 PM2/24/09

to

2009-02-24, 11:12(-08), Ed Morton:
[...]

> $ for ((v=0;v<${#array2[@]};v++))
> do
> echo "% % ${array2[v]}"
> done
> % % VALOR EM VENDAS
> % % valor

Note that that is not correct in the general case, except with
zsh in ksh emulation, that doesn't loop through the values of
the arrays because bash and ksh arrays are sparse arrays, that
is more like associative arrays (though the keys are limited to
positive integers). You can have an arrays that have values for
keys 0, 2 and 4 only for instance:

$ ksh -c 'a[0]=1; a[2]=2; a[4]=3; echo "${#a[@]}"'
3

That loop would output:

% % 1
% %
% % 2

instead of

% % 1
% % 2
% % 3

zsh arrays are real arrays and start at index 1 (to be
consistent with $@) except in ksh emulation.

--
Stéphane

David W. Hodgins

unread,

Feb 24, 2009, 3:31:12 PM2/24/09

to

On Tue, 24 Feb 2009 13:56:06 -0500, Luis P. Mendes <luisl...@gmailxxx.com> wrote:

> for v in ${array2[@]}

> What am I missing?

Quotes around the variable.

for v in "${array2[@]}"

Regards, Dave Hodgins

--
Change nomail.afraid.org to ody.ca to reply by email.
(nomail.afraid.org has been set up specifically for
use in usenet. Feel free to use it yourself.)

Mart Frauenlob

unread,

Feb 27, 2009, 5:33:22 AM2/27/09

to

To avoid the problem with unset array members (in bash), and to make
sure you catch all defined (even if empty, but not NUL) members, you
need to use this method:

for i in ${!array[@]}; do
echo ${array[i]} ...
done

This `!array[@]' style syntax (I think) was taken from ksh around bash
rel. 2.x?, as I remember reading a change log some day.

If you just want to retrieve the last index of an array, which you
expect with ${#array[@]} (but which fails because of an unset index),
your are lost with bash tools, AFAIK.

I did write my own function to retrieve the last array member:

last_arr_member() # FUNCTION to echo the amount of members an array ($1)
contains
{
set -- $(eval echo \${!${1}[@]})
if ((${#} > 0)); then
echo "${!#}"
else
echo "0"
fi
}

Greets

Mart

Stephane CHAZELAS

unread,

Feb 27, 2009, 11:25:08 AM2/27/09

to

2009-02-27, 11:33(+01), Mart Frauenlob:
[...]

> If you just want to retrieve the last index of an array, which you
> expect with ${#array[@]} (but which fails because of an unset index),
> your are lost with bash tools, AFAIK.

It's the same problem with ksh.

You'd expect ${!arr[@]: -1}
to return that, but that's not the case.

> I did write my own function to retrieve the last array member:
>
> last_arr_member() # FUNCTION to echo the amount of members an array ($1)
> contains
> {
> set -- $(eval echo \${!${1}[@]})
> if ((${#} > 0)); then
> echo "${!#}"
> else
> echo "0"
> fi
> }

[...]

That returns 0 if there's no member or if there's

No need for the pipe:

last_array_member() {
eval 'set -- -1 "${!'"$1"'[@]}"'
eval 'echo "${'"$#"'}"'
}

Which works in ksh93 as well.

--
Stéphane

Mart Frauenlob

unread,

Feb 27, 2009, 8:04:08 PM2/27/09

to

Stephane CHAZELAS wrote:
> 2009-02-27, 11:33(+01), Mart Frauenlob:
> [...]
>> If you just want to retrieve the last index of an array, which you
>> expect with ${#array[@]} (but which fails because of an unset index),
>> your are lost with bash tools, AFAIK.
>
> It's the same problem with ksh.
>
> You'd expect ${!arr[@]: -1}
> to return that, but that's not the case.
>
>> I did write my own function to retrieve the last array member:
>>
>> last_arr_member() # FUNCTION to echo the amount of members an array ($1)
>> contains
>> {
>> set -- $(eval echo \${!${1}[@]})
>> if ((${#} > 0)); then
>> echo "${!#}"
>> else
>> echo "0"
>> fi
>> }
> [...]
>
> That returns 0 if there's no member or if there's
>
> No need for the pipe:

hm, I don't think an arithmetic evaluation causes a pipe.

>
> last_array_member() {
> eval 'set -- -1 "${!'"$1"'[@]}"'
> eval 'echo "${'"$#"'}"'
> }
>
> Which works in ksh93 as well.

Nice, just do you like to return -1 if the array is empty or undefined?

Inspired by your way of doing the first eval, I did wrote the following:

last_arr_member()
{
eval 'set -- 0 ${!'${1}'[@]}'
echo "${!#}"
}

Removing my 'echo', and your 'eval', this is the quickest way for bash I
can think of by now (and even quite nice to read imho).

Greets

Mart

Greets

Stephane CHAZELAS

unread,

Feb 28, 2009, 8:26:24 AM2/28/09

to

2009-02-28, 02:04(+01), Mart Frauenlob:

> Stephane CHAZELAS wrote:
>> 2009-02-27, 11:33(+01), Mart Frauenlob:
>> [...]
>>> If you just want to retrieve the last index of an array, which you
>>> expect with ${#array[@]} (but which fails because of an unset index),
>>> your are lost with bash tools, AFAIK.
>>
>> It's the same problem with ksh.
>>
>> You'd expect ${!arr[@]: -1}
>> to return that, but that's not the case.
>>
>>> I did write my own function to retrieve the last array member:
>>>
>>> last_arr_member() # FUNCTION to echo the amount of members an array ($1)
>>> contains
>>> {
>>> set -- $(eval echo \${!${1}[@]})
>>> if ((${#} > 0)); then
>>> echo "${!#}"
>>> else
>>> echo "0"
>>> fi
>>> }
>> [...]
>>
>> That returns 0 if there's no member or if there's

Sorry, that sentence wasn't finished:

"That returns 0 if there's no member as well as if there's one
member of indice 0, so you can't differentiate them. Returning
-1 would make more sense, as then the returned value plus one
can be used to append an element to the array.

>> No need for the pipe:
>
> hm, I don't think an arithmetic evaluation causes a pipe.

But command substitution does (though ksh93 has some
optimisations that fakes the fork and pipe when all the commands
run in the command subtitution are builtin).

>>
>> last_array_member() {
>> eval 'set -- -1 "${!'"$1"'[@]}"'
>> eval 'echo "${'"$#"'}"'
>> }
>>
>> Which works in ksh93 as well.
>
> Nice, just do you like to return -1 if the array is empty or undefined?

Yes, see above.

>
> Inspired by your way of doing the first eval, I did wrote the following:
>
> last_arr_member()
> {
> eval 'set -- 0 ${!'${1}'[@]}'
> echo "${!#}"
> }
>
> Removing my 'echo', and your 'eval', this is the quickest way for bash I
> can think of by now (and even quite nice to read imho).

[...]

But then, it ceases to work in ksh and it's dependent on the
value of IFS as you forgot to quote ${1}. Also, that means that
if the caller wants to retrieve that value, it has to do:

value=$(last_arr_member arr)

which means a fork and a pipe (not in ksh93).

You could do:

last_arr_member() {

eval 'set -- -1 ${!'"$1"'[@]}'

eval "REPLY=\${$#}"
}

Another useful function could be one that returns the first free
element:

first_free_member() {
local i
REPLY=0
eval 'set -- ${!'"$1"'[@]}'
for i do
[ "$REPLY" -eq "$i" ] || break
REPLY=$(($REPLY+1))
done
}

--
Stéphane

Mart Frauenlob

unread,

Mar 1, 2009, 4:41:25 AM3/1/09

to

yep, you're right :)

>>> No need for the pipe:
>> hm, I don't think an arithmetic evaluation causes a pipe.
>
> But command substitution does (though ksh93 has some
> optimisations that fakes the fork and pipe when all the commands
> run in the command subtitution are builtin).
>

I see, didn't know that.
I use command substitution a lot, because I find it very useful.
Guess we could start a long design discussing thread here :)

>>> last_array_member() {
>>> eval 'set -- -1 "${!'"$1"'[@]}"'
>>> eval 'echo "${'"$#"'}"'
>>> }
>>>
>>> Which works in ksh93 as well.
>> Nice, just do you like to return -1 if the array is empty or undefined?
>
> Yes, see above.
>
>> Inspired by your way of doing the first eval, I did wrote the following:
>>
>> last_arr_member()
>> {
>> eval 'set -- 0 ${!'${1}'[@]}'
>> echo "${!#}"
>> }
>>
>> Removing my 'echo', and your 'eval', this is the quickest way for bash I
>> can think of by now (and even quite nice to read imho).
> [...]
>
> But then, it ceases to work in ksh and it's dependent on the
> value of IFS as you forgot to quote ${1}. Also, that means that
> if the caller wants to retrieve that value, it has to do:
>
> value=$(last_arr_member arr)
>
> which means a fork and a pipe (not in ksh93).
>
> You could do:
>
> last_arr_member() {
> eval 'set -- -1 ${!'"$1"'[@]}'
> eval "REPLY=\${$#}"
> }
>

I must admit, to not know ksh at all, but I thought so, that it'll only
work that way in bash.
Yes, I use command substitution in the calling functions, to do things
like reverse iterating and array. Guess I don't care much about the
fork/pipe, even now I know of creating them... :)

It just makes the code shorter and easy to read imho.
Also I don't like to use 'global' variables that much. But as said
before, matter of choice, as both ways work.

To also bring in an eventually useful (bash) example: :)

arr_members_sum() {
eval echo \${\#"${1}"[@]}
}

append_array() # append an arrays content to another array
{
local -i arr_idx
local first_arr="${1}" sec_arr="${2}"
(($(arr_members_sum "${sec_arr}") > 0)) || return 0
local -i arr_height=$(($(last_arr_member "${first_arr}") + 1))

for arr_idx in $(eval echo \${!"${sec_arr}"[@]}); do
eval "${first_arr}"[arr_height++]="\${${sec_arr}[arr_idx]}"
done
}

> Another useful function could be one that returns the first free
> element:
>
> first_free_member() {
> local i
> REPLY=0
> eval 'set -- ${!'"$1"'[@]}'
> for i do
> [ "$REPLY" -eq "$i" ] || break
> REPLY=$(($REPLY+1))
> done
> }
>
>

Sorry to the original poster, if this has gone off topic.

Greets

Mart

Mart Frauenlob

unread,

Mar 1, 2009, 5:19:16 AM3/1/09

to

Mart Frauenlob wrote:
> To also bring in an eventually useful (bash) example: :)

arr_members_sum() {
eval echo \${\#"${1}"[@]}
}

last_arr_member() {
eval 'set -- -1 ${!'${1}'[@]}'
echo "${!#}"
}

> Sorry to the original poster, if this has gone off topic.
>

Don't want to spam, but I just looked through my functions, and thought
that one would fit much better into the topic about empty indexes in
arrays. And it's just a one liner.

check_array_empty_indicies()
{
(($(($(last_arr_member "${1}") + 1)) == $(arr_members_sum "${1}")))
}

Greets

Mart

Stephane CHAZELAS

unread,

Mar 1, 2009, 6:03:11 AM3/1/09

to

2009-03-01, 10:41(+01), Mart Frauenlob:
[...]

>> But command substitution does (though ksh93 has some
>> optimisations that fakes the fork and pipe when all the commands
>> run in the command subtitution are builtin).
>>
>
> I see, didn't know that.
> I use command substitution a lot, because I find it very useful.
> Guess we could start a long design discussing thread here :)

Yes, that's how it's meant to be used: return the output of a
command (though there's a misfeature in it in that it removes
all the trailing newline characters from that output instead of
just the last one which makes it unreliable in the general
case).

The only problem is with performance. That becomes really a
problem when you do that kind of things several times, which
shouldn't happen when you write a script with a shell script
approach, but then using arrays generally means that's not what
you're doing.

[...]

>> last_arr_member() {
>> eval 'set -- -1 ${!'"$1"'[@]}'
>> eval "REPLY=\${$#}"
>> }
>>
>
> I must admit, to not know ksh at all, but I thought so, that it'll only
> work that way in bash.

Note that bash is mostly an unfinished clone of ksh (with some
interactive features from csh), so what can be said of bash can
generally be said as well of ksh.

[...]

> It just makes the code shorter and easy to read imho.
> Also I don't like to use 'global' variables that much. But as said
> before, matter of choice, as both ways work.

Note that $REPLY is already used for things like "read" (and
more in zsh for instance).

As we're returning an integer, a more obvious solution could
have been:

f() {
...
return "$value"
}

But unfortunately, in bash and AT&T ksh (and with them only,
ash, pdksh or zsh are OK, though ash only allows positive
values), it returns only the lower 8 bits of it.

~$ ksh -c 'f() { return "$1"; }; f 1234; echo $?'
210
~$ bash -c 'f() { return "$1"; }; f 1234; echo $?'
210
~$ pdksh -c 'f() { return "$1"; }; f 1234; echo $?'
1234
~$ zsh -c 'f() { return "$1"; }; f 1234; echo $?'
1234
~$ ash -c 'f() { return "$1"; }; f 1234; echo $?'
1234

Though of course:

~$ zsh -c 'f() { return "$1"; }; (f 1234); echo $?'
210

as then it becomes the exit status of the subshell which is on 8
bits.

> To also bring in an eventually useful (bash) example: :)
>
> arr_members_sum() {
> eval echo \${\#"${1}"[@]}
> }

Again, there's an unquoted parameter expansion (so depends on
$IFS)

arr_members_sum() {
eval "echo \"\${#${1}[@]}\""
# or eval 'echo "${#'"$1"'[@]}"'
}

Which also works in ksh and zsh.

> append_array() # append an arrays content to another array
> {
> local -i arr_idx
> local first_arr="${1}" sec_arr="${2}"
> (($(arr_members_sum "${sec_arr}") > 0)) || return 0
> local -i arr_height=$(($(last_arr_member "${first_arr}") + 1))
>
> for arr_idx in $(eval echo \${!"${sec_arr}"[@]}); do
> eval "${first_arr}"[arr_height++]="\${${sec_arr}[arr_idx]}"
> done
> }

[...]

Again, a lot of which is dependent on the current value of $IFS.

$IFS was invented for shells that didn't have arrays (to have
some sort of list support via the splitting of string
variables), so there's no point using it when dealing with
arrays.

Also note that it doesn't work for arrays whose names are used
as local variables in that function.

With recent versions of bash and ksh93:

you can do:

array1+=("${array2[@]}")

which also works with zsh. Otherwise:

append_array() {
local i n a b
a=$1 b=$2
eval 'set -- -1 "${!'"$a"'[@]}"'
eval '
n=${'"$#"'}
for i in "${'"$b"'[@]}"; do
n=$(($n + 1))
'"$a"'[$n]=$i
done
'
}

(which again doesn't work for arrays called i, n, a, b).

In all those approaches /holes/ in the second array are not
preserved.

As ksh and bash arrays are associative arrays, there's all the
usual associative array functions that one may want to implement
(like merge arrays, intersections...)

Now, reading that kind of code, one may rightly choose to stay
away from arrays in shells.

--
Stéphane

Stephane CHAZELAS

unread,

Mar 1, 2009, 6:59:21 AM3/1/09

to

2009-02-27, 11:33(+01), Mart Frauenlob:
[...]

> for i in ${!array[@]}; do
> echo ${array[i]} ...
> done
>
> This `!array[@]' style syntax (I think) was taken from ksh around bash
> rel. 2.x?, as I remember reading a change log some day.

[...]

That was introduced in bash 3. Before that, even for something
as simple as copying an array, you had to do convoluted stuff to
parse the output of declare -a.

--
Stéphane

hpt

unread,

Mar 3, 2009, 2:33:25 AM3/3/09

to

On Feb 25, 3:10 am, Stephane CHAZELAS <stephane_chaze...@yahoo.fr>
wrote:

> 2009-02-24, 18:56(+00), Luis P. Mendes:
>
> > Hi,
>
> > I'm trying to build an array whose elements belong to a text string
> > separated by the semicolon.
>
> > Like this:
> > texto_campos="VALOR EM VENDAS;valor"
>
> > oifs=$IFS
> > IFS=';'
>
> you also need "set -f"
>
> > array2=( $texto_campos ) IFS=$oifs
>
> > when I print the elements of the array, like:
> > for v in ${array2[@]}
>
> for f in "${array2[@]}"
>
> or
>
> IFS=';'
> set -f
> for f in $texto_campos

Why use the "set -f" ?

hpt

unread,

Mar 3, 2009, 3:47:01 AM3/3/09

to

On Mar 1, 7:03 pm, Stephane CHAZELAS <stephane_chaze...@yahoo.fr>
wrote:

Why unquoted parameter expansion depends on $IFS?

>
> arr_members_sum() {
> eval "echo \"\${#${1}[@]}\""
> # or eval 'echo "${#'"$1"'[@]}"'
>
> }
>
> Which also works in ksh and zsh.
>
> > append_array() # append an arrays content to another array
> > {
> > local -i arr_idx
> > local first_arr="${1}" sec_arr="${2}"
> > (($(arr_members_sum "${sec_arr}") > 0)) || return 0
> > local -i arr_height=$(($(last_arr_member "${first_arr}") + 1))
>
> > for arr_idx in $(eval echo \${!"${sec_arr}"[@]}); do
> > eval "${first_arr}"[arr_height++]="\${${sec_arr}[arr_idx]}"
> > done
> > }
>
> [...]
>
> Again, a lot of which is dependent on the current value of $IFS.
>

> $IFSwas invented for shells that didn't have arrays (to have

pk

unread,

Mar 3, 2009, 4:53:58 AM3/3/09

to

On Tuesday 3 March 2009 08:33, hpt wrote:

>> IFS=';'
>> set -f
>> for f in $texto_campos
> Why use the "set -f" ?

Because otherwise the shell will (try to) do pathname expansion, for example
if $texto_campos contains characters like *, ? etc.

Stephane CHAZELAS

unread,

Mar 3, 2009, 1:42:15 PM3/3/09

to

2009-03-3, 00:47(-08), hpt:
[...]

>> > arr_members_sum() {
>> > eval echo \${\#"${1}"[@]}
>> > }
>>
>> Again, there's an unquoted parameter expansion (so depends on
>> $IFS)
> Why unquoted parameter expansion depends on $IFS?

[...]

Because leaving a variable unquoted *is* the splitting operator
in POSIX shells, or more exactly, unquoted parameter expansion
(or command substitution and more in some shells) are subject to
word splitting and filename generation (in contexts where that
makes sense)

To perform word splitting, you set $IFS to the list of
separators (terminators in some shells), disable filename
generation (set -f) unless you also want it to be performed.

For instance, so split the output of ls -t into a list of file
to pass to the cat command:

IFS='
'
set -f
cat -- $(ls -t)

Or

ls_output=$(ls -t)
IFS='
'
set -f
cat -- $ls_output

Or to loop through the elements of $PATH (note that the behavior
varies accross shells if $PATH ends in a : character):

IFS=:
set -f
for i in $PATH
do something with "$i"
done

That's confusing for a lot of people and leads to plenty of
problems when people forget that. See the rc and zsh (when not
in POSIX mode) shells for a better design.

--
Stéphane