Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

equivalent of Linux readlink -f in pure bash?

1,109 views
Skip to first unread message

Jon Seymour

unread,
Aug 8, 2011, 9:44:43 PM8/8/11
to bug-...@gnu.org
Has anyone ever come across an equivalent to Linux's readlink -f that
is implemented purely in bash?

(I need readlink's function on AIX where it doesn't seem to be available).

jon.

Bob Proulx

unread,
Aug 8, 2011, 10:49:54 PM8/8/11
to Jon Seymour, bug-...@gnu.org

Try this:

ls -l /path/to/some/link | awk '{print$NF}'

Sure it doesn't handle whitespace in filenames but what classic AIX
Unix symlink would have whitespace in it? :-)

Bob

Jon Seymour

unread,
Aug 8, 2011, 11:24:54 PM8/8/11
to Jon Seymour, bug-...@gnu.org

readlink -f will fully resolve links in the path itself (rather than
link at the end of the path), which was the behaviour I needed.

It seems cd -P does most of what I need for directories and so
handling things other than directories is a small tweak on that.

Anyway, thanks for that!

jon.

Bob Proulx

unread,
Aug 8, 2011, 11:36:43 PM8/8/11
to Jon Seymour, bug-...@gnu.org
Jon Seymour wrote:
> readlink -f will fully resolve links in the path itself (rather than
> link at the end of the path), which was the behaviour I needed.

Ah, yes, well, as you could tell that was just a partial solution
anyway.

> It seems cd -P does most of what I need for directories and so
> handling things other than directories is a small tweak on that.

You might try cd'ing there and then using pwd -P to get the canonical
directory name. I am thinking something like this:

#!/bin/sh
p="$1"
dir=$(dirname "$p")
base=$(basename "$p")
physdir=$(cd "$dir"; pwd -P)
realpath=$(cd "$dir"; ls -l "$base" | awk '{print$NF}')
echo "$physdir/$realpath" | sed 's|//*|/|g'
exit 0

Again, another very quick and partial solution. But perhaps something
good enough just the same.

Bob

Jon Seymour

unread,
Aug 9, 2011, 12:09:06 AM8/9/11
to Jon Seymour, bug-...@gnu.org

I always use sed for this purpose, so:

$(cd "$dir"; ls -l "$base" | sed "s/.*->//")

But, with pathological linking structures, this isn't quite enough -
particularly if the target of the link itself contains paths, some of
which may contain links :-)

jon.

Bob Proulx

unread,
Aug 9, 2011, 12:14:46 AM8/9/11
to Jon Seymour, bug-...@gnu.org
Jon Seymour wrote:
> I always use sed for this purpose, so:
>
> $(cd "$dir"; ls -l "$base" | sed "s/.*->//")
>
> But, with pathological linking structures, this isn't quite enough -
> particularly if the target of the link itself contains paths, some of
> which may contain links :-)

Agreed! Symlinks with arbitrary data, such as holding small shopping
lists in the target value, are so much fun. I am more concerned that
arbitrary data such as "->" might exist in there more so than
whitespace. That is why I usually don't use a pattern expression.
But I agree it is another way to go. But it is easier to say
whitespace is bad in filenames than to say whitespace is bad and oh
yes you can't have "->" in there either. :-)

Bob

Jon Seymour

unread,
Aug 9, 2011, 12:36:52 AM8/9/11
to Jon Seymour, bug-...@gnu.org

Ok, I think this does it...

readlink_f()
{
local path="$1"
test -z "$path" && echo "usage: readlink_f path" 1>&2 && exit 1;

local dir

if test -L "$path"
then
local link=$(ls -l "$path" | sed "s/.*-> //")
if test "$link" = "${link#/}"
then
# relative link
dir="$(dirname "$path")"
readlink_f "${dir%/}/$link"
else
# absolute link
readlink_f "$link"
fi
elif test -d "$path"
then
(cd "$path"; pwd -P) # normalize it
else
dir="$(cd $(dirname "$path"); pwd -P)"
base="$(basename "$path")"
echo "${dir%/}/${base}"
fi
}

Jon Seymour

unread,
Aug 9, 2011, 12:41:44 AM8/9/11
to Jon Seymour, bug-...@gnu.org
On Tue, Aug 9, 2011 at 2:36 PM, Jon Seymour <jon.s...@gmail.com> wrote:
> On Tue, Aug 9, 2011 at 2:14 PM, Bob Proulx <b...@proulx.com> wrote:
>> Jon Seymour wrote:
>>> I always use sed for this purpose, so:
>>>
>>>    $(cd "$dir"; ls -l "$base" | sed "s/.*->//")
>>>
>>> But, with pathological linking structures, this isn't quite enough -
>>> particularly if the target of the link itself contains paths, some of
>>> which may contain links :-)
>>
>> Agreed!  Symlinks with arbitrary data, such as holding small shopping
>> lists in the target value, are so much fun.  I am more concerned that
>> arbitrary data such as "->" might exist in there more so than
>> whitespace.  That is why I usually don't use a pattern expression.
>> But I agree it is another way to go.  But it is easier to say
>> whitespace is bad in filenames than to say whitespace is bad and oh
>> yes you can't have "->" in there either.  :-)
>>
>
> Ok, I think this does it...
>
> readlink_f()
> {
> ...
> }

And I make no claims whatsoever about whether this is vulnerable to
infinite recursion!

jon.

Bob Proulx

unread,
Aug 9, 2011, 12:51:26 AM8/9/11
to Jon Seymour, bug-...@gnu.org
Jon Seymour wrote:
> readlink_f()
> {
> local path="$1"
> test -z "$path" && echo "usage: readlink_f path" 1>&2 && exit 1;

An extra ';' there that doesn't hurt but isn't needed.

> local dir
>
> if test -L "$path"
> then
> local link=$(ls -l "$path" | sed "s/.*-> //")

I would be inclined to also look for a space before the " -> " too.
Because it just is slightly more paranoid.

local link=$(ls -l "$path" | sed "s/.* -> //")

> if test "$link" = "${link#/}"
> then
> # relative link
> dir="$(dirname "$path")"

As an aside you don't need to quote assignments. They exist inside
the shell and no word splitting will occur. It is okay to assign
without quotes here and I think it reads slightly better without.

dir=$(dirname "$path")

> readlink_f "${dir%/}/$link"
> else
> # absolute link
> readlink_f "$link"
> fi
> elif test -d "$path"
> then
> (cd "$path"; pwd -P) # normalize it
> else
> dir="$(cd $(dirname "$path"); pwd -P)"
> base="$(basename "$path")"

Same comment here about over-quoting. If nothing else it means that
syntax highlighting is different.

dir=$(cd $(dirname "$path"); pwd -P)
base=$(basename "$path")

> echo "${dir%/}/${base}"
> fi
> }

And of course those are just suggestions and nothing more. Feel free
to ignore.

Note that there is a recent movement to change that dash greater-than
combination into a true unicode arrow graphic emited by 'ls'. I think
things paused when there were several different bike shed suggestions
about which unicode arrow symbol people wanted there. I haven't seen
any actual movement for a while and I think that is a good thing.

Bob

Jon Seymour

unread,
Aug 9, 2011, 1:12:34 AM8/9/11
to Jon Seymour, bug-...@gnu.org

Tips appreciated, thanks.

jon.

Andreas Schwab

unread,
Aug 9, 2011, 4:56:26 AM8/9/11
to Jon Seymour, bug-...@gnu.org
Bob Proulx <b...@proulx.com> writes:

> Same comment here about over-quoting. If nothing else it means that
> syntax highlighting is different.
>
> dir=$(cd $(dirname "$path"); pwd -P)

You are missing a pair of quotes here. :-)

Andreas.

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Stephane CHAZELAS

unread,
Aug 9, 2011, 5:24:40 AM8/9/11
to
2011-08-9, 11:44(+10), Jon Seymour:

> Has anyone ever come across an equivalent to Linux's readlink -f that
> is implemented purely in bash?
>
> (I need readlink's function on AIX where it doesn't seem to be available).
[...]

What about:

readlink_f() (
link=$1 max_iterations=40
while [ "$max_iterations" -gt 0 ]; do
max_iterations=$(($max_iterations - 1))
dir=$(dirname -- "$link") || exit
base=$(basename -- "$link") || exit
dir=$(cd -P -- "$dir" && pwd -P) || exit
link=${dir%/}/$base
if [ ! -L "$link" ]; then
printf '%s\n' "$link"
exit
fi
link=$(ls -ld -- "$link") || exit
link=${link#* -> }
done
printf >&2 'Loop detected\n'
exit 1
)

--
Stephane

Bernd Eggink

unread,
Aug 9, 2011, 5:29:00 AM8/9/11
to bug-...@gnu.org
On 09.08.2011 03:44, Jon Seymour wrote:
> Has anyone ever come across an equivalent to Linux's readlink -f that
> is implemented purely in bash?

You can find my version here:

http://sudrala.de/en_d/shell-getlink.html

As it contains some corrections from Greg Wooledge, it should handle
even pathological situations. ;)

Bernd

--
http://sudrala.de

Stephane CHAZELAS

unread,
Aug 9, 2011, 6:02:28 AM8/9/11
to
2011-08-09, 11:29(+02), Bernd Eggink:
[...]

> function getlink # ([-l] path)

Why use the ksh syntax instead of the standard one?

> {
> # Path of the file a symbolic link is pointing to.
> # -l: follow link chain, print last target
> # no option: print 1st target unchanged
>
> typeset dir file last link opt oldPWD=$PWD ret=0
>
> (( OPTIND = 1 ))
>
> while getopts "l" opt
> do
> case $opt in
> (l) last=1
> ;;
> esac
> done
>
> shift $(( OPTIND - 1 ))
> file=$1
>
> if [[ $last ]] # last link
> then
> while true
> do
> dir=$(dirname "$file")

What if $file starts with "-"?

> [[ ! -d $dir ]] &&
> {
> ret=1
> break
> }
>
> # remove slashes at end
>
> while [[ $file == */ ]]
> do
> file=${file%/}
> done

What if $file is "/"

> file=${file##*/} # file name
> command cd -P "$dir"

What if that command fails?

> [[ ! -h $file ]] && break

What if there are symlinks in the path components?

> link=$(command ls -l -- "$file"; printf x)

Good point about command substitution discarding trailing
newlines. I forgot that in the solution I gave.

> link=${link%$'\nx'}
> remove="$file -> "
> file=${link#*"$remove"}
> done
>
> printf "%s\n" "$PWD/$file"
> command cd $oldPWD

What if $oldPWD has blanks or wildcards?


> elif [[ ! -h $file ]] # 1st link
> then
> printf "%s\n" "$file"
> else
> link=$(ls -l "$file")
> printf "%s\n" "${link##*-> }"

What about trailing newlines here and links with "-> " in their
path.

> fi
>
> return $ret

What if $IFS contains 0 or 1?

> }
>
> declare -fc getlink

What's the -c about?

--
Stephane

Stephane CHAZELAS

unread,
Aug 9, 2011, 6:10:48 AM8/9/11
to
2011-08-9, 09:24(+00), Stephane CHAZELAS:

Sorry, it's wrong if there are relative paths in symlinks (or
trailing newlines).

fixed_cmd_subst() {
eval '
'"$1"'=$('"$2"'; ret=$?; echo .; exit "$ret")
set -- "$1" "$?"
'"$1"'=${'"$1"'%??}
'
return "$2"
}

readlink_f() (
link=$1 max_iterations=40
while [ "$max_iterations" -gt 0 ]; do
max_iterations=$(($max_iterations - 1))

fixed_cmd_subst dir 'dirname -- "$link"' || exit
fixed_cmd_subst base 'basename -- "$link"' || exit
cd -P -- "$dir" || exit
link=${PWD%/}/$base


if [ ! -L "$link" ]; then
printf '%s\n' "$link"
exit
fi

fixed_cmd_subst link 'ls -ld -- "$link"' || exit

Steven W. Orr

unread,
Aug 9, 2011, 9:50:49 AM8/9/11
to bug-...@gnu.org
On 8/9/2011 5:29 AM, Bernd Eggink wrote:
> On 09.08.2011 03:44, Jon Seymour wrote:
>> Has anyone ever come across an equivalent to Linux's readlink -f that
>> is implemented purely in bash?
>
> You can find my version here:
>
> http://sudrala.de/en_d/shell-getlink.html
>
> As it contains some corrections from Greg Wooledge, it should handle even
> pathological situations. ;)
>
> Bernd
>

I'd just like to make a couple of suggestions for your script (I hope these
are welcome):

*) You reset OPTIND to 1 but you didn't declare it local. This will cause any
caller of getlink which uses getopts to reset its variable to 1. (I mention
this because it cost me a couple of hours a while back.)

When calling getopts, especially from a function that is intended to not be
used at a top level for processing command line options, you should declare
local copies of OPTIND, OPTARG and OPTERR.

*) To remove the trailing slashes, instead of

while [[ $file == */ ]]
do
file=${file%/}
done

file=${file##*/} # file name

just say
file="${file%${file##*[!/]}}"

*) Instead of

[[ ! -d $dir ]] &&
{
ret=1
break
}

how about this for slightly cleaner?

[[ -d $dir ]] ||
{
ret=1
break
}

--
Time flies like the wind. Fruit flies like a banana. Stranger things have .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net

Jon Seymour

unread,
Aug 9, 2011, 10:50:01 AM8/9/11
to Bernd Eggink, bug-...@gnu.org
On Tue, Aug 9, 2011 at 7:29 PM, Bernd Eggink <mon...@sudrala.de> wrote:
> On 09.08.2011 03:44, Jon Seymour wrote:
>>
>> Has anyone ever come across an equivalent to Linux's readlink -f that
>> is implemented purely in bash?
>
> You can find my version here:
>
>        http://sudrala.de/en_d/shell-getlink.html
>
> As it contains some corrections from Greg Wooledge, it should handle even
> pathological situations. ;)
>
> Bernd
>

Thanks for that. ${link##*-> } is a neater way to extract the link.

It does seem that a link create like so: ln -sf "a -> b" c is going to
create problems for both your script and mine [ not that I actually
care about such a perverse case :-) ]

jon.

Stephane CHAZELAS

unread,
Aug 9, 2011, 10:54:43 AM8/9/11
to
2011-08-09, 09:50(-04), Steven W. Orr:
[...]

> *) To remove the trailing slashes, instead of
>
> while [[ $file == */ ]]
> do
> file=${file%/}
> done
>
> file=${file##*/} # file name
>
> just say
> file="${file%${file##*[!/]}}"
[...]

file=${file%"${file##*[!/]}"}

Same problem with "/" being changed to "" though.

--
Stephane

Bernd Eggink

unread,
Aug 10, 2011, 6:00:22 AM8/10/11
to bug-...@gnu.org
On 09.08.2011 15:50, Steven W. Orr wrote:

> On 8/9/2011 5:29 AM, Bernd Eggink wrote:
>> On 09.08.2011 03:44, Jon Seymour wrote:
>>> Has anyone ever come across an equivalent to Linux's readlink -f
>>> that is implemented purely in bash?
>>
>> You can find my version here:
>>
>> http://sudrala.de/en_d/shell-getlink.html
>>
>> As it contains some corrections from Greg Wooledge, it should
>> handle even pathological situations. ;)
>>
>> Bernd
>>
>
> I'd just like to make a couple of suggestions for your script (I hope
> these are welcome):

You are welcome!

> *) You reset OPTIND to 1 but you didn't declare it local. This will
> cause any caller of getlink which uses getopts to reset its variable
> to 1. (I mention this because it cost me a couple of hours a while
> back.)

The reason I didn't declare OPTIND local is that OPTIND is handled
specially by the shell; there is always exactly _one_ instance of this
variable. In other words, OPTIND is always global, even if declared
local (which is indeed pretty weird). Try this:

-------------------------------
function f
{
local OPTIND=1

echo "\$1=$1"
}

while getopts "abcdefg" opt
do
echo "opt=$opt"
f $opt
done
--------------------------------

Calling the sript like this works fine:
script -a -b -c

But calling it like this leads to an endless loop:
script -abc

One could of course save and restore the original:

-------------------------------
function f
{
local oldind=$OPTIND

OPTIND=1
echo "\$1=$1"
OPTIND=$oldind
}
-------------------------------

However, this also loops endlessly. The reason is most likely that bash
maintains an additional internal variable holding the index of the
current character, relative to the current word. While this variable is
not directly accessible by the user, it is set to 0 whenever OPTIND is
assigned a value.

So the only safe way is to _never_ use getopts within another getopts
block, but always wait until the first one has finished.

> When calling getopts, especially from a function that is intended to
> not be used at a top level for processing command line options, you
> should declare local copies of OPTIND, OPTARG and OPTERR.
>

> *) To remove the trailing slashes, instead of
>
> while [[ $file == */ ]] do file=${file%/} done
>
> file=${file##*/} # file name
>
> just say file="${file%${file##*[!/]}}"

Yes, you can do that, but I find my version a bit more legible. Also,
for file=/ it returns a single slash, while yours returns an empty
string. (Hmm... the next statement in my script also creates an empty
string, but this is a bug and will be fixed).

> *) Instead of
>
> [[ ! -d $dir ]] && { ret=1 break }
>
> how about this for slightly cleaner?
>
> [[ -d $dir ]] || { ret=1 break }

I think that's just a matter of taste.

Greetings,
Bernd

--
http://sudrala.de

Stephane CHAZELAS

unread,
Aug 10, 2011, 6:47:59 AM8/10/11
to
2011-08-10, 12:00(+02), Bernd Eggink:
[...]

> function f
> {
> local OPTIND=1
>
> echo "\$1=$1"
> }
>
> while getopts "abcdefg" opt
> do
> echo "opt=$opt"
> f $opt
> done
> --------------------------------
>
> Calling the sript like this works fine:
> script -a -b -c
>
> But calling it like this leads to an endless loop:
> script -abc
[...]

> However, this also loops endlessly. The reason is most likely that bash
> maintains an additional internal variable holding the index of the
> current character, relative to the current word. While this variable is
> not directly accessible by the user, it is set to 0 whenever OPTIND is
> assigned a value.
[...]

That would be a bug in bash in my opinion. If OPTIND is marked
local to the function, it shouldn't affect the behavior of
parent contexts.

Note that that bug is also in ksh93, pdksh, mksh and posh
(though slightly different in that one), but not in ash nor zsh.

Note that if you set OPTIND local, you probably want to do the
same for OPTARG (and maybe OPTERR).

--
Stephane

Clark J. Wang

unread,
Aug 11, 2011, 2:55:22 AM8/11/11
to Bernd Eggink, bug-...@gnu.org
On Wed, Aug 10, 2011 at 6:00 PM, Bernd Eggink <mon...@sudrala.de> wrote:

> On 09.08.2011 15:50, Steven W. Orr wrote:
>
> *) You reset OPTIND to 1 but you didn't declare it local. This will
>> cause any caller of getlink which uses getopts to reset its variable
>> to 1. (I mention this because it cost me a couple of hours a while
>> back.)
>>
>
> The reason I didn't declare OPTIND local is that OPTIND is handled
> specially by the shell; there is always exactly _one_ instance of this
> variable. In other words, OPTIND is always global, even if declared local
> (which is indeed pretty weird). Try this:
>
>

I always declare OPTIND as local. I didn't know it does not work at all.
Bug?


> ------------------------------**-


> function f
> {
> local OPTIND=1
>
> echo "\$1=$1"
> }
>
> while getopts "abcdefg" opt
> do
> echo "opt=$opt"
> f $opt
> done

> ------------------------------**--


>
> Calling the sript like this works fine:
> script -a -b -c
>
> But calling it like this leads to an endless loop:
> script -abc
>

> One could of course save and restore the original:
>

> ------------------------------**-


> function f
> {
> local oldind=$OPTIND
>
> OPTIND=1
> echo "\$1=$1"
> OPTIND=$oldind
> }

> ------------------------------**-


>
> However, this also loops endlessly. The reason is most likely that bash
> maintains an additional internal variable holding the index of the current
> character, relative to the current word. While this variable is not directly
> accessible by the user, it is set to 0 whenever OPTIND is assigned a value.
>

Stephane CHAZELAS

unread,
Aug 9, 2011, 5:24:40 AM8/9/11
to
2011-08-9, 11:44(+10), Jon Seymour:
> Has anyone ever come across an equivalent to Linux's readlink -f that
> is implemented purely in bash?
>
> (I need readlink's function on AIX where it doesn't seem to be available).
[...]

What about:

readlink_f() (
link=$1 max_iterations=40
while [ "$max_iterations" -gt 0 ]; do
max_iterations=$(($max_iterations - 1))
dir=$(dirname -- "$link") || exit
base=$(basename -- "$link") || exit
dir=$(cd -P -- "$dir" && pwd -P) || exit
link=${dir%/}/$base
if [ ! -L "$link" ]; then
printf '%s\n' "$link"
exit
fi
link=$(ls -ld -- "$link") || exit
link=${link#* -> }
done
printf >&2 'Loop detected\n'
exit 1
)

--
Stephane

Stephane CHAZELAS

unread,
Aug 9, 2011, 6:10:48 AM8/9/11
to
2011-08-9, 09:24(+00), Stephane CHAZELAS:

Sorry, it's wrong if there are relative paths in symlinks (or
trailing newlines).

fixed_cmd_subst() {
eval '
'"$1"'=$('"$2"'; ret=$?; echo .; exit "$ret")
set -- "$1" "$?"
'"$1"'=${'"$1"'%??}
'
return "$2"
}

readlink_f() (


link=$1 max_iterations=40
while [ "$max_iterations" -gt 0 ]; do
max_iterations=$(($max_iterations - 1))

fixed_cmd_subst dir 'dirname -- "$link"' || exit
fixed_cmd_subst base 'basename -- "$link"' || exit
cd -P -- "$dir" || exit

link=${PWD%/}/$base


if [ ! -L "$link" ]; then
printf '%s\n' "$link"
exit
fi

fixed_cmd_subst link 'ls -ld -- "$link"' || exit

Stephane CHAZELAS

unread,
Aug 9, 2011, 6:02:27 AM8/9/11
to
2011-08-09, 11:29(+02), Bernd Eggink:
[...]

> function getlink # ([-l] path)

Why use the ksh syntax instead of the standard one?

> {
> # Path of the file a symbolic link is pointing to.
> # -l: follow link chain, print last target
> # no option: print 1st target unchanged
>
> typeset dir file last link opt oldPWD=$PWD ret=0
>
> (( OPTIND = 1 ))
>
> while getopts "l" opt
> do
> case $opt in
> (l) last=1
> ;;
> esac
> done
>
> shift $(( OPTIND - 1 ))
> file=$1
>
> if [[ $last ]] # last link
> then
> while true
> do
> dir=$(dirname "$file")

What if $file starts with "-"?

> [[ ! -d $dir ]] &&
> {
> ret=1
> break
> }
>

> # remove slashes at end
>

> while [[ $file == */ ]]
> do
> file=${file%/}
> done

What if $file is "/"

> file=${file##*/} # file name

Stephane CHAZELAS

unread,
Aug 10, 2011, 6:47:58 AM8/10/11
to
2011-08-10, 12:00(+02), Bernd Eggink:
[...]
> function f
> {
> local OPTIND=1
>
> echo "\$1=$1"
> }
>
> while getopts "abcdefg" opt
> do
> echo "opt=$opt"
> f $opt
> done
> --------------------------------
>
> Calling the sript like this works fine:
> script -a -b -c
>
> But calling it like this leads to an endless loop:
> script -abc
[...]

> However, this also loops endlessly. The reason is most likely that bash
> maintains an additional internal variable holding the index of the
> current character, relative to the current word. While this variable is
> not directly accessible by the user, it is set to 0 whenever OPTIND is
> assigned a value.

Stephane CHAZELAS

unread,
Aug 9, 2011, 10:54:43 AM8/9/11
to
2011-08-09, 09:50(-04), Steven W. Orr:
[...]
> *) To remove the trailing slashes, instead of
>
> while [[ $file == */ ]]
> do
> file=${file%/}
> done
>
> file=${file##*/} # file name
>
> just say
> file="${file%${file##*[!/]}}"
[...]

file=${file%"${file##*[!/]}"}

Same problem with "/" being changed to "" though.

--
Stephane

Bernd Eggink

unread,
Aug 11, 2011, 2:55:30 PM8/11/11
to bug-...@gnu.org

This has been fixed. Also getlink() now outputs paths always in
normalized form. Any feedback appreciated!

Bernd

--
http://sudrala.de

Clark J. Wang

unread,
Jan 10, 2012, 2:55:19 AM1/10/12
to Stephane CHAZELAS, Chet Ramey, Bash Bug - ML
Seems like ksh93 (tested with version 93u 2011-02-08) implicitly declares
OPTIND and OPTARG in functions defined in the `function NAME {}' syntax and
everything works fine. But if OPTIND or OPTARG are explicitly declared as
local it may not work as expected.

Wish Chet would consider fixing this problem in future bash releases. :)

Chet Ramey

unread,
Jan 10, 2012, 7:55:51 AM1/10/12
to dear...@gmail.com, stephane...@yahoo.fr, bug-...@gnu.org, ch...@po.cwru.edu, chet....@case.edu
> > > However, this also loops endlessly. The reason is most likely that bash
> > > maintains an additional internal variable holding the index of the
> > > current character, relative to the current word. While this variable is
> > > not directly accessible by the user, it is set to 0 whenever OPTIND is
> > > assigned a value.
> > [...]
> >
> > That would be a bug in bash in my opinion. If OPTIND is marked
> > local to the function, it shouldn't affect the behavior of
> > parent contexts.
> >
> > Note that that bug is also in ksh93, pdksh, mksh and posh
> > (though slightly different in that one), but not in ash nor zsh.
> >
>
> Seems like ksh93 (tested with version 93u 2011-02-08) implicitly declares
> OPTIND and OPTARG in functions defined in the `function NAME {}' syntax and
> everything works fine. But if OPTIND or OPTARG are explicitly declared as
> local it may not work as expected.
>
> Wish Chet would consider fixing this problem in future bash releases. :)

Bash doesn't have the `posix' and `non-posix' distinction between function
syntax that ksh93 makes. All variables are global unless explicitly
declared as local. The other thing that Posix requires is that setting
OPTIND to 1 resets getopts' internal state. I'll take another look and
see if I can work around the issue: resetting getopts when returning from
the shell function restores the value `1' (since we're still processing
the first argument), which causes getopts' internal state to reset.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/

Sung Pae

unread,
Jan 11, 2012, 3:31:42 AM1/11/12
to bug-...@gnu.org
Hello,

There is a small bug in the default completion system due to a
disagreement between a few functions in what is considered a command
delimiter. I have tested this on the current devel HEAD in the git repo
(d00a2d6).

Define a default completion:

_completion_loader() {
complete -W "COMPLETE ME" "$1" && return 124
}

complete -F _completion_loader -D

Try to trigger completion for ">" (^I is literal tab):

$ > ^I^I
$ complete -p
complete -F _completion_loader -D
complete -W 'COMPLETE ME' >

Check to make sure things are working:

$ ls ^I^I
COMPLETE ME
$ complete -p
complete -F _completion_loader -D
complete -W 'COMPLETE ME' >
complete -W 'COMPLETE ME' ls

Works just fine for "ls". Now, set a COMPSPEC for the empty string, "":

$ complete -W "EMPTY STRING" ""

And try to complete ">" again:

$ > ^I^I
EMPTY STRING
$ complete -p
complete -W 'EMPTY STRING'
complete -F _completion_loader -D
complete -W 'COMPLETE ME' >
complete -W 'COMPLETE ME' ls

It works this time, but only because it is loading the COMPSPEC for the
empty string, instead of for ">".

The reason for this is that gen_progcomp_completions() first searches
the prog_completes table with the results from find_cmd_name() [1],
which given a command line "> ", returns "" (the empty string).

Failing this, it will call the default completion function
_completion_loader(), and pass it a word list composed from the results
of command_line_to_word_list(), which when given the _same_ command line
"> ", returns the list [">", ""].

Our default completion function then faithfully inserts a COMPSPEC into
prog_completes for the key ">" and returns 124 to restart the completion
loop. However, since the prog_completes table is never looked up with
the key ">", it is never actually found.

Of course, once a COMPSPEC for the empty string is inserted into the
table (which can also be done by trying to complete an empty command
line), completing ">" works as if one was completing the empty string,
which will most likely be assigned the same dynamic completion, thus
having the unfortunate appearance of suddenly working as intended.

I have not attached a patch to this bug report, because I know
the subtle differences of rl_basic_word_break_characters,
COMMAND_SEPARATORS, and the value embedded in find_cmd_name(), are
important, and that any changes, for better or worse, are scrupulously
recorded for ALL TIME. I have implemented a couple of solutions, but who
am _I_ to say what is a command delimiter?

So while a proper fix is complicated, there is a simple workaround
that can be implemented in userland: simply define the completion for
the empty string ("") for the keys that slip between the cracks of the
different command delimiter lists (i.e. "<", ">", ">>"):

complete -o default ""

Cheers,
Sung Pae


[1]: Not directly, but by provenance. These are the stack traces for the
functions progcomp_search() and progcomp_insert():

(HASH SEARCH) (HASH INSERTION)
0 progcomp_search + 39 0 progcomp_insert + 101
1 gen_progcomp_completions + 40 1 complete_builtin + 943
2 programmable_completions + 115 2 execute_builtin + 500
3 attempt_shell_completion + 1201 3 execute_command_internal + 8420
4 gen_completion_matches + 58 4 execute_command + 138
5 rl_complete_internal + 294 5 execute_command_internal + 16943
6 _rl_dispatch_subseq + 623 6 execute_command_internal + 14939
7 readline_internal_char + 267 7 execute_function + 1339
8 readline + 101 8 execute_shell_function + 135
9 yy_readline_get + 185 9 gen_compspec_completions + 1107
10 shell_getc + 490 10 gen_progcomp_completions + 149
11 read_token + 525 11 programmable_completions + 261
12 yyparse + 1189 12 attempt_shell_completion + 1201
13 parse_command + 184 13 gen_completion_matches + 58
14 read_command + 200 14 rl_complete_internal + 294
15 reader_loop + 355 15 _rl_dispatch_subseq + 623
16 main + 6292 16 readline_internal_char + 267
17 start + 52 17 readline + 101
18 0x0 + 2 18 yy_readline_get + 185
19 shell_getc + 490
20 read_token + 525
21 yyparse + 1189
22 parse_command + 184
23 read_command + 200
24 reader_loop + 355
25 main + 6292
26 start + 52
27 0x0 + 2


Chet Ramey

unread,
Jan 16, 2012, 12:29:37 PM1/16/12
to Sung Pae, bug-...@gnu.org, chet....@case.edu
On 1/11/12 3:31 AM, Sung Pae wrote:
> Hello,
>
> There is a small bug in the default completion system due to a
> disagreement between a few functions in what is considered a command
> delimiter. I have tested this on the current devel HEAD in the git repo
> (d00a2d6).

I think you've uncovered a potential problem, but not the one you think.
The important thing is that `>' is not a command separator and cannot be
a command name. As such, it kind of slips between the cracks of the
completion system.
That's because `>' is not part of a command name. It *ends* a command
name. It's a shell metacharacter, and unquoted metacharacters separate
words.

>
> Failing this, it will call the default completion function
> _completion_loader(), and pass it a word list composed from the results
> of command_line_to_word_list(), which when given the _same_ command line
> "> ", returns the list [">", ""].

Yes, those are separate words.

> Our default completion function then faithfully inserts a COMPSPEC into
> prog_completes for the key ">" and returns 124 to restart the completion
> loop. However, since the prog_completes table is never looked up with
> the key ">", it is never actually found.

True, the completion system will allow you to define a compspec for
something that will never be used as a command name.

> Of course, once a COMPSPEC for the empty string is inserted into the
> table (which can also be done by trying to complete an empty command
> line), completing ">" works as if one was completing the empty string,
> which will most likely be assigned the same dynamic completion, thus
> having the unfortunate appearance of suddenly working as intended.

Completing `>' *is* the same as completing the empty string, because `>'
is not a command name, or part of a command name. If you think about
how the shell parser operates, a sequence of characters like "foo>bar"
is three words: the command name `foo', the operator `>' and the word
`bar'. Removing `foo' doesn't change the meaning of `>' and `bar'.

>
> I have not attached a patch to this bug report, because I know
> the subtle differences of rl_basic_word_break_characters,
> COMMAND_SEPARATORS, and the value embedded in find_cmd_name(), are
> important, and that any changes, for better or worse, are scrupulously
> recorded for ALL TIME. I have implemented a couple of solutions, but who
> am _I_ to say what is a command delimiter?

Well, Posix says what is and what is not a command delimiter. Let's see
if I can explain what these do a little better.

Readline has its own idea about completion: it breaks the command line
into units on which the user might find it convenient to perform
completion. That's why characters like `:' and `@' are in readline's word
break characters when they are not shell metacharacters.

find_cmd_start and find_cmd_end are intended to find the boundaries of the
command containing the word to be completed. That means find_cmd_start
uses characters after which it might be reasonable to find a command name.
That's why `{' appears in the value of COMMAND_SEPARATORS even though it's
not a shell operator.

find_cmd_end is where we might have a potential problem. Like _start, it
should include characters that can reasonably end a command, so it should
include `)' instead of `('. It all works out ok in the end, since `(' is
a character that starts the next command instead of one that ends the
current one, but there's potential improvement there.

find_cmd_name is different. It uses word boundaries, not command
boundaries. That's why it uses the shell break characters as delimiters
(look at syntax.h for the value of shell_break_characters), which are
those characters that delimit tokens when the shell is parsing input.

> So while a proper fix is complicated, there is a simple workaround
> that can be implemented in userland: simply define the completion for
> the empty string ("") for the keys that slip between the cracks of the
> different command delimiter lists (i.e. "<", ">", ">>"):

While we can talk about whether or not it's reasonable to extend the
completion system to incorporate redirections and other shell operators,
there's no bug in the way things currently work.

Sung Pae

unread,
Jan 16, 2012, 3:54:56 PM1/16/12
to chet....@case.edu, bug-...@gnu.org
On 16 Jan 2012, at 11:29 AM, Chet Ramey wrote:

> I think you've uncovered a potential problem, but not the one you
> think. The important thing is that `>' is not a command separator and
> cannot be a command name. As such, it kind of slips between the cracks
> of the completion system.

> > Of course, once a COMPSPEC for the empty string is inserted into
> > the table (which can also be done by trying to complete an empty
> > command line), completing ">" works as if one was completing the
> > empty string,

> Completing `>' *is* the same as completing the empty string, because
> `>' is not a command name, or part of a command name. If you think
> about how the shell parser operates, a sequence of characters like
> "foo>bar" is three words: the command name `foo', the operator `>' and
> the word `bar'. Removing `foo' doesn't change the meaning of `>' and
> `bar'.

I see. Trying to complete the command line:

$ >^I

is the same as completing:

$ "">^I

I have mistaken `>` for the command.

> > I have not attached a patch to this bug report, because I know
> > the subtle differences of rl_basic_word_break_characters,
> > COMMAND_SEPARATORS, and the value embedded in find_cmd_name(), are
> > important,

> Well, Posix says what is and what is not a command delimiter. Let's
> see if I can explain what these do a little better.

Thank you for the explanation; it's clear now. I had a suspicion the
reasoning for this was deeper than I imagined.

> While we can talk about whether or not it's reasonable to extend
> the completion system to incorporate redirections and other shell
> operators, there's no bug in the way things currently work.

The only issue then is that completing a bare `<`, `>` and `>>` results
in the default completion command receiving the redirection operator as
its first argument. The manual, however, states (my emphasis):

For instance, assuming that there is a library of compspecs,
each kept in a file corresponding to the *name of the command*,
the following default completion function would load completions
dynamically:

_completion_loader()
{
. "/etc/bash_completion.d/$1.sh" >/dev/null 2>&1 && return 124
}
complete -D -F _completion_loader

Is it not reasonable then that _completion_loader should receive the
empty string instead of the redirection operator in this corner case?

Cheers,
Sung Pae


Chet Ramey

unread,
Jan 17, 2012, 3:25:30 PM1/17/12
to Sung Pae, bug-...@gnu.org, chet....@case.edu
On 1/16/12 3:54 PM, Sung Pae wrote:

> The only issue then is that completing a bare `<`, `>` and `>>` results
> in the default completion command receiving the redirection operator as
> its first argument. The manual, however, states (my emphasis):
>
> For instance, assuming that there is a library of compspecs,
> each kept in a file corresponding to the *name of the command*,
> the following default completion function would load completions
> dynamically:
>
> _completion_loader()
> {
> . "/etc/bash_completion.d/$1.sh" >/dev/null 2>&1 && return 124
> }
> complete -D -F _completion_loader
>
> Is it not reasonable then that _completion_loader should receive the
> empty string instead of the redirection operator in this corner case?

That's an excellent point. It's a case of these corner cases violating
one of the assumptions the programmable completion code makes. I know
how to fix it -- the correct information exists, and is used to search
for the right compspec. That null command just needs to be passed to
the right function. This fix will be in the next version of bash.
0 new messages