Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

brace expansion

41 views
Skip to first unread message

John Doe

unread,
Jul 6, 2015, 3:23:20 PM7/6/15
to
When I do the following in bash and ksh

aa=AA ; ab=AB ; ac=AC
echo $a{b,c,d}

I get different results:

bash: AA AB AC
ksh: a b c

In bash manual I can read that "Brace expansion is performed before any
other expansions, and any characters special to other expansions are
preserved in the result. It is strictly textual.". On the other hand in
the book "Learning the Korn shell" I can read that brace expansion is
performed much later, right before splitting into words.

Is this the magic moment I need to abandon the idea of portability in
broader sense? I mean, bash and ksh both support this feature but they
do it at different stages (maybe even in different way internally, I
don't know).

Is this something that goes along with learning shell scripting, ie.
read "fine points" and don't assume the same result even the same
feature is supported? Or maybe my example misled me again and everything
was OK, I just need to learn more not to be tricked by behavior like the
one above?

John

John Doe

unread,
Jul 6, 2015, 3:30:52 PM7/6/15
to
On 06.07.2015 21:23, John Doe wrote:
> When I do the following in bash and ksh
>
> aa=AA ; ab=AB ; ac=AC
> echo $a{b,c,d}

^^^^^^^^^
Should be: $a{a,b,c}

Barry Margolin

unread,
Jul 6, 2015, 3:41:50 PM7/6/15
to
In article <559ad5a3$0$55333$b1db1813$7946...@news.astraweb.com>,
John Doe <john.doe@notpresent> wrote:

> When I do the following in bash and ksh
>
> aa=AA ; ab=AB ; ac=AC
> echo $a{b,c,d}
>
> I get different results:
>
> bash: AA AB AC
> ksh: a b c
>
> In bash manual I can read that "Brace expansion is performed before any
> other expansions, and any characters special to other expansions are
> preserved in the result. It is strictly textual.". On the other hand in
> the book "Learning the Korn shell" I can read that brace expansion is
> performed much later, right before splitting into words.

There was a long thread just last week about the quirks in brace
expansion, and it was noted that it's different in different shells.

> Is this the magic moment I need to abandon the idea of portability in
> broader sense? I mean, bash and ksh both support this feature but they
> do it at different stages (maybe even in different way internally, I
> don't know).

brace expansion is not a POSIX feature, it's an extension that a number
of shells have added. But they each did it a little differently.

If you want portability, stick to standard features, not extensions.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

John Doe

unread,
Jul 6, 2015, 6:45:22 PM7/6/15
to
On 06.07.2015 21:41, Barry Margolin wrote:
> There was a long thread just last week about the quirks in brace
> expansion, and it was noted that it's different in different shells.
>
> brace expansion is not a POSIX feature, it's an extension that a number
> of shells have added. But they each did it a little differently.
>
> If you want portability, stick to standard features, not extensions.

Indeed there was, thanks!

John

Martijn Dekker

unread,
Jul 6, 2015, 8:26:21 PM7/6/15
to
In article <559ad5a3$0$55333$b1db1813$7946...@news.astraweb.com>,
John Doe <john.doe@notpresent> wrote:

> Is this the magic moment I need to abandon the idea of portability in
> broader sense?

I've done a lot of portable POSIX shell programming lately. One thing I
quickly learned was not to rely on shell manuals first, as they
typically make no distinction between what's standard and what's not,
and even those that do often have inaccuracies.

Instead, I quite simply use the POSIX specification directly as a
reference manual. Anything that's specified there should be portable,
unless there are compliance bugs in the shell you're using. Those are
the cases where I do testing and look at the shell manuals.

Start here:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/contents.html

Hope this helps,

- M.

wil...@wilbur.25thandclement.com

unread,
Jul 8, 2015, 3:15:07 PM7/8/15
to
Martijn Dekker <mar...@inlv.demon.nl> wrote:
> In article <559ad5a3$0$55333$b1db1813$7946...@news.astraweb.com>,
> John Doe <john.doe@notpresent> wrote:
>
>> Is this the magic moment I need to abandon the idea of portability in
>> broader sense?

I wouldn't abandon portability. ksh and bash might predominate, but stricter
shells like ash, dash, and others are much more common than one might think.
And because of recent issues with bash, there has been, will continue to be,
and _should_ be systematic efforts to simplify and constrain shell
environments, especially in sensitive systems. Plus, of course, the default
shell on Debian is dash, not bash.

Learn how to use (and abuse) the positional parameter list, $@, and it's
interaction with the `for` statement and field splitting (i.e. $IFS).
Practice leveraging pathname expansion. And learn how to use parameter
expansion substitutions, which are immensely useful for manipulating string
without resort to sed or awk. IME these constructs, alone and in
combination, will usually suffice as a practical matter in lieu of
non-portable extensions such as arrays, brace expansion, etc. And of course
there's always eval, but I rarely resort to that hammer.

Sometimes you _will_ need to use a non-portable construct to get your job
done efficiently. But you can't be a good judge of those exigencies unless
you master the portable constructs.

FWIW, the preamble to most of my scripts starts with

set -e # strict errors
set -u # don't expand unbound variables
set -f # disable pathname expansion
set -C # noclobber

which helps to minimize bugs. But see below.

> I've done a lot of portable POSIX shell programming lately. One thing I
> quickly learned was not to rely on shell manuals first, as they
> typically make no distinction between what's standard and what's not,
> and even those that do often have inaccuracies.
>
> Instead, I quite simply use the POSIX specification directly as a
> reference manual. Anything that's specified there should be portable,
> unless there are compliance bugs in the shell you're using. Those are
> the cases where I do testing and look at the shell manuals.
>
> Start here:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/contents.html
>

I keep telling other engineers that they should have a copy of the latest
POSIX/SUSv3 specification on their desktop. It's a free download, and it's
grep'able. The specification is almost always clearer and more concise, both
for the shell as well as for system C interfaces. But almost everybody is
too lazy to do this. It's very disheartening.

Here's a list of portability issues I've run into for a single script I
maintain, and which I document at the head of the script

https://github.com/wahern/luapath/blob/master/runlua

These are all related to POSIX-defined behavior, except for the shebang.

PORTING NOTES

unset) On NetBSD (confirmed up to 6.1.5) unset NAME will exit with a
failure status if no such variable is set. If errexit (set -e) is
enabled then the shell will exit. See NetBSD PR 49595.

#!) Linux and OS X permit recursive shebang execution, which some users
might wish to take advantage of. However, neither will field-split
interpreter arguments, instead passing the remainder of the shebang line
as a single positional argument. So we manually field-split any first
argument.

Solaris (confirmed 11.1), AIX (confirmed 7.1), OpenBSD (confirmed 5.5),
NetBSD (confirmed 5.1.2, 6.1.1), and FreeBSD (confirmed 9.0) will search
for the interpreter recursively, following shebang interpreter paths
until a binary interpreter is found. But they will not add each
intervening interpreter path to the positional argument list. If you
don't know the paths you cannot execute them recursively.

$@) On some BSD shells (confirmed NetBSD 5.1.2, 6.1.1, OpenBSD 5.5)
expansion of an empty $@ will wrongly trigger an error if nounset (set
-u) is in effect.

noclobber) On some BSD shells (confirmed NetBSD 5.1.2, 6.1.1) the
noclobber (set -C) option will wrongly cause redirection to /dev/null
using the redirect operator (">") to fail. Use the appending redirect
operator (">>") as a workaround.

trap EXIT) ksh88 (confirmed AIX 7.1) wrongly executes an EXIT trap when
the calling function returns, rather than when the shell exits. Note
ksh93 does not exhibit this bug.

$@ and null IFS) ksh88 (confirmed AIX 7.1) pdksh (confirmed pdksh 5.2.14)
and pdksh derivatives (confirmed OpenBSD 5.6 ksh, NetBSD 6.1 ksh) will
expand $@ as a single field if IFS is null (set but empty). As a
workaround we set IFS to a control character when juggling paths. ksh93,
bash, and ash correctly expand $@ when IFS is null.

In general I've found portable shell programming to be quite practical. The
biggest problems I've run into involve the behavior of $@ and IFS. I've made
a concerted effort to handle path names with internal whitespace, including
embedded newlines. The only _correct_ way to do this is by using an empty
IFS, plus some hackery to preserve trailing newlines in command
substitutions. pdksh derivatives in particular have subtle bugs in this
area. Combined with `set -u` they're my biggest source of headaches.

Geoff Clare

unread,
Jul 9, 2015, 8:41:08 AM7/9/15
to
william wrote:

> I keep telling other engineers that they should have a copy of the latest
> POSIX/SUSv3 specification on their desktop. It's a free download, and it's
> grep'able.

The latest (since 2008) is SUSv4, not SUSv3.

> In general I've found portable shell programming to be quite practical. The
> biggest problems I've run into involve the behavior of $@ and IFS. I've made
> a concerted effort to handle path names with internal whitespace, including
> embedded newlines. The only _correct_ way to do this is by using an empty
> IFS, plus some hackery to preserve trailing newlines in command
> substitutions.

Newlines in filenames occur for one of three reasons:

1. by accident
2. by malicious intent
3. in order to test whether they are handled correctly

IMO, the best way to handle them in a shell script is to report them
as an error and refuse to do anything with them. The sooner files
resulting from reasons 1 and 2 are brought to someone's attention so
that they can be removed/renamed, the less likely they are to cause
misbehaviour of other scripts.

--
Geoff Clare <net...@gclare.org.uk>

wil...@wilbur.25thandclement.com

unread,
Jul 9, 2015, 4:13:48 PM7/9/15
to
Geoff Clare <ge...@clare.see-my-signature.invalid> wrote:
> william wrote:
>
>> I keep telling other engineers that they should have a copy of the latest
>> POSIX/SUSv3 specification on their desktop. It's a free download, and
>> it's grep'able.
>
> The latest (since 2008) is SUSv4, not SUSv3.

Thanks. I had my doubts. I should've confirmed rather than blithely moving
on.

>> In general I've found portable shell programming to be quite practical.
>> The biggest problems I've run into involve the behavior of $@ and IFS.
>> I've made a concerted effort to handle path names with internal
>> whitespace, including embedded newlines. The only _correct_ way to do
>> this is by using an empty IFS, plus some hackery to preserve trailing
>> newlines in command substitutions.
>
> Newlines in filenames occur for one of three reasons:
>
> 1. by accident
> 2. by malicious intent
> 3. in order to test whether they are handled correctly
>
> IMO, the best way to handle them in a shell script is to report them
> as an error and refuse to do anything with them. The sooner files
> resulting from reasons 1 and 2 are brought to someone's attention so
> that they can be removed/renamed, the less likely they are to cause
> misbehaviour of other scripts.

I'll admit that handling embedded newlines may be more of an academic
exercise[1] than a practical feature in the vast majority of cases.

OTOH, a script passing over or bailing on filenames with embedded newlines
could very well be a security issue[2]. For example, /tmp is often cleaned
out by a shell script invoked from cron. If I wanted to circumvent the
policy I'd now have a mechanism.

I reuse my file tree traversal code in other scripts, many of which are
invoked from non-interactive contexts, so keeping the implementation robust
is not a completely idle exercise.

[1] David Wheeler's request that POSIX exclude control characters from file
names spurred my interest in the issue.

http://austingroupbugs.net/view.php?id=251&nbn=51

I think that's a poor idea. (As is requiring UTF-8, for that matter, as OS X
has done.) And so I felt compelled to explore the problem in earnest.

[2] In college I discovered an intruder on the university's main user Unix
system, used for all manner of student and class activities. The intruder
hid his setuid program in plain sight under / with a filename using only one
or a few control characters. The sysadmins were insufficiently curious, and
either never saw it or couldn't be bothered to figure out how to invoke it.
After gaining root to the system, I told the sysadmins and explained how to
invoke the program from the command-line. Long story short, an intruder with
the same M.O. had hacked multiple systems on the Bell Atlantic network.
David Wheeler might say this is a perfect example of why control characters
should be disallowed. But control character are merely one way to obfuscate
file names. OS X's Unicode normalization of file names makes some files
inaccessible over network mounts, for example. I wouldn't venture to try to
enumerate all the possible exploit possibilities, especially when you
include exploits that circumvent ad hoc application policies, not merely the
traditional OS privilege systems. I'd simply prefer to focus on correctness.
An algorithm that can be proven to handle _any_ kind of file name is more
robust than one that tries to enumerate, detect, and avoid malicious file
names.

Geoff Clare

unread,
Jul 10, 2015, 8:41:08 AM7/10/15
to
william wrote:

> Geoff Clare <ge...@clare.see-my-signature.invalid> wrote:
>>
>> Newlines in filenames occur for one of three reasons:
>>
>> 1. by accident
>> 2. by malicious intent
>> 3. in order to test whether they are handled correctly
>>
>> IMO, the best way to handle them in a shell script is to report them
>> as an error and refuse to do anything with them. The sooner files
>> resulting from reasons 1 and 2 are brought to someone's attention so
>> that they can be removed/renamed, the less likely they are to cause
>> misbehaviour of other scripts.
>
> I'll admit that handling embedded newlines may be more of an academic
> exercise[1] than a practical feature in the vast majority of cases.
>
> OTOH, a script passing over or bailing on filenames with embedded newlines
> could very well be a security issue[2]. For example, /tmp is often cleaned
> out by a shell script invoked from cron. If I wanted to circumvent the
> policy I'd now have a mechanism.

Obviously there are some cases that should be exceptions to the
general rule. In the case of a /tmp cleaner, IMO the ideal design
would be that it removes files with newlines in the name that meet
its normal removal criteria, but it also checks for the existence
of any files with newlines in the name that don't meet its removal
criteria and reports them.

--
Geoff Clare <net...@gclare.org.uk>

Kenny McCormack

unread,
Jul 10, 2015, 9:33:55 AM7/10/15
to
In article <sjr27c...@wilbur.25thandClement.com>,
<wil...@wilbur.25thandClement.com> wrote:
...
>[1] David Wheeler's request that POSIX exclude control characters from file
>names spurred my interest in the issue.
>
> http://austingroupbugs.net/view.php?id=251&nbn=51
>
>I think that's a poor idea.

That's interesting. I didn't follow your reasons for not liking it (they
seem pretty convoluted and inward-looking to me).

I think it is pretty obvious and beyond any reasonable debate that if we
could do it all over again we'd do it differently - we'd follow Wheeler's
suggestion. Therefore, the only realistic arguments at this point in time
would be those based on backwards compatibility and other such concerns -
i.e., how do we get there from here?

Anyway, I have long thought that the Unix filename model was silly, so I'm
clearly in favor of Wheeler's idea. Maybe it could be done as an optional
feature - i.e., leave the default as it is, but have a mode that you could
turn on (maybe call it "secure filename mode"), with the strong
recommendation that new systems and development going forward use this
mode.

I'd be interested to hear others ideas about why they don't like this
proposal. It seems a no-brainer to me, but I may be missing something.

--
Religion is regarded by the common people as true,
by the wise as foolish,
and by the rulers as useful.

(Seneca the Younger, 65 AD)

Kaz Kylheku

unread,
Jul 10, 2015, 2:59:21 PM7/10/15
to
On 2015-07-09, <wil...@wilbur.25thandClement.com> <wil...@wilbur.25thandClement.com> wrote:
> Geoff Clare <ge...@clare.see-my-signature.invalid> wrote:
>> william wrote:
>>
>>> I keep telling other engineers that they should have a copy of the latest
>>> POSIX/SUSv3 specification on their desktop. It's a free download, and
>>> it's grep'able.
>>
>> The latest (since 2008) is SUSv4, not SUSv3.
>
> Thanks. I had my doubts. I should've confirmed rather than blithely moving
> on.

What doubts? You said you found portable programming to be practical,
right?

Maybe part of the reason for that is that you don't use the latest standard.

If you start taking advantage of the differences between the latest POSIX
and the previous POSIX, there goes your potability.

Writing to a 2008 standard creates the portability risk that your code might
not work in places that do not implement the 2008 standard, or not fully.

Kaz Kylheku

unread,
Jul 10, 2015, 3:08:45 PM7/10/15
to
On 2015-07-10, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <sjr27c...@wilbur.25thandClement.com>,
> <wil...@wilbur.25thandClement.com> wrote:
> ...
>>[1] David Wheeler's request that POSIX exclude control characters from file
>>names spurred my interest in the issue.
>>
>> http://austingroupbugs.net/view.php?id=251&nbn=51
>>
>>I think that's a poor idea.
>
> That's interesting. I didn't follow your reasons for not liking it (they
> seem pretty convoluted and inward-looking to me).

I have long thought about a Linux kernel patch which banishes garbage
from file names. That would be the way to go with this: make a distro
around such a kernel and see how it goes.

> I think it is pretty obvious and beyond any reasonable debate that if we
> could do it all over again we'd do it differently - we'd follow Wheeler's
> suggestion. Therefore, the only realistic arguments at this point in time
> would be those based on backwards compatibility and other such concerns -
> i.e., how do we get there from here?

See above: just do it, get a distro going, and take it from there.

Maybe we reach the point where you can say:

"Hey look, the 'no stupid chars in files' distro is seven years old now,
and has tons of users. The sky hasn't fallen."

wil...@wilbur.25thandclement.com

unread,
Jul 10, 2015, 4:45:06 PM7/10/15
to
Kaz Kylheku <k...@kylheku.com> wrote:
> On 2015-07-09, <wil...@wilbur.25thandClement.com> <wil...@wilbur.25thandClement.com> wrote:
>> Geoff Clare <ge...@clare.see-my-signature.invalid> wrote:
>>> william wrote:
>>>
>>>> I keep telling other engineers that they should have a copy of the
>>>> latest POSIX/SUSv3 specification on their desktop. It's a free
>>>> download, and it's grep'able.
>>>
>>> The latest (since 2008) is SUSv4, not SUSv3.
>>
>> Thanks. I had my doubts. I should've confirmed rather than blithely
>> moving on.
>
> What doubts? You said you found portable programming to be practical,
> right?

Portable != standards-compliant. I mention that upfront because your
derisive tone seems premised on that conflation.

> Maybe part of the reason for that is that you don't use the latest
> standard.
>
> If you start taking advantage of the differences between the latest POSIX
> and the previous POSIX, there goes your potability.

The apparent paradox that moving from one POSIX standard to another could
cause programs to cease functioning on extant systems should be a strong
hint that your definition of portability is flawed.

> Writing to a 2008 standard creates the portability risk that your code
> might not work in places that do not implement the 2008 standard, or not
> fully.

I don't write _to_ the 2008 standard, or any other standard. I use the most
recent standard as my principal reference for Unix programming. The standard
is generally clearer, and easier to read and navigate, than vendor
references. And while I know of no Unix system that is _actually_ 100%
compliant with any version of the standard--certification
notwithstanding--they are all asymptotically approaching compliance. When
taking aim at a moving target, you lead it.

fdopendir was only added to OS X with release 10.10. But I certainly didn't
hestiate to make use of fdopendir in my projects merely because it wasn't
available in OS X at the time. My use of fdopendir didn't harm portability
as it was trivially emulated. Even even if it weren't, portableness isn't a
binary characteristic.

I regularly test my projects on recent releases of AIX, FreeBSD,
Linux/glibc, Linux/musl, Minix, NetBSD, OpenBSD, OS X, and Solaris. For
shell programming I endeavor that my code works with /bin/sh on those
systems. Of course, sometimes I can only test on the subset that provide
equivalent interfaces, for example BSD kqueue, Linux epoll, and Solaris
Ports.

wil...@wilbur.25thandclement.com

unread,
Jul 10, 2015, 5:30:08 PM7/10/15
to
Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <sjr27c...@wilbur.25thandClement.com>,
> <wil...@wilbur.25thandClement.com> wrote:
> ...
>>[1] David Wheeler's request that POSIX exclude control characters from file
>>names spurred my interest in the issue.
>>
>> http://austingroupbugs.net/view.php?id=251&nbn=51
>>
>>I think that's a poor idea.
>
> That's interesting. I didn't follow your reasons for not liking it (they
> seem pretty convoluted and inward-looking to me).
<snip>
> I'd be interested to hear others ideas about why they don't like this
> proposal. It seems a no-brainer to me, but I may be missing something.

The discussion in the thread at that link enumerates some counterarguments.

In version 5.1 of the PUC Lua interpreter the authors included a bytecode
verifier, as scripts could be loaded either as precompiled bytecode or as
source code. This verifier was removed completely in 5.2 after several
exploits were published that permitted circumventing the verifier, providing
a path to exploiting bugs in the VM. Rather than attempting to improve the
verifier, it was thought preferable to remove the complexity of the bytecode
verifier from the implementation, and focus on the correctness of the VM.

For substantially similar reasons, that's why I think focusing on removing
control characters from file names is misguided.

Think removeable media, network mounts, legacy systems, and simply the fact
that in many, perhaps most cases, code which processes file names can often
be fed input derived from sources other than simply readdir--e.g. pipes,
files, etc. In other words, it's like placing an impervious, indestructible
stick at the battle front where previous soldiers passed, and daring the
enemy to tear it down. They may simply choose to go around it.

Another change for Lua 5.2 was that the script loading routines would no
longer load bytecode by default. Instead, through a parameter you had to
explicitly enable the bytecode loading capability. Unfortunately, this
approach is impractical in the context of the Unix API, because as discussed
in the thread you can't change the behavior of readdir, etc. And there
should be some showing that a substantial body of sensitive and exploitable
code (and of exploitable shell code in particular) cannot be fed file names
by some other route. By contrast, there's precisely one way to load code
into the Lua interpreter, and the Lua authors have never been afraid to
break backward compatibility.

Finally, the increased use of Unicode has and will continue to introduce
many additional problems and exploits. For the most part the proposal only
addresses issues in the context of shell programming, whereas in most other
contexts control characters are either not problematic, or if they are then
codes 1-31 are only the tip of the iceberg of text processing and display
issues.

In short, focusing on control characters is a mostly useless, if not
harmful, diversion.

Janis Papanagnou

unread,
Jul 10, 2015, 6:00:58 PM7/10/15
to
When I made a similar proposal a couple years ago in an Usenet discussion
I had been made aware (IIRC) that the current definition allows to support
all kinds of file systems, and in all possible locales with all possible
character sets. The accompanied statement was (IIRC) that any abstraction
should be done on a higher layer. - I'm not convinced, though.

Janis

Wayne

unread,
Jul 10, 2015, 9:00:21 PM7/10/15
to
The proper way to deal with an external (untrusted) string of bytes
that are to be interpreted as characters is well documented, and
several programming libraries exist for this (such as ICU). I was
wondering if there was a way of safely processing such data, such
as filenames externally created, from the shell? (I don't think
it would necessarily be a good idea, but academically, is is
possible?)

For those who are curious, you can read about it
at <https://www.securecoding.cert.org>:

First the bytes are normalized as Unicode in some assumed encoding.
This is done by removing any bytes that aren't part of valid Unicode
sequences. The result is then converted to a canonical form.
Then the sanitized, normalized data is validated. Best practice
is to check the sequence of characters against an allowed
whitelist. (Depending on your needs, you can abort on any
non-safe text detected, or remove non-safe text and continue.)

Finally, for any text to be used in a log file, the text should
be encoded one way or another if if contains any non-ASCII printable
characters.

All that is rather a lot to ask for in a shell script, but I was
wondering if it were possible at all, perhaps using iconv and
other utilities?

The complexity of doing this properly is an argument in favor of
having POSIX do as much as possible for the user. Personally,
I would be in favor of defining file names as UTF-8 (without BOM)
character strings, normalized as NFC, with illegal names either
sanitized (by stripping or encoding them) or treated as illegal.
Note that would also take care of existing files with (now)
illegal names.

(Yeah I know it will never happen, this almost certainly would
cause more problems than it would solve, but it's fun to dream.)

--
Wayne

Dan Espen

unread,
Jul 10, 2015, 9:14:14 PM7/10/15
to
Wayne <nos...@all.invalid> writes:

> All that is rather a lot to ask for in a shell script, but I was
> wondering if it were possible at all, perhaps using iconv and
> other utilities?

Simply do shell testing inside Emacs.
You will see all the funny characters.

--
Dan Espen

Kaz Kylheku

unread,
Jul 10, 2015, 9:49:14 PM7/10/15
to
On 2015-07-10, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <sjr27c...@wilbur.25thandClement.com>,
> <wil...@wilbur.25thandClement.com> wrote:
> ...
>>[1] David Wheeler's request that POSIX exclude control characters from file
>>names spurred my interest in the issue.
>>
>> http://austingroupbugs.net/view.php?id=251&nbn=51
>>
>>I think that's a poor idea.
>
> That's interesting. I didn't follow your reasons for not liking it (they
> seem pretty convoluted and inward-looking to me).
>
> I think it is pretty obvious and beyond any reasonable debate that if we
> could do it all over again we'd do it differently - we'd follow Wheeler's
> suggestion.

By the way, it's probably worth considering other operating system namespaces,
not only the file system.

Well, not all namespaces. For instance, names of socket endpoints are binary
addresses ("struct sockaddr" and derivatives); you can't ban control characters
from that. :)

How about, say, environment variables? Would you ban funny characters from
appearing in environment variable names, so that exec*() calls which try to
pass such variables will fail?

Or how about the names of POSIX semaphores? (sem_open?) Since (I think)
they cannot be traversed, it doesn't matter what they contain. I mean, a
program must know the exact name to sem_open; there isn't any way to
list through them. If two pieces of an application use "^Z^Z" as a semaphore
rendezvous name, that's strictly between them.

The host name should be sane. The sethostname syscall should scrub nasties
from the name.

By the way, these checks are overhead! Every time a string comes into
the kernel in some call that requires sanitization, it has to be checked,
character by character.

> Anyway, I have long thought that the Unix filename model was silly, so I'm
> clearly in favor of Wheeler's idea. Maybe it could be done as an optional
> feature - i.e., leave the default as it is, but have a mode that you could
> turn on (maybe call it "secure filename mode"), with the strong
> recommendation that new systems and development going forward use this
> mode.

E.g. perhaps a mount option?

# Level 5 Kenny! Names must consist of [A-Z0-9.], damn it!

mount -o kenny=5 # ...

wil...@wilbur.25thandclement.com

unread,
Jul 13, 2015, 3:45:07 PM7/13/15
to
One of the problems with enforcing Unicode in kernel space is that all
normalization forms are necessarily tied to a particular version of the
Unicode stanard as new combining sequences, precomposed characters, and
other elements are added or changed. This creates ample opporunity to
circumvent what is supposed to be a security measure. It can make what was
once a valid byte sequence invalid, or vice-versa. This is compounded by the
fact that whitelist and blacklists would presumably be much more dynamic.

And let's not forget that that now the kernel is parsing and likely
translating each and every pathname using a non-trivial function. Path names
could shrink or expand in kernel space where before they were immutable. I'd
bet a decent amount of money that such a feature would introduce buffer
bugs. The value of immutable data is priceless in the context of writing
correct code.

OS X Unicode-normalizes path names in the kernel, and AFAIK it's roundly
criticized--some claim they did it wrong, others think they shouldn't have
done it at all. (OTOH, I shouldn't be surprised that I would only hear
commentary from complainers, while supporters remain silent.)

To me it (kernel enforcement) just seems like a beguiling feature that in
the grand scheme of things is much more costly and ineffective than it
immediately appears.

I would like to see somebody or some group promulgate a file name munging
API that addressed these and other issues, especially much bigger security
issues such as as grapheme visual disambiguation--which doesn't rely on
whitelists or blacklists and the selection and maintenance dilemmas they
bring.

At a minimum it would be nice if bugs in extant shells were fixed,
specifically $@ expansion and proper handling of empty IFS. (I try to submit
bug reports as I encounter issues.) These things are concrete and
uncontroversial. If such things can't be fixed, why should we be confident
some other, much more complex solution, could be executed properly?
0 new messages