Security implications of using open() on untrusted strings.

r0g

unread,

Nov 24, 2008, 12:44:45 AM11/24/08

to

Hi there,

I'm trying to validate some user input which is for the most part simple
regexery however I would like to check filenames and I would like this
code to be multiplatform.

I had hoped the os module would have a function that would tell me if a
proposed filename would be valid on the host system but it seems not. I
have considered whitelisting but it seems a bit unfair to make the rest
of the world suffer the naming restrictions of windows. Moreover it
seems both inelegant and hard work to research the valid file/directory
naming conventions of every platform that this app could conceivably run
on and write regex's for all of them so...

I'm tempted to go the witch dunking route, stick it in an open() between
a Try: & Except: and see if it floats. However...

Although it's a desktop (not internet facing) app I'm a little squeamish
piping raw user input into a filesystem function like that and this app
will be dealing with some particularly sensitive data so I want to be
careful and minimize exposure where practical.

Has programming PHP and Web stuff for years made me overly paranoid
about this or do I should I still be scrubbing input like this before I
feed it to filesystem functions? If so does anyone know of a module
that may help or have any other advice.

Note: In this particular case the user input is only specifying the name
of a file that will be opened for writing _not_ reading and the
interface is GUI only (wxWidgets).

Regards,

Roger.

Steven D'Aprano

unread,

Nov 24, 2008, 2:29:33 AM11/24/08

to

On Mon, 24 Nov 2008 00:44:45 -0500, r0g wrote:

> Hi there,
>
> I'm trying to validate some user input which is for the most part simple
> regexery however I would like to check filenames and I would like this
> code to be multiplatform.
>
> I had hoped the os module would have a function that would tell me if a
> proposed filename would be valid on the host system but it seems not. I
> have considered whitelisting but it seems a bit unfair to make the rest
> of the world suffer the naming restrictions of windows. Moreover it
> seems both inelegant and hard work to research the valid file/directory
> naming conventions of every platform that this app could conceivably run
> on and write regex's for all of them so...

That's probably why nobody has written a function for the os module to do
the same... and just wait until you get into the murky universe of cross-
platform Unicode filenames.

Honestly, I think your best bet is to just trust the file system to
recognize a bad file name and raise an exception. What counts as a bad
file name is surprisingly hard to define, especially if you want to be
cross-platform. See here for more details:

http://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-
filename-in-python

--
Steven

r0g

unread,

Nov 24, 2008, 3:47:30 AM11/24/08

to

Yep, I spotted that too which is why white-listing is my fallback plan.
My question is really about the security of using unfiltered data in a
filesystem function though. Are there particualar exploits that could
make use of such unfiltered calls? For example I'd imagine jailbreaking
might be a concern if the app isn't run under it's own restricted user
account. Do others here consider this when designing applications and
what techniques/modules, if any, do you use to sanitize path/filename input?

Roger.

Thomas Bellman

unread,

Nov 24, 2008, 4:22:02 AM11/24/08

to

r0g <aioe...@technicalbloke.com> wrote:

> Although it's a desktop (not internet facing) app I'm a little squeamish
> piping raw user input into a filesystem function like that and this app
> will be dealing with some particularly sensitive data so I want to be
> careful and minimize exposure where practical.

> Has programming PHP and Web stuff for years made me overly paranoid
> about this or do I should I still be scrubbing input like this before I
> feed it to filesystem functions? If so does anyone know of a module
> that may help or have any other advice.

> Note: In this particular case the user input is only specifying the name
> of a file that will be opened for writing _not_ reading and the
> interface is GUI only (wxWidgets).

Is the user *running* the application the same as the user who
feeds it input? If it is, then there is no need to filter the
filenames, since that user could just do "rm bad-file" (or "DEL
BAD-FILE" on MS-Windows) anyway to destroy it.

(Of course, if you are passing the filename to, e.g, os.system(),
you would need to quote it properly, but that is to avoid
surprising the user; it is one thing to let the user overwrite a
file named "foo; rm -rf $HOME", quite another to pass that string
unquoted to /bin/sh when the user thought he was just typing a
filename.)

--
Thomas Bellman, Lysator Computer Club, Linköping University, Sweden
"I don't think [that word] means what you ! bellman @ lysator.liu.se
think it means." -- The Princess Bride ! Make Love -- Nicht Wahr!

Terry Reedy

unread,

Nov 24, 2008, 11:54:14 AM11/24/08

to pytho...@python.org

r0g wrote:

> Yep, I spotted that too which is why white-listing is my fallback plan.
> My question is really about the security of using unfiltered data in a
> filesystem function though. Are there particualar exploits that could
> make use of such unfiltered calls?

The classic one would be submitting a filename such as 'a'*1000, but
current OSes should be immune from that sort of thing by now.

Jorgen Grahn

unread,

Nov 24, 2008, 3:00:38 PM11/24/08

to

On Mon, 24 Nov 2008 00:44:45 -0500, r0g <aioe...@technicalbloke.com> wrote:
> Hi there,
>
> I'm trying to validate some user input which is for the most part simple
> regexery however I would like to check filenames and I would like this
> code to be multiplatform.
>
> I had hoped the os module would have a function that would tell me if a
> proposed filename would be valid on the host system but it seems not. I
> have considered whitelisting but it seems a bit unfair to make the rest
> of the world suffer the naming restrictions of windows. Moreover it
> seems both inelegant and hard work to research the valid file/directory
> naming conventions of every platform that this app could conceivably run
> on and write regex's for all of them so...
>
> I'm tempted to go the witch dunking route, stick it in an open() between
> a Try: & Except: and see if it floats. However...
>
> Although it's a desktop (not internet facing) app I'm a little squeamish
> piping raw user input into a filesystem function like that and this app
> will be dealing with some particularly sensitive data so I want to be
> careful and minimize exposure where practical.

Take the Unix 'ls' command (or MS-DOS 'dir'). That's two programs
which let users pipe raw input into the filesystem functions, and they
certainly have handled some very sensitive data over the years.

> Has programming PHP and Web stuff for years made me overly paranoid

> about this [...]

Yes. ;-)

Please explain one thing: what are you looking for? It's not
"accesses a file outside the user's home directory", "accesses an
infinite file like /dev/zero" or something like that, or you would
have said so. Nor seems the "user" input come from some other user
than the one your program is running as, nor from some input source
which the user cannot be held responsible for.

Seems to me you simply want to know beforehand that the reading will
work. But you can never check that! You can stat(2) the file, or
open-and-close it -- and then a microsecond later, someone deletes the
file, or replaces it with another one, or write-protects it, or mounts
a file system on top of its directory, or drops a nuke over the city,
or ...

Two more notes:

- os.open is not like os.system. If os.open ends up doing
anything other than trying to open the file corresponding to the
string you feed it, it's Python's fault, not yours.

Compare with a language (does Perl allow this?) where if the string
is "rm -rf /|", open will run "rm -rf /" and start reading its output.
*That* interface would have been

- if the OS ends up doing something different when calling open(2) or
creat(2) or whatever using that string, it's the OSes fault, not
yours.

Or am I missing something?

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!

r0g

unread,

Nov 25, 2008, 2:26:32 AM11/25/08

to

No Jorgen, that's exactly what I needed to know i.e. that sending
unfiltered text to open() is not negligent or likely to allow any
badness to occur.

As far as what I was looking for: I was not looking for anything in
particular as I couldn't think of any specific cases where this could be
a problem however... my background is websites (where input sanitization
is rule number one) and some of the web exploits I've learned to
mitigate over the years aren't ones I would have necessarily figured out
for myself i.e. CSRF So I thought I'd ask you guys in case there's
anything I haven't considered that I should consider! Thankfully it
seems I don't have too much to worry about :-)

The only situation where I can forsee potential for mischief is if the
program, or part thereof, is running as a more privileged user than the
user it is accepting input from. Thankfully I don't think that will be
necessary in the prog I'm working on right now as I don't need packet
capture / low numbered ports etc.

Thanks for your answer and thanks to everybody else for all their
comments too.

Roger.

Lawrence D'Oliveiro

unread,

Nov 25, 2008, 2:40:57 AM11/25/08

to

Jorgen Grahn wrote:

> Seems to me you simply want to know beforehand that the reading will
> work. But you can never check that! You can stat(2) the file, or
> open-and-close it -- and then a microsecond later, someone deletes the
> file, or replaces it with another one, or write-protects it, or mounts
> a file system on top of its directory, or drops a nuke over the city,
> or ...

Depends on what exactly you're trying to guard against. Your comments would apply, for example, to a set-uid program being run by a potentially hostile local user (except that Linux doesn't allow set-uid scripts).

But if the script is being run, for example, via a Web interface, where processes on the local system can be trusted but the remote user cannot, then it is perfectly legitimate to use calls like stat(2) to enforce your own permission checks before allowing an operation.

Jorgen Grahn

unread,

Nov 25, 2008, 4:58:27 PM11/25/08

to

On Tue, 25 Nov 2008 20:40:57 +1300, Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:
> Jorgen Grahn wrote:
>
>> Seems to me you simply want to know beforehand that the reading will
>> work. But you can never check that! You can stat(2) the file, or
>> open-and-close it -- and then a microsecond later, someone deletes the
>> file, or replaces it with another one, or write-protects it, or mounts
>> a file system on top of its directory, or drops a nuke over the city,
>> or ...
>

> Depends on what exactly you're trying to guard against. Your
> comments would apply, for example, to a set-uid program being run by a
> potentially hostile local user

Yeah, I know. I covered that in the part you snipped: "Nor seems the

'user' input come from some other user than the one your program is
running as, nor from some input source which the user cannot be held
responsible for."

/Jorgen

Jorgen Grahn

unread,

Nov 25, 2008, 5:12:28 PM11/25/08

to

On Tue, 25 Nov 2008 02:26:32 -0500, r0g <aioe...@technicalbloke.com> wrote:
> Jorgen Grahn wrote:
...

>> Or am I missing something?

> No Jorgen, that's exactly what I needed to know i.e. that sending

> unfiltered text to open() is not negligent or likely to allow any
> badness to occur.
>
> As far as what I was looking for: I was not looking for anything in
> particular as I couldn't think of any specific cases where this could be
> a problem however... my background is websites (where input sanitization
> is rule number one) and some of the web exploits I've learned to
> mitigate over the years aren't ones I would have necessarily figured out
> for myself i.e. CSRF

I have no idea what CSRF is, but I know what you mean. And it applies
in the safe and cozy Unix account world too -- that the exploits are
surprising, I mean. Maybe I made it out to be *too* safe in my
previous posting. But still ...

> So I thought I'd ask you guys in case there's
> anything I haven't considered that I should consider! Thankfully it
> seems I don't have too much to worry about :-)

... no, in this case you're just doing what everybody else does,
and you have no alternative plan (filter for what?)

There ought to be some list "common attacks on applications run by
local Unix users" which one could learn from. Maybe it's not obvious
that the content of a local file should, in many situations, be
handled as untrusted. In the meantime, there's things like this:

http://www.debian.org/security/2008/

Many of them are local exploits.

News123

unread,

Nov 25, 2008, 5:37:25 PM11/25/08

to

Jorgen Grahn wrote:
> Compare with a language (does Perl allow this?) where if the string
> is "rm -rf /|", open will run "rm -rf /" and start reading its output.
> *That* interface would have been

Good example. (for perl):

The problem doesn't exist in python
open("rm -rf / |") would try to open a file with exactly that name and
it would fail if it doesn't exist.

In perl the perl script author has the choice to be safe (three argument
open) or to allow stupid or nice things with a two argument open.

In perl:
open($fh,"rm -rf / |") would execute the command "rm -rf /" and pass
it's output to perl

In perl:
open($fh,"rm -rf / |","<") would work as in python

The only similiar pitfall for pyhon would be popen() in a context like
filename=userinput()
p = os.popen("md5sum "+f)
here you would have unexpected behavior if filename were something like
"bla ; rm -rf /"

Sometimes I miss the 'dangerous variation' in python and I explicitely
add code in python that the filename '-' will be treated as stdin for
files to be read and as stdout for files to be written to

bye N

Jorgen Grahn

unread,

Nov 26, 2008, 9:00:06 AM11/26/08

to

On Tue, 25 Nov 2008 23:37:25 +0100, News123 <new...@free.fr> wrote:
> Jorgen Grahn wrote:
>> Compare with a language (does Perl allow this?) where if the string
>> is "rm -rf /|", open will run "rm -rf /" and start reading its output.
>> *That* interface would have been

> Good example. (for perl):

I should actually have removed that paragraph from my posting.
I was about to write "*That* interface would have been dangerous!" but
then I thought "Hm, isn't the user supposed to be in control of that
string, and isn't it his fault if he enters '-rm -rf |', just as if
he entered the name of his most valuable file?"

I don't know ...

> The problem doesn't exist in python
> open("rm -rf / |") would try to open a file with exactly that name and
> it would fail if it doesn't exist.
>
> In perl the perl script author has the choice to be safe (three argument
> open) or to allow stupid or nice things with a two argument open.

...

> Sometimes I miss the 'dangerous variation' in python and I explicitely
> add code in python that the filename '-' will be treated as stdin for
> files to be read and as stdout for files to be written to

That's something I frequently do, too. And I see no harm in it, if I
document it and people expect it (for those who don't know, reserving
'-' for this is a Unix tradition).