Yeah, I think there are a lot of people out there who would like
something like this, but it's not quite clear how to go about it. If
you search Google Groups, there are a lot of examples of how you can use
Python's object introspection to retrieve "unsafe" functions.
I wish there was a way to, say, exec something with no builtins and with
import disabled, so you would have to specify all the available
bindings, e.g.:
exec user_code in dict(ClassA=ClassA, ClassB=ClassB)
but I suspect that even this wouldn't really solve the problem, because
you can do things like:
py> class ClassA(object):
... pass
...
py> object, = ClassA.__bases__
py> object
<type 'object'>
py> int = object.__subclasses__()[2]
py> int
<type 'int'>
so you can retrieve a lot of the builtins. I don't know how to retrieve
__import__ this way, but as soon as you figure that out, you can then
do pretty much anything you want to.
Steve
>
> I wish there was a way to, say, exec something with no builtins and
with
> import disabled, so you would have to specify all the available
> bindings, e.g.:
>
> exec user_code in dict(ClassA=ClassA, ClassB=ClassB)
>
> but I suspect that even this wouldn't really solve the problem,
because
> you can do things like:
>
> py> class ClassA(object):
> ... pass
> ...
> py> object, = ClassA.__bases__
> py> object
> <type 'object'>
> py> int = object.__subclasses__()[2]
> py> int
> <type 'int'>
>
> so you can retrieve a lot of the builtins. I don't know how to
retrieve
> __import__ this way, but as soon as you figure that out, you can then
> do pretty much anything you want to.
>
> Steve
Steve
Safe eval recipe posted to cookbook:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/364469
Couldn't safe exec be programmed similarly?
'import' and 'from' are syntax, so trivially avoided
Likewise, function calls are easily intercepted
As you say, attribute access to core functions appears to present the
challenge. It is easy to intercept attribute access, harder to know
what's safe. If there were a known set of 'dangerous' objects e.g.,
sys, file, os etc... then these could be checked by identity against any
attribute returned
Of course, execution would be painfully slow, due to double -
interpretation.
Michael
>
> I wish there was a way to, say, exec something with no builtins and
> with import disabled, so you would have to specify all the available
> bindings, e.g.:
>
> exec user_code in dict(ClassA=ClassA, ClassB=ClassB)
>
> but I suspect that even this wouldn't really solve the problem,
> because you can do things like:
>
> py> class ClassA(object):
> ... pass
> ...
> py> object, = ClassA.__bases__
> py> object
> <type 'object'>
> py> int = object.__subclasses__()[2]
> py> int
> <type 'int'>
>
> so you can retrieve a lot of the builtins. I don't know how to
> retrieve __import__ this way, but as soon as you figure that out, you
> can then do pretty much anything you want to.
>
> Steve
Steve
This recipe only evaluates constant expressions:
"Description:
Evaluate constant expressions, including list, dict and tuple using the
abstract syntax tree created by compiler.parse"
It means you can't eval arbitrary Python code -- it's basically just a
data parser. Handy in some situations, but not the equivalent of a
limited Python virtual machine.
> Likewise, function calls are easily intercepted
I'm not sure I follow this... How do you intend to intercept all
function calls?
> As you say, attribute access to core functions appears to present the
> challenge. It is easy to intercept attribute access, harder to know
> what's safe. If there were a known set of 'dangerous' objects e.g.,
> sys, file, os etc... then these could be checked by identity against any
> attribute returned
It sounds like you're suggesting overriding the global attribute access
mechanism. Is that right? So that every time Python encountered an
attribute access, you would verify that the attribute being accessed is
not on the 'dangerous' list? I don't know how to do that without
basically rewriting some of Python's C code, though certainly I'm no
expert in the area...
Also, I'm not sure identity is sufficient:
py> import sys
py> import new
py> new.module('newsys')
py> newsys = new.module('newsys')
py> newsys.__dict__.update(sys.__dict__)
py> newsys is sys
False
py> newsys == sys
False
Steve
Ahh, gotcha. Thanks for the clarification.
I haven't ever spent much time dealing with Python's ASTs, but my guess
is doing anything here is probably worth putting off until the AST
branch is merged into main CVS for Python 2.5. (I understand there are
supposed to be some substantial changes, but I don't know exactly what
they are or what they affect.)
> Right - the crux of the problem is how to identify dangerous objects.
> My point is that if such as test is possible, then safe exec is very
> easily implemented within current Python. If it is not, then it is
> essentially impossible.
>
[snip]
>
> It might still be possible to have a reliable test within a
> problem-specific domain i.e., white-listing.
Yeah, that was basically my intent -- provide a white-list of the usable
objects. I wonder how complicated this would be... You also probably
have to white-list the types of all the attributes of the objects you
provide...
Steve
> >>This is a serious issue.
> >>
> >>It's also one that brings Tcl, mentioned several
> >>times in this thread, back into focus. Tcl presents
> >>the notion of "safe interpreter", that is, a sub-
> >>ordinate virtual machine which can interpret only
> >>specific commands. It's a thrillingly powerful and
> >>correct solution to the main problem Jeff and others
> >>have described.
> >
> > A better (and of course *vastly* more powerful but unfortunately only
> > a dream ;-) is a similarly limited python virutal machine.....
>
> Yeah, I think there are a lot of people out there who would like
> something like this, but it's not quite clear how to go about it. If
> you search Google Groups, there are a lot of examples of how you can use
> Python's object introspection to retrieve "unsafe" functions.
IMHO a safe Python would consist of a special mode that disallows all
systemcalls that could spy/harm data (IO etc.) and imports of
non-whitelisted modules. Additionally, a loop counter in the interpreter
loop would ensure that the code does not stall the process/machine.
>>> sys.safecall(func, maxcycles=1000)
could enter the safe mode and call the func.
I am not sure how big the patch would be, it is mainly a C macro at the
begginning of every relevant function that checks the current "mode" and
raises an exception if it is not correct. The import handler would need to
check if the module is whitelisted (based on the path etc.).
Python is too dynamic to get this working while just using tricks that
manipulate some builtins/globals etc.
Kind regards,
Alexander
Indeed. But it's easy to extend this to arbitrary constructs. You just need to
decide what code to emit for the other 50 or so ast node types. Many of those
are boiler-plate binops.
>
>> Likewise, function calls are easily intercepted
>
> I'm not sure I follow this... How do you intend to intercept all
> function calls?
Sorry, should have been more precise. In the AST, Function calls have their own
node type, so it is easy to 'intercept' them and execute them conditionally
>
[snip]
>
> It sounds like you're suggesting overriding the global attribute access
> mechanism. Is that right? So that every time Python encountered an
> attribute access, you would verify that the attribute being accessed is
> not on the 'dangerous' list?
Just in the context of the AST-walker, yes
I don't know how to do that without
> basically rewriting some of Python's C code, though certainly I'm no
> expert in the area...
Not messing with the CPython interpreter
>
> Also, I'm not sure identity is sufficient:
>
> py> import sys
> py> import new
> py> new.module('newsys')
> py> newsys = new.module('newsys')
> py> newsys.__dict__.update(sys.__dict__)
> py> newsys is sys
> False
> py> newsys == sys
> False
Right - the crux of the problem is how to identify dangerous objects. My point
is that if such as test is possible, then safe exec is very easily implemented
within current Python. If it is not, then it is essentially impossible.
Let's assume that it is indeed not possible to know in general whether an object
is safe, either by inspecting its attributes, or by matching its identity
against a black list.
It might still be possible to have a reliable test within a problem-specific
domain i.e., white-listing. This, I think, is what you meant when you said:
> I wish there was a way to, say, exec something with no builtins and with import disabled, so you would have to specify all the available bindings, e.g.:
>
> exec user_code in dict(ClassA=ClassA, ClassB=ClassB)
I believe that if you can come up with a white-list, then the rest of the
problem is easy.
Michael
I'll suggest yet another perspective: add another indirection.
As the virtual machine becomes more available to introspection,
it might become natural to define a *very* restricted interpreter
which we can all agree is safe, PLUS a means to extend that
specific instance of the VM with, say, new definitions of bindings
for particular AST nodes. Then the developer has the means to
"build out" his own VM in a way he can judge useful and safe for
his own situation. Rather than the Java there-is-one-"safe"-for-
all approach, Pythoneers would have the tools to create safety.
That does sound good. And evolutionary, because the very restricted VM could be
implemented today (in Python), and subsequently PyPy (or whatever) could
optimize it.
The safe eval recipe I referred to earlier in the thread is IMO a trivial
example of of this approach. Of course, its restrictions are extreme - only
constant expressions, but it is straightforwardly extensible to any subset of
the language.
The limitation that I see with this approach is that it is not, in general,
syntax that is safe or unsafe (with the notable exception of 'import' and its
relatives). Rather, it the library objects, especially the built-ins, that
present the main source of risk.
So, if I understand your suggestion, it would require assessing the safety of
the built-in objects, as well as providing an interpreter that could control
access to them, possibly with fine-grain control at the attribute level.
M
>>>> sys.safecall(func, maxcycles=1000)
> could enter the safe mode and call the func.
This might be even enhanced like this:
>>> import sys
>>> sys.safecall(func, maxcycles=1000,
allowed_domains=['file-IO', 'net-IO', 'devices', 'gui'],
allowed_modules=['_sre'])
Every access to objects that are not in the specified domains are
restricted by the interpreter. Additionally, external modules (which are
expected to be not "decorated" by those security checks) have to be in the
modules whitelist to work flawlessy (i.e. not generate exceptions).
Any comments about this from someone who already hacked CPython?
Kind regards,
Alexander
Yes, this comes up every couple months and there is only one answer:
This is the job of the OS.
Java largely succeeds at doing sandboxy things because it was written that
way from the ground up (to behave both like a program interpreter and an OS).
Python the language was not, and the CPython interpreter definitely was not.
Search groups.google.com for previous discussions of this on c.l.py
-Jack
Could you give some useful queries? Every time I do this search, I get
a few results, but never anything that really goes into the security
holes in any depth. (They're ususally something like -- "look, given
object, I can get int" not "look, given object, I can get eval,
__import__, etc.)
Steve
A search on "rexec bastion" will give you most of the threads,
search on "rexec bastion diederich" to see the other times I tried to
stop the threads by reccomending reading the older ones *wink*.
Thread subjects:
Replacement for rexec/Bastion?
Creating a capabilities-based restricted execution system
Embedding Python in Python
killing thread ?
-Jack
See the past threads I reccomend in another just-posted reply.
Common browser implementations of Javascript have almost no features, can't
import C-based libraries, and can easilly enter endless loops or eat all
available memory. You could make a fork of python that matches that feature
set, but I don't know why you would want to.
-Jack
Thanks for the keywords -- I hadn't tried anything like any of these.
Unfortunately, they leave me with the same feeling as before... The
closest example that I saw that actually showed a security hole made use
of __builtins__. As you'll note from the beginning of this thread, I
was considering the case where no builtins are provided and imports are
disabled.
I also read a number of messages that had the same problems I do -- too
many threads just say "look at google groups", without saying what to
search for. They also often spend most of their time talking about
abstract problems, without showing code that illustrates how to break
the "security". For example, I never found anything close to describing
how to retrieve, say, 'eval' or '__import__' given only 'object'.
What would be really nice is a wiki that had examples of how to derive
"unsafe" functions from 'object'. I'd be glad to put one together, but
so far, I can't find many examples... If you want to consider reading
and writing of files as "unsafe", then I guess this might be one:
file = object.__subclasses__()[16]
If I could see how to go from 'object' (or 'int', 'str', 'file', etc.)
to 'eval' or '__import__', that would help out a lot...
Steve
I already wrote about the "RestrictedPython" which is part of Zope,
didn't I?
Please search the archive to find a description...
Dieter
Not in this thread.
> Please search the archive to find a description...
>
Interesting. I'd be interested in whether it requires a full Zope
install and how easy (or otherwise) it is to setup. I'll investigate.
Regards,
Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
>
> Dieter
>>> object.__subclasses__()
[<type 'type'>, <type 'weakref'>, <type 'int'>, <type 'basestring'>,
<type 'list'>, <type 'NoneType'>, <type 'NotImplementedType'>, <type
'module'>, <type 'zipimport.zipimporter'>, <type 'posix.stat_result'>,
<type 'posix.statvfs_result'>, <type 'dict'>, <type 'function'>, <class
'site._Printer'>, <class 'site._Helper'>, <type 'set'>, <type 'file'>]
Traipse through these, find one class that has an unbound method, get
that unbound method's func_globals, bingo.
Alex
So long as any Python modules are imported using the same restricted environment
their func_globals won't contain eval() or __import__ either.
And C methods don't have func_globals at all.
However, we're talking about building a custom interpreter here, so there's no
reason not to simply find the dangerous functions at the C-level and replace
their bodies with "PyErr_SetString(PyExc_Exception, "Access to this operation
not allowed in restricted build"); return NULL;".
Then it doesn't matter *how* you get hold of file(), it still won't work. (I can
hear the capabilities folks screaming already. . .)
Combine that with a pre-populated read-only sys.modules and a restricted custom
interpreter would be quite doable. Execute it in a separate process and things
should be fairly solid.
Cheers,
Nick.
--
Nick Coghlan | ncog...@email.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net
Thanks for the help! I'd played around with object.__subclasses__ for
a while, but I hadn't realized that func_globals was what I should be
looking for.
Here's one route to __builtins__:
py> string_Template = object.__subclasses__()[17]
py> builtins = string_Template.substitute.func_globals['__builtins__']
py> builtins['eval']
<built-in function eval>
py> builtins['__import__']
<built-in function __import__>
Steve
> Alex Martelli wrote:
> > Steven Bethard <steven....@gmail.com> wrote:
> > ...
> >
> >>If I could see how to go from 'object' (or 'int', 'str', 'file', etc.)
> >>to 'eval' or '__import__', that would help out a lot...
> >
> >>>>object.__subclasses__()
...
> > Traipse through these, find one class that has an unbound method, get
> > that unbound method's func_globals, bingo.
>
> So long as any Python modules are imported using the same restricted
> environment their func_globals won't contain eval() or __import__ either.
Sure, as long as you don't need any standard library module using eval
from Python (or can suitably restrict them or the eval they use), etc,
you can patch up this specific vulnerability.
> And C methods don't have func_globals at all.
Right, I used "unbound method" in the specific sense of "instance of
types.UnboundMethodType" (bound ones or any Python-coded function you
can get your paws on work just as well).
> However, we're talking about building a custom interpreter here, so there's no
It didn't seem to me that Steven's question was so restricted; and since
he thanked me for my answer (which of course is probably inapplicable to
some custom interpreter that's not written yet) it appears to me that my
interpretation of his question was correct, and my answer useful to him.
> reason not to simply find the dangerous functions at the C-level and replace
> their bodies with "PyErr_SetString(PyExc_Exception, "Access to this operation
> not allowed in restricted build"); return NULL;".
>
> Then it doesn't matter *how* you get hold of file(), it still won't work.
> (I can hear the capabilities folks screaming already. . .)
Completely removing Python-level access to anything dangerous might be a
safer approach than trying to patch one access route after another, yes.
> Combine that with a pre-populated read-only sys.modules and a restricted
> custom interpreter would be quite doable. Execute it in a separate process
> and things should be fairly solid.
If you _can_ execute (whatever) in a separate process, then an approach
based on BSD's "jail" or equivalent features of other OS's may be able
to give you all you need, without needing other restrictions to be coded
in the interpreter (or whatever else you run in that process).
Alex
One thing my company has done is written a ``safe_eval()`` that uses a
regex to disable double-underscore access.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
"19. A language that doesn't affect the way you think about programming,
is not worth knowing." --Alan Perlis
will the regex catch getattr(object, 'subclasses'.join(['_'*2]*2)...?-)
Alex
Alex> will the regex catch getattr(object,
Alex> 'subclasses'.join(['_'*2]*2)...?-)
Now he has two problems. ;-)
Skip
I nearly asked that question, then I realised that 'getattr' is quite
easy to remove from the global namespace for the code in question, and
assumed that they had already thought of that.
Stephen.
OK then -- vars(type(object)) is a dict which has [[the unbound-method
equivalent of]] object.__subclasses__ at its entry for key
'__subclasses__'. Scratch 'vars' in addition to 'getattr'. And 'eval'
of course, or else building up the string 'object.__subclasses__' (in a
way the regex won't catch) then eval'ing it is easy. I dunno, maybe I'm
just being pessimistic, I guess...
Alex
Heheh. No. Then again, security is only as strong as its weakest link,
and that quick hack makes this part of our application as secure as the
rest.
> OK then -- vars(type(object)) is a dict which has [[the unbound-method
> equivalent of]] object.__subclasses__ at its entry for key
> '__subclasses__'. Scratch 'vars' in addition to 'getattr'. And 'eval'
> of course, or else building up the string 'object.__subclasses__' (in a
> way the regex won't catch) then eval'ing it is easy. I dunno, maybe I'm
> just being pessimistic, I guess...
You can defeat the regexp without any builtin besides object:
>>> eval("# coding: utf7\n"
"+AG8AYgBqAGUAYwB0AC4AXwBfAHMAdQBiAGMAbABhAHMAcwBlAHMAXwBf-")
<built-in method __subclasses__ of type object at 0x81010e0>
>>>
Bernhard
--
Intevation GmbH http://intevation.de/
Skencil http://skencil.org/
Thuban http://thuban.intevation.org/
No, I think you are being realistic. I thought one of the basic tenets of
computer security was "that which is not expressly allowed is forbidden".
Any attempt at security that attempts to find and plug the security holes
while leaving the basic insecure system intact is almost certainly going to
miss something.
Skip
I guess security is drastically different from all other programming
spheres because you DO have an adversary, who you should presume to be
at least as clever as you are. In most tasks, good enough is good
enough and paranoia doesn't pay; when an adversary IS there, only the
paranoid survive...;-)
Alex
> Fuzzyman wrote:
> > Cameron Laird wrote:
> > [snip..]
> >
> >>This is a serious issue.
> >>
> >>It's also one that brings Tcl, mentioned several
> >>times in this thread, back into focus. Tcl presents
> >>the notion of "safe interpreter", that is, a sub-
> >>ordinate virtual machine which can interpret only
> >>specific commands. It's a thrillingly powerful and
> >>correct solution to the main problem Jeff and others
> >>have described.
> >
> > A better (and of course *vastly* more powerful but unfortunately only
> > a dream ;-) is a similarly limited python virutal machine.....
>
> Yeah, I think there are a lot of people out there who would like
> something like this, but it's not quite clear how to go about it. If
> you search Google Groups, there are a lot of examples of how you can use
> Python's object introspection to retrieve "unsafe" functions.
>
> I wish there was a way to, say, exec something with no builtins and with
> import disabled, so you would have to specify all the available
> bindings, e.g.:
>
> exec user_code in dict(ClassA=ClassA, ClassB=ClassB)
>
> but I suspect that even this wouldn't really solve the problem, because
> you can do things like:
>
> py> class ClassA(object):
> ... pass
> ...
> py> object, = ClassA.__bases__
> py> object
> <type 'object'>
> py> int = object.__subclasses__()[2]
> py> int
> <type 'int'>
>
> so you can retrieve a lot of the builtins. I don't know how to retrieve
> __import__ this way, but as soon as you figure that out, you can then
> do pretty much anything you want to.
>
> Steve
Wouldn't it be better to attach to all code objets some kind of access right
marker and to create an opcode that calls a function while reducing the
access rights ? After all, security would be easier to achieve if you
prevented the execution of all the dangerous code rather than trying to
hide all the entry points to it.
Yes, I'd stopped following the thread for a bit, and the discussion had moved
further afield than I realised :)
> If you _can_ execute (whatever) in a separate process, then an approach
> based on BSD's "jail" or equivalent features of other OS's may be able
> to give you all you need, without needing other restrictions to be coded
> in the interpreter (or whatever else you run in that process).
I think that's where these discussion have historically ended. . . making a
Python-specific sandbox gets complicated enough that it ends up making more
sense to just use an OS-based sandbox that lets you execute arbitrary binaries
relatively safely.
The last suggestion I recall along these lines was chroot() plus a monitoring
daemon that killed the relevant subprocess if it started consuming too much
memory or looked like it had got stuck in an infinite loop.
"Yes, but" -- that ``if'' at the start of this quote paragraph of mine
is, I believe, a meaningful qualification. It is not obvious to me that
all applications and platforms can usefully execute untrusted Python
code in a separate jail'd process; so, I think there would still be use
cases for an in-process sandbox, although it's surely true that making
one would not be trivial.
Alex
The Xen virtual server[1] was recently metnioned on slashdot[2].
It is more lightweight and faster than full scale machine emulators because
it uses a modified system kernel (so it only works on *nixes it has been
ported to). You can set the virtual memory of each instance to keep
programs from eating the world. I don't know about CPU, you might still
have to monitor & kill instances that peg the CPU.
If anyone does this, a HOWTO would be appreciated!
-Jack
...it also uses python for its control programs.
--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick