Simple and safe evaluator

bvdp

unread,

Jun 11, 2008, 4:25:32 PM6/11/08

to

Is there a simple/safe expression evaluator I can use in a python
program. I just want to pass along a string in the form "1 + 44 / 3" or
perhaps "1 + (-4.3*5)" and get a numeric result.

I can do this with eval() but I really don't want to subject my users to
the problems with that method.

In this use I don't need python to worry about complex numbers,
variables or anything else. Just do the math on a set of values. Would
eval() with some restricted list of permitted operators do the trick?

I'm feeling too lazy to write/debug my own parser for this :)

Thanks, Bob.

Simon Forman

unread,

Jun 11, 2008, 5:05:59 PM6/11/08

to

Funny, I need exactly the same kind of parser myself right now.
Fredrik Lundh has posted some code-and-explanation on an excellent
simple parser that's easy to extend. http://effbot.org/zone/simple-iterator-parser.htm

Just make it recognize the operator tokens you're interested in and if
a string parsers w/o errors then you know it's safe to eval().

I probably won't get to writing this myself for a few days or a week,
but if you do will you post it here (or send me a copy)? I'll do the
same if I get to it sooner.

Regards,
~Simon

Matimus

unread,

Jun 11, 2008, 5:30:56 PM6/11/08

to

On Jun 11, 1:25 pm, bvdp <b...@mellowood.ca> wrote:

Here is something that I wrote using the _ast module. It works pretty
well, and might be a good example for others wanting to experiment
with the _ast module. On a related note... if anybody wants to provide
feedback on this code it would be much appreciated. It involves a lot
of if/elif branches, and feels ugly.

Matt

[code]
import _ast

class SafeEvalError(Exception):
pass

class UnsafeCode(SafeEvalError):
pass

# safe types:
# Sequences:
# list, tuple, dict, set, frozen_set*
# Literals: str, unicode, int, long, complex, float
def safe_eval(text):
"similar to eval, but only works on literals"
ast = compile(text, "<string>", 'exec', _ast.PyCF_ONLY_AST)
return _traverse(ast.body[0].value)

def _traverse(ast):
if isinstance(ast, _ast.List):
return [_traverse(el) for el in ast.elts]
elif isinstance(ast, _ast.Tuple):
return tuple(_traverse(el) for el in ast.elts)
elif isinstance(ast, _ast.Dict):
return dict(
zip(
(_traverse(k) for k in ast.keys),
(_traverse(v) for v in ast.values)
)
)
elif isinstance(ast, _ast.Str):
return ast.s
elif isinstance(ast, _ast.Num):
return ast.n
elif isinstance(ast, _ast.Expr):
return _traverse(ast.value)
elif isinstance(ast, _ast.BinOp):
if isinstance(ast.op, _ast.Add):
return _traverse(ast.left) + _traverse(ast.right)
elif isinstance(ast.op, _ast.Sub):
return _traverse(ast.left) - _traverse(ast.right)
elif isinstance(ast.op, _ast.Div):
return _traverse(ast.left) / _traverse(ast.right)
elif isinstance(ast.op, _ast.FloorDiv):
return _traverse(ast.left) // _traverse(ast.right)
elif isinstance(ast.op, _ast.Mod):
return _traverse(ast.left) % _traverse(ast.right)
elif isinstance(ast.op, _ast.Mult):
return _traverse(ast.left) * _traverse(ast.right)
elif isinstance(ast.op, _ast.Pow):
return _traverse(ast.left) ** _traverse(ast.right)
elif isinstance(ast.op, _ast.BitAnd):
return _traverse(ast.left) & _traverse(ast.right)
elif isinstance(ast.op, _ast.BitOr):
return _traverse(ast.left) | _traverse(ast.right)
elif isinstance(ast.op, _ast.BitXor):
return _traverse(ast.left) ^ _traverse(ast.right)
elif isinstance(ast.op, _ast.LShift):
return _traverse(ast.left) << _traverse(ast.right)
elif isinstance(ast.op, _ast.RShift):
return _traverse(ast.left) >> _traverse(ast.right)
elif isinstance(ast, _ast.BoolOp):
if isinstance(ast.op, _ast.And):
return all(_traverse(v) for v in ast.values)
if isinstance(ast.op, _ast.Or):
return any(_traverse(v) for v in ast.values)
elif isinstance(ast, _ast.UnaryOp):
if isinstance(ast.op, _ast.Invert):
return _traverse(ast.operand)
if isinstance(ast.op, _ast.USub):
return -_traverse(ast.operand)
if isinstance(ast.op, _ast.UAdd):
return +_traverse(ast.operand)
if isinstance(ast.op, _ast.Not):
return not _traverse(ast.operand)

raise UnsafeCode()

if __name__ == "__main__":
print safe_eval("[1,2,3,{'hello':1}, (1,-2,3)], 4j, 1+5j, ~1+2*3")
[/code]

bvdp

unread,

Jun 11, 2008, 5:49:57 PM6/11/08

to

Oh, this is interesting. Similar to some other code I found on the web
which grabs a list of permitted token values using the dis module:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/286134

I was really hoping for a builtin on this :)

Thanks.

bvdp

unread,

Jun 11, 2008, 5:53:41 PM6/11/08

to

I'll have to read Fredrik's code a few more times, but I think it makes
as much sense as anything else. Of course, I could take the lazy man's
way out and just to a left->right evaluation without any ()s, etc.,
which in my project would work. But, honestly, I thought it'd be easier.
I was going to use eval() until I realized that it was not a good idea.
Darn shame we have to work so hard to prevent some jerk's malicious code
from effecting our stuff. Oh well, that's life.

bvdp

unread,

Jun 11, 2008, 7:38:58 PM6/11/08

to

I'm finding my quest for a safe eval() quite frustrating :)

Any comments on this: Just forget about getting python to do this and,
instead, grab my set of values (from a user supplied text file) and call
an external program like 'bc' to do the dirty work. I think that this
would avoid someone from embedding os.system("rm ...") in what I thought
would be a math expression and having it maybe do damage? Perhaps I'm
getting too paranoid in my old age.

I guess this would slow things down a bit, but that is not a big
concern. Bigger concern would be that I'm not sure if 'bc' or whatever
is guaranteed to be on other platforms than *nix. And if I want to be
really paranoid, I could worry that someone had planted a bad 'bc' on
the target.

Matimus

unread,

Jun 11, 2008, 8:05:22 PM6/11/08

to

The solution I posted should work and is safe. It may not seem very
readable, but it is using Pythons internal parser to parse the passed
in string into an abstract symbol tree (rather than code). Normally
Python would just use the ast internally to create code. Instead I've
written the code to do that. By avoiding anything but simple operators
and literals it is guaranteed safe.

If you only want numbers you can even remove a bunch of code:
[code]
import _ast

class SafeEvalError(Exception):
pass

class UnsafeCode(SafeEvalError):
pass

def num_eval(text):
"similar to eval, but only works on numerical values."
ast = compile(text, "<string>", 'eval', _ast.PyCF_ONLY_AST)
return _traverse(ast.body)

def _traverse(ast):

elif isinstance(ast, _ast.UnaryOp):
if isinstance(ast.op, _ast.Invert):

return ~_traverse(ast.operand)
elif isinstance(ast.op, _ast.USub):
return -_traverse(ast.operand)
elif isinstance(ast.op, _ast.UAdd):
return +_traverse(ast.operand)

raise UnsafeCode()
[/code]

To use:
print num_eval("1 + 44 / 3")

bvdp

unread,

Jun 11, 2008, 8:15:49 PM6/11/08

to

Matimus wrote:

>
> The solution I posted should work and is safe. It may not seem very
> readable, but it is using Pythons internal parser to parse the passed
> in string into an abstract symbol tree (rather than code). Normally
> Python would just use the ast internally to create code. Instead I've
> written the code to do that. By avoiding anything but simple operators
> and literals it is guaranteed safe.
>

Just wondering ... how safe would:

eval(s, {"__builtins__":None}, {} )

be? From my testing it seems that it parses out numbers properly (int
and float) and does simple math like +, -, **, etc. It doesn't do
functions like int(), sin(), etc ... but that is fine for my puposes.

Just playing a bit, it seems to give the same results as your code using
ast does. I may be missing something!

Paul McGuire

unread,

Jun 11, 2008, 8:25:51 PM6/11/08

to

This example ships with pyparsing, and can be extended to support
built-in functions: http://pyparsing.wikispaces.com/space/showimage/fourFn.py.

-- Paul

George Sakkis

unread,

Jun 12, 2008, 12:16:08 AM6/12/08

to

Probably you do; within a couple of minutes I came up with this:

>>> s = """
... (t for t in 42 .__class__.__base__.__subclasses__()
... if t.__name__ == 'file').next()('/etc/passwd')
... """

>>> eval(s, {"__builtins__":None}, {} )

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 3, in <module>
IOError: file() constructor not accessible in restricted mode

Not an exploit yet but I wouldn't be surprised if there is one. Unless
you fully trust your users, an ast-based approach is your best bet.

George

Hans Nowak

unread,

Jun 12, 2008, 7:50:49 AM6/12/08

to

bvdp wrote:
>
> Is there a simple/safe expression evaluator I can use in a python
> program. I just want to pass along a string in the form "1 + 44 / 3" or
> perhaps "1 + (-4.3*5)" and get a numeric result.
>
> I can do this with eval() but I really don't want to subject my users to
> the problems with that method.
>
> In this use I don't need python to worry about complex numbers,
> variables or anything else. Just do the math on a set of values. Would
> eval() with some restricted list of permitted operators do the trick?

This solution may be overly simply (especially compared to the AST-based
solution suggested earlier), but... if all you need is numbers and operators,
*maybe* you can get away with stripping all letters from the input string (and
possibly the underscore), and then evaluating it:

import re
import traceback

re_letters = re.compile("[a-zA-Z_]+")

def safe_eval(s):
s = re_letters.sub("", s)
return eval(s)

# try it out...

>>> safe_eval("2+2")
4

>>> safe_eval("4 * (8 / 3.1) ** 7.2")
3685.5618352828474

>>> safe_eval("(2).__class__.__base__.__subclasses__()")

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

File "safe_eval.py", line 12, in safe_eval
return eval(s)
File "<string>", line 1
(2)...()
^
SyntaxError: invalid syntax

...It's primitive, but it might work for your purposes.

--
Hans Nowak (zephyrfalcon at gmail dot com)
http://4.flowsnake.org/

Grant Edwards

unread,

Jun 12, 2008, 10:13:41 AM6/12/08

to

On 2008-06-12, Hans Nowak <zephyrfalcon!NO_SPAM!@gmail.com> wrote:
> bvdp wrote:
>>
>> Is there a simple/safe expression evaluator I can use in a python
>> program. I just want to pass along a string in the form "1 + 44 / 3" or
>> perhaps "1 + (-4.3*5)" and get a numeric result.
>>
>> I can do this with eval() but I really don't want to subject my users to
>> the problems with that method.
>>
>> In this use I don't need python to worry about complex
>> numbers, variables or anything else. Just do the math on a set
>> of values. Would eval() with some restricted list of permitted
>> operators do the trick?
>
> This solution may be overly simply (especially compared to the
> AST-based solution suggested earlier), but... if all you need
> is numbers and operators, *maybe* you can get away with
> stripping all letters from the input string (and possibly the
> underscore), and then evaluating it:

It won't work for numbers expressed in scientific notation
(e.g. 1.23e-3).

--
Grant Edwards grante Yow! All right, you
at degenerates! I want
visi.com this place evacuated in
20 seconds!

Matimus

unread,

Jun 12, 2008, 1:28:46 PM6/12/08

to

You can get access to any new-style class that has been loaded. This
exploit works on my machine (Windows XP).

[code]
# This assumes that ctypes was loaded, but keep in mind any classes
# that have been loaded are potentially accessible.

import ctypes

s = """
(

t for t in 42 .__class__.__base__.__subclasses__()

if t.__name__ == 'LibraryLoader'
).next()(
(

t for t in 42 .__class__.__base__.__subclasses__()

if t.__name__ == 'CDLL'
).next()
).msvcrt.system('dir') # replace 'dir' with something nasty
"""

eval(s, {"__builtins__":None}, {})

[/code]

Matt

bvdp

unread,

Jun 12, 2008, 1:51:31 PM6/12/08

to

Yes, this is probably a good point. But, I don't see this as an exploit
in my program. Again, I could be wrong ... certainly not the first time
that has happened :)

In my case, the only way a user can use eval() is via my own parsing
which restricts this to a limited usage. So, the code setting up the
eval() exploit has to be entered via the "safe" eval to start with. So,
IF the code you present can be installed from within my program's
scripts ... then yes there can be a problem. But for the life of me I
don't see how this is possible. In my program we're just looking at
single lines in a script and doing commands based on the text.
Setting/evaluating macros is one "command" and I just want a method to
do something like "Set X 25 * 2" and passing the "25 * 2" string to
python works. If the user creates a script with "Set X os.system('rm
*')" and I used a clean eval() then we could have a meltdown ... but if
we stick with the eval(s, {"__builtins__":None}, {}) I don't see how the
malicious script could do the class modifications you suggest.

I suppose that someone could modify my program code and then cause my
eval() to fail (be unsafe). But, if we count on program modifications to
be doorways to exploits then we might as well just pull the plug.

Bob.

George Sakkis

unread,

Jun 12, 2008, 3:14:50 PM6/12/08

to

You probably missed the point in the posted examples. A malicious user
doesn't need to modify your program code to have access to far more
than you would hope, just devise an appropriate string s and pass it
to your "safe" eval.

Here's a simpler example to help you see the back doors that open. So
you might think that eval(s, {"__builtins__":None}, {}) doesn't
provide access to the `file` type. At first it looks so:

>>> eval('file', {"__builtins__":None}, {})
NameError: name 'file' is not defined
>>> eval('open', {"__builtins__":None}, {})
NameError: name 'open' is not defined

"Ok, I am safe from users messing with files since they can't even
access the file type" you reassure yourself. Then someone comes in and
passes to your "safe" eval this string:
>>> s = "(t for t in (42).__class__.__base__.__subclasses__() if t.__name__ == 'file').next()"

>>> eval(s, {"__builtins__":None}, {})

Oops.

Fortunately the file() constructor has apparently some extra logic
that prevents it from being used in restricted mode, but that doesn't
change the fact that file *is* available in restricted mode; you just
can't spell it "file".

25 builtin types are currently available in restricted mode -- without
any explicit import -- and this number has been increasing over the
years:

$ python2.3 -c 'print
eval("(42).__class__.__base__.__subclasses__().__len__()",
{"__builtins__":None}, {})'
13

$ python2.4 -c 'print
eval("(42).__class__.__base__.__subclasses__().__len__()",
{"__builtins__":None}, {})'
17

$ python2.5 -c 'print
eval("(42).__class__.__base__.__subclasses__().__len__()",
{"__builtins__":None}, {})'
25

$ python2.6a1 -c 'print
eval("(42).__class__.__base__.__subclasses__().__len__()",
{"__builtins__":None}, {})'
32

Regards,
George

bvdp

unread,

Jun 12, 2008, 3:31:35 PM6/12/08

to

George Sakkis wrote:

> You probably missed the point in the posted examples. A malicious user
> doesn't need to modify your program code to have access to far more
> than you would hope, just devise an appropriate string s and pass it
> to your "safe" eval.

Oppps, I did miss the point. I was assuming that the modifying stuff was
being done before the call to the eval(). I was wrong.

I'll have to get the ast based code incorporated into my code and just
use it. Darn, but it seems that each and every time one sees a simple
solution to a simple problem ... :)

Thanks.

bvdp

unread,

Jun 16, 2008, 4:47:30 PM6/16/08

to

Okay guys. I have the _ast based safe eval installed and working in my
program. It appears to be working just fine. Thanks for the help.

Now, a few more questions:

1. I see that _ast is a 2.5 module?? So, for folks using my code with
<2.5 I could do something like this:

# I've got some imports here to look after the error() and warning()
funcs ....

emsg_done = 0
etx = ""

def unsafe_eval(s):
""" safe eval for < python 2.5 (lacks _ast) """
global emsg_done
if not emsg_done:
warning("You are using an unsafe eval() function. Please
upgrade to Python version 2.5 or greater.")
emsg_done=1
# need error trap here as well ...
return eval(s, {"__builtins__":None}, {} )

def safe_eval(text):

"similar to eval, but only works on numerical values."

global etx
try:

ast = compile(text, "<string>", 'eval', _ast.PyCF_ONLY_AST)

except:
error("Expression error in '%s'" % text)
etx = text # for error reporting, bvdp
return _traverse(ast.body)

try:
import _ast
num_eval = safe_eval
except:
num_eval = unsafe_eval

# rest of matt's ast code follows.

Which appears to do the following: if there isn't an _ast module we just
define an alternate, not-so-safe, function and warn the user; otherwise
we use the safe version. I'm a bit uncomfortable with the import _ast
being after the function which uses the code, but it seems to work.

2. I thought I'd be happy with * / + -, etc. Of course now I want to add
a few more funcs like int() and sin(). How would I do that?

Thanks. This is looking very nice indeed.

Bob.

George Sakkis

unread,

Jun 16, 2008, 5:30:37 PM6/16/08

to

On Jun 16, 4:47 pm, bvdp <b...@mellowood.ca> wrote:

> 2. I thought I'd be happy with * / + -, etc. Of course now I want to add
> a few more funcs like int() and sin(). How would I do that?

For the builtin eval, just populate the globals dict with the names
you want to make available:

import math

globs = {'__builtins__' : None}

# expose selected builtins
for name in 'True False int float round abs divmod'.split():
globs[name] = eval(name)

# expose selected math constants and functions
for name in 'e pi sqrt exp log ceil floor sin cos tan'.split():
globs[name] = getattr(math,name)

return eval(s, globs, {})

The change to the _ast version is left as an exercise to the reader ;)

George

bvdp

unread,

Jun 16, 2008, 6:02:56 PM6/16/08

to

George Sakkis wrote:
> On Jun 16, 4:47 pm, bvdp <b...@mellowood.ca> wrote:
>
>> 2. I thought I'd be happy with * / + -, etc. Of course now I want to add
>> a few more funcs like int() and sin(). How would I do that?
>
> For the builtin eval, just populate the globals dict with the names
> you want to make available:
>
> import math
>
> globs = {'__builtins__' : None}
>
> # expose selected builtins
> for name in 'True False int float round abs divmod'.split():
> globs[name] = eval(name)
>
> # expose selected math constants and functions
> for name in 'e pi sqrt exp log ceil floor sin cos tan'.split():
> globs[name] = getattr(math,name)
>
> return eval(s, globs, {})
>

Thanks. That was easy :)

> The change to the _ast version is left as an exercise to the reader ;)

And I have absolutely no idea on how to do this. I can't even find the
_ast import file on my system. I'm assuming that the _ast definitions
are buried in the C part of python, but that is just a silly guess.

Bob.

Message has been deleted

bvdp

unread,

Jun 16, 2008, 11:32:57 PM6/16/08

to

swee...@acm.org wrote:

> On Jun 17, 8:02 am, bvdp <b...@mellowood.ca> wrote:
>
>> Thanks. That was easy :)
>>
>>> The change to the _ast version is left as an exercise to the reader ;)
>> And I have absolutely no idea on how to do this. I can't even find the
>> _ast import file on my system. I'm assuming that the _ast definitions
>> are buried in the C part of python, but that is just a silly guess.
>>
>> Bob.
>

> If you just need numeric expressions with a small number of functions,
> I would suggest checking the expression string first with a simple
> regular expression, then using the standard eval() to evaluate the
> result. This blocks the attacks mentioned above, and is simple to
> implement. This will not work if you want to allow string values in
> expressions though.
>
> import re
> def safe_eval( expr, safe_cmds=[] ):
> toks = re.split( r'([a-zA-Z_\.]+|.)', expr )
> bad = [t for t in toks if len(t)>1 and t not in safe_cmds]
> if not bad:
> return eval( expr )
>

Yes, this appears to be about as good (better?) an idea as any.
Certainly beats writing my own recursive decent parser for this :)

And it is not dependent on python versions. Cool.

I've run a few tests with your code and it appears to work just fine.
Just a matter of populating the save_cmds[] array and putting in some
error traps. Piece of cake. And should be fast as well.

Thanks!!!

Bob.

Simon Forman

unread,

Jun 19, 2008, 7:10:09 PM6/19/08

to

On Jun 16, 8:32 pm, bvdp <b...@mellowood.ca> wrote:

FWIW, I got around to implementing a function that checks if a string
is safe to evaluate (that it consists only of numbers, operators, and
"(" and ")"). Here it is. :)

import cStringIO, tokenize

def evalSafe(source):
'''
Return True if a source string is composed only of numbers,
operators
or parentheses, otherwise return False.
'''
try:
src = cStringIO.StringIO(source).readline
src = tokenize.generate_tokens(src)
src = (token for token in src if token[0] is not tokenize.NL)

for token in src:
ttype, tstr = token[:2]

if (
tstr in "()" or
ttype in (tokenize.NUMBER, tokenize.OP)
and not tstr == ',' # comma is an OP.
):
continue
raise SyntaxError("unsafe token: %r" % tstr)

except (tokenize.TokenError, SyntaxError):
return False

return True

for s in (

'(1 2)', # Works, but isn't math..

'1001 * 99 / (73.8 ^ 88 % (88 + 23e-10 ))', # Works

'1001 * 99 / (73.8 ^ 88 % (88 + 23e-10 )',
# Raises TokenError due to missing close parenthesis.

'(1, 2)', # Raises SyntaxError due to comma.

'a * 21', # Raises SyntaxError due to identifier.

'import sys', # Raises SyntaxError.

):
print evalSafe(s), '<--', repr(s)

Aahz

unread,

Jun 19, 2008, 11:54:31 PM6/19/08

to

In article <f407c296-9e02-47a6...@u12g2000prd.googlegroups.com>,

Simon Forman <sajm...@gmail.com> wrote:
>
>FWIW, I got around to implementing a function that checks if a string
>is safe to evaluate (that it consists only of numbers, operators, and
>"(" and ")"). Here it is. :)

What's safe about "10000000 ** 10000000"?
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha

bvdp

unread,

Jun 20, 2008, 1:00:46 PM6/20/08

to

Aahz wrote:
> In article <f407c296-9e02-47a6...@u12g2000prd.googlegroups.com>,
> Simon Forman <sajm...@gmail.com> wrote:
>> FWIW, I got around to implementing a function that checks if a string
>> is safe to evaluate (that it consists only of numbers, operators, and
>> "(" and ")"). Here it is. :)
>
> What's safe about "10000000 ** 10000000"?

Guess it depends on your definition of safe. I think that in most cases
folks looking for "safe" are concerned about a malicious interjection of
a command like "rm *" ... your example hangs the system for a long time
and eventually will error out when it runs out of memory, but (probably)
doesn't cause data corruption.

It would be nice if in a future version of Python we could have a
safe/limited eval() ... which would limit the resources.