Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

easy eval() fix?

0 views
Skip to first unread message

Geoff Gerrietts

unread,
Oct 15, 2003, 4:16:57 PM10/15/03
to Python List
On one of the projects I've worked with, early development featured a
pattern where primitive data would be repr()'ed into a string, then
eval()'ed back out of the string. Later in the project's evolution,
this was seen to have some previously unconsidered security
implications.

The hard way to fix this is to go back and change all the places where
data were repr()'ed into a string, and use some simpler system. But at
its easiest, that's a great deal of work.

I'm hoping that there's some module or project out there that I
haven't seen that will translate simple primitives into python objects
without doing variable interpolation or expression evaluation, etc.

I know it's a pretty steep order, but I also know it's something I've
seen before, in the form of the plist libraries under Objective C.

Anyone got a quick fix?

Thanks,
--G.

--
Geoff Gerrietts <geoff at gerrietts dot net>
"Ordinarily he was insane, but he had lucid moments
when he was merely stupid." --Heinrich Heine

Alex Martelli

unread,
Oct 15, 2003, 6:33:36 PM10/15/03
to
Geoff Gerrietts wrote:
...

> I'm hoping that there's some module or project out there that I
> haven't seen that will translate simple primitives into python objects
> without doing variable interpolation or expression evaluation, etc.

Given that "simple primitives" ARE "python objects", I'm not too
clear on what you're asking for. Perhaps marshal.dumps (and
loads to recover the objecs)? Pickle does evaluate expressions
when reloading from a string (you can't generate an instance of
a generic class without "evaluating and espression"), though.


Alex

John Roth

unread,
Oct 15, 2003, 7:00:39 PM10/15/03
to

"Geoff Gerrietts" <ge...@gerrietts.net> wrote in message
news:mailman.122.1066249...@python.org...

> On one of the projects I've worked with, early development featured a
> pattern where primitive data would be repr()'ed into a string, then
> eval()'ed back out of the string. Later in the project's evolution,
> this was seen to have some previously unconsidered security
> implications.
>
> The hard way to fix this is to go back and change all the places where
> data were repr()'ed into a string, and use some simpler system. But at
> its easiest, that's a great deal of work.
>
> I'm hoping that there's some module or project out there that I
> haven't seen that will translate simple primitives into python objects
> without doing variable interpolation or expression evaluation, etc.
>
> I know it's a pretty steep order, but I also know it's something I've
> seen before, in the form of the plist libraries under Objective C.
>
> Anyone got a quick fix?

I don't know of a module that does this, but I'm not altogether
certain it wouldn't be possible to put something together that
would suit what you need in around the same time it took to
write the message.

What are the primitive types you need to convert from repr()
string format back to their object format?

John Roth
>
> Thanks,
> --G.
>
> --


Geoff Gerrietts

unread,
Oct 15, 2003, 8:53:15 PM10/15/03
to John Roth, pytho...@python.org
Quoting John Roth (newsg...@jhrothjr.com):
>
> I don't know of a module that does this, but I'm not altogether
> certain it wouldn't be possible to put something together that would
> suit what you need in around the same time it took to write the
> message.

You might be surprised how quickly I type. ;)

> What are the primitive types you need to convert from repr() string
> format back to their object format?

Literal statements.

A list of integers:
[1, 2, 3, 4]
A list of strings:
['1', '2', '3', '4']
A string/string dict:
{'a': 'b', 'c': 'd'}

Imagine the variations; they are plentiful.

On the other hand, anything that actually performs "operations" is not
permissible.

On the other hand, an error case:
[10 ** (10 *** 10)]

This should not, for instance, choke the box for a day evaluating the
expression; it should (probably) throw an exception but any scenario
that does not allow the code to chew CPU time is a win over eval().

Also, eval and exec do all their work inside a namespace where names
get resolved to bound objects etcetera. That's not desirable. Nor is
it desirable to permit an object to be called.

What I'm interested in -- what eval seems most used for, certainly in
this project -- is a general-purpose tool for transforming a string
containing a literal statement into the Python data structure.

I toyed with using the parser module to do this. I still may try to do
that, but I don't know enough about ASN parse trees to understand why
so many apparently unrelated symbols show up in the parse tree, and so
I'm reluctant to start down this road without an ample budget of time
to come to an understanding of such things.

I don't have that ample budget of time in my project schedule, so I
thought I would check to see if there was a quick fix available.

Thanks,
--G.

--
Geoff Gerrietts <geoff at gerrietts dot net> http://www.gerrietts.net/
"Politics, as a practice, whatever its professions, has always been the
systematic organization of hatreds." --Henry Adams

John Roth

unread,
Oct 15, 2003, 9:44:28 PM10/15/03
to

"Geoff Gerrietts" <ge...@gerrietts.net> wrote in message
news:mailman.130.1066265...@python.org...

Are the strings allowed to contain commas? Are the structures
allowed to contain embedded structures? If neither of those is
true, it's relatively easy to crack the input and build a result.

If you have to handle strings with embedded commas (or colons)
and also recursive structures, you'll need a finite state machine.
It's not a particularly difficult one to handle since there are only
8 symbols ([, ], {, }, ,, :, ', ") that have to be handled. Everything
else is just a literal symbol that's either the string that comes out
of the FSM, or it can be fed into int() or float().

It might even be easier to verify that the input string only
contains those special characters, plus strings, and then feed
it into exec the way you're doing now. That would satisfy the
security concern by verifying that the input can't cause any
harm.

John Roth

>
> Thanks,
> --G.


Bengt Richter

unread,
Oct 17, 2003, 5:17:24 AM10/17/03
to
On Wed, 15 Oct 2003 17:53:15 -0700, Geoff Gerrietts <ge...@gerrietts.net> wrote:

>Quoting John Roth (newsg...@jhrothjr.com):
>>
>> I don't know of a module that does this, but I'm not altogether
>> certain it wouldn't be possible to put something together that would
>> suit what you need in around the same time it took to write the
>> message.
>
>You might be surprised how quickly I type. ;)
>
>> What are the primitive types you need to convert from repr() string
>> format back to their object format?
>
>Literal statements.
>
>A list of integers:
> [1, 2, 3, 4]
>A list of strings:
> ['1', '2', '3', '4']
>A string/string dict:
> {'a': 'b', 'c': 'd'}
>
>Imagine the variations; they are plentiful.
>

Maybe looking into what the compiler produces? E.g.,

>>> import compiler
>>> compiler.transformer.parse('[1, 2, 3, 4]')
Module(None, Stmt([Discard(List([Const(1), Const(2), Const(3), Const(4)]))]))
>>> compiler.transformer.parse("['1', '2', '3', '4']")
Module(None, Stmt([Discard(List([Const('1'), Const('2'), Const('3'), Const('4')]))]))
>>> compiler.transformer.parse("{'a': 'b', 'c': 'd'}")
Module(None, Stmt([Discard(Dict([(Const('a'), Const('b')), (Const('c'), Const('d'))]))]))

You could repr that and then exec the string in an environment [1] with your own definitions
of all those capitalized names, e.g., in example at end.

>On the other hand, anything that actually performs "operations" is not
>permissible.
>
>On the other hand, an error case:
> [10 ** (10 *** 10)]
>

>>> print '%s'% compiler.transformer.parse('[10 ** (10 *** 10)]')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "D:\python23\lib\compiler\transformer.py", line 50, in parse
return Transformer().parsesuite(buf)
File "D:\python23\lib\compiler\transformer.py", line 120, in parsesuite
return self.transform(parser.suite(text))
File "<string>", line 1


[10 ** (10 *** 10)]

^
SyntaxError: invalid syntax

No problem catching syntax errors ;-)

>This should not, for instance, choke the box for a day evaluating the
>expression; it should (probably) throw an exception but any scenario
>that does not allow the code to chew CPU time is a win over eval().
>

Time, memory resource quotas are an OS job, I think.

>Also, eval and exec do all their work inside a namespace where names
>get resolved to bound objects etcetera. That's not desirable. Nor is
>it desirable to permit an object to be called.
>
>What I'm interested in -- what eval seems most used for, certainly in
>this project -- is a general-purpose tool for transforming a string
>containing a literal statement into the Python data structure.
>
>I toyed with using the parser module to do this. I still may try to do
>that, but I don't know enough about ASN parse trees to understand why
>so many apparently unrelated symbols show up in the parse tree, and so
>I'm reluctant to start down this road without an ample budget of time
>to come to an understanding of such things.

I think using repr on the tree (like what happens when you print the top node
and (I presume) the nodes repr themselves recursively down their subtrees)
and using that as in [what I can now (it's later) refer to as] the example below,
might work as a screener.

>
>I don't have that ample budget of time in my project schedule, so I
>thought I would check to see if there was a quick fix available.
>

Here is a first go at implementing the idea mentioned above.
Not tested very much. I don't know what you need to exclude from compilation.

You will note a start at customizing -- i.e., I allowed calls to names
in the ok_to_call list, in case you need that. Plus it's a start on ideas how
to detect and allow particular things, or maybe disallow some things.

For some reason I disallowed keyword parameters in calls, and attribute access in general,
but you will need to think about each AST name and decide whether it plays a role
in code you want to accept for compilation.

The tuple return for ok names is sure suggestive of lisp/scheme ;-)
It doesn't seem like a big jump to do a translation of simple Python to simple
scheme, but that just fell out ;-)

====< cksrc.py >===================================================
# chksrc.py -- check Python source for syntax errors and "dangerous" stuff
# V .01a bokr 2003-10-17

# NOTE: USE AT YOUR OWN RISK -- NO WARRANTY! MAKE YOUR OWN EDITS!!
# This is an experimental hack to demonstrate a nascent idea, no more.

# Following symbols from python 2.3 were/are retrieved by
# [x for x in dir(compiler.symbols.ast) if x[:2].istitle()]
# and edited to comment out what to allow, the rest being "dangerous"
#
# Everything has not even been tested, never mind thoroughly
#
import sets
dangerous = sets.Set([
# 'Add', 'And',
'AssAttr', 'AssList', 'AssName', 'AssTuple',
'Assert', 'Assign', 'AugAssign', 'Backquote',
# 'Bitand', 'Bitor', 'Bitxor', 'Break',
'CallFunc', 'Class',
# 'Compare', 'Const', 'Continue', 'Dict', 'Discard',
# 'Div', 'Ellipsis',
'EmptyNode', # ??
'Exec',
# 'Expression', 'FloorDiv',
'For', 'From', 'Function',
'Getattr', 'Global','If', 'Import', 'Invert',
'Keyword', # ??
'Lambda',
# 'LeftShift', 'List', 'ListComp', 'ListCompFor', 'ListCompIf',
'ListType', 'Mod',
# 'Module', 'Mul', 'Name', 'Node',
# 'Not', 'Or',
'Pass',
# 'Power',
'Print', 'Printnl',
'Raise', 'Return',
# 'RightShift', 'Slice', 'Sliceobj', 'Stmt',
# 'Sub', 'Subscript',
'TryExcept', 'TryFinally',
# 'Tuple', 'TupleType',
# 'UnaryAdd', 'UnarySub',
'While', 'Yield',
])

## define a set of ok names to call
ok_to_call = sets.Set('bool int foo'.split()) # etc

import compiler
checkMethods = {}

class Error(Exception): pass


# build an environment dict with functions defined for all the AST names
# above, returning an innocuous tuple for accepted names, and throwing an
# exception for the names left un-commented in the "dangerous" set.
class NameChecker(object):
def __init__(self, name, dangerous=True):
self.name=name; self.dangerous = dangerous
def __call__(self, *args):
result = (self.name,)+args
# allow call to specific function
if self.name=='CallFunc' and args[0][0]=='Name' and args[0][1] in ok_to_call:
return result
if self.dangerous: raise Error, '%r not allowed!'%(result,)
return result

for name in [x for x in dir(compiler.symbols.ast) if x[:2].istitle()]:
checkMethods[name] = NameChecker(name, name in dangerous)

def cksrc(src, verbose=False):
"""
Check Python source for syntax errors or banned usage.
Return True if "ok (USE THIS AT YOUR OWN RISK!), False otherwise.
Print source, compiler.transformer.parse AST representation, and
result of recompiling and evaluating the text of that AST in an
environment where the node names are functions returning tuples for
"safe" nodes ,and throwing an Error exception if a "dangerous" node's
name is called.
"""
env = checkMethods.copy() #XXX maybe can eliminate copy
if verbose: print '%r =>'%src
try: ast_repr = repr(compiler.transformer.parse(src))
except Exception, e:
if verbose: print '%s: %s'%(e.__class__.__name__, e)
return False #not ok
else:
if verbose: print '%r =>'%ast_repr
try:
v = eval(ast_repr, env)
if verbose: print v
return True # ok
except Exception,e:
if verbose: print '%s: %s'%(e.__class__, e)
return False # not ok

if __name__ == '__main__':
import sys
usage = """
Usage: cksrc.py [-v] [- | -f file]* | -- expression
(quote expression elements as necessary to prevent shell interpretation)
-v for verbose output (default for interactive)
-v- to turn off verbose
- to read source from stdin (prompted if tty)
-f file to read from file
-- expression to take rest of command line as source"""

args = sys.argv[1:]
verbose = False; vopt=''
if not args: raise SystemExit, usage
while args:
src = ''
opt = args.pop(0)
if opt=='-v': vopt=opt; verbose=True; continue
elif opt=='-v-': vopt=opt; verbose=False; continue
elif opt=='-h': print usage; continue
elif opt=='-':
if sys.stdin.isatty:
print 'Enter Python source and end with ^Z'
verbose = True and vopt!='-v-'
src = sys.stdin.read()
print 'cksrc returned ok ==%s' % cksrc(src, verbose)
elif opt=='-f':
if not args: raise SystemExit, usage
f = file(args.pop(0))
src = f.read()
f.close()
print 'cksrc returned ok ==%s' % cksrc(src, verbose)
elif opt=='-i':
src='anything'; verbose = True and vopt!='-v-'
print 'Enter expression (or just press Enter to quit):'
while src:
src = raw_input('Expr> ').rstrip()
if src: print 'cksrc returned ok ==%s' % cksrc(src, verbose)
elif opt == '--':
verbose = True and vopt!='-v-'
src = ' '.join(args); args=[]
print 'cksrc returned ok ==%s' % cksrc(src, verbose)
===================================================================

A few examples:

[ 1:45] C:\pywk\rexec>cksrc.py -i
Enter expression (or just press Enter to quit):
Expr> [1, 2, 3, 4]
'[1, 2, 3, 4]' =>
'Module(None, Stmt([Discard(List([Const(1), Const(2), Const(3), Const(4)]))]))' =>
('Module', None, ('Stmt', [('Discard', ('List', [('Const', 1), ('Const', 2), ('Const', 3), ('Con
st', 4)]))]))
cksrc returned ok ==True
Expr> ['1', '2', '3', '4']
"['1', '2', '3', '4']" =>
"Module(None, Stmt([Discard(List([Const('1'), Const('2'), Const('3'), Const('4')]))]))" =>
('Module', None, ('Stmt', [('Discard', ('List', [('Const', '1'), ('Const', '2'), ('Const', '3'),
('Const', '4')]))]))
cksrc returned ok ==True
Expr> {'a':'b', 'c':'d'}
"{'a':'b', 'c':'d'}" =>
"Module(None, Stmt([Discard(Dict([(Const('a'), Const('b')), (Const('c'), Const('d'))]))]))" =>
('Module', None, ('Stmt', [('Discard', ('Dict', [(('Const', 'a'), ('Const', 'b')), (('Const', 'c
'), ('Const', 'd'))]))]))
cksrc returned ok ==True
Expr> a,b
'a,b' =>
"Module(None, Stmt([Discard(Tuple([Name('a'), Name('b')]))]))" =>
('Module', None, ('Stmt', [('Discard', ('Tuple', [('Name', 'a'), ('Name', 'b')]))]))
cksrc returned ok ==True
Expr> a b
'a b' =>
SyntaxError: unexpected EOF while parsing (line 1)
cksrc returned ok ==False
Expr> 1*3+3**4
'1*3+3**4' =>
'Module(None, Stmt([Discard(Add((Mul((Const(1), Const(3))), Power((Const(3), Const(4))))))]))' =>
('Module', None, ('Stmt', [('Discard', ('Add', (('Mul', (('Const', 1), ('Const', 3))), ('Power',
(('Const', 3), ('Const', 4))))))]))
cksrc returned ok ==True
Expr> 1*2+3**4
'1*2+3**4' =>
'Module(None, Stmt([Discard(Add((Mul((Const(1), Const(2))), Power((Const(3), Const(4))))))]))' =>
('Module', None, ('Stmt', [('Discard', ('Add', (('Mul', (('Const', 1), ('Const', 2))), ('Power',
(('Const', 3), ('Const', 4))))))]))
cksrc returned ok ==True
Expr>

Checking on being able to call foo:

Enter expression (or just press Enter to quit):
Expr> foo()
'foo()' =>
"Module(None, Stmt([Discard(CallFunc(Name('foo'), [], None, None))]))" =>
('Module', None, ('Stmt', [('Discard', ('CallFunc', ('Name', 'foo'), [], None, None))]))
cksrc returned ok ==True
Expr> bar(foo())
'bar(foo())' =>
"Module(None, Stmt([Discard(CallFunc(Name('bar'), [CallFunc(Name('foo'), [], None, None)], None,
None))]))" =>
__main__.Error: ('CallFunc', ('Name', 'bar'), [('CallFunc', ('Name', 'foo'), [], None, None)], N
one, None) not allowed!
cksrc returned ok ==False

Note the order of evaluation let it accept the foo() call to set up the arg value for the bar(foo())
call, but not the bar call itself. The other way around stop on bar right away:

Expr> foo(bar())
'foo(bar())' =>
"Module(None, Stmt([Discard(CallFunc(Name('foo'), [CallFunc(Name('bar'), [], None, None)], None,
None))]))" =>
__main__.Error: ('CallFunc', ('Name', 'bar'), [], None, None) not allowed!
cksrc returned ok ==False
Expr>

HTH. Fun stuff, anyway ;-)

Regards,
Bengt Richter

Geoff Gerrietts

unread,
Oct 17, 2003, 12:20:36 PM10/17/03
to Python List, Bengt Richter
Quoting Bengt Richter (bo...@oz.net):
> Maybe looking into what the compiler produces? E.g.,

Wow! Much better, and I like your solution. Unfortunately, I went from
"stuck on 1.5.2" to "stuck on 2.1.3" so no compiler module. At least
it's an improvement.

I have this feeling I could somehow extract most of this information
from the ASTs produced by the parser, but it would be a challenge. I'm
leaning toward John Roth's "build a little state machine" particularly
since I think I can do that with some regular expressions, and if it
becomes necessary, do THAT in C.

Of course this is all a great deal of work to circumvent situations
where things weren't done "the right way" the first time around. I may
have a go at scoping out "the right way" first.

Thanks everyone for your help!

--G.

--
Geoff Gerrietts "Whenever people agree with me I always
<geoff at gerrietts net> feel I must be wrong." --Oscar Wilde

0 new messages